Data & Data Governance
Controls on training, validation, and testing data — quality, representativeness, bias examination, and documentation.
Required by: ISO/IEC 42001, EU AI Act, GDPR Art. 22
Why this obligation matters
Data governance under EU AI Act Article 10 is one of the most operationally demanding obligations. High-risk AI providers must ensure that training, validation, and testing data sets are relevant, sufficiently representative, and to the best extent possible free of errors and complete in view of the intended purpose.
The Article requires examination of possible biases, identification of relevant data gaps, and appropriate measures to detect, prevent, and mitigate such biases. It also allows the processing of special categories of personal data under Article 9 GDPR to detect and correct biases, subject to safeguards.
This intersects with GDPR Article 5 on data minimisation and accuracy, and with GDPR Article 9 on special-category data.
What vendors typically provide
Data governance for AI is a mature category. Mature vendors handle data lineage, quality metrics, sensitive-attribute discovery, bias-detection scans, and access controls aligned to the AI use case.
Capabilities to look for:
- Lineage from raw source to model-ready feature, with provenance for every input.
- Automated detection of sensitive attributes (race, gender, age, disability, location).
- Bias-detection scans against a defined protected-attribute taxonomy.
- Synthetic-data and rebalancing tooling to address identified gaps.
- Access controls and audit logs that satisfy GDPR Article 32 alongside Article 10.
Compliance checklist
- [ ] Catalogue every data source used for training, validation, and testing.
- [ ] Document the relevance of each source to the intended purpose.
- [ ] Run a representativeness analysis against the intended population.
- [ ] Scan for and document errors, gaps, and biases.
- [ ] Apply documented mitigation when biases are found.
- [ ] If processing special-category data under Article 10(5), document the safeguards.
- [ ] Re-run the data governance analysis after every retraining.
- [ ] Tie data governance findings to the risk management system (Article 9).
Common gaps we see
Three patterns dominate.
First, organizations document data sources but not provenance. Saying the data came from "internal CRM" does not satisfy Article 10. The CRM data came from somewhere upstream, and that origin chain matters for relevance, representativeness, and lawful basis.
Second, bias is examined only along one or two protected attributes (typically gender and race) when the system's intended deployment context implicates more (age, disability, language, regional). Annex III high-risk categories often involve multiple protected attributes simultaneously.
Third, mitigation is applied once, never re-verified. Bias scores from the original training set are reported as if they apply to every subsequent model version. Article 10 implicitly requires bias work to keep pace with model updates.
Regulator guidance and primary sources
- EU AI Act Article 10: Data and Data Governance
- GDPR Article 5: Principles relating to processing
- GDPR Article 9: Special categories of personal data
- GDPR Article 22: Automated decision-making — relevant when data governance bears on consequential decisions.
- NIST Special Publication 1270 on AI bias — the standard methodology US regulators cite.
Vendors that support this obligation
| Vendor | HQ | Founded | Size | Pricing | Last verified |
|---|---|---|---|---|---|
| Credo AI | Palo Alto, US | 2020 | 51-200 | Contact sales for enterprise subscription quote. Credo AI homepage | Apr 26, 2026 |
| Fiddler AI | Palo Alto, US | 2018 | 51-200 | Contact for pricing | Apr 26, 2026 |
| Arthur | New York, US | 2019 | 51-200 | Contact for pricing | Apr 26, 2026 |
| Monitaur | Boston, United States | 2019 | 11-50 | Enterprise annual subscription; no public pricing listed. Forrester Wave cited 'pricing flexibility and transparency' as a highest-score criterion. Contact sales for quotes. | Apr 22, 2026 |
| Trustible | Arlington, United States | 2023 | 11-50 | Contact sales for enterprise pricing; no public plans listed | Apr 23, 2026 |
| FairNow | McLean, US | 2023 | 11-50 | Contact sales for quote; no public pricing listed | Apr 26, 2026 |
| Fairly AI | Kitchener, Canada | 2020 | 11-50 | On-premises or private-cloud deployments; quote-based. | Apr 21, 2026 |