Data & Data Governance

Controls on training, validation, and testing data — quality, representativeness, bias examination, and documentation.

Required by: ISO/IEC 42001, EU AI Act, GDPR Art. 22

Why this obligation matters

Data governance under EU AI Act Article 10 is one of the most operationally demanding obligations. High-risk AI providers must ensure that training, validation, and testing data sets are relevant, sufficiently representative, and to the best extent possible free of errors and complete in view of the intended purpose.

The Article requires examination of possible biases, identification of relevant data gaps, and appropriate measures to detect, prevent, and mitigate such biases. It also allows the processing of special categories of personal data under Article 9 GDPR to detect and correct biases, subject to safeguards.

This intersects with GDPR Article 5 on data minimisation and accuracy, and with GDPR Article 9 on special-category data.

What vendors typically provide

Data governance for AI is a mature category. Mature vendors handle data lineage, quality metrics, sensitive-attribute discovery, bias-detection scans, and access controls aligned to the AI use case.

Capabilities to look for:

Lineage from raw source to model-ready feature, with provenance for every input.
Automated detection of sensitive attributes (race, gender, age, disability, location).
Bias-detection scans against a defined protected-attribute taxonomy.
Synthetic-data and rebalancing tooling to address identified gaps.
Access controls and audit logs that satisfy GDPR Article 32 alongside Article 10.

Compliance checklist

[ ] Catalogue every data source used for training, validation, and testing.
[ ] Document the relevance of each source to the intended purpose.
[ ] Run a representativeness analysis against the intended population.
[ ] Scan for and document errors, gaps, and biases.
[ ] Apply documented mitigation when biases are found.
[ ] If processing special-category data under Article 10(5), document the safeguards.
[ ] Re-run the data governance analysis after every retraining.
[ ] Tie data governance findings to the risk management system (Article 9).

Common gaps we see

Three patterns dominate.

First, organizations document data sources but not provenance. Saying the data came from "internal CRM" does not satisfy Article 10. The CRM data came from somewhere upstream, and that origin chain matters for relevance, representativeness, and lawful basis.

Second, bias is examined only along one or two protected attributes (typically gender and race) when the system's intended deployment context implicates more (age, disability, language, regional). Annex III high-risk categories often involve multiple protected attributes simultaneously.

Third, mitigation is applied once, never re-verified. Bias scores from the original training set are reported as if they apply to every subsequent model version. Article 10 implicitly requires bias work to keep pace with model updates.

Regulator guidance and primary sources

EU AI Act Article 10: Data and Data Governance
GDPR Article 5: Principles relating to processing
GDPR Article 9: Special categories of personal data
GDPR Article 22: Automated decision-making — relevant when data governance bears on consequential decisions.
NIST Special Publication 1270 on AI bias — the standard methodology US regulators cite.

Vendors that support this obligation

Vendor	HQ	Founded	Size	Pricing	Last verified
Credo AI	Palo Alto, US	2020	51-200	Contact sales for enterprise subscription quote. Credo AI homepage	Apr 26, 2026
Fiddler AI	Palo Alto, US	2018	51-200	Contact for pricing	Apr 26, 2026
Arthur	New York, US	2019	51-200	Contact for pricing	Apr 26, 2026
Monitaur	Boston, United States	2019	11-50	Enterprise annual subscription; no public pricing listed. Forrester Wave cited 'pricing flexibility and transparency' as a highest-score criterion. Contact sales for quotes.	Apr 22, 2026
Trustible	Arlington, United States	2023	11-50	Contact sales for enterprise pricing; no public plans listed	Apr 23, 2026
FairNow	McLean, US	2023	11-50	Contact sales for quote; no public pricing listed	Apr 26, 2026
Fairly AI	Kitchener, Canada	2020	11-50	On-premises or private-cloud deployments; quote-based.	Apr 21, 2026