vendor-due-diligenceprocurementthird-party-risksecurity-questionnaireai-governance

AI Compliance Vendor Due Diligence: The Complete Procurement Guide

A procurement-grade due diligence framework for AI compliance software: financial-stability checks, security questionnaires (SIG, CAIQ), regulatory coverage validation, customer references, AI act traceability, and a vendor risk-tier scoring rubric.

By AI Compliance Vendors Editorial · Published April 25, 2026 · Last verified May 1, 2026

Buying AI compliance software is not like buying a project management tool. The market is young, vendor claims are rarely testable from the outside, and the cost of a wrong choice shows up in regulatory audits — not in the product itself. This guide gives procurement, legal, IT security, and compliance teams a structured method for evaluating AI compliance and governance vendors, from initial RFP through reference checks and final contract.

Why AI Vendor Due Diligence Is Different from Regular Software DD

Standard software due diligence asks whether a tool does what it claims. AI compliance vendor due diligence has an additional layer: you must assess whether the vendor can support your own compliance obligations, which are still evolving.

Three factors make this evaluation genuinely harder than conventional SaaS procurement.

Regulatory timelines are not synchronized. The EU AI Act fully applies to high-risk AI systems from August 2026. Colorado's SB 24-205 created the first comprehensive U.S. state-level algorithmic discrimination obligations for private entities. NYC Local Law 144 has enforced annual bias audits for automated hiring tools since July 2023. Texas's TRAIGA took effect January 1, 2026. A vendor covering only one of these frameworks leaves compliance gaps that are your organization's problem.

Vendor claims are not independently verifiable during the sales cycle. Unlike infrastructure with measurable SLAs, AI governance effectiveness is invisible until an incident or audit. Any vendor can claim "full EU AI Act coverage." Your evaluation must demand actual evidence artifacts — policy packs, sample audit outputs, model cards from real customer deployments — not feature slides.

The buyer's liability does not transfer to the vendor. Under EU AI Act Article 25, deployers retain compliance obligations regardless of vendor promises. Under Colorado SB 24-205, deployers must maintain a risk management program aligned with a recognized framework. Buying software does not buy compliance; it buys tooling that supports your compliance posture.

The 8 Pillars of AI Compliance Vendor Due Diligence

1. Regulatory Framework Coverage

Framework	Jurisdiction	Core obligation
EU AI Act	European Union	Risk classification, conformity assessment, technical documentation, logging, human oversight for high-risk systems
NIST AI RMF	U.S. (voluntary)	Four functions: Govern, Map, Measure, Manage; 72 playbook actions across AI lifecycle
ISO/IEC 42001:2023	International	AI Management System standard; Plan-Do-Check-Act methodology; third-party certifiable
NYC Local Law 144	New York City	Annual independent bias audit for automated employment decision tools; public results disclosure
Colorado SB 24-205	Colorado	Impact assessments; risk program aligned to NIST or ISO 42001; consumer disclosure for consequential decisions
Texas TRAIGA	Texas	Prohibits discriminatory and manipulative AI; applies to private entities doing business in Texas; effective Jan 1, 2026

Framework coverage is not binary. "We support the EU AI Act" can mean a compliance checklist or automated evidence generation for every Article 9–15 obligation with version-controlled audit trails.

RFP questions: Can you provide a control mapping document showing which product features satisfy which specific requirements for [Framework X]? Can you produce a sample audit artifact — not a screenshot — that a customer would present to a regulator? Which frameworks update automatically when regulatory guidance changes?

What good looks like: Credo AI publishes per-framework policy packs with control mappings for EU AI Act, NIST AI RMF, ISO/IEC 42001, and NYC Local Law 144. OneTrust includes EU AI Act, NIST, and ISO 42001 templates. IBM watsonx.governance has particular depth in financial services and government. As of April 2026, OneTrust does not publish per-framework control mapping documents publicly in the same granularity as Credo AI's policy packs — request a sample mapping in your RFP.

2. Evidence and Audit-Trail Capabilities

Regulators will not accept a vendor's assurance that your AI system was governed properly. They will ask for documentation. Evidence capabilities fall into two categories.

Development-time artifacts: Model cards, data documentation, bias test results tied to specific model versions, impact assessments (required by EU AI Act Article 9 and Colorado SB 24-205), control evidence mapped to regulatory requirements.

Runtime logs: Automatic event logging (EU AI Act Article 19 requires retention for at least six months); decision-level audit trails for consequential decision systems; human oversight records; drift and performance monitoring logs.

RFP questions: What artifacts are generated automatically vs. requiring manual input? Are logs cryptographically signed or tamper-evident? Can you produce a sample EU AI Act Technical File? Does your platform support Fundamental Rights Impact Assessments?

Red flags: Vendor cannot show a real audit artifact from a customer; outputs are PDF summaries rather than structured, machine-readable files; logs exist but cannot be exported outside the vendor's platform.

3. Model Lifecycle Integration (CI/CD, MLOps)

Governance that sits outside the development process gets bypassed. Evaluate integration depth with:

Version control (GitHub, GitLab, Azure DevOps) — governance checks trigger on code commits
Model registries (MLflow, Databricks Unity Catalog) — governance metadata captured at registration
CI/CD pipelines (GitHub Actions, Jenkins) — fairness and policy checks run as deployment gates
MLOps platforms (SageMaker, Azure ML, Vertex AI, Databricks) — hooks within active training and deployment workflows
LLM infrastructure (LangChain, LlamaIndex, OpenAI, Anthropic) — for generative AI workloads

RFP questions: Which integrations are native vs. custom connectors requiring professional services? Can governance checks act as hard deployment gates (block, not just warn)? When a model is updated, is a new governance review automatically triggered?

Vendor differentiation: Credo AI integrates natively with Databricks MLflow, Azure AI Foundry, GitHub, Slack, and ServiceNow. OneTrust offers continuous synchronization with Databricks Unity Catalog. Arthur AI focuses on post-deployment monitoring for both ML and LLM systems. Evaluate governance vendors before locking in your MLOps stack — integration depth varies significantly.

4. Data Residency and Deployment Options

Where your data goes when it enters a vendor's platform is a compliance question, not just a preference.

Deployment model	Best for
Multi-tenant SaaS	Lower-sensitivity workloads; no localization requirements
Single-tenant SaaS	Mid-tier sensitivity; logical isolation needed
Customer-hosted VPC	Data must remain within your cloud environment
Private/sovereign cloud	EU organizations; GDPR; regulated financial services
On-premises / air-gapped	Defense, intelligence, highly regulated institutions

Critical: A U.S. vendor deploying in an EU AWS region does not exempt that data from U.S. CLOUD Act requests. Regional deployment is not the same as data sovereignty.

RFP questions: In which countries is customer data stored and processed? Are you subject to the CLOUD Act? Do you offer private-cloud or on-premises deployment options, and at what cost? Who is on your full sub-processor list, and how are we notified of changes? If we run bias tests on a dataset containing PII, does that PII leave our environment?

5. Bias, Fairness, and Red-Teaming Features

Every vendor claims "bias detection." The meaningful questions concern which bias definitions are supported and whether adversarial testing is genuinely available.

Different regulatory requirements use different fairness definitions: - Disparate impact (selection rate ratio) — required by NYC LL 144 for employment tools - Equalized odds / equalized opportunity — common in lending and healthcare - Demographic parity — used in some fairness-constrained objectives - Intersectional bias — bias across combined protected characteristics; frequently missed by single-attribute tests

For LLMs and agentic systems, demographic-level bias testing is insufficient. Red-teaming involves adversarial prompting to test jailbreak resistance, harmful output elicitation, and off-label behavior. Ask whether the vendor integrates with frameworks such as PyRIT or Garak, or supports custom red-team test sets.

Demo checklist: - [ ] Multiple bias metrics tested simultaneously across protected groups - [ ] Intersectional analysis supported (multi-attribute testing) - [ ] Test results version-controlled and tied to specific model versions - [ ] Results can serve as evidence in a NYC LL 144 annual audit - [ ] Red-team capability for LLMs beyond basic content filtering

Vendor notes: Arthur AI provides continuous bias monitoring across tabular, NLP, and image modalities. Holistic AI emphasizes quantitative fairness testing specifically designed for generative AI. IBM AI Fairness 360 (open-source) offers 70+ fairness metrics; several governance platforms integrate it.

6. LLM and Agentic AI Support

Generative and agentic AI governance differs from traditional ML governance in ways that expose immature vendors.

What LLM governance requires at minimum: - Prompt and completion logging with PII masking - Output content policy enforcement (block or flag policy violations) - Hallucination detection and groundedness scoring - Foundation model version change notifications — when an upstream provider changes model behavior, your governance records must capture it

What agentic AI governance additionally requires: - Agent registration with defined scope, permitted tools, and action boundaries - Step-level audit logs for multi-step agent workflows - Policy-based action blocking (e.g., an agent cannot access production databases without approval) - MCP (Model Context Protocol) environment governance

RFP questions: For a production LLM, what is captured in the audit log per inference request? How is audit trail continuity maintained across multi-step agent executions? How are customers notified when a foundation model they depend on is updated or deprecated?

Red flag: A vendor who demos LLM governance using only a simple chatbot, with no discussion of agentic workflows or multi-step pipelines, has likely not tested their platform under realistic enterprise conditions.

7. Vendor Stability: Financial, Leadership, Customer Base

AI governance software is compliance infrastructure. A vendor who runs out of runway or gets acquired mid-contract leaves your compliance program on a deprecated platform.

Financial signals: - Funding stage and disclosed runway (Series A or earlier carries meaningful discontinuity risk) - Revenue growth trajectory; customer concentration risk - Renewals and net revenue retention (a proxy for whether customers are actually finding value)

Stability data points (April 2026): - Credo AI reported 2× revenue growth in 2025, 150% enterprise customer growth, and a Leader ranking in the Forrester Wave: AI Governance Solutions Q3 2025. Partners include IBM, Microsoft, Databricks, and McKinsey. - IBM watsonx.governance carries IBM's balance sheet and enterprise relationship depth; evaluate integration complexity with existing IBM tooling. - OneTrust is well-capitalized as a privacy and risk management vendor; assess whether its AI governance module is natively integrated or acquired/bolted-on.

For smaller vendors, request audited financials or a D&B report as part of procurement. For any vendor, include a technology escrow clause and a data portability provision in the contract.

8. Security Posture: SOC 2 Type II, ISO 27001, Penetration Tests, Sub-Processor List

An AI governance vendor holds sensitive information: model performance data, fairness results, governance decisions, and potentially model weights. Their security posture is a core evaluation criterion.

Minimum security baseline:

Certification	What to verify
SOC 2 Type II	Report covers 12 months; scope includes services you will use; no significant exceptions; review full report, not a summary letter
ISO 27001	Certificate current; certification body is accredited; scope covers relevant services; recent surveillance audit
Penetration test	External pen test within 12 months; confirm testing firm credentials; review remediation status of critical/high findings
Sub-processor list	Full list with jurisdictions and data types processed; notification process for changes

Beyond standard certifications: Neither SOC 2 nor ISO 27001 covers AI-specific security risks such as model extraction attacks, training data membership inference, or prompt injection through the governance platform's own LLM features. Ask explicitly whether the vendor's AI features fall within the scope of their SOC 2 audit.

Ongoing legal-risk monitoring. A SOC 2 report and a sub-processor list tell you about the vendor's posture today. They do not tell you whether the vendor (or one of its sub-processors) is currently being sued for the way it built or trained the underlying technology. For governance-platform vendors that ship LLM-powered features, an active intellectual-property or privacy lawsuit against the upstream foundation-model provider can materially change the risk picture mid-contract. We maintain a continuously updated database of AI-related lawsuits, rulings, and settlements at ailawsuittracker.com that procurement and risk teams can use to flag named defendants in vendor sub-processor chains before signing or renewing.

Compare vendor security postures and find accredited auditors at /auditors. See tool comparisons at /best/eu-ai-act-compliance-tools.

Red Flags During the Demo

The demo is a vendor's best foot forward. Anything wrong here will be worse in production.

"Full compliance" with no artifacts. A vendor who claims EU AI Act coverage but cannot produce a sample Technical File or control mapping during the demo has coverage only in their marketing.
Framework breadth without depth. Twelve regulatory frameworks listed on a website, generic controls for each — less useful than deep automated coverage of the three frameworks you actually need.
Demo data that never misbehaves. Ask to see what happens when a model fails a governance check. If the happy path is all they show, the failure path is untested.
Unexplained bias metrics. Any vendor providing fairness testing must explain which fairness definition is applied (disparate impact, equalized odds, demographic parity) and why. "It checks for bias" is not a technical answer.
No hard deployment gates. A governance check that warns but does not block is advisory, not governance. Confirm whether checks can halt CI/CD pipeline promotion.
Single foundation model dependency. A vendor whose own platform runs entirely on one LLM provider creates a fragility that becomes your problem if that provider changes pricing or deprecates the model.

Pricing Transparency and Procurement Traps

Most enterprise AI governance vendors require a discovery call before disclosing pricing. Common pricing structures include per-AI-use-case or per-model fees, per-seat fees, platform-plus-consumption, and token-based fees for LLM monitoring. For high-volume LLM deployments, token-based fees can produce significant and hard-to-forecast costs.

Contract terms to review with legal:

Data portability on termination. Can you export all governance records, audit logs, model cards, and assessment results in a machine-readable format? What is the export window? A vendor who controls your compliance evidence has leverage at renewal.
Auto-renewal and price escalation. Cap escalation at CPI or a fixed percentage.
Framework update commitments. If regulation changes, is the vendor contractually required to update their policy packs within a defined timeline at no additional cost?
IP ownership of governance artifacts. Model cards and impact assessments generated using the vendor's platform should be explicitly owned by your organization.
Liability caps. Standard SaaS liability caps typically do not meaningfully compensate for a regulatory penalty caused by missing audit trail functionality. Understand your actual risk position.

When vendors present ROI analyses citing Forrester's Total Economic Impact methodology, recognize that these studies are commissioned by the vendor. Build your own business case using your actual model count, compliance team cost, and audit frequency.

Reference Checks: What to Actually Ask

Vendor-provided references are pre-selected satisfied customers. Most reference calls confirm only that the vendor demos well. Effective AI procurement reference checks require specific questions designed to surface production reality.

Request references who meet these criteria: Same industry vertical; comparable AI portfolio size; deployed within the last 12 months; has completed at least one compliance review or audit using the platform.

Questions that surface production reality:

"Walk me through implementation — from contract signature to your first live governance review. How long did it actually take, and where did you hit friction?"

"Where did the platform fail to work as demonstrated in the sales cycle?" Every deployment has gaps. A reference who names them is being candid; one who insists everything was perfect has been coached.

"When a regulatory requirement changed, how did the vendor update their framework coverage? How long did it take?" This tests whether "automatic framework updates" is real.

"What does your governance team actually spend time on in the platform versus what you hoped to automate?" This surfaces manual effort the demo made look automatic.

"What did actual costs look like compared to procurement projections? Were there usage charges that surprised you?" References who have completed a full annual billing cycle give ground truth that no pricing model will.

"If you were renegotiating the contract today, what would you change?" This frequently reveals gaps in data export provisions, framework update SLAs, or liability allocations that were not apparent until production.

Final Scorecard and Decision Framework

Use this scorecard to structure final vendor comparison. Weight each pillar to your organizational priorities. Any pillar marked M that scores below 3 requires explicit sign-off from CISO, General Counsel, and Chief Compliance Officer before advancing to contract.

Pillar	Weight	Score (1–5)	Notes
Regulatory framework coverage	High
Evidence and audit-trail capabilities	High
Model lifecycle integration	Medium
Data residency and deployment options	High (M if EU/GDPR)
Bias, fairness, and red-teaming	High
LLM and agentic AI support	Medium–High
Vendor stability	Medium
Security posture (SOC 2 II, ISO 27001)	M

Decision thresholds: - Average ≥ 4.0, no pillar below 3 → Advance to contract negotiation - Average 3.0–3.9, no pillar below 2 → Advance with documented risk acceptance - Any pillar at 1–2 → Require remediation plan with contractual milestones, or disqualify

Pre-signature checklist: - [ ] Legal reviewed DPA, IP ownership clause, and exit provisions - [ ] IT security reviewed full SOC 2 Type II report and pen test executive summary - [ ] Compliance validated framework coverage maps to your specific obligations (not just the frameworks in general) - [ ] Contractual data portability provision in place with defined export window - [ ] Framework update SLA defined (e.g., vendor updates policy packs within 90 days of material regulatory guidance changes) - [ ] Reference checks documented, scored, and filed

Frequently Asked Questions

Do we need a dedicated AI governance platform, or will our existing GRC tool work?

For organizations with fewer than 10 AI use cases (all traditional ML), a well-configured GRC tool with custom AI risk templates may be sufficient. However, general GRC platforms cannot embed governance into CI/CD pipelines, auto-generate model cards, or monitor LLM inputs and outputs. As AI adoption scales, manual GRC-based governance becomes unsustainable. Most enterprises with more than 20 active AI use cases benefit from a purpose-built platform. Many organizations run both: a GRC tool (OneTrust, ServiceNow) for privacy and vendor risk, and a dedicated AI governance platform for AI-specific lifecycle controls.

How do we evaluate EU AI Act coverage when implementing guidance is still evolving?

Focus on the vendor's process, not their current state. Who on their team tracks EU AI Office guidance? What is their contractual commitment to update policy packs when guidance changes? Look for vendors who publish analysis of new guidance within weeks of publication. A vendor who is reactive to regulatory change will leave your compliance posture perpetually behind.

Our models are built by third-party vendors, not internally. Does AI governance software still apply?

Yes — this is one of the highest-value use cases for AI governance platforms. As a deployer of third-party AI, you remain responsible under the EU AI Act for risk assessments, usage logs, and demonstrated human oversight even if you did not build the model. Colorado SB 24-205 similarly requires deployers to conduct impact assessments and maintain risk programs regardless of who built the underlying model.

What security certifications should be non-negotiable?

SOC 2 Type II is the minimum credible baseline. ISO 27001 is increasingly expected in financial services and healthcare. Neither certification covers AI-specific risks such as model extraction or prompt injection. Budget for AI-specific security questions beyond what certifications verify, and request the most recent external pen test executive summary explicitly asking whether the vendor's own AI features were in scope.

A vendor claims they will support a regulatory framework that is not yet finalized. How do we handle this?

Ask for the specific regulatory text they are tracking (draft, proposed rulemaking, published guidance), the date of the version, and how they handle changes before final publication. Get contractual language committing to updates after the final regulation publishes, without additional cost. Do not pay a framework-readiness premium based on a draft unless the contract includes an update obligation with defined timelines.

Sources and Further Reading

NIST AI RMF Playbook — Govern, Map, Measure, Manage functions: https://airc.nist.gov/AI_RMF_Knowledge_Base/Playbook
ISO/IEC 42001:2023 — AI Management Systems: https://www.iso.org/standard/81230.html
EU AI Act — Article 16 (Provider obligations): https://artificialintelligenceact.eu/article/16/
EU AI Act — Article 43 (Conformity Assessment): https://artificialintelligenceact.eu/article/43/
NYC DCWP — Automated Employment Decision Tools (Local Law 144): https://www.nyc.gov/site/dca/about/automated-employment-decision-tools.page
Colorado Attorney General — SB 24-205 AI Rulemaking: https://coag.gov/ai/
Texas TRAIGA — Wiley Rein analysis: https://www.wiley.law/alert-Texas-Responsible-AI-Governance-Act-Enacted
Credo AI — Regulations and standards coverage: https://www.credo.ai/solutions/regulations-and-standards
OneTrust — AI Governance platform: https://www.onetrust.com/solutions/ai-governance/
Pertama Partners — AI Vendor Certifications (SOC 2, ISO 27001): https://www.pertamapartners.com/insights/ai-vendor-certifications-soc2-iso27001
Enterprise AI Procurement — Reference Check Guide: https://www.enterpriseaiprocurement.com.au/enterprise-ai-vendor-reference-checks/
EU AI Act — Deployer obligations (Sentra): https://www.sentra.io/learn/eu-ai-act-compliance-what-enterprise-ai-deployers-need-to-know
Colorado SB 24-205 Compliance Guide (TrustArc): https://trustarc.com/resource/colorado-ai-law-sb24-205-compliance-guide/
Forrester TEI Methodology: https://www.forrester.com/policies/tei/
AI Vendor Red Flags Checklist (Aikaara): https://aikaara.com/blog/ai-vendor-evaluation-red-flags
Gartner Market Guide for AI Governance Platforms (Credo AI summary): https://www.credo.ai/gartner-market-guide-for-ai-governance-platforms
A&O Shearman — EU AI Act obligations for high-risk AI systems: https://www.aoshearman.com/en/insights/ao-shearman-on-tech/zooming-in-on-ai-10-eu-ai-act-what-are-the-obligations-for-high-risk-ai-systems
AI data residency by region — Prem AI: https://blog.premai.io/ai-data-residency-requirements-by-region-the-complete-enterprise-compliance-guide/
NYC Local Law 144 compliance guide (FairNow): https://fairnow.ai/guide/nyc-local-law-144/
Top enterprise AI governance tools 2026 (Reco): https://www.reco.ai/compare/ai-governance-tools

Compare AI compliance auditors at [/auditors](/auditors) | EU AI Act tool comparison: [/best/eu-ai-act-compliance-tools](/best/eu-ai-act-compliance-tools)

For a deeper procurement-side analysis of vendor lock-in risk and data portability, see the /guides/ai-vendor-independence guide. For AI bill of materials (AI-BOM) requirements under EU AI Act Annex IV, see /guides/ai-bom-tools.