Model Risk Management for Banks: Integrating SR 11-7 and OCC 2011-12 with AI Governance

SR 11-7 and OCC 2011-12 still govern model risk at banks—but ML and LLMs demand new validation. Here's what examiners expect from AI governance in 2026.

By ACV Editorial · April 22, 2026 · 12 min read · Last reviewed April 22, 2026

Model Risk Management for Banks: Integrating SR 11-7 and OCC 2011-12 with AI Governance

Model risk management in banking has a clear regulatory foundation: the Federal Reserve's SR 11-7, issued April 4, 2011, and its near-identical companion OCC Bulletin 2011-12, issued the same month. These two documents — which the FDIC formally adopted via FIL-22-2017 in 2017 — define supervisory expectations for model development, validation, and governance across all U.S. federally supervised institutions.

For traditional credit, market, and operational risk models, the SR 11-7 framework has proven durable. Its core principles — effective challenge, independent validation, conceptual soundness, outcomes analysis — translate reasonably well to gradient-boosted trees and even some traditional ML applications.

They translate much less cleanly to large language models, generative AI, and agentic systems. The gap between what SR 11-7 was designed to govern and what banks are actually deploying in 2026 is the central model risk management problem of this decade. This post maps the existing framework, identifies where it breaks down for modern AI, and explains what examiners are actually looking for.

SR 11-7: The Enduring Foundation

SR 11-7 defines a model as a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories to process input data into quantitative estimates. This definition — though drafted with traditional econometric models in mind — is broad enough to encompass machine learning systems, which the OCC's 2021 Model Risk Management Handbook explicitly confirmed.

The guidance organizes model risk management across three pillars:

Pillar 1: Model Development, Implementation, and Use

Models must be developed with a clear statement of purpose, sound theoretical grounding, rigorous assessment of data quality, and complete documentation. Developers bear responsibility for demonstrating that model logic is appropriate, that assumptions are explicitly stated, and that the model is not used beyond its intended scope.

For ML models, "clear statement of purpose" becomes more demanding: a gradient-boosted credit model trained on historical loan performance may generalize poorly to populations underrepresented in training data, a limitation that must be explicitly documented and communicated to business users.

Pillar 2: Model Validation

SR 11-7 requires validation to be conducted by parties independent of model development — typically a dedicated Model Risk Management function or, for smaller institutions, an appropriately segregated internal team. The core validation elements are:

  • Evaluation of conceptual soundness: Does the theoretical foundation support the model's intended use? Are design choices and variable selections consistent with published research and sound industry practice?
  • Ongoing monitoring: Is the model performing as intended? Are there signs of data drift, population shift, or degraded performance that require recalibration or redevelopment?
  • Outcomes analysis: Are model outputs consistent with actual observed outcomes? Back-testing against holdout samples is the standard technique.

The guidance specifies that validation should occur at least annually for each model, more frequently for models with higher risk profiles or material changes.

Pillar 3: Governance, Policies, and Controls

SR 11-7 places ultimate accountability for model risk at the board and senior management level. Management must maintain a comprehensive model inventory, define clear roles for developers and validators, set appropriate policies and procedures, and ensure that model limitations are communicated to decision-makers. The creation and maintenance of a model inventory — documenting purpose, ownership, validation status, and known limitations — is a core governance deliverable.

The 2021 Interagency RFI: A Regulatory Signal

In March 2021, the Federal Reserve, FDIC, OCC, CFPB, and NCUA jointly issued a Request for Information on Financial Institutions' Use of Artificial Intelligence (extended to July 1, 2021). The RFI sought public comment on how financial institutions use AI, governance challenges, and whether regulatory clarification would be helpful.

The RFI was notable for three things it highlighted as novel risk management challenges specific to AI: explainability (how AI uses inputs to produce outputs), data usage (including potential bias), and dynamic updating (models that retrain on new data without explicit human intervention). These three dimensions — none of which was a primary concern in 2011 — signal where supervisory expectations were heading.

The RFI produced no binding guidance, but it established that the five agencies view existing MRM frameworks (SR 11-7 explicitly referenced) as applicable to AI/ML systems and that clarifications, rather than entirely new rules, are the likely near-term regulatory approach.

What Changes With Machine Learning and LLMs

Explainability and Effective Challenge

SR 11-7's "effective challenge" requirement — critical analysis by objective, informed parties — assumes that model outputs can be interrogated with sufficient technical depth to identify errors in conceptual design or implementation. For traditional statistical models, this is tractable: coefficients have economic interpretations, assumptions can be tested, and performance can be decomposed.

For gradient-boosted trees, neural networks, and especially large language models, effective challenge requires different competencies. The MRM function must be able to:

  • Evaluate whether SHAP values or other post-hoc explanation methods accurately represent feature importance in the deployed model
  • Identify whether observed model behavior reflects intended design or emergent properties of the training process
  • Assess whether the model's behavior under distribution shift (data from populations not well-represented in training) is within acceptable limits

Platforms like Fiddler AI and Arthur AI have built validation tooling specifically for these ML-specific challenges — providing drift detection, bias analysis across demographic segments, and explainability in formats that MRM teams can include in validation reports and present to examiners.

Challenger Models and Benchmarking

SR 11-7 calls for benchmarking as part of ongoing monitoring — comparing a model's inputs and outputs to estimates from alternative models. For traditional models, a challenger model is typically a simpler linear regression or a model from an alternative vendor.

For LLMs used in document summarization, compliance screening, or customer communication, identifying an appropriate challenger model is non-trivial. What constitutes an alternative benchmark for a fine-tuned GPT-4-class model processing loan application narratives? MRM teams are increasingly relying on ensemble evaluation — running the same input against multiple models and comparing outputs — and human evaluator baselines where subject matter experts assess model output quality on structured samples.

Generative AI Validation: Three New Dimensions

Researchers at Wells Fargo published a detailed framework in 2024 for extending SR 11-7's three validation pillars to generative AI applications, identifying three categories of heightened risk unique to LLMs:

Hallucinations and factual accuracy: Unlike traditional models that produce a single output within a bounded range, LLMs can generate plausible-sounding but factually incorrect text. Validation frameworks must include structured factual accuracy testing, retrieval-augmented generation (RAG) quality assessment, and faithfulness metrics for summarization tasks.

Toxicity and content safety: LLMs used in customer-facing applications require toxicity screening, demographic bias testing in generated content, and red-teaming protocols designed to surface harmful outputs under adversarial inputs.

Prompt sensitivity and robustness: Small changes in prompt framing can produce materially different outputs — a characteristic with no analog in traditional models. MRM teams must test prompt robustness across paraphrase variations and assess whether model behavior is stable enough to support governance-level accountability.

For each of these dimensions, validation documentation must include the testing methodology, the evaluation criteria, the results, and the remediation plan for identified failures. WhyLabs and Fiddler AI provide continuous monitoring capabilities that extend these validation tests to production environments, enabling ongoing monitoring requirements to be met for generative AI systems.

Third-Party and Vendor Model Risk

SR 11-7 explicitly covers vendor models: banks are responsible for applying the same validation rigor to externally sourced models as to internally developed ones. The guidance states that banks must require vendors to provide developmental evidence — model components, design rationale, intended use, known limitations.

In 2011, this applied primarily to off-the-shelf credit scoring and fraud detection models. In 2026, it applies to foundation models licensed from OpenAI, Anthropic, or Google; fine-tuned models from AI specialist vendors; and embedded AI features in core banking platforms from Temenos, FIS, or Fiserv.

The practical challenge is that foundation model providers rarely disclose training data composition, architecture details, or systematic bias testing results at the level SR 11-7 contemplates. Banks have addressed this through:

  • Behavioral testing: Validating model outputs on bank-specific test sets rather than relying on vendor-provided performance metrics
  • Scope limitation: Restricting LLM use to applications where output errors are detectable and consequential decisions are subject to human review
  • Contractual requirements: Incorporating SR 11-7 documentation obligations into vendor agreements, triggering notification requirements if vendor models are substantially modified

The April 2021 Interagency Statement on Model Risk Management for Bank Systems Supporting BSA/AML compliance reinforced this point directly: "Banks are ultimately responsible for complying with BSA/AML requirements, even if they choose to use third-party models."

Monitaur and Saidot offer governance platforms that manage third-party model inventory and validation documentation, providing audit trails that support examiner-facing evidence packages for vendor model oversight.

What Bank Examiners Are Looking For in 2026

Bank supervisors have not issued new AI-specific model risk guidance to replace SR 11-7. Instead, they have issued supplemental guidance that interprets SR 11-7 in the AI context:

  • OCC Bulletin 2025-26 for community banks: Reinforces that model validation frequency and scope should be calibrated to risk exposure and complexity; annual validation is not mandatory for all models — but high-risk AI applications generally warrant it.
  • Financial Services Sector Coordinating Council (FSSCC) AI Risk Management Framework (released late 2025): 230 control objectives mapped to NIST AI RMF, covering governance, data, model development, validation, monitoring, and third-party risk. Treasury simultaneously released an AI Lexicon to standardize terminology. While not binding, this framework is expected to become examination scaffolding in the same way FFIEC guidelines inform IT examinations.

In practice, examination teams are asking:

  1. Do you have a comprehensive model inventory that includes AI/ML systems? Any gap here — shadow models, vendor tools not formally inventoried — is a finding.
  2. Can you demonstrate independent validation for your highest-risk AI applications? Independence means the validator had no role in development and no stake in the outcome.
  3. How do you validate explainability and fairness for models used in credit decisioning? Fair lending risk from AI credit models is a supervisory priority; MRM teams must demonstrate bias testing protocols.
  4. What are your controls for generative AI? Hallucination risk, content safety, and scope limitation are the primary examiner concerns for LLM deployments.
  5. How do you manage third-party model risk? Vendor onboarding due diligence, contractual documentation requirements, and ongoing performance monitoring for vendor AI models.

Practical Integration: Building the AI-Extended MRM Framework

Organizations that have operated mature SR 11-7 frameworks for traditional models are in a better position than those starting from scratch — but the AI extension requires deliberate effort:

Expand the model inventory taxonomy: Add fields for model type (traditional, ML, LLM, generative), training data lineage, third-party provenance, explainability method used, bias testing results, and known limitations specific to AI.

Differentiate validation intensity by AI risk tier: High-risk AI applications (credit decisioning, fraud detection, regulatory capital models) warrant full independent validation. Lower-risk applications (internal productivity tools, research summarization) may qualify for lighter-touch monitoring with periodic spot validation.

Build LLM-specific validation protocols: Document your methodology for hallucination testing, prompt robustness assessment, and content safety evaluation — and ensure these are in the validation report format that examiners will review.

Operationalize ongoing monitoring for drift: For models that retrain on streaming data or are updated by vendors without explicit notification, continuous monitoring platforms provide the ongoing monitoring evidence that SR 11-7 requires. Fiddler AI, Arthur AI, and WhyLabs each offer production monitoring with configurable alert thresholds and automated performance reporting.

Align with emerging financial services AI standards: The NIST AI RMF and the FSSCC AI RMF are both voluntary but increasingly represent the baseline against which examiner expectations will be calibrated. Organizations that can demonstrate alignment gain both a governance framework and a credible narrative for examination.

For institutions with financial services-specific AI governance needs, the intersection of SR 11-7 compliance, fair lending requirements, and emerging generative AI risk creates a governance surface that purpose-built MRM tools handle more efficiently than general-purpose GRC platforms.

Key Takeaways

  • SR 11-7 and OCC 2011-12 remain the governing frameworks for model risk management at U.S. banks — they have not been replaced by AI-specific guidance; they have been extended.
  • Three pillars — development/implementation/use, validation, governance — apply to AI/ML systems; the 2021 OCC MRM Handbook and the 2021 interagency RFI confirmed this explicitly.
  • Machine learning and LLMs introduce distinct validation challenges: explainability, hallucination testing, prompt robustness, and adversarial evaluation have no direct analogs in traditional MRM.
  • Vendor model risk is fully in scope: banks cannot offload SR 11-7 obligations by using foundation model providers; behavioral testing on bank-specific data is the primary mitigation when vendor documentation is insufficient.
  • Examiners in 2026 prioritize: model inventory completeness (including AI), validation independence, fair lending / bias testing for credit models, and generative AI scope controls.
  • The FSSCC AI RMF (230 control objectives, released late 2025) is expected to become de facto examination scaffolding for AI governance.
  • Tools from Fiddler AI, Arthur AI, WhyLabs, and Monitaur directly address the monitoring, validation, and documentation requirements that SR 11-7 imposes on AI/ML deployments.

Sources

  1. SR 11-7: Supervisory Guidance on Model Risk Management (Federal Reserve, April 2011) — https://www.federalreserve.gov/supervisionreg/srletters/sr1107.htm
  2. Agencies Seek Views on Financial Institutions' Use of Artificial Intelligence (OCC, March 2021) — https://www.occ.treas.gov/news-issuances/news-releases/2021/nr-ia-2021-39.html
  3. Request for Information on Artificial Intelligence (FDIC, March 2021) — https://www.fdic.gov/news/financial-institution-letters/2021/fil21020.html
  4. Model Risk Management for Generative AI in Financial Institutions (Wells Fargo / arXiv, 2024) — https://arxiv.org/pdf/2503.15668
  5. How to Comply with OCC 2011-12 in Model Risk Management (ValidMind, 2025) — https://validmind.com/blog/mrm-teams-and-how-to-comply-with-occ-2011-12/
  6. Financial Services AI Risk Management Framework — 230 Control Objectives (Lowenstein Sandler, 2026) — https://www.lowenstein.com/news-insights/publications/client-alerts/financial-services-ai-risk-management-framework-operationalizing-the-230-control-objectives-before-the-market-wakes-up-data-privacy
  7. Managing AI Model Risk in Financial Institutions (Kaufman Rossin, 2025) — https://kaufmanrossin.com/blog/managing-ai-model-risk-in-financial-institutions-best-practices-for-compliance-and-governance/
  8. AI Risk Management in Banking: Detecting, Governing Model Risk (Cygeniq, 2026) — https://cygeniq.ai/blog/ai-risk-management-in-banking/
  9. SR 11-7 Model Risk Management — ModelOp — https://www.modelop.com/ai-governance/ai-regulations-standards/sr-11-7
  10. Artificial Intelligence in Financial Services — U.S. Treasury (December 2024) — https://home.treasury.gov/system/files/136/Artificial-Intelligence-in-Financial-Services.pdf

Keep reading