External Audits & Third-Party Evaluation

Apr 29

External audits and third-party evaluations are independent assessments of AI systems conducted by organizations outside the company that built them. These evaluations are meant to facilitate the independent validations of safety claims, external scrutiny of model behaviour, and accountability beyond a developer’s own processes. They are intended to introduce external governance pressure into AI systems.

Compliance

External audits can support compliance with the NIST AI Risk Management Framework, which encourages transparency, validation, and independent assessment of AI risks; and ISO 42001, which supports auditability and conformity assessment processes. It also enables compliance with the EU AI Act, which introduces conformity assessment requirements for high-risk systems, effectively mandating forms of external or semi-external review.

In Practice

External audits and third-party evaluation processes can include selective external red teaming and safety evaluations for frontier models, structured external research collaborations and safety testing, and limited third-party research partnerships and audits in particular contexts. External evaluations are common, but not standardized. Access is typically restricted and negotiated.

External evaluation can include red teaming partnerships, academic collaborations, regulatory and compliance audits, and benchmark-based evaluations.

Embedding Responsibility and Ethical Practices

External audits augment internal evaluations. They offer independent verification of safety claims, reduce reliance on self-reporting, and bring in the pressure of external accountability. External audits are significant because AI systems are complex and opaque, and there is a need to ensure transparency and accountability. Failures often emerge after deployment, and external audits offer a way to understand these outcomes. It adds transparency. However, it is not entirely always independent, complete, or comprehensive. There is a greater likelihood of partial (rather than full) external oversight, and accountability is often negotiated.

Kirthi Jayakumar

External Audits & Third-Party Evaluation

Build an AI Use Policy

Incident Response and Rollback Systems