Skip to main content

One-Shot Audit Risk Calculator

Answer eight questions about how you actually run an AI audit today to see where a one-shot, drop-it-in-ChatGPT process is most likely to fail, and the failure mode most likely to bite first.

Answer all eight questions from 0 (your process already does this) to 4 (this is closer to a one-shot, drop-it-in-ChatGPT habit). Score what your process actually does today, not what you intend to fix.

Rate your process on eight questions
0 = fully disciplined practice, 4 = fully one-shot practice. Each question belongs to one of the three failure surfaces below the form.
The deliverable
Every finding cites its exact sourceFindings are a confident summary, not traceable
The deliverable
Contradictions are surfaced and flaggedOne version gets averaged away, unnoticed
The deliverable
You can point to the exact source, every timeThe honest answer is "the model thought so"
The relationship
Findings compound across engagementsEvery engagement is solved in total isolation
The relationship
Scale adds structure, not strainYou are stuffing one window past its limit
The relationship
Everything is organized and findable instantlyIt is scattered across drive, doc, and text thread
The implementation
Readiness is assessed before any solution is namedSolutions get recommended before readiness is checked
The implementation
Never. Readiness gaps surface before the pitchRegularly. The walk-back happens after the sale

How this works

A one-shot audit, dropping a client's documents into a general-purpose model and prompting it to "do an audit," does not fail because the prompt was weak. It fails in three specific places at once: the deliverable (can a finding cite where it came from), the relationship (does anything you learn carry forward, or does every engagement start from zero), and the implementation (is a solution sequenced behind a readiness check, or recommended before anyone looked).

The math here is deliberately simple and fully visible. Each of the eight questions above is answered on the same 0-4 scale (0 = disciplined practice, 4 = one-shot practice) and belongs to exactly one of the three surfaces. Every surface score is just the sum of its own questions' answers, turned into a percentage of that surface's own maximum. The overall score is the sum of all eight answers, turned into a percentage of the overall maximum. That is the entire formula: additive, equally weighted, and stated here in full. There are no hidden weights and this is not Audity's internal scoring engine, it is an illustrative model built to make the three failure surfaces concrete.

The stakes are not hypothetical. More than 80 percent of AI projects fail, roughly twice the failure rate of non-AI IT work, and the leading root cause is misunderstanding the problem and optimizing for the wrong metric, not the model (RAND, 2024). And provenance is not a nicety even when a system is grounded: in a study of leading AI legal-research tools, grounded systems still hallucinated in roughly 17 to 33 percent of queries (Stanford RegLab, 2025). If even grounded tools miss that often, an ungrounded one-shot summary is not a shortcut. It is where the risk actually lives.

Related free tools for consultants

This tool is one page of the method

The Diagnostic Discipline is the full framework behind this tool. Join the community to put it into practice with other AI transformation partners, or read the paper it comes from.