The Black Box Has No Audit Log

By Leigh Garrity— May 8, 2026

What It Actually Is

Every layer in a neural network does the same basic thing: it takes a set of numbers as input, multiplies each one by a learned weight, sums the results, adds a bias term, and passes the output through an activation function that introduces non-linearity. Then the next layer does it again. And again. The final layer produces the output — a classification, a probability, a generated token, whatever the network was built to produce.

The weights are the whole game. They're not programmed; they're learned. During training, the network sees examples, makes predictions, measures how wrong it was, and adjusts the weights slightly to be less wrong next time. Do this across enough examples and enough iterations and the weights settle into a configuration that produces useful outputs. The weights are, in a meaningful sense, what the network "knows."

Scale makes this hard to intuit. A mid-sized language model might have 7 billion weights. A large one, 70 billion or more. Each weight is just a floating-point number. The intelligence, if that's the right word, is in the collective configuration of all of them, not in any individual parameter you could point to and explain.

Depth buys you abstraction: each layer learns to represent something more abstract than the layer before it. In image recognition systems, you can observe this fairly cleanly — early layers respond to edges and textures, middle layers to shapes and parts, later layers to objects and categories. In language models, the abstraction hierarchy is harder to describe but the principle holds. The network learns intermediate representations that are useful for the final task, and those representations emerge from training rather than being designed.

Nobody specified what those intermediate representations should be. They appeared because they were useful for reducing error. That's a genuinely strange thing to sit with.

“

• Neural network: A layered mathematical structure where each layer applies learned weights to transform inputs into increasingly abstract representations. The weights — not the architecture — encode what the network has learned.

“

• Deep: Many layers stacked sequentially. Depth enables the learning of hierarchical, abstract representations that shallow networks cannot capture, and is the primary structural reason modern AI systems perform as well as they do.

How the Mechanism Produces Behavior

Take a concrete forward pass. You feed the network an input — say, a document. That document gets converted to numbers (how that happens is a later piece). Those numbers hit the first layer. Each node in that layer computes a weighted sum of its inputs, applies an activation function, and passes a single number to every node in the next layer. The second layer does the same thing with its own weights. So does the third. And the fourth. By the time you reach the final layer, the original input has been transformed through dozens or hundreds of successive operations, each one a small mathematical step, and the output reflects the accumulated effect of all of them.

The activation function introduces non-linearity, which is doing real structural work. Without it, stacking layers would be mathematically equivalent to a single layer — you'd just be doing linear algebra on linear algebra, which collapses. The activation function is what makes depth meaningful. The most common one in modern networks, ReLU (Rectified Linear Unit), is almost embarrassingly simple: if the input is negative, output zero; otherwise, output the input unchanged. That's it. The sophistication of deep learning emerges from applying this trivial operation billions of times in a carefully structured way.

There's no reverse path to read out a reason. You can observe that a particular input produced a particular output. You can perturb the input slightly and observe how the output changes. You can use interpretability techniques to identify which parts of the input the network weighted most heavily. But you cannot trace a reasoning path. There is no equivalent of stepping through a policy evaluation engine and seeing which rule fired, which condition matched, which claim was present. The output is the end of the trail.

“

• Forward pass: The sequential transformation of an input through all layers of the network, from input to output. Each layer applies its weights; no layer stores a record of why it responded the way it did.

“

• Weights: The learned parameters of the network — floating-point numbers adjusted during training to minimize prediction error. They are the encoded knowledge of the model, distributed across billions of values with no human-readable structure.

When You'll Need This

The question surfaces in procurement conversations, not research seminars. A CISO at a mid-size civilian agency asks: "If the AI flags something as anomalous, can we audit why?" A CAIO wants to know whether the agency can satisfy emerging explainability requirements in an AI governance framework. A program manager is trying to figure out what goes in the risk assessment.

The honest answer to the CISO's question is: partially. You can log the input and the output. You can run the same input again and get the same output. You can use interpretability tools — attention visualization, feature attribution, saliency maps — to get a rough sense of what the model weighted. Producing an audit trail that looks like the ones your team already knows how to read is a different matter. The reasoning, such as it is, doesn't exist as a discrete artifact. It's the collective behavior of billions of weights, and no current tool translates that into a claim-by-claim account.

Federal accounts feel this specifically because AI governance frameworks are starting to require explainability without defining what explainability means for a deep learning system. The NIST AI Risk Management Framework uses the term. OMB guidance references it. The field's honest answer — "we can characterize behavior but not fully explain mechanism" — is not what those documents envision, and the gap between policy language and technical reality is where your buyer's anxiety lives.

“

• Interpretability gap: The absence of a mechanism for tracing a neural network's internal reasoning in the way one can trace a rule-based or policy-driven system. Active research area; no clean solution exists at present.

IDAM Concept Mapping

Where the analogy holds: A neural network functions like a policy evaluation engine in the sense that both take structured inputs and produce outputs based on encoded rules. Feed it a request; get a decision.

Where it breaks, and this is the part to hold: You can read the policy. You can open the SAML assertion and inspect every claim. You can trace an access decision through the policy evaluation log and identify exactly which rule matched, which attribute was present, which condition failed. A 70-billion-parameter model has 70 billion weights, and no tool currently exists that translates those weights into something a human can audit claim by claim. The "policy" is real; it's just encoded in a form that isn't human-readable, isn't separable into discrete rules, and wasn't written by a human in the first place.

The interpretability gap isn't a temporary limitation waiting on a software update. It's a structural property of how these systems work. Your IDAM intuition — that access decisions are auditable because the rules are inspectable — does not transfer here.

The Honest Framing

"We trained it and it works" is closer to the truth than most vendor pitches suggest. That's not a criticism of the technology; it's a description of where the science actually stands. Interpretability is a serious, active research field. Anthropic, DeepMind, and academic groups publish meaningful work on it regularly. But the current state is: we observe behavior, we test behavior, we characterize behavior — and we don't fully understand mechanism.

Carry this into the room as a precise description of what "trust" means in this context, not as a reason to distrust AI capabilities. Trusting a policy you can read is different from trusting a behavioral track record you can measure. Conflating them is where procurement conversations go sideways.

The depth is what makes these systems powerful. It's also what makes them opaque. Both are true, and the technology doesn't resolve the tension — it just asks you to manage it.