Notebook 06 — Audit & The Federal Story¶
Premise: In a SCIF, 'it worked' isn't enough. An auditor needs to verify — months later, with the original event log — exactly what the agent did, in what order, who authorized it, and that nothing was altered after the fact.
Arc's answer: every event in the loop is hashed, chained, and verifiable. Tamper one byte and verification fails at the broken link.
By the end you will have:
- Captured every event from a real run via
on_event= - Verified the integrity of the chain with
verify_chain() - Tampered with an event and watched verification catch it
- Seen where this plugs into NIST 800-53 / FedRAMP audit controls
- Sketched what the full
ArcAgentadds on top (identity, policy, signing)
Setup¶
import dataclasses
from pathlib import Path
from types import MappingProxyType
from dotenv import load_dotenv
load_dotenv()
from arcllm import load_model
from arcrun import run, Tool, ToolContext, verify_chain
from rich import print
from rich.table import Table
from rich.console import Console
console = Console()
model = load_model('anthropic')
1. Capture every event from a run¶
on_event= gets called for every event. They are also returned in
result.events. We just collect them.
async def list_files(args: dict, ctx: ToolContext) -> str:
p = Path(args['path']).expanduser()
return '\n'.join(sorted(f.name for f in p.iterdir())) if p.is_dir() else 'not a dir'
tool = Tool(
name='list_files', description='List a directory.',
input_schema={'type': 'object', 'properties': {'path': {'type': 'string'}}, 'required': ['path']},
execute=list_files,
)
captured = []
result = await run(
model=model,
tools=[tool],
system_prompt='You are a research assistant.',
task='List the notebooks in ~/projects/ai-roadshow/notebooks/.',
on_event=captured.append,
)
print(f'captured {len(captured)} events; result.events has {len(result.events)}')
2. Inspect the hash chain¶
Each event carries prev_hash and event_hash. Each hash incorporates
the previous one — a chain. This is exactly the structure of a
blockchain block, just simpler (no proof-of-work, no consensus —
single-writer tamper detection).
tbl = Table(title='Event chain (first 8)', show_lines=False)
tbl.add_column('seq', style='dim', width=4)
tbl.add_column('type', width=24)
tbl.add_column('prev_hash', style='dim')
tbl.add_column('event_hash')
for ev in result.events[:8]:
tbl.add_row(str(ev.sequence), ev.type, ev.prev_hash[:12]+'…', ev.event_hash[:12]+'…')
console.print(tbl)
3. Verify integrity¶
Recompute every hash, walk the chain. Any break — sequence gap, hash mismatch, broken prev-pointer — reports the first bad index.
check = result.verify_integrity()
print('valid: ', check.valid)
print('event_count: ', check.event_count)
print('first_broken: ', check.first_broken_index)
4. Tamper detection¶
Mutate one event's data. Watch verification fail at exactly that index. This is what an auditor's tool looks like at its core.
events = list(result.events)
victim = events[2]
tampered = dataclasses.replace(victim, data=MappingProxyType({**dict(victim.data), 'hacked': True}))
events[2] = tampered
broken = verify_chain(events)
print('valid: ', broken.valid)
print('first_broken_index: ', broken.first_broken_index)
print('error: ', broken.error)
5. Where this plugs in¶
NIST 800-53 (federal):
- AU-2 Audit Events — every tool call, strategy switch, LLM call emits a typed event
- AU-9 Protection of Audit Information — hash chain makes silent tampering detectable
- AU-10 Non-repudiation — combined with
arctrustEd25519 signatures, events are signed by the agent's DID - AU-12 Audit Generation — events are emitted at the source, not reconstructed from logs
For your lab demo, the punchline: the same code runs in a Jupyter notebook on a laptop and in a FedRAMP-authorized SCIF. The auditing isn't bolted on — it's built in.
6. Where ArcAgent goes from here¶
arcrun gives you the loop + audit chain. arcagent wraps it with
the rest of the federal stack:
- Identity — every agent has a DID, every event is signed
- Skill signing — skills are bundled, signed, and verified before load
- Policy engine — every tool call is authorized (allow/deny) by a multi-layer policy pipeline (first-deny-wins, fail-closed)
- Trace store — persistent, queryable trace history for post-incident review
- OpenTelemetry export — feeds your existing SOC stack
Sketch (not run here — production agents need a arcagent.toml):
from arcagent.core.agent import ArcAgent
agent = await ArcAgent.from_config('arcagent.toml')
result = await agent.run('Analyze experiment log 42')
# → identity verified · skill signatures checked · policy enforced
# → tool calls authorized · every event audited & signed
Takeaway¶
- The audit trail is the agent's diary. Every step. Hashed. Chained. Tamper-evident.
- Verification is one function call:
verify_chain(events). - This is not federal-mode-only. Every Arc agent emits this chain, at every tier. Federal turns the strictness up; the mechanism is identical.
Next: 07 — Security at Runtime. Audit proves what happened. Now: the active defenses that prevent harm at runtime — PII redaction, sandboxed code execution, allowlists.