Notebook 06 — Audit & The Federal Story¶

Premise: In a SCIF, 'it worked' isn't enough. An auditor needs to verify — months later, with the original event log — exactly what the agent did, in what order, who authorized it, and that nothing was altered after the fact.

Arc's answer: every event in the loop is hashed, chained, and verifiable. Tamper one byte and verification fails at the broken link.

By the end you will have:

  • Captured every event from a real run via on_event=
  • Verified the integrity of the chain with verify_chain()
  • Tampered with an event and watched verification catch it
  • Seen where this plugs into NIST 800-53 / FedRAMP audit controls
  • Sketched what the full ArcAgent adds on top (identity, policy, signing)

Setup¶

In [ ]:
import dataclasses
from pathlib import Path
from types import MappingProxyType
from dotenv import load_dotenv
load_dotenv()

from arcllm import load_model
from arcrun import run, Tool, ToolContext, verify_chain
from rich import print
from rich.table import Table
from rich.console import Console

console = Console()
model = load_model('anthropic')

1. Capture every event from a run¶

on_event= gets called for every event. They are also returned in result.events. We just collect them.

In [ ]:
async def list_files(args: dict, ctx: ToolContext) -> str:
    p = Path(args['path']).expanduser()
    return '\n'.join(sorted(f.name for f in p.iterdir())) if p.is_dir() else 'not a dir'

tool = Tool(
    name='list_files', description='List a directory.',
    input_schema={'type': 'object', 'properties': {'path': {'type': 'string'}}, 'required': ['path']},
    execute=list_files,
)

captured = []
result = await run(
    model=model,
    tools=[tool],
    system_prompt='You are a research assistant.',
    task='List the notebooks in ~/projects/ai-roadshow/notebooks/.',
    on_event=captured.append,
)

print(f'captured {len(captured)} events; result.events has {len(result.events)}')

2. Inspect the hash chain¶

Each event carries prev_hash and event_hash. Each hash incorporates the previous one — a chain. This is exactly the structure of a blockchain block, just simpler (no proof-of-work, no consensus — single-writer tamper detection).

In [ ]:
tbl = Table(title='Event chain (first 8)', show_lines=False)
tbl.add_column('seq', style='dim', width=4)
tbl.add_column('type', width=24)
tbl.add_column('prev_hash', style='dim')
tbl.add_column('event_hash')
for ev in result.events[:8]:
    tbl.add_row(str(ev.sequence), ev.type, ev.prev_hash[:12]+'…', ev.event_hash[:12]+'…')
console.print(tbl)

3. Verify integrity¶

Recompute every hash, walk the chain. Any break — sequence gap, hash mismatch, broken prev-pointer — reports the first bad index.

In [ ]:
check = result.verify_integrity()
print('valid:        ', check.valid)
print('event_count:  ', check.event_count)
print('first_broken: ', check.first_broken_index)

4. Tamper detection¶

Mutate one event's data. Watch verification fail at exactly that index. This is what an auditor's tool looks like at its core.

In [ ]:
events = list(result.events)
victim = events[2]
tampered = dataclasses.replace(victim, data=MappingProxyType({**dict(victim.data), 'hacked': True}))
events[2] = tampered

broken = verify_chain(events)
print('valid:               ', broken.valid)
print('first_broken_index:  ', broken.first_broken_index)
print('error:               ', broken.error)

5. Where this plugs in¶

NIST 800-53 (federal):

  • AU-2 Audit Events — every tool call, strategy switch, LLM call emits a typed event
  • AU-9 Protection of Audit Information — hash chain makes silent tampering detectable
  • AU-10 Non-repudiation — combined with arctrust Ed25519 signatures, events are signed by the agent's DID
  • AU-12 Audit Generation — events are emitted at the source, not reconstructed from logs

For your lab demo, the punchline: the same code runs in a Jupyter notebook on a laptop and in a FedRAMP-authorized SCIF. The auditing isn't bolted on — it's built in.

6. Where ArcAgent goes from here¶

arcrun gives you the loop + audit chain. arcagent wraps it with the rest of the federal stack:

  • Identity — every agent has a DID, every event is signed
  • Skill signing — skills are bundled, signed, and verified before load
  • Policy engine — every tool call is authorized (allow/deny) by a multi-layer policy pipeline (first-deny-wins, fail-closed)
  • Trace store — persistent, queryable trace history for post-incident review
  • OpenTelemetry export — feeds your existing SOC stack

Sketch (not run here — production agents need a arcagent.toml):

from arcagent.core.agent import ArcAgent
agent = await ArcAgent.from_config('arcagent.toml')
result = await agent.run('Analyze experiment log 42')
# → identity verified · skill signatures checked · policy enforced
# → tool calls authorized · every event audited & signed

Takeaway¶

  • The audit trail is the agent's diary. Every step. Hashed. Chained. Tamper-evident.
  • Verification is one function call: verify_chain(events).
  • This is not federal-mode-only. Every Arc agent emits this chain, at every tier. Federal turns the strictness up; the mechanism is identical.

Next: 07 — Security at Runtime. Audit proves what happened. Now: the active defenses that prevent harm at runtime — PII redaction, sandboxed code execution, allowlists.