Notebook 07 — Security at Runtime¶
Premise: Notebook 06 showed the audit chain — forensic verification of what happened. This notebook shows the active defenses: things that fire at runtime to prevent harm before it lands in the audit log.
These are the controls federal auditors and lab CISOs ask about first:
- PII redaction at the LLM boundary — sensitive data never leaves the agent in plaintext.
- Sandboxed code execution — the agent can write and run Python without ever touching the host's filesystem or network.
- Tool allowlists — the agent literally cannot call a tool that's not in its sandbox.
- PII-safe audit logging — what gets written to logs is metadata + classification, never raw content.
- Token / cost budgets — runaway loops trip a circuit breaker, not your finance department.
By the end you'll have run each one and seen its effect on the wire — what the LLM saw vs. what the user wrote, what was denied vs. allowed, and how each event landed in the audit chain.
Setup¶
from dotenv import load_dotenv
load_dotenv()
from arcllm import load_model, Message
from arcllm._pii import RegexPiiDetector, redact_text
from arcrun import run, SandboxConfig, make_execute_tool
from rich import print
from rich.panel import Panel
from rich.console import Console
from rich.table import Table
console = Console()
1. PII detection — what's in the message?¶
ArcLLM ships a regex-based PII detector with built-in patterns for SSN, credit card, email, IP address, and phone number. It's pluggable — you can swap in a Presidio or Microsoft DLP backend for higher- stakes deployments — but the regex backend is fine for demos and most lab use cases.
RAW = (
'Hi, please email jane.doe@nationallab.gov about subject 47. '
'Her SSN is 123-45-6789 and the corp card on file is 4242 4242 4242 4242. '
'My cell is 555-867-5309.'
)
detector = RegexPiiDetector()
matches = detector.detect(RAW)
tbl = Table(title='PII findings')
tbl.add_column('type', style='red')
tbl.add_column('span')
tbl.add_column('matched text')
for m in matches:
tbl.add_row(m.pii_type, f'{m.start}-{m.end}', repr(m.matched_text))
console.print(tbl)
console.print(Panel(redact_text(RAW, matches), title='redacted', border_style='green'))
2. Plug it into the LLM call — SecurityModule¶
Setting security={'pii_enabled': True} on load_model() wraps
the adapter so that every outbound message is redacted before
the network call, and every inbound response is scanned on the
way back. The model literally cannot see the raw SSN.
Watch the response — the model tells us it sees [PII:SSN].
model_redacted = load_model('anthropic', security={'pii_enabled': True})
resp = await model_redacted.invoke([
Message(role='user', content=f'Summarize what you can see about this customer: {RAW}')
])
console.print(Panel(resp.content or '', title='LLM response (saw redacted text)', border_style='cyan'))
This is the lethal-trifecta break. Private data + external comms + untrusted input is OWASP LLM02 (Sensitive Information Disclosure). Redacting before egress severs it.
In federal mode the SecurityModule also signs the request payload (Ed25519) so the receiving end can verify what was sent.
3. Sandbox — only allowed tools fire¶
Now the loop side. SandboxConfig(allowed_tools=[...]) is a hard
permission boundary checked before every tool dispatch. The
model can ask for any tool; the sandbox will deny anything not on
the list and emit a tool.denied event you can audit.
Build two scenarios with the same tool — once in the allowlist, once not. Watch the difference.
model = load_model('anthropic')
exec_tool = make_execute_tool(timeout_seconds=10, max_output_bytes=4096)
events_open: list = []
result_open = await run(
model=model, tools=[exec_tool],
sandbox=SandboxConfig(allowed_tools=['execute_python']),
system_prompt='Use execute_python when math is needed.',
task='Compute the first 10 Fibonacci numbers and print them.',
on_event=events_open.append,
)
console.print(Panel(result_open.content or '', title='OPEN sandbox — tool allowed', border_style='green'))
n_denied = sum(1 for e in events_open if e.type == 'tool.denied')
print(f' turns={result_open.turns} tool_calls={result_open.tool_calls_made} tool.denied events: {n_denied}')
events_closed: list = []
result_closed = await run(
model=model, tools=[exec_tool],
sandbox=SandboxConfig(allowed_tools=[]), # empty allowlist => nothing fires
system_prompt='Use execute_python when math is needed.',
task='Compute the first 5 Fibonacci numbers.',
on_event=events_closed.append,
max_turns=3,
)
denied = [dict(e.data) for e in events_closed if e.type == 'tool.denied']
console.print(Panel(str(denied), title='CLOSED sandbox — tool.denied events', border_style='red'))
print(f' turns={result_closed.turns} tool_calls={result_closed.tool_calls_made}')
Same tool. Same task. Same model. The sandbox stops it cold. And
every denial is in the audit chain (verifiable end-to-end with
result.verify_integrity() from notebook 06) — so a SOC analyst
can prove that an attempted action was blocked.
4. Sandboxed code execution — make_execute_tool¶
Look at the protections layered into the local subprocess sandbox:
- Process isolation: new process group (
start_new_session=True) — the agent can't signal the host process. - Locked PATH: only
/usr/bin:/bin— nopip install foo. - Hard timeout: SIGTERM at the deadline, SIGKILL after a 5s grace — runaway scripts can't hang the loop.
- Output cap: stdout+stderr truncated at
max_output_bytesso a fork-bomb ofprint('x')can't OOM the audit log. - TempDir cwd: each invocation gets a fresh
tempfile.TemporaryDirectory— no cross-invocation FS state.
Demo it — give the agent a small computational task and watch what code it writes.
events_exec: list = []
exec_result = await run(
model=model, tools=[make_execute_tool(timeout_seconds=15, max_output_bytes=8192)],
sandbox=SandboxConfig(allowed_tools=['execute_python']),
system_prompt='Use execute_python to compute things rather than reasoning step-by-step.',
task='Find all prime numbers under 100 using a sieve. Print them comma-separated.',
on_event=events_exec.append,
)
for e in events_exec:
if e.type == 'tool.start' and e.data.get('name') == 'execute_python':
code_run = e.data.get('arguments', {}).get('code', '')
console.print(Panel(code_run, title='code the agent wrote', border_style='yellow'))
console.print(Panel(exec_result.content or '', title='final answer', border_style='cyan'))
5. Container isolation for higher tiers — make_contained_execute_tool¶
For air-gapped, federal, or anything where subprocess isn't enough,
arcrun also ships make_contained_execute_tool — Docker/Podman
container per execution. Same interface as the local tool, but the
code now runs inside a container with explicit limits.
We won't run it here (requires Docker on the demo machine), but the constructor surface tells you what's enforced:
from arcrun.builtins.contained_execute import make_contained_execute_tool
tool = make_contained_execute_tool(
image='python:3.12-slim',
timeout_seconds=30,
max_output_bytes=65536,
mem_limit='256m', # cgroup memory ceiling
cpu_period=100_000, # CFS period (microseconds)
cpu_quota=50_000, # 50% of one core
pids_limit=64, # process count cap
tmpfs_size='64m', # writable tmpfs only — no host FS
network_disabled=True, # no egress, no exfiltration
)
OOM kill becomes SandboxOOMError, exceeded timeout becomes
SandboxTimeoutError, container failures become
SandboxRuntimeError, missing runtime is SandboxUnavailableError.
Each is auditable; each is recoverable.
Federal posture progression: subprocess sandbox for personal/dev → container sandbox for enterprise/SCIF → Firecracker microVM (planned) for the strictest tiers. Same agent code; the sandbox swap is a config flag.
6. PII-safe audit by default — AuditModule¶
ArcLLM's AuditModule logs metadata only by default — provider,
model, message count, stop reason, content length, tool counts. It
does not log message content unless you explicitly opt in via
include_messages=True (DEBUG level) — and even then, ideally only
after redaction (i.e. after the SecurityModule has run).
Module stack order matters: Audit → Security → Adapter. Audit sees what Security has already redacted. There is no path where raw PII reaches the audit sink.
import logging
logging.basicConfig(level=logging.INFO, format='%(name)s %(message)s', force=True)
model_audited = load_model(
'anthropic',
security={'pii_enabled': True},
audit={'log_level': 'INFO'},
)
_ = await model_audited.invoke([Message(role='user', content='What is 2 + 2?')])
print('(audit metadata logged above by arcllm.modules.audit — no message content)')
7. Budgets — task_complete and make_budget_breach_args¶
Token and cost budgets are part of the loop, not the application
code. Set them once and the loop terminates cleanly when crossed,
emitting a task_complete with status='budget_breach' so the
auditor can see why the run stopped.
ArcRun ships a make_task_complete_tool factory — pair it with
make_budget_breach_args to convert an over-budget event into the
structured terminator. The point: the agent has a graceful stop
signal that's auditable, not a kill -9 from the runtime.
from arcrun.builtins import make_task_complete_tool, make_budget_breach_args
complete_tool = make_task_complete_tool()
# loop sees usage cross threshold → emits budget breach via task_complete
Combined with arcllm's telemetry={'budget_scope': 'agent:plasma-007'}
you get per-agent budgets enforced at the LLM boundary AND graceful
termination at the loop boundary. Belt and suspenders.
8. The full security stack on one call¶
What it looks like to compose them. Each module is opt-in via
load_model() kwargs; the loop applies the sandbox and budget;
the audit chain (notebook 06) records every event.
model = load_model(
'anthropic',
security={'pii_enabled': True, 'signing_enabled': True}, # PII redact + Ed25519 sign
audit={'log_level': 'INFO'}, # PII-safe metadata logs
telemetry={'budget_scope': 'agent:plasma-007'}, # cost ceilings
rate_limit=True, # provider TPS caps
retry=True, # exponential backoff
fallback=True, # provider failover
otel=True, # OpenTelemetry export
)
result = await run(
model=model,
tools=[make_execute_tool(timeout_seconds=10, max_output_bytes=8192)],
sandbox=SandboxConfig(allowed_tools=['execute_python']),
system_prompt=YOUR_SYSTEM,
task=YOUR_TASK,
)
assert result.verify_integrity().valid # tamper-evident chain
That's the full federal-tier posture. Every module composable, every default sane, every event audited.
Takeaway¶
| Defense | Mechanism | OWASP map |
|---|---|---|
| PII redaction | security={'pii_enabled': True} on load_model() |
LLM02 (Sensitive Disclosure) |
| Tool allowlist | SandboxConfig(allowed_tools=[...]) |
LLM06 (Excessive Agency) |
| Sandboxed code exec | make_execute_tool (subprocess) / make_contained_execute_tool (Docker) |
LLM05 + ASI05 (RCE) |
| PII-safe audit | audit={...} (metadata-only by default) |
LLM07 (log hygiene) |
| Cost / token budgets | telemetry={'budget_scope': ...} + task_complete terminator |
LLM10 (Unbounded Consumption) |
| Tamper-evident chain | verify_chain() (notebook 06) |
NIST AU-9, AU-10 |
Each defense is one keyword argument away. No code branches, no separate codepath, no "federal mode fork." Same agent code; the modules and the sandbox flip with config.
Next: 08 — The Coding Workflow. Now that you've seen the runtime defenses, see the development workflow that ensures every feature ships with them turned on.