Notebook 07 — Security at Runtime¶

Premise: Notebook 06 showed the audit chain — forensic verification of what happened. This notebook shows the active defenses: things that fire at runtime to prevent harm before it lands in the audit log.

These are the controls federal auditors and lab CISOs ask about first:

  1. PII redaction at the LLM boundary — sensitive data never leaves the agent in plaintext.
  2. Sandboxed code execution — the agent can write and run Python without ever touching the host's filesystem or network.
  3. Tool allowlists — the agent literally cannot call a tool that's not in its sandbox.
  4. PII-safe audit logging — what gets written to logs is metadata + classification, never raw content.
  5. Token / cost budgets — runaway loops trip a circuit breaker, not your finance department.

By the end you'll have run each one and seen its effect on the wire — what the LLM saw vs. what the user wrote, what was denied vs. allowed, and how each event landed in the audit chain.

Setup¶

In [ ]:
from dotenv import load_dotenv
load_dotenv()

from arcllm import load_model, Message
from arcllm._pii import RegexPiiDetector, redact_text
from arcrun import run, SandboxConfig, make_execute_tool
from rich import print
from rich.panel import Panel
from rich.console import Console
from rich.table import Table

console = Console()

1. PII detection — what's in the message?¶

ArcLLM ships a regex-based PII detector with built-in patterns for SSN, credit card, email, IP address, and phone number. It's pluggable — you can swap in a Presidio or Microsoft DLP backend for higher- stakes deployments — but the regex backend is fine for demos and most lab use cases.

In [ ]:
RAW = (
    'Hi, please email jane.doe@nationallab.gov about subject 47. '
    'Her SSN is 123-45-6789 and the corp card on file is 4242 4242 4242 4242. '
    'My cell is 555-867-5309.'
)

detector = RegexPiiDetector()
matches = detector.detect(RAW)

tbl = Table(title='PII findings')
tbl.add_column('type', style='red')
tbl.add_column('span')
tbl.add_column('matched text')
for m in matches:
    tbl.add_row(m.pii_type, f'{m.start}-{m.end}', repr(m.matched_text))
console.print(tbl)

console.print(Panel(redact_text(RAW, matches), title='redacted', border_style='green'))

2. Plug it into the LLM call — SecurityModule¶

Setting security={'pii_enabled': True} on load_model() wraps the adapter so that every outbound message is redacted before the network call, and every inbound response is scanned on the way back. The model literally cannot see the raw SSN.

Watch the response — the model tells us it sees [PII:SSN].

In [ ]:
model_redacted = load_model('anthropic', security={'pii_enabled': True})

resp = await model_redacted.invoke([
    Message(role='user', content=f'Summarize what you can see about this customer: {RAW}')
])
console.print(Panel(resp.content or '', title='LLM response (saw redacted text)', border_style='cyan'))

This is the lethal-trifecta break. Private data + external comms + untrusted input is OWASP LLM02 (Sensitive Information Disclosure). Redacting before egress severs it.

In federal mode the SecurityModule also signs the request payload (Ed25519) so the receiving end can verify what was sent.

3. Sandbox — only allowed tools fire¶

Now the loop side. SandboxConfig(allowed_tools=[...]) is a hard permission boundary checked before every tool dispatch. The model can ask for any tool; the sandbox will deny anything not on the list and emit a tool.denied event you can audit.

Build two scenarios with the same tool — once in the allowlist, once not. Watch the difference.

In [ ]:
model = load_model('anthropic')
exec_tool = make_execute_tool(timeout_seconds=10, max_output_bytes=4096)

events_open: list = []
result_open = await run(
    model=model, tools=[exec_tool],
    sandbox=SandboxConfig(allowed_tools=['execute_python']),
    system_prompt='Use execute_python when math is needed.',
    task='Compute the first 10 Fibonacci numbers and print them.',
    on_event=events_open.append,
)
console.print(Panel(result_open.content or '', title='OPEN sandbox — tool allowed', border_style='green'))
n_denied = sum(1 for e in events_open if e.type == 'tool.denied')
print(f'  turns={result_open.turns}  tool_calls={result_open.tool_calls_made}  tool.denied events: {n_denied}')
In [ ]:
events_closed: list = []
result_closed = await run(
    model=model, tools=[exec_tool],
    sandbox=SandboxConfig(allowed_tools=[]),  # empty allowlist => nothing fires
    system_prompt='Use execute_python when math is needed.',
    task='Compute the first 5 Fibonacci numbers.',
    on_event=events_closed.append,
    max_turns=3,
)
denied = [dict(e.data) for e in events_closed if e.type == 'tool.denied']
console.print(Panel(str(denied), title='CLOSED sandbox — tool.denied events', border_style='red'))
print(f'  turns={result_closed.turns}  tool_calls={result_closed.tool_calls_made}')

Same tool. Same task. Same model. The sandbox stops it cold. And every denial is in the audit chain (verifiable end-to-end with result.verify_integrity() from notebook 06) — so a SOC analyst can prove that an attempted action was blocked.

4. Sandboxed code execution — make_execute_tool¶

Look at the protections layered into the local subprocess sandbox:

  • Process isolation: new process group (start_new_session=True) — the agent can't signal the host process.
  • Locked PATH: only /usr/bin:/bin — no pip install foo.
  • Hard timeout: SIGTERM at the deadline, SIGKILL after a 5s grace — runaway scripts can't hang the loop.
  • Output cap: stdout+stderr truncated at max_output_bytes so a fork-bomb of print('x') can't OOM the audit log.
  • TempDir cwd: each invocation gets a fresh tempfile.TemporaryDirectory — no cross-invocation FS state.

Demo it — give the agent a small computational task and watch what code it writes.

In [ ]:
events_exec: list = []
exec_result = await run(
    model=model, tools=[make_execute_tool(timeout_seconds=15, max_output_bytes=8192)],
    sandbox=SandboxConfig(allowed_tools=['execute_python']),
    system_prompt='Use execute_python to compute things rather than reasoning step-by-step.',
    task='Find all prime numbers under 100 using a sieve. Print them comma-separated.',
    on_event=events_exec.append,
)

for e in events_exec:
    if e.type == 'tool.start' and e.data.get('name') == 'execute_python':
        code_run = e.data.get('arguments', {}).get('code', '')
        console.print(Panel(code_run, title='code the agent wrote', border_style='yellow'))

console.print(Panel(exec_result.content or '', title='final answer', border_style='cyan'))

5. Container isolation for higher tiers — make_contained_execute_tool¶

For air-gapped, federal, or anything where subprocess isn't enough, arcrun also ships make_contained_execute_tool — Docker/Podman container per execution. Same interface as the local tool, but the code now runs inside a container with explicit limits.

We won't run it here (requires Docker on the demo machine), but the constructor surface tells you what's enforced:

from arcrun.builtins.contained_execute import make_contained_execute_tool
tool = make_contained_execute_tool(
    image='python:3.12-slim',
    timeout_seconds=30,
    max_output_bytes=65536,
    mem_limit='256m',         # cgroup memory ceiling
    cpu_period=100_000,       # CFS period (microseconds)
    cpu_quota=50_000,         # 50% of one core
    pids_limit=64,            # process count cap
    tmpfs_size='64m',         # writable tmpfs only — no host FS
    network_disabled=True,    # no egress, no exfiltration
)

OOM kill becomes SandboxOOMError, exceeded timeout becomes SandboxTimeoutError, container failures become SandboxRuntimeError, missing runtime is SandboxUnavailableError. Each is auditable; each is recoverable.

Federal posture progression: subprocess sandbox for personal/dev → container sandbox for enterprise/SCIF → Firecracker microVM (planned) for the strictest tiers. Same agent code; the sandbox swap is a config flag.

6. PII-safe audit by default — AuditModule¶

ArcLLM's AuditModule logs metadata only by default — provider, model, message count, stop reason, content length, tool counts. It does not log message content unless you explicitly opt in via include_messages=True (DEBUG level) — and even then, ideally only after redaction (i.e. after the SecurityModule has run).

Module stack order matters: Audit → Security → Adapter. Audit sees what Security has already redacted. There is no path where raw PII reaches the audit sink.

In [ ]:
import logging
logging.basicConfig(level=logging.INFO, format='%(name)s %(message)s', force=True)

model_audited = load_model(
    'anthropic',
    security={'pii_enabled': True},
    audit={'log_level': 'INFO'},
)
_ = await model_audited.invoke([Message(role='user', content='What is 2 + 2?')])
print('(audit metadata logged above by arcllm.modules.audit — no message content)')

7. Budgets — task_complete and make_budget_breach_args¶

Token and cost budgets are part of the loop, not the application code. Set them once and the loop terminates cleanly when crossed, emitting a task_complete with status='budget_breach' so the auditor can see why the run stopped.

ArcRun ships a make_task_complete_tool factory — pair it with make_budget_breach_args to convert an over-budget event into the structured terminator. The point: the agent has a graceful stop signal that's auditable, not a kill -9 from the runtime.

from arcrun.builtins import make_task_complete_tool, make_budget_breach_args
complete_tool = make_task_complete_tool()
# loop sees usage cross threshold → emits budget breach via task_complete

Combined with arcllm's telemetry={'budget_scope': 'agent:plasma-007'} you get per-agent budgets enforced at the LLM boundary AND graceful termination at the loop boundary. Belt and suspenders.

8. The full security stack on one call¶

What it looks like to compose them. Each module is opt-in via load_model() kwargs; the loop applies the sandbox and budget; the audit chain (notebook 06) records every event.

model = load_model(
    'anthropic',
    security={'pii_enabled': True, 'signing_enabled': True},  # PII redact + Ed25519 sign
    audit={'log_level': 'INFO'},                              # PII-safe metadata logs
    telemetry={'budget_scope': 'agent:plasma-007'},           # cost ceilings
    rate_limit=True,                                          # provider TPS caps
    retry=True,                                               # exponential backoff
    fallback=True,                                            # provider failover
    otel=True,                                                # OpenTelemetry export
)

result = await run(
    model=model,
    tools=[make_execute_tool(timeout_seconds=10, max_output_bytes=8192)],
    sandbox=SandboxConfig(allowed_tools=['execute_python']),
    system_prompt=YOUR_SYSTEM,
    task=YOUR_TASK,
)

assert result.verify_integrity().valid  # tamper-evident chain

That's the full federal-tier posture. Every module composable, every default sane, every event audited.

Takeaway¶

Defense Mechanism OWASP map
PII redaction security={'pii_enabled': True} on load_model() LLM02 (Sensitive Disclosure)
Tool allowlist SandboxConfig(allowed_tools=[...]) LLM06 (Excessive Agency)
Sandboxed code exec make_execute_tool (subprocess) / make_contained_execute_tool (Docker) LLM05 + ASI05 (RCE)
PII-safe audit audit={...} (metadata-only by default) LLM07 (log hygiene)
Cost / token budgets telemetry={'budget_scope': ...} + task_complete terminator LLM10 (Unbounded Consumption)
Tamper-evident chain verify_chain() (notebook 06) NIST AU-9, AU-10

Each defense is one keyword argument away. No code branches, no separate codepath, no "federal mode fork." Same agent code; the modules and the sandbox flip with config.

Next: 08 — The Coding Workflow. Now that you've seen the runtime defenses, see the development workflow that ensures every feature ships with them turned on.