Notebook 09 — Your Workflow¶
Premise: The point of the previous four notebooks wasn't Arc. It was the thinking. Build a harness around a model:
- Goal — what is the agent for? One sentence.
- Knowledge — what does it need to know? (system prompt + skills)
- Capabilities — what does it need to do? (tools)
- Boundaries — what must it not do? (sandbox, policy)
- Evidence — how do we know it worked? (audit, verification)
If you can answer those five questions, you can build an agent — in Arc, in LangChain, in raw Python, anywhere.
This notebook walks one example end-to-end, then leaves space for you to build your own.
Worked example: the Run Triage agent¶
The task: scientist hands the agent a run id. Agent reads the log, classifies any anomalies against the known failure modes, and writes a one-paragraph triage note recommending an action.
Walk through the five questions:
| Question | Answer |
|---|---|
| Goal | Triage an experiment run from its log |
| Knowledge | log_analyst skill (built in notebook 03) |
| Capabilities | read_log(run_id), skill_lookup(reference), write_triage_note(text) |
| Boundaries | Read-only on the log store; write only to the triage notes folder |
| Evidence | Capture all events, verify chain, save signed triage note |
Setup¶
from pathlib import Path
from textwrap import dedent
from datetime import datetime
from dotenv import load_dotenv
load_dotenv()
from arcllm import load_model
from arcrun import run, Tool, ToolContext, SandboxConfig
from rich import print
model = load_model('anthropic')
SKILL = Path('../skills/log_analyst').resolve()
TRIAGE_DIR = Path('../data/triage').resolve()
TRIAGE_DIR.mkdir(parents=True, exist_ok=True)
1. Knowledge — load the skill we already wrote¶
skill_md = (SKILL / 'SKILL.md').read_text()
print(skill_md)
2. Capabilities — three tools¶
Each tool is one function. The bounds of what the agent can affect are exactly the union of these three tools — nothing more.
# Stand-in log store. In production this would be a query against your real one.
FAKE_LOGS = {
'42': dedent('''
2026-04-12 09:01:00 INFO node-7 Run 42 started
2026-04-12 09:01:55 INFO node-7 Loading dataset shard 3/8
2026-04-12 09:02:00 ERROR node-7 DRAM ECC uncorrectable at 0x7f3c...
2026-04-12 09:02:01 WARN node-7 Re-routing job to node-8
2026-04-12 09:05:00 INFO node-8 Run completed
''').strip(),
'43': dedent('''
2026-04-12 11:00:00 INFO node-3 Run 43 started
2026-04-12 11:14:22 ERROR node-3 CUDA error: out of memory
2026-04-12 11:14:22 INFO node-3 Run aborted
''').strip(),
}
async def read_log(args, ctx: ToolContext) -> str:
rid = args['run_id']
return FAKE_LOGS.get(rid, f'No log for run {rid}')
async def skill_lookup(args, ctx: ToolContext) -> str:
ref = args['reference']
p = SKILL / ref
return p.read_text() if p.exists() else f'No reference: {ref}'
async def write_triage_note(args, ctx: ToolContext) -> str:
rid = args['run_id']
out = TRIAGE_DIR / f'run_{rid}_{datetime.now():%Y%m%d_%H%M%S}.md'
out.write_text(args['note'])
return f'Wrote {out}'
tools = [
Tool(name='read_log', description='Fetch full log for a run id.',
input_schema={'type': 'object', 'properties': {'run_id': {'type': 'string'}}, 'required': ['run_id']},
execute=read_log),
Tool(name='skill_lookup', description='Fetch a reference file. Available: references/known_failure_modes.md',
input_schema={'type': 'object', 'properties': {'reference': {'type': 'string'}}, 'required': ['reference']},
execute=skill_lookup),
Tool(name='write_triage_note', description='Save the final triage note for a run.',
input_schema={'type': 'object', 'properties': {'run_id': {'type': 'string'}, 'note': {'type': 'string'}}, 'required': ['run_id', 'note']},
execute=write_triage_note),
]
3. Boundaries — the sandbox¶
Right now the sandbox is permissive (all 3 tools allowed). In
production you'd allowlist by role — a triage agent shouldn't have
delete_log or modify_run. The SandboxConfig.allowed_tools field
is your enforcement point.
sandbox = SandboxConfig(
allowed_tools=['read_log', 'skill_lookup', 'write_triage_note'],
)
4. Run with full evidence capture¶
events = []
system_prompt = dedent(f'''
You are an experiment Run Triage agent at a national lab.
For any run id, fetch the log, identify anomalies using the skill below,
and write a one-paragraph triage note recommending the next action.
Always end by calling write_triage_note with your final paragraph.
{skill_md}
''').strip()
result = await run(
model=model,
tools=tools,
sandbox=sandbox,
system_prompt=system_prompt,
task='Triage run 42.',
on_event=events.append,
)
print(result.content)
print()
print(f'turns={result.turns} tool_calls={result.tool_calls_made} cost=${result.cost_usd:.4f}')
print(f'chain valid: {result.verify_integrity().valid}')
5. Show the evidence¶
from collections import Counter
type_counts = Counter(e.type for e in events)
for t, n in type_counts.most_common():
print(f' {n:3d} {t}')
print()
print('triage notes saved:')
for f in sorted(TRIAGE_DIR.iterdir()):
print(f' {f.name} ({f.stat().st_size} bytes)')
Now build your own¶
Pick a real workflow from your lab. Walk the five questions:
Goal: (one sentence — what does the agent accomplish?)
Knowledge: (what skill / domain expertise does it need? write a SKILL.md)
Capabilities: (list the 2-5 tools it needs. each is one Python function)
Boundaries: (what is it forbidden to touch? encode in SandboxConfig)
Evidence: (what does the audit trail need to prove? capture & verify the chain)
Use the cell below as your scaffold.
# === YOUR AGENT BELOW ===
MY_SKILL = dedent('''
---
name: my_skill
description: TODO
---
# TODO
''').strip()
async def my_tool(args, ctx: ToolContext) -> str:
return 'TODO'
my_tools = [Tool(
name='my_tool', description='TODO',
input_schema={'type': 'object', 'properties': {}},
execute=my_tool,
)]
events = []
result = await run(
model=model,
tools=my_tools,
system_prompt=f'You are TODO.\n\n{MY_SKILL}',
task='TODO',
on_event=events.append,
)
print(result.content)
print('chain valid:', result.verify_integrity().valid)
Closing¶
You started with a chat call. You ended with an audited, sandboxed, skill-aware agent that runs a real workflow. The total amount of new code you wrote was small.
What scaled was the mental model:
Goal · Knowledge · Capabilities · Boundaries · Evidence.
That is portable. To LangChain. To LangGraph. To raw Python. To whatever framework your lab adopts in 2027. The frameworks are implementations of the model — the model is what you take with you.