Notebook 09 — Your Workflow¶

Premise: The point of the previous four notebooks wasn't Arc. It was the thinking. Build a harness around a model:

  1. Goal — what is the agent for? One sentence.
  2. Knowledge — what does it need to know? (system prompt + skills)
  3. Capabilities — what does it need to do? (tools)
  4. Boundaries — what must it not do? (sandbox, policy)
  5. Evidence — how do we know it worked? (audit, verification)

If you can answer those five questions, you can build an agent — in Arc, in LangChain, in raw Python, anywhere.

This notebook walks one example end-to-end, then leaves space for you to build your own.

Worked example: the Run Triage agent¶

The task: scientist hands the agent a run id. Agent reads the log, classifies any anomalies against the known failure modes, and writes a one-paragraph triage note recommending an action.

Walk through the five questions:

Question Answer
Goal Triage an experiment run from its log
Knowledge log_analyst skill (built in notebook 03)
Capabilities read_log(run_id), skill_lookup(reference), write_triage_note(text)
Boundaries Read-only on the log store; write only to the triage notes folder
Evidence Capture all events, verify chain, save signed triage note

Setup¶

In [ ]:
from pathlib import Path
from textwrap import dedent
from datetime import datetime
from dotenv import load_dotenv
load_dotenv()

from arcllm import load_model
from arcrun import run, Tool, ToolContext, SandboxConfig
from rich import print

model = load_model('anthropic')
SKILL = Path('../skills/log_analyst').resolve()
TRIAGE_DIR = Path('../data/triage').resolve()
TRIAGE_DIR.mkdir(parents=True, exist_ok=True)

1. Knowledge — load the skill we already wrote¶

In [ ]:
skill_md = (SKILL / 'SKILL.md').read_text()
print(skill_md)

2. Capabilities — three tools¶

Each tool is one function. The bounds of what the agent can affect are exactly the union of these three tools — nothing more.

In [ ]:
# Stand-in log store. In production this would be a query against your real one.
FAKE_LOGS = {
    '42': dedent('''
        2026-04-12 09:01:00 INFO  node-7  Run 42 started
        2026-04-12 09:01:55 INFO  node-7  Loading dataset shard 3/8
        2026-04-12 09:02:00 ERROR node-7  DRAM ECC uncorrectable at 0x7f3c...
        2026-04-12 09:02:01 WARN  node-7  Re-routing job to node-8
        2026-04-12 09:05:00 INFO  node-8  Run completed
    ''').strip(),
    '43': dedent('''
        2026-04-12 11:00:00 INFO  node-3  Run 43 started
        2026-04-12 11:14:22 ERROR node-3  CUDA error: out of memory
        2026-04-12 11:14:22 INFO  node-3  Run aborted
    ''').strip(),
}

async def read_log(args, ctx: ToolContext) -> str:
    rid = args['run_id']
    return FAKE_LOGS.get(rid, f'No log for run {rid}')

async def skill_lookup(args, ctx: ToolContext) -> str:
    ref = args['reference']
    p = SKILL / ref
    return p.read_text() if p.exists() else f'No reference: {ref}'

async def write_triage_note(args, ctx: ToolContext) -> str:
    rid = args['run_id']
    out = TRIAGE_DIR / f'run_{rid}_{datetime.now():%Y%m%d_%H%M%S}.md'
    out.write_text(args['note'])
    return f'Wrote {out}'

tools = [
    Tool(name='read_log', description='Fetch full log for a run id.',
         input_schema={'type': 'object', 'properties': {'run_id': {'type': 'string'}}, 'required': ['run_id']},
         execute=read_log),
    Tool(name='skill_lookup', description='Fetch a reference file. Available: references/known_failure_modes.md',
         input_schema={'type': 'object', 'properties': {'reference': {'type': 'string'}}, 'required': ['reference']},
         execute=skill_lookup),
    Tool(name='write_triage_note', description='Save the final triage note for a run.',
         input_schema={'type': 'object', 'properties': {'run_id': {'type': 'string'}, 'note': {'type': 'string'}}, 'required': ['run_id', 'note']},
         execute=write_triage_note),
]

3. Boundaries — the sandbox¶

Right now the sandbox is permissive (all 3 tools allowed). In production you'd allowlist by role — a triage agent shouldn't have delete_log or modify_run. The SandboxConfig.allowed_tools field is your enforcement point.

In [ ]:
sandbox = SandboxConfig(
    allowed_tools=['read_log', 'skill_lookup', 'write_triage_note'],
)

4. Run with full evidence capture¶

In [ ]:
events = []

system_prompt = dedent(f'''
    You are an experiment Run Triage agent at a national lab.
    For any run id, fetch the log, identify anomalies using the skill below,
    and write a one-paragraph triage note recommending the next action.
    Always end by calling write_triage_note with your final paragraph.

    {skill_md}
''').strip()

result = await run(
    model=model,
    tools=tools,
    sandbox=sandbox,
    system_prompt=system_prompt,
    task='Triage run 42.',
    on_event=events.append,
)

print(result.content)
print()
print(f'turns={result.turns} tool_calls={result.tool_calls_made} cost=${result.cost_usd:.4f}')
print(f'chain valid: {result.verify_integrity().valid}')

5. Show the evidence¶

In [ ]:
from collections import Counter

type_counts = Counter(e.type for e in events)
for t, n in type_counts.most_common():
    print(f'  {n:3d}  {t}')

print()
print('triage notes saved:')
for f in sorted(TRIAGE_DIR.iterdir()):
    print(f'  {f.name}  ({f.stat().st_size} bytes)')

Now build your own¶

Pick a real workflow from your lab. Walk the five questions:

Goal: (one sentence — what does the agent accomplish?)

Knowledge: (what skill / domain expertise does it need? write a SKILL.md)

Capabilities: (list the 2-5 tools it needs. each is one Python function)

Boundaries: (what is it forbidden to touch? encode in SandboxConfig)

Evidence: (what does the audit trail need to prove? capture & verify the chain)

Use the cell below as your scaffold.

In [ ]:
# === YOUR AGENT BELOW ===

MY_SKILL = dedent('''
    ---
    name: my_skill
    description: TODO
    ---
    # TODO
''').strip()

async def my_tool(args, ctx: ToolContext) -> str:
    return 'TODO'

my_tools = [Tool(
    name='my_tool', description='TODO',
    input_schema={'type': 'object', 'properties': {}},
    execute=my_tool,
)]

events = []
result = await run(
    model=model,
    tools=my_tools,
    system_prompt=f'You are TODO.\n\n{MY_SKILL}',
    task='TODO',
    on_event=events.append,
)
print(result.content)
print('chain valid:', result.verify_integrity().valid)

Closing¶

You started with a chat call. You ended with an audited, sandboxed, skill-aware agent that runs a real workflow. The total amount of new code you wrote was small.

What scaled was the mental model:

Goal · Knowledge · Capabilities · Boundaries · Evidence.

That is portable. To LangChain. To LangGraph. To raw Python. To whatever framework your lab adopts in 2027. The frameworks are implementations of the model — the model is what you take with you.