Notebook 04 — Skills and Prompts¶
Premise: A tool teaches the model what it can do. A skill
teaches it how to think about a domain. Skills are just markdown
— a SKILL.md plus optional reference files. The agent loads the
summary into context, and pulls references on demand.
This is the same pattern Claude Code, Anthropic Skills, and most modern agent frameworks use. It scales because only the index lives in context permanently — the deep knowledge is fetched when relevant. This is called progressive disclosure.
By the end you will have:
- Built a skill folder by hand (
SKILL.md+ references) - Loaded it into a system prompt
- Given the agent a
skill_lookuptool to fetch references on demand - Watched the agent choose when to drill in
Setup¶
from pathlib import Path
from textwrap import dedent
from dotenv import load_dotenv
load_dotenv()
from arcllm import load_model
from arcrun import run, Tool, ToolContext
from rich import print
model = load_model('anthropic')
1. Build a skill folder¶
Standard layout:
skills/log_analyst/
├── SKILL.md # always loaded — small, descriptive
└── references/
└── known_failure_modes.md # loaded on demand
We'll write it from this notebook so you can see the structure.
SKILL = Path('../skills/log_analyst').resolve()
SKILL.mkdir(parents=True, exist_ok=True)
(SKILL / 'references').mkdir(exist_ok=True)
(SKILL / 'SKILL.md').write_text(dedent('''
---
name: log_analyst
description: Analyze experiment logs for anomalies and root causes.
---
# Log Analyst
When analyzing an experiment log:
1. Skim for ERROR, WARN, FAIL, exception, panic.
2. Note the timestamp and node/process for each anomaly.
3. Read the 5 lines before each error for context.
4. If the error matches a known failure mode, look it up via
`skill_lookup(reference="references/known_failure_modes.md")`.
5. Output: timestamp · node · anomaly · suspected cause.
''').strip())
(SKILL / 'references' / 'known_failure_modes.md').write_text(dedent('''
# Known failure modes
| Pattern | Meaning | Action |
|---|---|---|
| DRAM ECC uncorrectable | Memory hardware fault on the node | Mark node bad; rerun on different node |
| tcp connection reset | Upstream service flap | Auto-retry usually works |
| OOM killer | Process exceeded memory budget | Increase budget or split job |
| CUDA error: out of memory | GPU OOM | Reduce batch size |
| Permission denied: /scratch | Filesystem quota or ACL issue | Check quota, escalate to admins |
''').strip())
print('skill structure:')
for p in sorted(SKILL.rglob('*')):
if p.is_file():
print(f' {p.relative_to(SKILL.parent)} ({p.stat().st_size} bytes)')
2. Load the skill into a system prompt¶
The simplest integration: paste the SKILL.md into the system prompt. The agent now 'knows' how to analyze logs.
skill_md = (SKILL / 'SKILL.md').read_text()
SAMPLE_LOG = dedent('''
2026-04-12 09:01:00 INFO node-7 Run 42 started
2026-04-12 09:01:30 INFO node-7 Detector calibrated
2026-04-12 09:01:55 INFO node-7 Loading dataset shard 3/8
2026-04-12 09:02:00 ERROR node-7 DRAM ECC uncorrectable at 0x7f3c...
2026-04-12 09:02:01 WARN node-7 Re-routing job to node-8
2026-04-12 09:02:30 INFO node-8 Resumed from checkpoint
2026-04-12 09:05:00 INFO node-8 Run completed
''').strip()
# We're not giving it skill_lookup yet — see what it does with just the SKILL.md.
result = await run(
model=model,
tools=[Tool(
name='noop', description='No-op (placeholder so the loop has a tool)',
input_schema={'type': 'object', 'properties': {}},
execute=lambda a, c: 'noop',
)],
system_prompt=f'You are a research assistant.\n\n{skill_md}',
task=f'Analyze this log:\n\n{SAMPLE_LOG}',
)
print(result.content)
3. Progressive disclosure — references on demand¶
The SKILL.md is small. The references are bigger and only relevant sometimes. Don't pay for them in every prompt — fetch them with a tool when needed.
async def skill_lookup(args: dict, ctx: ToolContext) -> str:
ref = args['reference']
p = SKILL / ref
if not p.exists():
return f'No reference: {ref}'
return p.read_text()
skill_lookup_tool = Tool(
name='skill_lookup',
description='Fetch a reference file from the log_analyst skill. Available: references/known_failure_modes.md',
input_schema={
'type': 'object',
'properties': {'reference': {'type': 'string', 'description': 'Path within the skill, e.g. references/known_failure_modes.md'}},
'required': ['reference'],
},
execute=skill_lookup,
)
result = await run(
model=model,
tools=[skill_lookup_tool],
system_prompt=f'You are a research assistant.\n\n{skill_md}',
task=f'Analyze this log and recommend an action:\n\n{SAMPLE_LOG}',
on_event=lambda e: print(f' [dim cyan][{e.type}][/dim cyan]', e.data.get('name') or '') if e.type.startswith('tool') else None,
)
print()
print(result.content)
4. Skills compose with tools¶
A skill is knowledge. A tool is action. They compose: the skill
tells the agent how to think, and the tools let it act on the
thinking. Add a read_log tool and the agent can analyze logs
from disk by name.
async def read_log(args: dict, ctx: ToolContext) -> str:
return SAMPLE_LOG # in production: fetch from your log store
read_log_tool = Tool(
name='read_log',
description='Fetch the full text of an experiment log by run id.',
input_schema={
'type': 'object',
'properties': {'run_id': {'type': 'string'}},
'required': ['run_id'],
},
execute=read_log,
)
result = await run(
model=model,
tools=[read_log_tool, skill_lookup_tool],
system_prompt=f'You are a research assistant.\n\n{skill_md}',
task='Run 42 looked weird. Pull the log and tell me what happened.',
)
print(result.content)
Takeaway¶
- Skills are just folders of markdown. No exotic format. No vector DB. Markdown.
- Progressive disclosure: small SKILL.md always in context, big references fetched on demand. Cheap and scalable.
- A skill written for one agent works for any agent. Knowledge is reusable — the same way library code is reusable.
- In production,
arcskillvalidates and signs skill bundles before loading. The pattern is identical; the trust story is added on top.
Next: 05 — Identity & Aligned Decisions. Same data, different identity, different — but both rational — decisions.