Notebook 01 — From Chat to Coworker¶
Premise: A chat window is one prompt at a time. A harness is the structure around the model — the message thread, the system prompt, the tools, the budget, the audit trail — that turns a one-shot model into something you can hand a real job to.
In this notebook we build the simplest possible harness — just structured calls — and prove that the model is interchangeable. Anthropic, OpenAI, Ollama, vLLM: same code, different brain.
By the end you will have:
- Made a raw LLM call with
arcllm.load_model() - Steered behavior with a system prompt
- Held a multi-turn conversation by managing the message list
- Swapped models without changing your code
- Inspected token usage and cost
Setup¶
import os
from dotenv import load_dotenv
load_dotenv() # picks up ANTHROPIC_API_KEY etc. from ../.env
from arcllm import load_model, Message
from rich import print
1. The simplest possible call¶
load_model(provider) returns an adapter. invoke(messages) returns
an LLMResponse. That is the whole API at this layer.
model = load_model('anthropic')
resp = await model.invoke([
Message(role='user', content='In one sentence: what is plasma confinement?')
])
print(resp.content)
2. Steering with a system prompt¶
The system prompt sets the role. Same model, very different output.
SYSTEM_GENERIC = 'You are a helpful assistant.'
SYSTEM_SCIENTIST = (
'You are a senior plasma physicist at a national lab. '
'Answer in 2-3 dense sentences. Use specifics. No hedging.'
)
QUESTION = 'What is the biggest open problem in fusion plasma confinement?'
for label, sys_prompt in [('generic', SYSTEM_GENERIC), ('scientist', SYSTEM_SCIENTIST)]:
resp = await model.invoke([
Message(role='system', content=sys_prompt),
Message(role='user', content=QUESTION),
])
print(f'\n[bold]{label}[/bold]')
print(resp.content)
3. Multi-turn — the model has no memory¶
Models are stateless between calls. The 'conversation' is just the list of messages you pass in each time. That is what a harness manages: the running thread.
history: list[Message] = [
Message(role='system', content=SYSTEM_SCIENTIST),
]
async def turn(user_text: str) -> str:
history.append(Message(role='user', content=user_text))
resp = await model.invoke(history)
history.append(Message(role='assistant', content=resp.content or ''))
return resp.content or ''
print('Q1:', await turn('What is ITER?'))
print('\nQ2:', await turn("What's the most important milestone left for it?"))
print('\nQ3:', await turn('Why does that matter for stellarators?'))
print(f'\n[dim]history now has {len(history)} messages[/dim]')
4. Swap the model — same code, different brain¶
If you have an OpenAI key, this just works. If you don't, skip — the point is the code didn't change, only the provider string.
For air-gapped labs: change 'openai' to 'ollama' (with model='llama3.1')
or 'vllm'. The harness is the same.
from contextlib import suppress
QUESTION = 'In one sentence, what is plasma?'
msgs = [Message(role='user', content=QUESTION)]
for provider in ['anthropic', 'openai']:
with suppress(Exception) as _:
m = load_model(provider)
r = await m.invoke(msgs)
print(f'[bold]{provider}[/bold] ({r.model}): {r.content}')
5. Inspect what came back¶
Every response carries usage, cost, stop reason, and provider metadata. This is what a harness uses to enforce budgets, retry on truncation, and audit the call.
resp = await model.invoke([Message(role='user', content='Name one thing.')])
print('model: ', resp.model)
print('stop_reason: ', resp.stop_reason)
print('usage: ', resp.usage)
print('cost_usd: ', resp.cost_usd)
print('content: ', resp.content)
Takeaway¶
- A chat call is one shot. A harness is the code that manages many shots — message list, system prompt, model choice, budget, retries, observability.
- The model is interchangeable. Anthropic, OpenAI, Ollama, vLLM
— same
Messagetypes, sameinvoke()call. - This is already enough to build something useful. Most 'AI features' in production are exactly this loop, with a little plumbing.
Next: 02 — Prompts That Work. Same model, very different output. The craft of prompting.