Agentic AI: Harness and Reliability(with code) - Part 3
This is fully written by me without any help from AI, except for learning the concepts. This blog covers everything everyone should be aware of in the AI world, and it took me weeks to learn it so I could make it easy for you.
To understand the basics of agents and how to build one, check out Part 2 of the Agentic AI series here.
Overview
- How agents remembers past conversations
- How agents personalize the replies for you
- History and checkpointing
Limitation of Agents
Harnesses target the major limitations of agents i.e. they are single shot and don't validate the results before you give any output to the user.
If you ask an agent to write something, it will do that.
But it will not validate the results even if you ask it to validate as LLMs are single shot and once they give you a reply, they just stop after first reply.
Harness
A harness make the agent more reliable and powerful so the ouput is verified and improved on before providing the user with final output.
Without harness, if you give an input to an agent it will think --> call tools --> give output.
But if you add a layer that can validate the ouput from an agent before providing user the final output, it can automatically give the feedback the the agent to improve the ouput without manual involvement of the user.
To avoid confusion between various terminologies used, re-iterating:
- LLM = a pre-trained model that can think but can't act. You can think of it as a brain in a jar.
- Agent = can run tools when told by the LLM, the ouput of tools is used as additional context to make LLM aware of outside world.
- Harness = Verifies correctness of output, improve on it, take care of guardrails and re-triggers the agent by itself for a better final output.
- Tool registry - keeps track of which all tools the agent can access. Example, in claude settings.json you can define the list of tools it can use. This is enforced by the harness.
- Context management - when to discard old context, when to compress the context, what to keep and what to remove from context when context gets full, what to add in context from past history -- all these are taken care by the harness.
- Guardrails - these are rules or checks that restrict agents to perform actions which can do unintentional wrong things mostly pre/post running tools which update anything.
- Agent loop - a layer which re-iterates the agent for a better ouput.
- Verification layer - a layer that verifies the results before giving the final output to the user. The same verification results are used to improve the final output in agent loop.
| Component | Who owns it? | Who uses it? |
|---|---|---|
| LLM/model | Model provider / app config | Agent |
| System prompt | Usually harness/app developer | Agent/LLM |
| Tool registry | Harness | Harness + Agent/LLM sees subset |
| Tool descriptions | Harness/tool registry | Agent/LLM |
| Tool execution | Harness | Tools |
| Permissions | Harness | Harness enforces |
| Memory store | Harness/platform | Agent receives selected memory |
| Context retrieval | Harness | Agent receives selected context |
| Planning policy | Usually harness/app developer | Agent follows it |
| Output schema | Harness/app developer | Agent must output in it |
| Decision of next action | Agent/LLM | Harness validates |
| Loop orchestration | Harness | Harness |
| Guardrails | Harness | Harness enforces |
| Quality checks/evals | Harness/eval system | Harness |
Comments
Post a Comment