Scan to follow along on your phone or tablet
A document-processing AI agent that lives in a Telegram chat. You send it a PDF and a task; it parses, extracts, classifies, reasons, and replies asynchronously when it's done.
Small enough to dissect on stage. Real enough to be interesting.
Most agents look like a black box: prompt in, answer out. The interesting engineering lives in the gap between those two. We'll walk through it using four anatomical metaphors:
If the brain is unreliable, every downstream step inherits that unreliability. The fix has to start at the LLM call itself.
Every LLM call in LobsterX is constrained by a typed JSON schema. The model cannot reply with free-form prose — it must fill in a known shape.
LobsterX is built on LlamaIndex Agent Workflows: an event-driven, async-first stepwise execution engine.
Each arrow is a typed event. Observe re-enters Think until Think decides the task is done and emits Stop — at which point the workflow terminates and the answer goes back to the user.
The brain+loop is a generalist scaffolding. What makes LobsterX a document agent are the three interfaces it exposes to the world.
.env, and other
files are excluded entirely)
If the agent is jailbroken into writing something destructive, the damage stays inside the virtual FS. Nothing leaks to the host unless you explicitly sync it.
Filesystem ops alone only see plain text. To actually understand unstructured documents, the agent calls three LlamaCloud tools — each with its own typed input schema.
A document agent is only as good as its eyes on unstructured content. Generic OCR isn't enough — layout, tables and figures all carry meaning that naive text extraction loses.
Each tool exposes a typed input schema, so the Act step can call them with full structured-output guarantees end to end.
The right interface for a long-running agent isn't a chatbot — it's a colleague who replies when they're finished.
The Telegram bot is one frontend. The same agent core also runs as a FastAPI server, with the workflow's async-first shape carried all the way through.
task_id → asyncio.Task, guarded
by a lock. POST /task spawns,
GET polls, DELETE cancels.
fastapi-throttle — uploads,
creates, polls and deletes each have their
own budget
AGENTS.md file, not
via potentially unvetted instructions
None of this prevents prompt injection from a malicious document the agent has been asked to read. The mitigations bound the blast radius; they don't eliminate it.