Magus

Arcadia Rose is a Senior Developer on Jane.app's Developer Acceleration team.

Magus is a CLI-invoked coding agent that parallelizes streams of work, limits actions to known-safe commands and learns from its work.

Check out the README on GitHub for a deeper technical look.

Why Though?

The promise of AI (setting aside all the hype and rhetoric) is that it will make software development faster. If you're coming to this post from an AI-forward community, you have probably already, as I have, proven to yourself that there is significant merit to the idea. The question becomes how does one go fast with AI? Especially if you're dealing with problems like the facts that:

  1. Agents will make up silly commands to navigate your codebase and process information
  2. Agents don't always reliably follow workflows or instructions and may not use tools you specify while working
  3. Agents reinforce whatever patterns are already present in your project, it's on you to know how to steer things
  4. Agents will spend a lot of time reading unrelated files and making poor judgement calls due to conflated context

Of course, these problems can be solved through careful project structuring and by developing your own suite of tooling and that's exactly what Magus seeks to provide out of the box. Generally speaking, our goal is to minimize the inaccuracies and unreliability of agents and maximize the value of their robust capabiltiies. We can do this by writing good old-fashioned deterministic code around the non-deterministic agentic bits.

Safety, Speed and Continuous Refinement

There are a few key decisions that I made when building Magus that I think lend to its being a great solution (for me!) to the problem of how to really streamline my experience of working with LLMs.

  1. Mandatory Planning
  2. Parallelized Workstreams
  3. Encoded Learnings

Planning for Efficiency

Knowing how to plan effectively when working with LLMs is practically a super-power for getting them to work more effectively. As you "talk" to your agent, you can provide necessary context, do exploration, validate assumptions and encode decisions so as to pre-load your agent's context window with everything it needs to know to execute effectively. I will say, though, that I've personally not been terribly impressed by the "planning mode"s of many agents if only because they don't necessarily adhere to a dependable structure enough to keep the agent on track without explicit guidance.

Planning in Magus is organized around achieving a few critical outcomes:

  1. Ensuring that the relevant context is available
  2. Tightly scoping work into stages with clear acceptance criteria
  3. Structuring the plan to maximize concurrent work streams
Magus' planner agent achieves this by producing structured output that organizes work items into a directed acyclic graph. Each "stage" includes a list of relevant files for context, files to edit and a clear objective to achieve in a prompt. This structured output is processed into a state machine that an orchestrator executes concurrently.

TDD is Just the Tip of the Iceberg

Most of the time, the agents that actually write your code don't need access to a lot of tools. For the work I do, those tools are all available through my Makefile targets. To ensure that my agents don't go too far off the rails, coder agents are limited to working inside the scope of your project and have access to the following tools:

  1. An EditFile tool that always prints a diff and prohibits edits to package.json
  2. A Makefile tool that allows calling targets defined in Makefile
These restrictions make me very, very confident that my agents will never do anything that can't be fixed with a git operation or two. It also allows me the flexibility to customize the agents' capabilities as I see fit.

The Virtuous Cycle of Learning

After every completed work cycle, a scribe agent is given all of the final results from the planning stage and each of the coders and is tasked with writing a memory file as well as authoring and updating skills, i.e. tidbits of deeper awareness about the project / technology being used that will help inform future development work.

Some may scoff at the idea of these cyclical workflows- "Garbage in, garbage out" they'll say. But, I've seen first-hand just how much friction is eliminated when your agents have the ability to meaningfully "remember" things they had to reason through. Implementation work gets faster and more predictable. These mechanisms reduce the work you have to do to document such details and also serve as an "audit trail" to understand the outcomes the agents produce.

A Simple, Flexible User Experience

Over the course of building Magus, I experimented with a variety of approaches and found none of them really worked so well that I felt excited. A lot of the fancy multi-agent orchestration with message passing architectures others are exploring and even the shiny terminal user interfaces written in Rust were cool but didn't actually feel very excellent to me, in practice.

I chose to make Magus a relatively simple command-line application for a few reasons.

  1. Unix-like operating systems already have great ways to supervise and pipe output between processes
  2. The simplicity makes it possible to use it in more contexts, such as from within Claude Code or automated workflows
  3. Dealing with plain (okay, modestly stylized) text makes it easy to search and store the agents' outputs
All of this without compromising on providing a really nice experience.

Final Thoughts

This version of Magus is my fourth attempt at turning all of the things I've learned and experienced in my work into a really nice tool that doesn't just satisfy me but excites me about what I'm able to do with AI. Honestly, I think those Anthropic guys might have been onto something with the operating system analogy. A lot of the time, you don't want to build your own operating system. For software developers especially, though, the exercise can be incredibly valuable.

By actually simplifying the software I'm building and reducing the surface area the LLM is exposed to, I've found that you can build more powerful and trustworthy software. I can't recommend taking the time to try out high-level agent libraries enough. The value of having so much control over your agents' capabilities cannot be understated and the ability to deterministically direct the flow of information dramatically improves reliability.

Software engineering is dead; long live software engineering.