Evaluation May 7, 2026 · 7 min read · FuturOne Engineering

How we measure whether an agent's work is actually good

An agent that ships deliverables needs a sharper bar than "looks plausible" — and for a code review agent, "passes tests" measures the code, not the review. How we score artifacts against versioned rubrics, calibrate confidence against acceptance outcomes, and what the 91% acceptance number actually counts.

Read the post →
Infrastructure Mar 19, 2026 · 6 min read · FuturOne Engineering

Zero retention is an architecture decision, not a policy promise

Most vendors offer a retention policy — a promise to delete data from a system that stores it by default. We built a system with nothing to delete, and it collapsed enterprise security reviews from months to about two weeks. Here is how it works, and what it honestly costs us.

Read the post →
Architecture Feb 12, 2026 · 7 min read · FuturOne Engineering

Agents vs. chatbots: it is an architecture, not a feature

Every product page says "agent" now; most of them ship a chat loop with a longer system prompt. The real difference is a planner–orchestrator–verifier loop over tools and durable state — and you cannot prompt your way into one.

Read the post →

Prefer watching to reading? Watch an agent run live, or follow smaller releases on the changelog.