The Only Way Forward
Everybody acknowledges that everything in AI and tech is moving at a breakneck speed. In the two weeks since our last Field Note, I'm reminded of Theodore Roosevelt's "Man in the Arena":
This feels viscerally resonant about the moment in which we find ourselves. And I'm more emboldened than ever to believe that the only way to create something, something of value in this space is to get deep into the trenches, get one's hands dirty and succeed through trial-and-error. This imperfect motion feels like the only way to not just do the incremental thing in front of us, but to have more of a "peek around the corner to see what's next" motion.
This feels especially apt as we finally had the time to start playing with OpenClaw two weeks ago, which led to us discovering an interesting exploration adjacent to tootoo to explore. We've asked ourselves whether it makes sense to bank on OpenClaw and then we'd answer that irrespective of whether it's OpenClaw or another platform that thrives in future, there's stuff that agents will need. And we'll build that stuff. Only for OpenClaw's founder to join OpenAI in a tie up that may (or may not) impact the project. Time (which we don't seem to have) will tell.
We've been benchmarked
Whilst we've been in the arena, one of the "side quests" that popped up for us was learning all about benchmarking (something that we've never done before or planned to do, but this too changed at breakneck speed).
When we were building tootoo, we thought that the middleware we built was performing very well. But we didn't know how well and tootoo didn't yet have any usage or users for us to validate. The only other way that we could figure out whether we had something valuable on our hands was to test ourselves against reputable benchmarks.
So, we've worked hard to abstract "Cortex" (effectively a memory layer or memory system) from tootoo. We then ran it against the LongMemEval benchmark and learnt just how significant the cost of compute and processing is to stay relevant with benchmarks.

But there was good news too… We scored 92.8% on the benchmark.
That makes Cortex the third long-term memory system in the world on this benchmark. The current leaders currently have 93.6% and 93.2% respectively.
What makes these results more exceptional (and makes us very proud) is that we achieved this within only two months of Ubundi's existence and the team working together.
All the tootoo Updates
Our surface area with tootoo has grown and we have real users finding initial value from the product. More importantly, their usage is a learning opportunity for us.

So we've doubled down and made a whole bunch of improvements to make tootoo more engaging and fun to use.
The Big Theme: Finding Where Your Codex Needs More
We've been specifically finding places where your codex could use more context — and then amplifying that in Daily Reflections. Two features work hand-in-hand:
Codex Readiness — We now show you how "ready" your codex is to represent you. It's not just about having answers; it's about having the right kind of context. We surface the gaps so you know exactly what to add.
Daily Reflections — These aren't just prompts. They're targeted questions based on what's missing from your codex. Answer a reflection, and your codex gets more complete. It's a feedback loop that makes your AI representation stronger over time.
Other Improvements
- Refined the conversation experience to feel more natural
- Better handling of complex, multi-faceted answers
- Improved the way we extract and store nuanced context
- Performance optimisations across the board
- For more, check out our Build notes at tootoo.ai
Ready to try it?
Create your codexMeet Rune: Our Own AI Teammate
We finally have our own OpenClaw agent. Its name is Rune Calder.

A note from Rune:
The Challenge: Understanding the Black Box
One big problem we've encountered is truly understanding how, why, and when Rune operates in certain ways. AI agents can feel opaque — you see the output, but not the reasoning.
So we built something to fix that.
Introducing Claw Journal
Claw Journal is our open-source tool for exposing an OpenClaw agent's inner workings. We built it for ourselves, and we're releasing it for everyone.

The dashboard focuses on costs — We want to better understand how cost and usage relates to value and outcomes.
Tool selection and reasoning — We can now see exactly what tools Rune is selecting and why. This transparency is essential for learning how to upskill an AI teammate.
Part of the reason for exposing this information is to learn how to best empower and upskill Rune to be more useful. With this greater observability, we've managed to make Rune more autonomous, proactive and accurate, and it can now handle more tasks.

We've also built a Control Center for Rune, so that we have a less ephemeral way of working together on tasks (especially those that are recurring, scheduled or sequenced). The Kanban board isn't a new concept. Human teams have used this forever to collaborate. We're just reimagining it for a world where humans and AI's are working together. Here is what that looks like:

Along with the Control Center, we've also built some specific skills for Rune. Those skills have unique UI's — each with its own workflow — within the Control Center too. Rune runs its own X account (@runecalder) using such a skill and sub-agent with a human-in-the-loop UI.
As we've tried to build for Rune and tried to both understand it so that we can empower it more, we've realised that the biggest gap in getting an AI to do more is simply trust. We've (literally) started building trust with greater monitoring, logging and observability. Both Claw Journal and Rune's own Control Center is a good start. But we still felt like something was lacking…

What's Next: Human Context + Agentic AI
We're not about to make a bet that OpenClaw is still a popular or relevant platform in X months' time. Everything is moving too fast for that. We are making a bet that the humans who deploys AI agents will want to trust them.
In a slightly roundabout way, we're back at the original hypothesis of tootoo: if a human can share their nuanced context with an AI, the output will be more tailored to that individual.
It is with that original curiosity in mind that we asked another question: Could we use a tootoo codex (mine in this case) to give Rune more of that nuanced context, so that it can better align with and represent me.
So in the spirit of building new and more surface areas as a way to accelerate our learning, we now have an internal prototype that does something like this:
- After every major or material action that Rune takes, we spawn a new bot (different model and no memory of Rune) to act as an "independent observer".
- The bot is instructed to evaluate Rune's reasoning, decisions and actions (Claw Journal is helpful in that regard) against my tootoo codex. It then shares that feedback with Rune.
- Rune has a defined skill in this regard to then receive that feedback and to consider to what extent to accept and incorporate it. If it believes that refinement or improvements are warranted, it'll then draft proposal cards for a human to review.
Here are two recent examples (screenshots from Claw Journal) of this in action:


All of this remains an experiment and we might be moving in a holding pattern, where we confuse motion for progress. This does however feel like an incredibly powerful way to use AI to fine-tune, improve and realign AI agents in a way that builds trust.
And greater trust will unlock so much productivity and value.
Links & Resources
- Cortex Technical Doc
- LongMemEval Benchmark
- Claw Journal (open source)
- tootoo Updates
- OpenClaw Journal Dashboard Preview
- Tool Selection View
- Create your codex
- Previous Field Note: Introducing tootoo
Join the Community
Part of our bigger mission is to build a community of awesome people that cares as much as we do about adding as much human context into AI. We're of the strong opinion that it requires a village to make a real impact here.
We've created our "Community Capital" initiative where community members can earn real equity in Ubundi for their contributions.
Field Notes are published every two weeks. Subscribe to stay in the loop.
