Google senior AI product supervisor Shubham Saboo has turned one of many thorniest issues in agent design into an open-source engineering train: persistent reminiscence.
This week, he revealed an open-source “At all times On Reminiscence Agent” on the official Google Cloud Platform Github web page beneath a permissive MIT License, permitting for business utilization.
It was constructed with Google's Agent Growth Equipment, or ADK launched final Spring in 2025, and Gemini 3.1 Flash-Lite, a low-cost mannequin Google launched on March 3, 2026 as its quickest and most cost-efficient Gemini 3 sequence mannequin.
The undertaking serves as a sensible reference implementation for one thing many AI groups need however few have productionized cleanly: an agent system that may ingest info constantly, consolidate it within the background, and retrieve it later with out counting on a standard vector database.
For enterprise builders, the discharge issues much less as a product launch than as a sign about the place agent infrastructure is headed.
The repo packages a view of long-running autonomy that’s more and more enticing for assist programs, analysis assistants, inner copilots and workflow automation. It additionally brings governance questions into sharper focus as quickly as reminiscence stops being session-bound.
What the repo seems to do — and what it doesn’t clearly declare
The repo additionally seems to make use of a multi-agent inner structure, with specialist parts dealing with ingestion, consolidation and querying.
However the provided supplies don’t clearly set up a broader declare that this can be a shared reminiscence framework for a number of unbiased brokers.
That distinction issues. ADK as a framework helps multi-agent programs, however this particular repo is finest described as an always-on reminiscence agent, or reminiscence layer, constructed with specialist subagents and protracted storage.
Even at this narrower degree, it addresses a core infrastructure downside many groups are actively working via.
The structure favors simplicity over a standard retrieval stack
Based on the repository, the agent runs constantly, ingests recordsdata or API enter, shops structured reminiscences in SQLite, and performs scheduled reminiscence consolidation each half-hour by default.
A neighborhood HTTP API and Streamlit dashboard are included, and the system helps textual content, picture, audio, video and PDF ingestion. The repo frames the design with an deliberately provocative declare: “No vector database. No embeddings. Simply an LLM that reads, thinks, and writes structured reminiscence.”
That design alternative is probably going to attract consideration from builders managing price and operational complexity. Conventional retrieval stacks typically require separate embedding pipelines, vector storage, indexing logic and synchronization work.
Saboo's instance as a substitute leans on the mannequin to prepare and replace reminiscence instantly. In apply, that may simplify prototypes and cut back infrastructure sprawl, particularly for smaller or medium-memory brokers. It additionally shifts the efficiency query from vector search overhead to mannequin latency, reminiscence compaction logic and long-run behavioral stability.
Flash-Lite offers the always-on mannequin some financial logic
That’s the place Gemini 3.1 Flash-Lite enters the story.
Google says the mannequin is constructed for high-volume developer workloads at scale and priced at $0.25 per 1 million enter tokens and $1.50 per 1 million output tokens.
The corporate additionally says Flash-Lite is 2.5 instances sooner than Gemini 2.5 Flash in time to first token and delivers a forty five% improve in output velocity whereas sustaining related or higher high quality.
On Google’s revealed benchmarks, the mannequin posts an Elo rating of 1432 on Area.ai, 86.9% on GPQA Diamond and 76.8% on MMMU Professional. Google positions these traits as a match for high-frequency duties reminiscent of translation, moderation, UI era and simulation.
These numbers assist clarify why Flash-Lite is paired with a background-memory agent. A 24/7 service that periodically re-reads, consolidates and serves reminiscence wants predictable latency and low sufficient inference price to keep away from making “at all times on” prohibitively costly.
Google’s ADK documentation reinforces the broader story. The framework is introduced as model-agnostic and deployment-agnostic, with assist for workflow brokers, multi-agent programs, instruments, analysis and deployment targets together with Cloud Run and Vertex AI Agent Engine. That mixture makes the reminiscence agent really feel much less like a one-off demo and extra like a reference level for a broader agent runtime technique.
The enterprise debate is about governance, not simply functionality
Public response reveals why enterprise adoption of persistent reminiscence won’t hinge on velocity or token pricing alone.
A number of responses on X highlighted precisely the issues enterprise architects are more likely to elevate. Franck Abe referred to as Google ADK and 24/7 reminiscence consolidation “good leaps for steady agent autonomy,” however warned that an agent “dreaming” and cross-pollinating reminiscences within the background with out deterministic boundaries turns into “a compliance nightmare.”
ELED made a associated level, arguing that the principle price of always-on brokers is just not tokens however “drift and loops.”
These critiques go on to the operational burden of persistent programs: who can write reminiscence, what will get merged, how retention works, when reminiscences are deleted, and the way groups audit what the agent discovered over time?
One other response, from Iffy, challenged the repo’s “no embeddings” framing, arguing that the system nonetheless has to chunk, index and retrieve structured reminiscence, and that it could work nicely for small-context brokers however break down as soon as reminiscence shops change into a lot bigger.
That criticism is technically necessary. Eradicating a vector database doesn’t take away retrieval design; it adjustments the place the complexity lives.
For builders, the tradeoff is much less about ideology than match. A lighter stack could also be enticing for low-cost, bounded-memory brokers, whereas larger-scale deployments should still demand stricter retrieval controls, extra express indexing methods and stronger lifecycle tooling.
ADK broadens the story past a single demo
Different commenters centered on developer workflow. One requested for the ADK repo and documentation and needed to know whether or not the runtime is serverless or long-running, and whether or not tool-calling and analysis hooks can be found out of the field.
Primarily based on the provided supplies, the reply is successfully each: the memory-agent instance itself is structured like a long-running service, whereas ADK extra broadly helps a number of deployment patterns and contains instruments and analysis capabilities.
The always-on reminiscence agent is attention-grabbing by itself, however the bigger message is that Saboo is attempting to make brokers really feel like deployable software program programs relatively than remoted prompts. In that framing, reminiscence turns into a part of the runtime layer, not simply an add-on characteristic.
What Saboo has proven — and what he has not
What Saboo has not proven but is simply as necessary as what he's revealed.
The supplied supplies don’t embody a direct Flash-Lite versus Anthropic Claude Haiku benchmark for agent loops in manufacturing use.
In addition they don’t lay out enterprise-grade compliance controls particular to this reminiscence agent, reminiscent of: deterministic coverage boundaries, retention ensures, segregation guidelines or formal audit workflows.
And whereas the repo seems to make use of a number of specialist brokers internally, the supplies don’t clearly show a bigger declare about persistent reminiscence shared throughout a number of unbiased brokers.
For now, the repo reads as a compelling engineering template relatively than an entire enterprise reminiscence platform.
Why this issues now
Nonetheless, the discharge lands on the proper time. Enterprise AI groups are shifting past single-turn assistants and into programs anticipated to recollect preferences, protect undertaking context and function throughout longer horizons.
Saboo's open-source reminiscence agent presents a concrete place to begin for that subsequent layer of infrastructure, and Flash-Lite offers the economics some credibility.
However the strongest takeaway from the response across the launch is that steady reminiscence can be judged on governance as a lot as functionality.
That’s the actual enterprise query behind Saboo's demo: not whether or not an agent can keep in mind, however whether or not it might keep in mind in ways in which keep bounded, inspectable and secure sufficient to belief in manufacturing.
Keep forward of the curve with NextBusiness 24. Discover extra tales, subscribe to our publication, and be a part of our rising neighborhood at nextbusiness24.com

