Establishing A Context-Folding LLM Agent For Prolonged-Horizon Reasoning With Memory Compression And Software Program Use

On this tutorial, we uncover the easiest way to assemble a Context-Folding LLM Agent that successfully solves prolonged, superior duties by intelligently managing restricted context. We design the agent to interrupt down an enormous exercise into smaller subtasks, perform reasoning or calculations when wished, after which fold each completed sub-trajectory into concise summaries. By doing this, we shield vital information whereas conserving the energetic memory small. Check out the FULL CODES proper right here.

import os, re, sys, math, random, json, textwrap, subprocess, shutil, time
from typing import File, Dict, Tuple
attempt:
   import transformers
apart from:
   subprocess.run([sys.executable, "-m", "pip", "install", "-q", "transformers", "accelerate", "sentencepiece"], confirm=True)
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
MODEL_NAME = os.environ.get("CF_MODEL", "google/flan-t5-small")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
llm = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device_map="auto")
def llm_gen(speedy: str, max_new_tokens=160, temperature=0.0) -> str:
   out = llm(speedy, max_new_tokens=max_new_tokens, do_sample=temperature>0.0, temperature=temperature)[0]["generated_text"]
   return out.strip()

We begin by organising the setting and loading a lightweight Hugging Face model. We use this model to generate and course of textual content material domestically, guaranteeing the agent runs simply on Google Colab with none API dependencies. Check out the FULL CODES proper right here.

import ast, operator as op
OPS = {ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul, ast.Div: op.truediv, ast.Pow: op.pow, ast.USub: op.neg, ast.FloorDiv: op.floordiv, ast.Mod: op.mod}
def _eval_node(n):
   if isinstance(n, ast.Num): return n.n
   if isinstance(n, ast.UnaryOp) and type(n.op) in OPS: return OPS[type(n.op)](_eval_node(n.operand))
   if isinstance(n, ast.BinOp) and type(n.op) in OPS: return OPS[type(n.op)](_eval_node(n.left), _eval_node(n.correct))
   improve ValueError("Unsafe expression")
def calc(expr: str):
   node = ast.parse(expr, mode="eval").physique
   return _eval_node(node)
class FoldingMemory:
   def __init__(self, max_chars:int=800):
       self.energetic=[]; self.folds=[]; self.max_chars=max_chars
   def add(self,textual content material:str):
       self.energetic.append(textual content material.strip())
       whereas len(self.active_text())>self.max_chars and len(self.energetic)>1:
           popped=self.energetic.pop(0)
           fold=f"- Folded: {popped[:120]}..."
           self.folds.append(fold)
   def fold_in(self,summary:str): self.folds.append(summary.strip())
   def active_text(self)->str: return "n".be a part of(self.energetic)
   def folded_text(self)->str: return "n".be a part of(self.folds)
   def snapshot(self)->Dict: return {"active_chars":len(self.active_text()),"n_folds":len(self.folds)}

We define a straightforward calculator machine for major arithmetic and create a memory system that dynamically folds earlier context into concise summaries. This helps us hold a manageable energetic memory whereas retaining vital information. Check out the FULL CODES proper right here.

SUBTASK_DECOMP_PROMPT="""You may be an educated planner. Decompose the obligation below into 2-4 crisp subtasks.
Return each subtask as a bullet starting with '- ' in priority order.
Course of: "{exercise}" """
SUBTASK_SOLVER_PROMPT="""You are a actual downside solver with minimal steps.
If a calculation is required, write one line 'CALC(expr)'.
In some other case write 'ANSWER: '.
Suppose briefly; steer clear of chit-chat.


Course of: {exercise}
Subtask: {subtask}
Notes (folded context):
{notes}


Now reply with each CALC(...) or ANSWER: ..."""
SUBTASK_SUMMARY_PROMPT="""Summarize the subtask finish end in <=3 bullets, entire <=50 tokens.
Subtask: {establish}
Steps:
{trace}
Remaining: {remaining}
Return solely bullets starting with '- '."""
FINAL_SYNTH_PROMPT="""You are a senior agent. Synthesize a remaining, coherent decision using ONLY:
- The distinctive exercise
- Folded summaries (below)
Stay away from repeating steps. Be concise and actionable.


Course of: {exercise}
Folded summaries:
{folds}


Remaining reply:"""
def parse_bullets(textual content material:str)->File[str]:
   return [ln[2:].strip() for ln in textual content material.splitlines() if ln.strip().startswith("- ")]

We design speedy templates that data the agent in decomposing duties, fixing subtasks, and summarizing outcomes. These structured prompts enable clear communication between reasoning steps and the model’s responses. Check out the FULL CODES proper right here.

def run_subtask(exercise:str, subtask:str, memory:FoldingMemory, max_tool_iters:int=3)->Tuple[str,str,List[str]]:
   notes=(memory.folded_text() or "(none)")
   trace=[]; remaining=""
   for _ in range(max_tool_iters):
       speedy=SUBTASK_SOLVER_PROMPT.format(exercise=exercise,subtask=subtask,notes=notes)
       out=llm_gen(speedy,max_new_tokens=96); trace.append(out)
       m=re.search(r"CALC((.+?))",out)
       if m:
           attempt:
               val=calc(m.group(1))
               trace.append(f"TOOL:CALC -> {val}")
               out2=llm_gen(speedy+f"nTool finish consequence: {val}nNow produce 'ANSWER: ...' solely.",max_new_tokens=64)
               trace.append(out2)
               if out2.strip().startswith("ANSWER:"):
                   remaining=out2.break up("ANSWER:",1)[1].strip(); break
           apart from Exception as e:
               trace.append(f"TOOL:CALC ERROR -> {e}")
       if out.strip().startswith("ANSWER:"):
           remaining=out.break up("ANSWER:",1)[1].strip(); break
   if not remaining:
       remaining="No definitive reply; partial reasoning:n"+"n".be a part of(trace[-2:])
   summ=llm_gen(SUBTASK_SUMMARY_PROMPT.format(establish=subtask,trace="n".be a part of(trace),remaining=remaining),max_new_tokens=80)
   summary_bullets="n".be a part of(parse_bullets(summ)[:3]) or f"- {subtask}: {remaining[:60]}..."
   return remaining, summary_bullets, trace
class ContextFoldingAgent:
   def __init__(self,max_active_chars:int=800):
       self.memory=FoldingMemory(max_chars=max_active_chars)
       self.metrics={"subtasks":0,"tool_calls":0,"chars_saved_est":0}
   def decompose(self,exercise:str)->File[str]:
       plan=llm_gen(SUBTASK_DECOMP_PROMPT.format(exercise=exercise),max_new_tokens=96)
       subs=parse_bullets(plan)
       return subs[:4] if subs else ["Main solution"]
   def run(self,exercise:str)->Dict:
       t0=time.time()
       self.memory.add(f"TASK: {exercise}")
       subtasks=self.decompose(exercise)
       self.metrics["subtasks"]=len(subtasks)
       folded=[]
       for st in subtasks:
           self.memory.add(f"SUBTASK: {st}")
           remaining,fold_summary,trace=run_subtask(exercise,st,self.memory)
           self.memory.fold_in(fold_summary)
           folded.append(f"- {st}: {remaining}")
           self.memory.add(f"SUBTASK_DONE: {st}")
       remaining=llm_gen(FINAL_SYNTH_PROMPT.format(exercise=exercise,folds=self.memory.folded_text()),max_new_tokens=200)
       t1=time.time()
       return {"exercise":exercise,"remaining":remaining.strip(),"folded_summaries":self.memory.folded_text(),
               "active_context_chars":len(self.memory.active_text()),
               "subtask_finals":folded,"runtime_sec":spherical(t1-t0,2)}

We implement the agent’s core logic, throughout which each and every subtask is executed, summarized, and folded once more into memory. This step demonstrates how context folding permits the agent to trigger iteratively with out shedding monitor of prior reasoning. Check out the FULL CODES proper right here.

DEMO_TASKS=[
   "Plan a 3-day study schedule for ML with daily workouts and simple meals; include time blocks.",
   "Compute a small project budget with 3 items (laptop 799.99, course 149.5, snacks 23.75), add 8% tax and 5% buffer, and present a one-paragraph recommendation."
]
def pretty(d): return json.dumps(d, indent=2, ensure_ascii=False)
if __name__=="__main__":
   agent=ContextFoldingAgent(max_active_chars=700)
   for i,exercise in enumerate(DEMO_TASKS,1):
       print("="*70)
       print(f"DEMO #{i}: {exercise}")
       res=agent.run(exercise)
       print("n--- Folded Summaries ---n"+(res["folded_summaries"] or "(none)"))
       print("n--- Remaining Reply ---n"+res["final"])
       print("n--- Diagnostics ---")
       diag={okay:res[k] for okay in ["active_context_chars","runtime_sec"]}
       diag["n_subtasks"]=len(agent.decompose(exercise))
       print(pretty(diag))

We run the agent on sample duties to take a look at the way in which it plans, executes, and synthesizes remaining outcomes. By way of these examples, we see all the context-folding course of in movement, producing concise and coherent outputs.

In conclusion, we exhibit how context folding permits long-horizon reasoning whereas avoiding memory overload. We see how each subtask is deliberate, executed, summarized, and distilled into compact information, mimicking how an intelligent agent would take care of superior workflows over time. By combining decomposition, machine use, and context compression, we create a lightweight however extremely efficient agentic system that scales reasoning successfully.

Check out the FULL CODES proper right here and Paper . Be completely happy to try our GitHub Internet web page for Tutorials, Codes and Notebooks. Moreover, be completely happy to adjust to us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you’ll be capable of be a part of us on telegram as correctly.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is devoted to harnessing the potential of Artificial Intelligence for social good. His latest endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth safety of machine learning and deep learning data that’s every technically sound and easily understandable by a big viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

🙌 Adjust to MARKTECHPOST: Add us as a most popular provide on Google.

Elevate your perspective with NextTech Data, the place innovation meets notion.
Uncover the most recent breakthroughs, get distinctive updates, and be part of with a worldwide neighborhood of future-focused thinkers.
Unlock tomorrow’s traits proper now: study further, subscribe to our e-newsletter, and develop to be part of the NextTech neighborhood at NextTech-news.com

Keep forward of the curve with NextBusiness 24. Discover extra tales, subscribe to our e-newsletter, and be part of our rising neighborhood at nextbusiness24.com

What's Hot

EB Video Video games Canada Opening Massive Concept Retailer In Montreal On October 16

“Large Loss-Making Quantum Computing Firm”

QeRL: NVFP4-Quantized Reinforcement Finding out (RL) Brings 32B LLM Teaching To A Single H100—Whereas Bettering Exploration

Establishing A Context-Folding LLM Agent For Prolonged-Horizon Reasoning With Memory Compression And Software program Use

EB Video Video games Canada Opening Massive Concept Retailer In Montreal On October 16

QeRL: NVFP4-Quantized Reinforcement Finding out (RL) Brings 32B LLM Teaching To A Single H100—Whereas Bettering Exploration

Google Trades Assistant For Gemini Inside The Good Home

EB Video Video games Canada Opening Massive Concept Retailer In Montreal On October 16

“Large Loss-Making Quantum Computing Firm”

QeRL: NVFP4-Quantized Reinforcement Finding out (RL) Brings 32B LLM Teaching To A Single H100—Whereas Bettering Exploration

Anthropic’s Claude Haiku 4.5 matches Could’s frontier mannequin at fraction of value

EB Video Video games Canada Opening Massive Concept Retailer In Montreal On October 16

“Large Loss-Making Quantum Computing Firm”

QeRL: NVFP4-Quantized Reinforcement Finding out (RL) Brings 32B LLM Teaching To A Single H100—Whereas Bettering Exploration

Topics

-

Regional Insights

What's Hot

Establishing A Context-Folding LLM Agent For Prolonged-Horizon Reasoning With Memory Compression And Software program Use

Related Posts

Topics

-

Regional Insights