A Coding Data To Assemble A Procedural Memory Agent That Learns, Outlets, Retrieves, And Reuses Skills As Neural Modules Over Time

Next Business 24

3 months ago

A Coding Data To Assemble A Procedural Memory Agent That Learns, Outlets, Retrieves, And Reuses Skills As Neural Modules Over Time

On this tutorial, we uncover how an intelligent agent can step-by-step kind procedural memory by learning reusable skills instantly from its interactions with an setting. We design a minimal however extremely efficient framework throughout which skills behave like neural modules: they retailer movement sequences, carry contextual embeddings, and are retrieved by similarity when a model new state of affairs resembles an experience. As we run our agent by way of plenty of episodes, we observe how its behaviour turns into additional setting pleasant, transferring from primitive exploration to leveraging a library of skills that it has found by itself. Attempt the FULL CODES proper right here.

import numpy as np
import matplotlib.pyplot as plt
from collections import defaultdict


class Means:
   def __init__(self, establish, preconditions, action_sequence, embedding, success_count=0):
       self.establish = establish
       self.preconditions = preconditions
       self.action_sequence = action_sequence
       self.embedding = embedding
       self.success_count = success_count
       self.times_used = 0
  
   def is_applicable(self, state):
       for key, price in self.preconditions.objects():
           if state.get(key) != price:
               return False
       return True
  
   def __repr__(self):
       return f"Means({self.establish}, used={self.times_used}, success={self.success_count})"


class SkillLibrary:
   def __init__(self, embedding_dim=8):
       self.skills = []
       self.embedding_dim = embedding_dim
       self.skill_stats = defaultdict(lambda: {"makes an try": 0, "successes": 0})
  
   def add_skill(self, means):
       for existing_skill in self.skills:
           if self._similarity(means.embedding, existing_skill.embedding) > 0.9:
               existing_skill.success_count += 1
               return existing_skill
       self.skills.append(means)
       return means
  
   def retrieve_skills(self, state, query_embedding=None, top_k=3):
       related = [s for s in self.skills if s.is_applicable(state)]
       if query_embedding should not be None and related:
           similarities = [self._similarity(query_embedding, s.embedding) for s in applicable]
           sorted_skills = [s for _, s in sorted(zip(similarities, applicable), reverse=True)]
           return sorted_skills[:top_k]
       return sorted(related, key=lambda s: s.success_count / max(s.times_used, 1), reverse=True)[:top_k]
  
   def _similarity(self, emb1, emb2):
       return np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2) + 1e-8)
  
   def get_stats(self):
       return {
           "total_skills": len(self.skills),
           "total_uses": sum(s.times_used for s in self.skills),
           "avg_success_rate": np.indicate([s.success_count / max(s.times_used, 1) for s in self.skills]) if self.skills else 0
       }

We define how skills are represented and saved in a memory development. We implement similarity-based retrieval so that the agent can match a model new state with earlier skills using cosine similarity. As we work by way of this layer, we see how means reuse turns into attainable as quickly as skills buy metadata, embeddings, and utilization statistics. Attempt the FULL CODES proper right here.

class GridWorld:
   def __init__(self, dimension=5):
       self.dimension = dimension
       self.reset()
  
   def reset(self):
       self.agent_pos = [0, 0]
       self.goal_pos = [self.size-1, self.size-1]
       self.objects = {"key": [2, 2], "door": [3, 3], "area": [1, 3]}
       self.inventory = []
       self.door_open = False
       return self.get_state()
  
   def get_state(self):
       return {
           "agent_pos": tuple(self.agent_pos),
           "has_key": "key" in self.inventory,
           "door_open": self.door_open,
           "at_goal": self.agent_pos == self.goal_pos,
           "objects": {okay: tuple(v) for okay, v in self.objects.objects()}
       }
  
   def step(self, movement):
       reward = -0.1
       if movement == "move_up":
           self.agent_pos[1] = min(self.agent_pos[1] + 1, self.dimension - 1)
       elif movement == "move_down":
           self.agent_pos[1] = max(self.agent_pos[1] - 1, 0)
       elif movement == "move_left":
           self.agent_pos[0] = max(self.agent_pos[0] - 1, 0)
       elif movement == "move_right":
           self.agent_pos[0] = min(self.agent_pos[0] + 1, self.dimension - 1)
       elif movement == "pickup_key":
           if self.agent_pos == self.objects["key"] and "key" not in self.inventory:
               self.inventory.append("key")
               reward = 1.0
       elif movement == "open_door":
           if self.agent_pos == self.objects["door"] and "key" in self.inventory:
               self.door_open = True
               reward = 2.0
       carried out = self.agent_pos == self.goal_pos and self.door_open
       if carried out:
           reward = 10.0
       return self.get_state(), reward, carried out

We assemble a simple setting throughout which the agent learns duties resembling choosing up a key, opening a door, and reaching a purpose. We use this setting as a playground for our procedural memory system, allowing us to observe how primitive actions evolve into additional superior, reusable skills. The setting’s development helps us observe clear, interpretable enhancements in behaviour all through episodes. Attempt the FULL CODES proper right here.

class ProceduralMemoryAgent:
   def __init__(self, env, embedding_dim=8):
       self.env = env
       self.skill_library = SkillLibrary(embedding_dim)
       self.embedding_dim = embedding_dim
       self.episode_history = []
       self.primitive_actions = ["move_up", "move_down", "move_left", "move_right", "pickup_key", "open_door"]
  
   def create_embedding(self, state, action_seq):
       state_vec = np.zeros(self.embedding_dim)
       state_vec[0] = hash(str(state["agent_pos"])) % 1000 / 1000
       state_vec[1] = 1.0 if state.get("has_key") else 0.0
       state_vec[2] = 1.0 if state.get("door_open") else 0.0
       for i, movement in enumerate(action_seq[:self.embedding_dim-3]):
           state_vec[3+i] = hash(movement) % 1000 / 1000
       return state_vec / (np.linalg.norm(state_vec) + 1e-8)
  
   def extract_skill(self, trajectory):
       if len(trajectory) < 2:
           return None
       start_state = trajectory[0][0]
       actions = [a for _, a, _ in trajectory]
       preconditions = {"has_key": start_state.get("has_key", False), "door_open": start_state.get("door_open", False)}
       end_state = self.env.get_state()
       if end_state.get("has_key") and by no means start_state.get("has_key"):
           establish = "acquire_key"
       elif end_state.get("door_open") and by no means start_state.get("door_open"):
           establish = "open_door_sequence"
       else:
           establish = f"navigate_{len(actions)}_steps"
       embedding = self.create_embedding(start_state, actions)
       return Means(establish, preconditions, actions, embedding, success_count=1)
  
   def execute_skill(self, means):
       means.times_used += 1
       trajectory = []
       total_reward = 0
       for movement in means.action_sequence:
           state = self.env.get_state()
           next_state, reward, carried out = self.env.step(movement)
           trajectory.append((state, movement, reward))
           total_reward += reward
           if carried out:
               means.success_count += 1
               return trajectory, total_reward, True
       return trajectory, total_reward, False
  
   def uncover(self, max_steps=20):
       trajectory = []
       state = self.env.get_state()
       for _ in fluctuate(max_steps):
           movement = self._choose_exploration_action(state)
           next_state, reward, carried out = self.env.step(movement)
           trajectory.append((state, movement, reward))
           state = next_state
           if carried out:
               return trajectory, True
       return trajectory, False

We think about developing embeddings that encode the context of a state-action sequence, enabling us to meaningfully look at skills. We moreover extract skills from worthwhile trajectories, remodeling raw experience into reusable behaviours. As we run this code, we observe how simple exploration step-by-step yields structured data that the agent can apply later. Attempt the FULL CODES proper right here.

   def _choose_exploration_action(self, state):
       agent_pos = state["agent_pos"]
       if not state.get("has_key"):
           key_pos = state["objects"]["key"]
           if agent_pos == key_pos:
               return "pickup_key"
           if agent_pos[0] < key_pos[0]:
               return "move_right"
           if agent_pos[0] > key_pos[0]:
               return "move_left"
           if agent_pos[1] < key_pos[1]:
               return "move_up"
           return "move_down"
       if state.get("has_key") and by no means state.get("door_open"):
           door_pos = state["objects"]["door"]
           if agent_pos == door_pos:
               return "open_door"
           if agent_pos[0] < door_pos[0]:
               return "move_right"
           if agent_pos[0] > door_pos[0]:
               return "move_left"
           if agent_pos[1] < door_pos[1]:
               return "move_up"
           return "move_down"
       goal_pos = (4, 4)
       if agent_pos[0] < goal_pos[0]:
           return "move_right"
       if agent_pos[1] < goal_pos[1]:
           return "move_up"
       return np.random.choice(self.primitive_actions)
  
   def run_episode(self, use_skills=True):
       self.env.reset()
       total_reward = 0
       steps = 0
       trajectory = []
       whereas steps < 50:
           state = self.env.get_state()
           if use_skills and self.skill_library.skills:
               query_emb = self.create_embedding(state, [])
               skills = self.skill_library.retrieve_skills(state, query_emb, top_k=1)
               if skills:
                   skill_traj, skill_reward, success = self.execute_skill(skills[0])
                   trajectory.extend(skill_traj)
                   total_reward += skill_reward
                   steps += len(skill_traj)
                   if success:
                       return trajectory, total_reward, steps, True
                   proceed
           movement = self._choose_exploration_action(state)
           next_state, reward, carried out = self.env.step(movement)
           trajectory.append((state, movement, reward))
           total_reward += reward
           steps += 1
           if carried out:
               return trajectory, total_reward, steps, True
       return trajectory, total_reward, steps, False
  
   def observe(self, episodes=10):
       stats = {"rewards": [], "steps": [], "skills_learned": [], "skill_uses": []}
       for ep in fluctuate(episodes):
           trajectory, reward, steps, success = self.run_episode(use_skills=True)
           if success and len(trajectory) >= 3:
               part = trajectory[-min(5, len(trajectory)):]
               means = self.extract_skill(part)
               if means:
                   self.skill_library.add_skill(means)
           stats["rewards"].append(reward)
           stats["steps"].append(steps)
           stats["skills_learned"].append(len(self.skill_library.skills))
           stats["skill_uses"].append(self.skill_library.get_stats()["total_uses"])
           print(f"Episode {ep+1}: Reward={reward:.1f}, Steps={steps}, Skills={len(self.skill_library.skills)}, Success={success}")
       return stats

We define how the agent chooses between using acknowledged skills and exploring with primitive actions. We observe the agent all through plenty of episodes and file the evolution of found skills, utilization counts, and success prices. As we take a look at this half, we observe that means reuse reduces episode measurement and improves whole rewards. Attempt the FULL CODES proper right here.

def visualize_training(stats):
   fig, axes = plt.subplots(2, 2, figsize=(12, 8))
   axes[0, 0].plot(stats["rewards"])
   axes[0, 0].set_title("Episode Rewards")
   axes[0, 1].plot(stats["steps"])
   axes[0, 1].set_title("Steps per Episode")
   axes[1, 0].plot(stats["skills_learned"])
   axes[1, 0].set_title("Skills in Library")
   axes[1, 1].plot(stats["skill_uses"])
   axes[1, 1].set_title("Cumulative Means Makes use of")
   plt.tight_layout()
   plt.savefig("skill_learning_stats.png", dpi=150, bbox_inches="tight")
   plt.current()


if __name__ == "__main__":
   print("=== Procedural Memory Agent Demo ===n")
   env = GridWorld(dimension=5)
   agent = ProceduralMemoryAgent(env)
   print("Teaching agent to review reusable skills...n")
   stats = agent.observe(episodes=15)
   print("n=== Found Skills ===")
   for means in agent.skill_library.skills:
       print(f"{means.establish}: {len(means.action_sequence)} actions, used {means.times_used} situations, {means.success_count} successes")
   lib_stats = agent.skill_library.get_stats()
   print(f"n=== Library Statistics ===")
   print(f"Full skills: {lib_stats['total_skills']}")
   print(f"Full means makes use of: {lib_stats['total_uses']}")
   print(f"Avg success payment: {lib_stats['avg_success_rate']:.2%}")
   visualize_training(stats)
   print("n✓ Means learning full! Take a look at the visualization above.")

We convey all of the items collectively by working teaching, printing found skills, and plotting behaviour statistics. We visualize the sample in rewards and the best way the power library grows over time. By working this snippet, we full the lifecycle of procedural memory formation and be sure that the agent learns to behave additional intelligently with experience.

In conclusion, we see how procedural memory emerges naturally when an agent learns to extract skills from its private worthwhile trajectories. We observe how skills are gained, development, metadata, embeddings, and utilization patterns, allowing the agent to reuse them successfully in future situations. Lastly, we admire how even a small setting and straightforward heuristics lead to important learning dynamics, giving us a concrete understanding of what it means for an agent to develop reusable inside competencies over time.

Attempt the FULL CODES proper right here. Be at liberty to check out our GitHub Net web page for Tutorials, Codes and Notebooks. Moreover, be blissful to look at us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you’ll be capable to be a part of us on telegram as successfully.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is devoted to harnessing the potential of Artificial Intelligence for social good. His most modern endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth safety of machine learning and deep learning data that’s every technically sound and easily understandable by a big viewers. The platform boasts of over 2 million month-to-month views, illustrating its fame amongst audiences.

🙌 Observe MARKTECHPOST: Add us as a preferred provide on Google.

Elevate your perspective with NextTech Data, the place innovation meets notion.
Uncover the latest breakthroughs, get distinctive updates, and be a part of with a worldwide group of future-focused thinkers.
Unlock tomorrow’s tendencies right now: be taught additional, subscribe to our publication, and switch into part of the NextTech group at NextTech-news.com

Keep forward of the curve with NextBusiness 24. Discover extra tales, subscribe to our publication, and be a part of our rising group at nextbusiness24.com