Memory Enhanced AI Agents

12 min readJan 2, 2025

Explore the theory behind AI agents and learn how to build memory-enhanced conversational AI with this detailed tutorial. Understand the importance of short-term and long-term memory in AI agents for maintaining context, personalizing interactions, and improving overall performance. This guide provides step-by-step instructions to enhance AI agent functionality for more coherent, personalized conversations

Introduction to Agents in AI

At its core, an agent in artificial intelligence is any entity that can perceive its environment, reason about it, and take actions to achieve a particular goal. The term “agent” comes from the field of multi-agent systems and is now widely used across many AI domains, including machine learning, robotics, and generative AI.

In the context of generative AI, agents can be thought of as autonomous systems capable of producing content, generating solutions to problems, and interacting with users or other systems. These agents are often powered by advanced models, like GPT or diffusion-based models, that are capable of creating text, images, audio, or even video content based on given prompts or environments. Unlike traditional AI systems, which may be rule-based or reactive, agents are proactive. They autonomously decide what actions to take in a given context, based on learned knowledge.

AI agents stand out due to their ability to adapt, learn, and improve over time. The generative aspect refers to the agent’s ability to produce novel outputs based on existing information and context, rather than simply performing predefined tasks or classifications. Generative agents are typically used for tasks such as writing, content creation, design, and even autonomous interactions in virtual environments.

Types of AI Agents

AI agents come in different varieties, depending on their level of autonomy, complexity, and capabilities. These can be classified as follows:

a) Reactive Agents

Reactive agents are the simplest form of AI agents. They react to inputs from the environment without any internal representation of the world or long-term goals. They operate based on a set of predefined rules or decision-making logic. While they can be very effective in specific scenarios, such as automated chatbots that follow scripted rules, they lack the ability to learn or adapt beyond their programmed responses.

In generative AI, reactive agents are less common since they don’t have the complexity to create new, diverse outputs. However, they can be useful for scenarios that don’t require much creativity but rather fast responses or rule-based decision-making.

b) Deliberative Agents

Deliberative agents are more sophisticated than reactive agents. They have an internal representation of the environment and use reasoning to make decisions. This means that deliberative agents are capable of planning and considering multiple possibilities before taking action. These agents typically utilize advanced decision-making algorithms, such as reinforcement learning or model-based reasoning systems, to assess possible actions and outcomes.

In the world of generative AI, deliberative agents are often used in tasks that require more complex decision-making, such as generating meaningful narratives, creating designs, or conducting research. These agents generate outputs not just based on prompts but through logical steps and a deeper understanding of the environment they operate in.

c) Hybrid Agents

Hybrid agents combine the capabilities of reactive and deliberative agents, integrating both rule-based systems and reasoning capabilities. These agents may rely on reactive rules in certain situations but also use reasoning or learning algorithms to handle more complex or dynamic tasks. Hybrid agents are capable of working in real-time scenarios but can also adapt to unexpected situations.

In generative AI, hybrid agents are particularly useful in environments where a balance between efficiency and creativity is necessary. For example, an AI system for generating marketing copy might rely on predefined templates (reactive) but can also learn from user feedback to improve the quality of future outputs (deliberative).

d) Autonomous Agents

Autonomous agents represent the highest level of sophistication in AI agents. These agents have a high degree of independence and can perform complex tasks without much human intervention. They can plan, learn, and even modify their behavior based on experience. Autonomous agents are typically used in real-world applications such as autonomous vehicles, robotics, and AI-driven simulations.

Generative AI agents that are autonomous might include those that generate content based on ongoing interactions, learning from the environment and adjusting their outputs in real-time. These agents could autonomously create marketing campaigns, generate personalized content for users, or even design entire websites based on user preferences.

Architecture of AI Agents

The architecture of AI agents involves the integration of several components, which work together to enable the agent to act intelligently. In generative AI, these architectures are complex, involving deep learning models, reinforcement learning, and even large-scale language models like GPT.

a) Perception Module

The perception module allows the agent to sense and understand its environment. This could be a visual input (such as images or videos), audio (such as speech recognition), or textual data (such as understanding prompts or commands). The agent uses this data to form a representation of the environment or task at hand.

This often involves a model capable of processing large datasets, such as a convolutional neural network (CNN) for image input or transformers for text input.

b) Decision-Making Module

Once the agent has perceived the environment, it needs to decide what actions to take. The decision-making module typically involves complex algorithms such as reinforcement learning, planning systems, or decision trees. In generative AI agents, decision-making could also involve generative models that produce content based on learned data.

c) Action Module

The action module is responsible for executing decisions. In generative AI, this could involve generating text, producing an image, or taking some other form of action to create the desired output. The quality and creativity of the action depend heavily on the generative model used by the agent.

d) Learning Module

Learning is a critical aspect of AI agents. It allows them to adapt and improve over time. The learning module is responsible for updating the agent’s knowledge base or model based on feedback from the environment or users. This is often done using techniques such as supervised learning, reinforcement learning, or unsupervised learning.

learning plays a crucial role in improving the quality and creativity of outputs. For instance, agents generating text might improve their writing skills based on user feedback, while agents generating images might refine their visual styles through repeated exposure to different data sets.

Role of Generative AI in Building Agents

Generative AI plays a pivotal role in empowering AI agents to perform creative tasks, learn from data, and generate meaningful outputs. Unlike traditional AI models that might focus on tasks like classification or regression, generative models create new data. This ability is key for agents that aim to autonomously generate content.

a) Generating Text and Speech

Generative AI models like GPT-3 and GPT-4 are widely used in conversational agents, chatbots, and virtual assistants. These models can generate coherent and contextually relevant text, allowing agents to interact with users in a natural and engaging manner. For instance, in customer service, AI agents can generate responses that mimic human-like interactions, improving the user experience.

b) Image and Video Generation

Generative models in the realm of computer vision, such as GANs (Generative Adversarial Networks), are capable of creating realistic images and even videos. These models are used in design applications, entertainment, and healthcare, among others. AI agents powered by these generative models can autonomously create visual content based on user inputs or specific requirements.

c) Music and Sound Synthesis

Generative AI agents can also produce music and sound effects. By leveraging models such as OpenAI’s Jukedeck or Google’s Magenta, agents can compose original music based on specific styles or themes. These agents learn from a vast amount of data on music composition and can produce tracks that are indistinguishable from those created by human musicians.

Architecture of Generative AI Agents

The architecture of generative AI agents is complex yet modular, ensuring flexibility and scalability. Each component plays a critical role in enabling functionality and adaptability.

1. Input Processor:

This component handles raw input data, transforming it into a format suitable for model processing. Techniques like tokenization for text, normalization for images, and feature extraction for audio are employed.

2. Core Model:

The heart of the generative AI agent, this component determines the agent’s capabilities. Transformers dominate this space for their unparalleled sequence-to-sequence modeling.

Example:

GPT excels in language generation.
GANs generate high-quality images.

3. Memory Module:

A crucial element for maintaining context over time, the memory module enables agents to make informed decisions based on historical interactions. Advanced architectures like Long Short-Term Memory (LSTM) networks and attention mechanisms are employed.

4. Reasoning Engine:

This module integrates external tools and APIs for advanced problem-solving. For instance, a reasoning engine might connect to a knowledge graph to verify facts or perform computations.

5. Output Processor:

Once the core model generates an output, the output processor refines and formats it. This may include converting text into speech or formatting images for web use.

6. Feedback Loop:

The feedback loop ensures continuous improvement by integrating user interactions and performance metrics. Techniques like reinforcement learning are used to fine-tune the agent’s behavior.

Workflow of Generative AI Agents

The typical workflow of a generative AI agent is iterative and dynamic, ensuring adaptability and efficiency.

1. Data Collection and Preprocessing:

Raw data is gathered from diverse sources such as user inputs, databases, or real-time environments. Preprocessing involves cleaning and organizing this data to eliminate inconsistencies.

2. Contextual Understanding:

Natural Language Understanding (NLU) modules analyze the input, extracting semantic meaning and intent. This step often employs pre-trained language models fine-tuned for domain-specific tasks.

3. Generation:

The core model processes the contextualized input to generate an output. For text, this might involve predicting the next word sequence; for images, it involves reconstructing visual features.

4. Evaluation and Optimization:

Generated outputs are evaluated using metrics like BLEU for text or Fréchet Inception Distance (FID) for images. Feedback mechanisms refine the model’s predictions.

5. User Interaction:

Outputs are presented to users through intuitive interfaces. For example, a chatbot might deliver responses via a conversational UI, while an art generator might provide downloadable images.

6. Continuous Learning:

Data from user interactions is fed back into the system, enhancing the model’s capabilities over time.

Applications of AI Agents

AI agents have a broad range of applications across industries, particularly when paired with generative AI models. Some of the most notable use cases include:

Content Creation

AI agents are increasingly used in content generation across fields such as writing, marketing, and design. For instance, AI writers can create articles, blog posts, and advertisements with minimal human input. AI agents can also generate designs, music, and videos for digital media companies.

Autonomous Systems

Generative AI agents are used in autonomous systems, such as self-driving cars and drones. These agents are able to make decisions based on real-time environmental data and execute complex tasks without human intervention.

Personalized Learning Systems

In education, AI agents can act as personalized tutors, adapting their teaching styles and content based on individual students’ needs. These agents can generate educational content, quizzes, and exercises, making learning more engaging and customized.

Healthcare Assistants

AI agents can assist in healthcare by analyzing medical data and generating diagnostic suggestions. These agents can process large datasets of medical images, reports, and patient histories to generate personalized treatment recommendations.

lets dive into tutorial how we can build an agent for customer care support.

Tutorial for Building a Memory-Enhanced Conversational Agent

This tutorial walks through the creation of a conversational AI agent designed to enhance its capabilities by utilizing both short-term and long-term memory systems. By integrating these memory types, we create a conversational agent that doesn’t just respond to single-session interactions but also recalls past conversations and user preferences over time. This allows the agent to offer more personalized, coherent, and context-aware responses, ensuring a richer user experience.

Traditional chatbots struggle with understanding the context of a conversation once the interaction ends. Without memory, chatbots can only respond based on immediate input, resulting in disjointed interactions and limited engagement. To overcome this, we implement short-term memory, which helps the chatbot understand and maintain context within a single conversation, and long-term memory, which allows the chatbot to retain useful data across multiple interactions.

With both memory types in place, our goal is to build a chatbot that:

Maintains context within a conversation session, remembering previous exchanges.
Remembers user preferences or important details across sessions, offering personalized and intelligent responses.
Provides coherent responses even when sessions are not consecutive, enhancing the experience over time.

Key Components:

Language Model (LM): This is the core of the agent, used for understanding and generating responses based on user input.
Short-term Memory: This stores the ongoing conversation history within a single session.
Long-term Memory: This holds essential information, such as user preferences or important past interactions, across different sessions.
Prompt Template: This defines the structure for how the agent forms its responses by incorporating both types of memory.
Memory Manager: Manages the storage and retrieval of short-term and long-term memories to facilitate seamless interactions.

Method Details:

Environment Setup: Import necessary libraries for AI models, memory management, and prompt handling.
Memory Systems Implementation: Design separate stores for short-term and long-term memories. Use logic to decide what to store in long-term memory and how to update these stores.
Conversation Structure: Develop a prompt template to structure inputs from both memory types and manage user interaction.
Interaction Loop: Design the function to handle chat inputs, retrieve and update memories, and generate responses.

lets dive into practical implementation:

Environment Setup and Imports

from langchain_openai import ChatOpenAI
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain.memory import ChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from dotenv import load_dotenv
import os

Imports: Here, we import all necessary libraries:

ChatOpenAI: Interface with OpenAI's language model.
RunnableWithMessageHistory: Handles message history for maintaining context.
ChatMessageHistory: Used to store and manage the chat history (short-term memory).
ChatPromptTemplate and MessagesPlaceholder: For constructing prompts that include both memory types.
load_dotenv and os: For loading environment variables securely (like the OpenAI API key).

# Load environment variables
load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')
# Initialize the language model
llm = ChatOpenAI(model="gpt-4o-mini", max_tokens=1000, temperature=0)

API Key Setup: We load the OpenAI API key securely from a .env file to authenticate API requests.
Language Model Initialization: We initialize the language model (gpt-4o-mini) with a token limit of 1000 and a low temperature (which makes responses more deterministic).

Memory Stores: Short-term and Long-term

# Short-term memory store (conversation history)
chat_store = {}

# Long-term memory store (persistent user data)
long_term_memory = {}

# Function to retrieve chat history for a specific session
def get_chat_history(session_id: str):
    if session_id not in chat_store:
        chat_store[session_id] = ChatMessageHistory()
    return chat_store[session_id]

# Function to update long-term memory
def update_long_term_memory(session_id: str, input: str, output: str):
    if session_id not in long_term_memory:
        long_term_memory[session_id] = []
    if len(input) > 20:  # Store inputs longer than 20 characters
        long_term_memory[session_id].append(f"User said: {input}")
    if len(long_term_memory[session_id]) > 5:  # Keep only the last 5 memories
        long_term_memory[session_id] = long_term_memory[session_id][-5:]

# Function to retrieve long-term memory
def get_long_term_memory(session_id: str):
    return ". ".join(long_term_memory.get(session_id, []))

Short-term memory (chat_store) holds the immediate chat history for each session. The get_chat_history() function retrieves or creates a new history store for each session.
Long-term memory (long_term_memory) stores persistent information such as user preferences. The update_long_term_memory() function checks if the input is significant (e.g., longer than 20 characters) and updates the memory. We also limit the stored memories to the most recent 5 to prevent overflow.
get_long_term_memory() retrieves and formats long-term memories for the session.

prompt template

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant. Use the information from long-term memory if relevant."),
    ("system", "Long-term memory: {long_term_memory}"),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

Prompt Template: We create a structured prompt for the language model that:
Defines the role of the assistant (You are a helpful AI assistant).
Provides the long-term memory in the prompt.
Includes placeholders for short-term memory (chat history) and user input.

Conversational Chain

# Chain combining the language model and memory management
chain = prompt | llm

# Manage message history with RunnableWithMessageHistory
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_chat_history,
    input_messages_key="input",
    history_messages_key="history"
)

Conversational Chain: We combine the prompt template and the language model into a chain of processing. The RunnableWithMessageHistory ensures that the message history is properly updated and accessed, allowing the AI to remember and use past interactions.
The get_chat_history function is invoked to manage message histories.

def chat(input_text: str, session_id: str):
    long_term_mem = get_long_term_memory(session_id)
    response = chain_with_history.invoke(
        {"input": input_text, "long_term_memory": long_term_mem},
        config={"configurable": {"session_id": session_id}}
    )
    update_long_term_memory(session_id, input_text, response.content)
    return response.content

chat() Function: This function handles user input:

Retrieves relevant long-term memory for the session.
Invokes the conversational chain with the current input and memory.
Updates the long-term memory based on the conversation.
Returnchat() Function: This function handles user input:
Retrieves relevant long-term memory for the session.
Invokes the conversational chain with the current input and memory.
Updates the long-term memory based on the conversation.
Returns the AI’s responses

session_id = "user_123"

print("AI:", chat("Hello! My name is Alice.", session_id))
print("AI:", chat("What's the weather like today?", session_id))
print("AI:", chat("I love sunny days.", session_id))
print("AI:", chat("Do you remember my name?", session_id))

Here, we simulate multiple user inputs for a session (user_123), testing the agent's ability to maintain context and remember user details across different inputs.

The AI remembers the user’s name (“Alice”) and can refer back to previous interactions.

# Print conversation history and long-term memory
print("Conversation History:")
for message in chat_store[session_id].messages:
    print(f"{message.type}: {message.content}")

print("\nLong-term Memory:")
print(get_long_term_memory(session_id))

Memory Review: This section prints out the entire conversation history and long-term memory for review, allowing us to see what the AI has learned and remembered.

Conclusion:

By leveraging both short-term and long-term memory, this implementation creates a much more engaging and personalized conversational agent. It remembers key details, ensuring context is maintained both within and across sessions, and provides a clear path for further enhancements, such as memory consolidation, emotional tracking, or external knowledge base integration.

This tutorial sets up a solid foundation for building smarter, more capable AI agents that enhance user experience by retaining and utilizing important context.

subscribe to my newsletter for recent updates in AI
Thank You