Introduction
In this tutorial, we'll build a reinforcement learning-powered agent that learns to retrieve relevant long-term memories for accurate question answering with large language models (LLMs). This approach enhances LLM performance by enabling them to access specific contextual information from a memory bank, rather than relying solely on their pre-trained knowledge.
The agent will learn to select the most relevant memories from a long-term memory bank by receiving rewards for successful retrieval. We'll use OpenAI embeddings to convert memories and queries into vector representations, enabling similarity-based retrieval.
Prerequisites
- Python 3.7+
- OpenAI API key
- Required Python packages:
openai,numpy,scikit-learn,faiss-cpu,pandas
Step-by-Step Instructions
1. Set up the environment and install dependencies
First, create a new Python environment and install the required packages:
pip install openai numpy scikit-learn faiss-cpu pandas
This setup ensures we have all necessary libraries for embeddings, vector similarity, and memory management.
2. Create a synthetic memory dataset
We'll generate a synthetic dataset of memories that our agent will learn to retrieve from:
import pandas as pd
import numpy as np
# Create synthetic memories
memories_data = [
{'id': 1, 'content': 'John Smith works at Google as a software engineer'},
{'id': 2, 'content': 'Sarah Johnson graduated from MIT with a PhD in Computer Science'},
{'id': 3, 'content': 'The company acquired a new office in San Francisco'},
{'id': 4, 'content': 'Dr. Michael Chen published a paper on reinforcement learning in 2023'},
{'id': 5, 'content': 'The marketing campaign launched on March 15th'},
]
memories_df = pd.DataFrame(memories_data)
print(memories_df)
This creates a dataset of contextual information that our agent will learn to retrieve based on queries.
3. Initialize OpenAI embeddings
Next, we'll set up the OpenAI client to generate embeddings for our memories and queries:
import openai
# Set your OpenAI API key
openai.api_key = 'your-api-key-here'
# Function to generate embeddings
def get_embedding(text):
response = openai.Embedding.create(
input=text,
model='text-embedding-ada-002'
)
return response['data'][0]['embedding']
The embeddings transform text into high-dimensional vectors that capture semantic meaning, enabling similarity comparisons.
4. Generate embeddings for all memories
We'll convert all our memories into vector representations:
# Generate embeddings for all memories
memories_df['embedding'] = memories_df['content'].apply(get_embedding)
print(memories_df[['id', 'content', 'embedding']])
These embeddings will be used for similarity matching when retrieving memories for queries.
5. Create a memory bank with FAISS for efficient retrieval
We'll use FAISS (Facebook AI Similarity Search) to create an efficient memory bank:
import faiss
# Convert embeddings to numpy array
embeddings = np.array(memories_df['embedding'].tolist()).astype('float32')
# Create FAISS index
index = faiss.IndexFlatIP(1536) # 1536 is the dimension of the embedding
index.add(embeddings)
# Save index and memories for later use
faiss.write_index(index, 'memory_bank.index')
memories_df.to_pickle('memories.pkl')
FAISS provides fast similarity search capabilities, crucial for real-time memory retrieval in agent systems.
6. Create a query processing function
We'll implement a function that processes queries and retrieves relevant memories:
def retrieve_relevant_memories(query, top_k=3):
# Generate embedding for the query
query_embedding = get_embedding(query)
query_embedding = np.array([query_embedding]).astype('float32')
# Search for most similar memories
distances, indices = index.search(query_embedding, top_k)
# Return relevant memories
relevant_memories = memories_df.iloc[indices[0]]
return relevant_memories, distances[0]
This function demonstrates how embeddings enable semantic similarity search.
7. Implement a reward system for reinforcement learning
We'll create a reward function that evaluates how well the agent retrieves relevant memories:
def calculate_reward(query, retrieved_memories, ground_truth):
# Simple reward: 1 if ground truth is in retrieved memories, 0 otherwise
reward = 0
for _, memory in retrieved_memories.iterrows():
if ground_truth.lower() in memory['content'].lower():
reward = 1
break
return reward
The reward system is crucial for training the agent to learn which memories are most relevant to specific queries.
8. Test the agent with sample queries
Let's test our memory retrieval system:
# Test queries
queries = [
'Where does John Smith work?',
'What degree did Sarah Johnson earn?',
'When did the marketing campaign start?'
]
for query in queries:
print(f'Query: {query}')
relevant_memories, distances = retrieve_relevant_memories(query)
print('Retrieved memories:')
for _, memory in relevant_memories.iterrows():
print(f' - {memory["content"]} (distance: {distances[relevant_memories.index.get_loc(memory.name)]})')
print()
This demonstrates the agent's ability to retrieve contextually relevant information from memory.
9. Train the reinforcement learning agent (conceptual)
While full RL training is complex, we can conceptualize how the agent would learn:
# Conceptual RL training loop
# In practice, this would involve more complex architecture
# Pseudocode for RL agent training
# 1. For each query, retrieve top-k memories
# 2. Calculate reward based on relevance
# 3. Update agent policy using reinforcement learning algorithm
# 4. Repeat for many query-memory pairs
The agent learns to select the most relevant memories by receiving rewards for successful retrieval, improving over time.
Summary
In this tutorial, we've built a reinforcement learning-powered agent that learns to retrieve relevant long-term memories for accurate question answering. We created a synthetic memory dataset, generated embeddings using OpenAI, and implemented FAISS-based retrieval. The agent's learning capability comes from a reward system that evaluates the relevance of retrieved memories.
This approach enhances LLM performance by providing contextual information that may not be in the model's pre-trained knowledge, making it particularly useful for applications requiring domain-specific or up-to-date information. The foundation we've built can be extended with more sophisticated RL algorithms for continuous learning and improvement.



