Introduction
In this tutorial, we'll explore how to implement a simplified version of Google's agentic Retrieval-Augmented Generation (RAG) framework, specifically focusing on the Sufficient Context Agent approach for handling multi-hop queries. This technique enhances factuality accuracy by re-searching until sufficient context is gathered to answer complex questions. While we won't be building the full Gemini Enterprise platform, we'll create a working prototype that demonstrates core concepts of agentic RAG.
Prerequisites
- Python 3.8+
- Basic understanding of RAG systems
- Knowledge of vector databases (we'll use ChromaDB)
- Familiarity with OpenAI's API or similar LLM services
- Installed packages:
chromadb,openai,numpy,python-dotenv
Step-by-Step Instructions
1. Setting Up the Environment
1.1 Create Project Structure
First, let's create our project directory structure:
mkdir agentic_rag_tutorial
cd agentic_rag_tutorial
mkdir data src
1.2 Install Required Packages
We'll install all necessary packages:
pip install chromadb openai numpy python-dotenv
1.3 Create Environment File
Create a .env file in your project root:
OPENAI_API_KEY=your_openai_api_key_here
2. Implementing the Sufficient Context Agent
2.1 Create the Main Agent Class
Let's build our core agent class that will handle the multi-hop search:
import os
import openai
from chromadb import Client
from chromadb.config import Settings
import numpy as np
from dotenv import load_dotenv
load_dotenv()
class SufficientContextAgent:
def __init__(self):
self.client = Client(Settings(chroma_db_impl="duckdb", persist_directory="./chroma_db"))
self.openai_client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
self.collection = self.client.get_or_create_collection("documents")
def add_documents(self, documents):
"""Add documents to the vector store"""
ids = [str(i) for i in range(len(documents))]
self.collection.add(
ids=ids,
documents=documents
)
def retrieve_context(self, query, n_results=3):
"""Retrieve relevant documents for a query"""
results = self.collection.query(
query_texts=[query],
n_results=n_results
)
return results['documents'][0]
def generate_response(self, query, context):
"""Generate response using LLM with context"""
prompt = f"""
Based on the following context, answer the question:
Context: {" ".join(context)}
Question: {query}
Answer:"""
response = self.openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
max_tokens=200
)
return response.choices[0].message.content.strip()
2.2 Add Multi-Hop Logic
Now we'll implement the key multi-hop search functionality:
def search_with_hops(self, query, max_hops=3):
"""Search through multiple hops to gather sufficient context"""
current_query = query
context = []
hop_count = 0
while hop_count < max_hops:
# Retrieve documents for current query
docs = self.retrieve_context(current_query)
# Add to context
context.extend(docs)
# Check if we have sufficient context
if self.is_sufficient_context(context, current_query):
break
# Generate a new query based on retrieved documents
current_query = self.generate_next_query(current_query, docs)
hop_count += 1
# Generate final response
return self.generate_response(query, context)
def is_sufficient_context(self, context, query):
"""Determine if current context is sufficient"""
# Simple heuristic: check if we have enough information
# In a real implementation, this would be more sophisticated
context_length = len(" ".join(context))
return context_length > 500 # Arbitrary threshold
def generate_next_query(self, current_query, docs):
"""Generate a follow-up query based on retrieved documents"""
prompt = f"""
Given the following documents and query, generate a follow-up query that would help gather more information:
Query: {current_query}
Documents: {" ".join(docs)}
Follow-up query:"""
response = self.openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant that generates follow-up queries for research."},
{"role": "user", "content": prompt}
],
max_tokens=100
)
return response.choices[0].message.content.strip()
3. Testing the Implementation
3.1 Prepare Sample Data
Let's create some sample documents to populate our vector store:
def prepare_sample_data():
documents = [
"The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France.",
"The Eiffel Tower was built in 1887-1889 by Gustave Eiffel.",
"Gustave Eiffel was a French civil engineer and architect.",
"The tower is 330 meters tall and was the world's tallest man-made structure until the Chrysler Building was built in 1930.",
"The Eiffel Tower is one of the most recognizable structures in the world.",
"It was initially criticized by some of France's leading artists and intellectuals.",
"The tower has become a global cultural icon of France and one of the most-visited paid monuments in the world.",
"The tower was originally intended to be a temporary structure for the 1889 World's Fair.",
"It was almost dismantled in 1889 but was saved due to its value as a radio transmission tower.",
"The tower has been a symbol of Paris since its construction."
]
return documents
3.2 Run the Complete Example
Let's create a complete example that demonstrates the agent in action:
def main():
# Initialize agent
agent = SufficientContextAgent()
# Add sample documents
documents = prepare_sample_data()
agent.add_documents(documents)
# Test multi-hop query
query = "How tall is the Eiffel Tower and who built it?"
print(f"Query: {query}")
response = agent.search_with_hops(query)
print(f"Response: {response}")
# Test another complex query
query2 = "What is the significance of the Eiffel Tower in French culture?"
print(f"\nQuery: {query2}")
response2 = agent.search_with_hops(query2)
print(f"Response: {response2}")
if __name__ == "__main__":
main()
4. Understanding the Implementation
4.1 Why This Approach Works
The Sufficient Context Agent approach works by:
- Iterative Retrieval: Instead of relying on a single search, it performs multiple hops to gather more information
- Context Evaluation: It determines when enough information has been gathered
- Query Refinement: It generates follow-up queries to expand the search
4.2 Performance Benefits
This approach improves factuality accuracy by up to 34% because:
- It avoids jumping to conclusions based on partial information
- It ensures sufficient grounding before generating responses
- It handles complex, multi-faceted queries more effectively
5. Optimization Considerations
5.1 Adding Memory Management
For production use, consider implementing memory management to prevent context overflow:
def manage_context_length(self, context, max_length=1000):
"""Manage context length to prevent token overflow"""
context_text = " ".join(context)
if len(context_text) > max_length:
# Keep only the most relevant documents
return context[:len(context)//2]
return context
5.2 Adding Caching
Implement caching to avoid redundant queries:
def __init__(self):
# ... existing code ...
self.query_cache = {}
def cached_retrieve(self, query, n_results=3):
"""Retrieve with caching"""
if query in self.query_cache:
return self.query_cache[query]
result = self.retrieve_context(query, n_results)
self.query_cache[query] = result
return result
Summary
In this tutorial, we've built a simplified version of Google's agentic RAG framework with a Sufficient Context Agent. We've implemented core functionality including multi-hop querying, context gathering, and response generation. While this is a simplified implementation, it demonstrates the fundamental concepts behind Google's approach that achieved up to 34% improvement in factuality accuracy. The key insights are iterative retrieval, context evaluation, and query refinement - all of which help ensure that complex queries receive well-grounded responses.



