Google Research Adds Agentic RAG to Gemini Enterprise Agent Platform with a Sufficient Context Agent for multi-hop queries

Learn to build a simplified version of Google's agentic RAG framework with Sufficient Context Agent for handling multi-hop queries and improving factuality accuracy.

Introduction

In this tutorial, we'll explore how to implement a simplified version of Google's agentic Retrieval-Augmented Generation (RAG) framework, specifically focusing on the Sufficient Context Agent approach for handling multi-hop queries. This technique enhances factuality accuracy by re-searching until sufficient context is gathered to answer complex questions. While we won't be building the full Gemini Enterprise platform, we'll create a working prototype that demonstrates core concepts of agentic RAG.

Prerequisites

Python 3.8+
Basic understanding of RAG systems
Knowledge of vector databases (we'll use ChromaDB)
Familiarity with OpenAI's API or similar LLM services
Installed packages: chromadb, openai, numpy, python-dotenv

Step-by-Step Instructions

1. Setting Up the Environment

1.1 Create Project Structure

First, let's create our project directory structure:

mkdir agentic_rag_tutorial
 cd agentic_rag_tutorial
mkdir data src

1.2 Install Required Packages

We'll install all necessary packages:

pip install chromadb openai numpy python-dotenv

1.3 Create Environment File

Create a .env file in your project root:

OPENAI_API_KEY=your_openai_api_key_here

2. Implementing the Sufficient Context Agent

2.1 Create the Main Agent Class

Let's build our core agent class that will handle the multi-hop search:

import os
import openai
from chromadb import Client
from chromadb.config import Settings
import numpy as np
from dotenv import load_dotenv

load_dotenv()

class SufficientContextAgent:
    def __init__(self):
        self.client = Client(Settings(chroma_db_impl="duckdb", persist_directory="./chroma_db"))
        self.openai_client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        self.collection = self.client.get_or_create_collection("documents")

    def add_documents(self, documents):
        """Add documents to the vector store"""
        ids = [str(i) for i in range(len(documents))]
        self.collection.add(
            ids=ids,
            documents=documents
        )

    def retrieve_context(self, query, n_results=3):
        """Retrieve relevant documents for a query"""
        results = self.collection.query(
            query_texts=[query],
            n_results=n_results
        )
        return results['documents'][0]

    def generate_response(self, query, context):
        """Generate response using LLM with context"""
        prompt = f"""
        Based on the following context, answer the question:

        Context: {" ".join(context)}

        Question: {query}

        Answer:"""
        
        response = self.openai_client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=200
        )
        
        return response.choices[0].message.content.strip()

2.2 Add Multi-Hop Logic

Now we'll implement the key multi-hop search functionality:

    def search_with_hops(self, query, max_hops=3):
        """Search through multiple hops to gather sufficient context"""
        current_query = query
        context = []
        hop_count = 0
        
        while hop_count < max_hops:
            # Retrieve documents for current query
            docs = self.retrieve_context(current_query)
            
            # Add to context
            context.extend(docs)
            
            # Check if we have sufficient context
            if self.is_sufficient_context(context, current_query):
                break
            
            # Generate a new query based on retrieved documents
            current_query = self.generate_next_query(current_query, docs)
            hop_count += 1
        
        # Generate final response
        return self.generate_response(query, context)

    def is_sufficient_context(self, context, query):
        """Determine if current context is sufficient"""
        # Simple heuristic: check if we have enough information
        # In a real implementation, this would be more sophisticated
        context_length = len(" ".join(context))
        return context_length > 500  # Arbitrary threshold

    def generate_next_query(self, current_query, docs):
        """Generate a follow-up query based on retrieved documents"""
        prompt = f"""
        Given the following documents and query, generate a follow-up query that would help gather more information:

        Query: {current_query}

        Documents: {" ".join(docs)}

        Follow-up query:"""
        
        response = self.openai_client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a helpful assistant that generates follow-up queries for research."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=100
        )
        
        return response.choices[0].message.content.strip()

3. Testing the Implementation

3.1 Prepare Sample Data

Let's create some sample documents to populate our vector store:

def prepare_sample_data():
    documents = [
        "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France.",
        "The Eiffel Tower was built in 1887-1889 by Gustave Eiffel.",
        "Gustave Eiffel was a French civil engineer and architect.",
        "The tower is 330 meters tall and was the world's tallest man-made structure until the Chrysler Building was built in 1930.",
        "The Eiffel Tower is one of the most recognizable structures in the world.",
        "It was initially criticized by some of France's leading artists and intellectuals.",
        "The tower has become a global cultural icon of France and one of the most-visited paid monuments in the world.",
        "The tower was originally intended to be a temporary structure for the 1889 World's Fair.",
        "It was almost dismantled in 1889 but was saved due to its value as a radio transmission tower.",
        "The tower has been a symbol of Paris since its construction."
    ]
    return documents

3.2 Run the Complete Example

Let's create a complete example that demonstrates the agent in action:

def main():
    # Initialize agent
    agent = SufficientContextAgent()
    
    # Add sample documents
    documents = prepare_sample_data()
    agent.add_documents(documents)
    
    # Test multi-hop query
    query = "How tall is the Eiffel Tower and who built it?"
    print(f"Query: {query}")
    
    response = agent.search_with_hops(query)
    print(f"Response: {response}")
    
    # Test another complex query
    query2 = "What is the significance of the Eiffel Tower in French culture?"
    print(f"\nQuery: {query2}")
    
    response2 = agent.search_with_hops(query2)
    print(f"Response: {response2}")

if __name__ == "__main__":
    main()

4. Understanding the Implementation

4.1 Why This Approach Works

The Sufficient Context Agent approach works by:

Iterative Retrieval: Instead of relying on a single search, it performs multiple hops to gather more information
Context Evaluation: It determines when enough information has been gathered
Query Refinement: It generates follow-up queries to expand the search

4.2 Performance Benefits

This approach improves factuality accuracy by up to 34% because:

It avoids jumping to conclusions based on partial information
It ensures sufficient grounding before generating responses
It handles complex, multi-faceted queries more effectively

5. Optimization Considerations

5.1 Adding Memory Management

For production use, consider implementing memory management to prevent context overflow:

    def manage_context_length(self, context, max_length=1000):
        """Manage context length to prevent token overflow"""
        context_text = " ".join(context)
        if len(context_text) > max_length:
            # Keep only the most relevant documents
            return context[:len(context)//2]
        return context

5.2 Adding Caching

Implement caching to avoid redundant queries:

    def __init__(self):
        # ... existing code ...
        self.query_cache = {}

    def cached_retrieve(self, query, n_results=3):
        """Retrieve with caching"""
        if query in self.query_cache:
            return self.query_cache[query]
        
        result = self.retrieve_context(query, n_results)
        self.query_cache[query] = result
        return result

Summary

In this tutorial, we've built a simplified version of Google's agentic RAG framework with a Sufficient Context Agent. We've implemented core functionality including multi-hop querying, context gathering, and response generation. While this is a simplified implementation, it demonstrates the fundamental concepts behind Google's approach that achieved up to 34% improvement in factuality accuracy. The key insights are iterative retrieval, context evaluation, and query refinement - all of which help ensure that complex queries receive well-grounded responses.