Skip to main content

What is RAG and Grounding?

Retrieval-Augmented Generation (RAG) improves Large Language Models (LLMs) by allowing them to access and process external information sources during generation. This ensures the model’s responses are grounded in factual data and reduces hallucinations.

Ungrounded Generation

Relies on LLM training data alone and is prone to hallucinations when it doesn’t have all the right facts

Grounded Generation

Provides fresh and potentially private data to the model as part of its input or prompt
RAG is a technique that retrieves relevant facts, often via search, and provides them to the LLM to improve generation quality and reduce hallucinations.

Why Use RAG?

1

Access Up-to-Date Information

LLMs are trained on static datasets, so their knowledge can become outdated. RAG allows them to access real-time or frequently updated information.
2

Improved Accuracy

RAG reduces the risk of LLM “hallucinations” (generating false or misleading information) by grounding responses in verified external data.
3

Enhanced Context

By combining additional knowledge sources with existing LLM knowledge, RAG provides better context to enhance response quality.
4

Private Data Access

Enable LLMs to understand and use your organization’s private data that wasn’t part of their training.

RAG Architecture

A typical RAG system consists of several key components:

1. Data Ingestion

Intake data from different sources:
  • Local files
  • Google Cloud Storage
  • Google Drive
  • BigQuery
  • Websites and structured data

2. Data Transformation

Conversion and preparation of data for indexing:
  • Document parsing and extraction
  • Text chunking and splitting
  • Metadata extraction
  • Format normalization

3. Embedding

Numerical representations of text that capture semantic meaning and context. Similar or related text tends to have similar embeddings in high-dimensional vector space.
from vertexai.language_models import TextEmbeddingModel

# Initialize embedding model
model = TextEmbeddingModel.from_pretrained("text-embedding-005")

# Generate embeddings
embeddings = model.get_embeddings(["Sample text to embed"])
for embedding in embeddings:
    vector = embedding.values
    print(f"Embedding dimension: {len(vector)}")

4. Data Indexing

Structure the knowledge base for optimized searching:
  • Vector databases (Vertex AI Vector Search, Feature Store)
  • Enterprise search indexes (Vertex AI Search)
  • Database storage (AlloyDB, BigQuery)

5. Retrieval

When a user asks a question, the retrieval component searches through the knowledge base to find relevant information:
  • Semantic search using vector similarity
  • Keyword-based search
  • Hybrid search combining both approaches

6. Generation

The retrieved information becomes context added to the original user query to guide the LLM in generating factually grounded responses.
from google import genai
from google.genai.types import Tool, Retrieval, VertexRagStore

client = genai.Client(vertexai=True, project=PROJECT_ID, location="global")

# Define RAG tool
rag_tool = Tool(
    retrieval=Retrieval(
        vertex_rag_store=VertexRagStore(
            rag_resources=[f"projects/{PROJECT_ID}/locations/us-east1/ragCorpora/{corpus_name}"]
        )
    )
)

# Generate with RAG context
response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents="What are the key features of our product?",
    config={"tools": [rag_tool]}
)

RAG Solutions on Google Cloud

Google Cloud offers multiple approaches for implementing RAG:

Vertex AI Search

Out-of-the-box enterprise search with Google-quality results for your data

RAG Engine

Managed data framework for building context-augmented LLM applications with flexible backends

Custom RAG

Build your own RAG pipeline using Vertex AI components and vector databases

Grounding API

Ground Gemini responses in Google Search or Vertex AI Search with a simple API

Common Use Cases

Enable employees to search across company documents, wikis, and data sources with natural language queries.

Customer Support

Provide customer service agents or chatbots with instant access to product documentation and support knowledge bases.

Document Q&A

Answer questions about contracts, reports, research papers, and other documents by extracting relevant information. Help developers find relevant code snippets, API documentation, and implementation examples.
For best results with RAG, focus on high-quality data ingestion, appropriate chunking strategies, and comprehensive evaluation of retrieval accuracy.

Next Steps

RAG Engine

Learn about managed RAG orchestration with Vertex AI

Vertex AI Search

Explore enterprise search capabilities and datastores

Grounding Techniques

Understand chunking, retrieval, and grounding strategies

Evaluation

Learn how to evaluate RAG system performance

Build docs developers (and LLMs) love