pgvector Integration with Spring AI

Overview

InterviewGuide uses pgvector (a PostgreSQL extension) for vector storage and similarity search. This enables RAG (Retrieval-Augmented Generation) for the knowledge base feature.

Architecture

Spring AI Configuration

application.yml

Location: src/main/resources/application.yml:52

spring:
  ai:
    openai:
      # Alibaba Cloud DashScope (OpenAI-compatible mode)
      base-url: https://dashscope.aliyuncs.com/compatible-mode
      api-key: ${AI_BAILIAN_API_KEY}
      
      # Embedding model: text-embedding-v3
      embedding:
        options:
          model: text-embedding-v3
    
    # pgvector configuration
    vectorstore:
      pgvector:
        index-type: HNSW                    # Hierarchical Navigable Small World
        distance-type: COSINE_DISTANCE       # Cosine similarity
        dimensions: 1024                     # text-embedding-v3 dimension
        initialize-schema: true              # Auto-create tables (dev only)
        remove-existing-vector-store-table: false  # Keep existing data

text-embedding-v3 generates 1024-dimensional embeddings. The HNSW index provides fast approximate nearest neighbor search.

Database Schema

Spring AI automatically creates the vector_store table:

CREATE TABLE vector_store (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    content TEXT,                          -- Original text chunk
    metadata JSON,                         -- Metadata (kb_id, filename, etc.)
    embedding vector(1024)                 -- 1024-dim embedding vector
);

-- HNSW index for fast similarity search
CREATE INDEX vector_store_embedding_idx 
    ON vector_store 
    USING hnsw (embedding vector_cosine_ops);

In production, set initialize-schema: false and manage schema migrations manually to avoid accidental data loss.

KnowledgeBaseVectorService

Core service for document vectorization and similarity search. Location: modules/knowledgebase/service/KnowledgeBaseVectorService.java:1

Text Splitting

Large documents are split into smaller chunks for better retrieval:

// modules/knowledgebase/service/KnowledgeBaseVectorService.java:23
@Service
public class KnowledgeBaseVectorService {
    
    private static final int MAX_BATCH_SIZE = 10;  // Alibaba Cloud limit
    private final VectorStore vectorStore;
    private final TextSplitter textSplitter;
    private final VectorRepository vectorRepository;

    public KnowledgeBaseVectorService(VectorStore vectorStore, 
                                      VectorRepository vectorRepository) {
        this.vectorStore = vectorStore;
        this.vectorRepository = vectorRepository;
        
        // TokenTextSplitter: ~500 tokens per chunk, 50 token overlap
        this.textSplitter = new TokenTextSplitter();
    }
}

Why chunk overlap? Overlapping chunks ensure context isn’t lost at chunk boundaries. A 50-token overlap means the last 50 tokens of chunk N appear at the start of chunk N+1.

Vectorization and Storage

// modules/knowledgebase/service/KnowledgeBaseVectorService.java:44
@Transactional
public void vectorizeAndStore(Long knowledgeBaseId, String content) {
    log.info("Starting vectorization: kbId={}, contentLength={}", 
        knowledgeBaseId, content.length());
    
    try {
        // 1. Delete old vectors for this knowledge base
        deleteByKnowledgeBaseId(knowledgeBaseId);
        
        // 2. Split text into chunks
        List<Document> chunks = textSplitter.apply(
            List.of(new Document(content))
        );
        
        log.info("Text split into {} chunks", chunks.size());
        
        // 3. Add metadata (kb_id) to each chunk
        chunks.forEach(chunk -> 
            chunk.getMetadata().put("kb_id", knowledgeBaseId.toString())
        );
        
        // 4. Batch process (Alibaba Cloud limit: 10 per batch)
        int totalChunks = chunks.size();
        int batchCount = (totalChunks + MAX_BATCH_SIZE - 1) / MAX_BATCH_SIZE;
        
        log.info("Processing {} batches ({} chunks per batch)", 
            batchCount, MAX_BATCH_SIZE);
        
        for (int i = 0; i < batchCount; i++) {
            int start = i * MAX_BATCH_SIZE;
            int end = Math.min(start + MAX_BATCH_SIZE, totalChunks);
            List<Document> batch = chunks.subList(start, end);
            
            log.debug("Processing batch {}/{}: chunks {}-{}", 
                i + 1, batchCount, start + 1, end);
            
            // Generate embeddings and store
            vectorStore.add(batch);
        }
        
        log.info("Vectorization complete: kbId={}, chunks={}, batches={}",
            knowledgeBaseId, totalChunks, batchCount);
    } catch (Exception e) {
        log.error("Vectorization failed: kbId={}, error={}", 
            knowledgeBaseId, e.getMessage(), e);
        throw new RuntimeException("向量化失败: " + e.getMessage(), e);
    }
}

What Happens in `vectorStore.add(batch)`?

Embedding Generation

Spring AI calls Alibaba Cloud’s text-embedding-v3 API to generate embeddings for each chunk

Database Insert

Inserts records into the vector_store table with content, metadata, and embedding vector

Index Update

PostgreSQL automatically updates the HNSW index for fast similarity search

Similarity Search

Basic Search

// modules/knowledgebase/service/KnowledgeBaseVectorService.java:88
public List<Document> similaritySearch(String query, 
                                       List<Long> knowledgeBaseIds, 
                                       int topK, 
                                       double minScore) {
    log.info("Vector similarity search: query={}, kbIds={}, topK={}, minScore={}",
        query, knowledgeBaseIds, topK, minScore);
    
    try {
        SearchRequest.Builder builder = SearchRequest.builder()
            .query(query)
            .topK(Math.max(topK, 1));

        if (minScore > 0) {
            builder.similarityThreshold(minScore);
        }

        // Filter by knowledge base IDs
        if (knowledgeBaseIds != null && !knowledgeBaseIds.isEmpty()) {
            builder.filterExpression(buildKbFilterExpression(knowledgeBaseIds));
        }

        List<Document> results = vectorStore.similaritySearch(builder.build());
        if (results == null) {
            return List.of();
        }
        
        log.info("Search complete: found {} relevant documents", results.size());
        return results;
        
    } catch (Exception e) {
        log.warn("Filter-based search failed, falling back to local filtering: {}", 
            e.getMessage());
        return similaritySearchFallback(query, knowledgeBaseIds, topK, minScore);
    }
}

Filter Expressions

Filters restrict search to specific knowledge bases:

// modules/knowledgebase/service/KnowledgeBaseVectorService.java:167
private String buildKbFilterExpression(List<Long> knowledgeBaseIds) {
    String values = knowledgeBaseIds.stream()
        .filter(Objects::nonNull)
        .map(String::valueOf)
        .map(id -> "'" + id + "'")  // Quote string values
        .collect(Collectors.joining(", "));
    
    return "kb_id in [" + values + "]";
}

Example: kb_id in ['123', '456', '789']

The filter uses the metadata JSON field. Spring AI translates this to a PostgreSQL JSON query:

WHERE metadata->>'kb_id' IN ('123', '456', '789')

Fallback Strategy

If filter-based search fails (e.g., due to database limitations), the service falls back to local filtering:

// modules/knowledgebase/service/KnowledgeBaseVectorService.java:119
private List<Document> similaritySearchFallback(String query, 
                                                List<Long> knowledgeBaseIds, 
                                                int topK, 
                                                double minScore) {
    try {
        // Fetch more results (3x topK) and filter locally
        SearchRequest.Builder builder = SearchRequest.builder()
            .query(query)
            .topK(Math.max(topK * 3, topK));
        
        if (minScore > 0) {
            builder.similarityThreshold(minScore);
        }

        List<Document> allResults = vectorStore.similaritySearch(builder.build());
        if (allResults == null || allResults.isEmpty()) {
            return List.of();
        }

        // Filter by knowledge base IDs in application code
        if (knowledgeBaseIds != null && !knowledgeBaseIds.isEmpty()) {
            allResults = allResults.stream()
                .filter(doc -> isDocInKnowledgeBases(doc, knowledgeBaseIds))
                .collect(Collectors.toList());
        }

        List<Document> results = allResults.stream()
            .limit(topK)
            .collect(Collectors.toList());

        log.info("Fallback search complete: found {} documents", results.size());
        return results;
    } catch (Exception e) {
        log.error("Fallback search failed: {}", e.getMessage(), e);
        throw new RuntimeException("向量搜索失败: " + e.getMessage(), e);
    }
}

VectorRepository

Handles direct SQL operations for vector data. Location: modules/knowledgebase/repository/VectorRepository.java:1

Deleting Vectors by Knowledge Base

// modules/knowledgebase/repository/VectorRepository.java:16
@Repository
public class VectorRepository {
    
    private final JdbcTemplate jdbcTemplate;
    
    @Transactional(rollbackFor = Exception.class)
    public int deleteByKnowledgeBaseId(Long knowledgeBaseId) {
        log.info("Deleting vector data for kbId={}", knowledgeBaseId);
        
        /* 
         * PostgreSQL JSON query:
         * - metadata->>'kb_id' extracts kb_id as text
         * - Supports both string and numeric kb_id storage
         */
        String sql = """
            DELETE FROM vector_store
            WHERE metadata->>'kb_id' = ?
               OR (metadata->>'kb_id_long' IS NOT NULL 
                   AND (metadata->>'kb_id_long')::bigint = ?)
            """;
        
        try {
            int deletedRows = jdbcTemplate.update(sql, 
                knowledgeBaseId.toString(),  // String match
                knowledgeBaseId);             // Numeric match
            
            if (deletedRows > 0) {
                log.info("Deleted {} vector rows for kbId={}", deletedRows, knowledgeBaseId);
            } else {
                log.info("No vector data found for kbId={}", knowledgeBaseId);
            }
            
            return deletedRows;
            
        } catch (Exception e) {
            log.error("Failed to delete vectors: kbId={}, error={}", 
                knowledgeBaseId, e.getMessage());
            throw new RuntimeException("删除向量数据失败", e);
        }
    }
}

Metadata Type Handling: The query checks both kb_id (string) and kb_id_long (numeric) to handle different storage formats. This ensures compatibility across schema versions.

Metadata Structure

Example Metadata

{
  "kb_id": "12345",
  "filename": "redis-guide.pdf",
  "upload_time": "2024-03-10T08:30:00Z",
  "chunk_index": 3
}

Accessing Metadata in Code

Document doc = searchResults.get(0);
Map<String, Object> metadata = doc.getMetadata();

String kbId = (String) metadata.get("kb_id");
String filename = (String) metadata.get("filename");

RAG Integration

The vector store integrates with KnowledgeBaseQueryService for RAG:

// From KnowledgeBaseQueryService.java
public String answerQuestion(List<Long> knowledgeBaseIds, String question) {
    // 1. Query rewriting (optional)
    String rewrittenQuery = rewriteQuestion(question);
    
    // 2. Vector similarity search
    List<Document> relevantDocs = vectorService.similaritySearch(
        rewrittenQuery,
        knowledgeBaseIds,
        topK,
        minScore
    );
    
    // 3. Build context from retrieved documents
    String context = relevantDocs.stream()
        .map(Document::getText)
        .collect(Collectors.joining("\n\n---\n\n"));
    
    // 4. Generate AI response with context
    String systemPrompt = buildSystemPrompt();
    String userPrompt = buildUserPrompt(context, question);
    
    String answer = chatClient.prompt()
        .system(systemPrompt)
        .user(userPrompt)
        .call()
        .content();
    
    return answer;
}

Chunk Management

Optimal Chunk Size

Too Small

Problem: Lack of context, poor retrieval quality
Example: Single sentences or short paragraphs

Too Large

Problem: Noisy results, irrelevant content mixed in
Example: Entire documents or long sections

Just Right

Size: 400-600 tokens (~300-450 words)
Overlap: 50-100 tokens for context preservation

Configuration

Splitter: TokenTextSplitter (default settings)
Embedder: text-embedding-v3 (1024 dims)

Chunk Statistics

Typical document statistics:

Document Size: 50KB
Total Tokens: ~12,500
Chunks: ~25 (500 tokens each)
Overlap: 50 tokens per boundary
Embedding API Calls: 3 batches (10 chunks per batch)

Performance Optimization

HNSW Index Parameters

The HNSW (Hierarchical Navigable Small World) index balances speed and accuracy:

-- Default HNSW parameters (configured by Spring AI)
CREATE INDEX vector_store_embedding_idx 
    ON vector_store 
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

m: Number of connections per layer (higher = more accurate, slower)
ef_construction: Search width during index build (higher = better quality, slower build)
Defaults: Good balance for most use cases

Query Optimization

-- Efficient query with filter
SELECT id, content, metadata, embedding
FROM vector_store
WHERE metadata->>'kb_id' IN ('123', '456')
ORDER BY embedding <=> '[0.1, 0.2, ..., 0.9]'  -- Cosine distance
LIMIT 10;

Batch Processing

Always batch embedding API calls to reduce latency. Alibaba Cloud text-embedding-v3 supports up to 10 texts per request.

// Good: Batch of 10
vectorStore.add(chunks.subList(0, 10));

// Bad: One at a time
for (Document chunk : chunks) {
    vectorStore.add(List.of(chunk));  // 10x slower!
}

Troubleshooting

No Results Returned

Possible Causes:

minScore threshold too high
No vectors for specified kb_id
Query embedding fails to match any documents

Solutions:

Lower minScore (try 0.2-0.3 for initial testing)
Check if vectorization completed successfully
Verify kb_id metadata is correct

Slow Search Performance

Possible Causes:

Missing HNSW index
Large result set (high topK)
Complex filter expressions

Solutions:

Verify index exists: \d vector_store in psql
Reduce topK to minimum needed
Simplify filters or use fallback strategy

Embedding API Errors

Possible Causes:

Invalid API key
Rate limit exceeded
Batch size > 10

Solutions:

Check AI_BAILIAN_API_KEY environment variable
Add retry logic with exponential backoff
Ensure MAX_BATCH_SIZE = 10

Metadata Filter Not Working

Possible Causes:

Metadata stored as wrong type (string vs. number)
PostgreSQL JSON syntax issues

Solutions:

Store kb_id as string: knowledgeBaseId.toString()
Use fallback strategy for broader compatibility

Best Practices

Normalize Metadata

Always store kb_id as a string for consistent filtering:

chunk.getMetadata().put("kb_id", knowledgeBaseId.toString());

Delete Before Re-Vectorizing

Always delete old vectors before adding new ones to avoid duplicates:

deleteByKnowledgeBaseId(knowledgeBaseId);
vectorizeAndStore(knowledgeBaseId, newContent);

Monitor Chunk Count

Track the number of chunks per document to detect anomalies:

log.info("Document vectorized: kbId={}, chunks={}", kbId, chunks.size());

Use Appropriate Thresholds

Adjust minScore based on query type:

Short queries (1-4 chars): 0.18
Medium queries (5-12 chars): 0.28
Long queries (>12 chars): 0.28

Handle Missing Results

Always check for empty results and provide user-friendly fallback:

if (relevantDocs.isEmpty()) {
    return "抱歉，未找到相关信息";
}

Service Layer

How KnowledgeBaseQueryService orchestrates RAG

Redis Streams

Async vectorization with VectorizeStreamConsumer

Database Config

PostgreSQL and pgvector setup

AI Model Config

Embedding model and API configuration

Development Guide

Backend Development

Frontend Development

​Overview

​Architecture

​Spring AI Configuration

​application.yml

​Database Schema

​KnowledgeBaseVectorService

​Text Splitting

​Vectorization and Storage

​What Happens in vectorStore.add(batch)?

​Similarity Search

​Basic Search

​Filter Expressions

​Fallback Strategy

​VectorRepository

​Deleting Vectors by Knowledge Base

​Metadata Structure

​Example Metadata

​Accessing Metadata in Code

​RAG Integration

​Chunk Management

​Optimal Chunk Size

Too Small

Too Large

Just Right

Configuration

​Chunk Statistics

​Performance Optimization

​HNSW Index Parameters

​Query Optimization

​Batch Processing

​Troubleshooting

​Best Practices

​See Also

Service Layer

Redis Streams

Database Config

AI Model Config

Build docs developers (and LLMs) love

Overview

Architecture

Spring AI Configuration

application.yml

Database Schema

KnowledgeBaseVectorService

Text Splitting

Vectorization and Storage

What Happens in `vectorStore.add(batch)`?

Similarity Search

Basic Search

Filter Expressions

Fallback Strategy

VectorRepository

Deleting Vectors by Knowledge Base

Metadata Structure

Example Metadata

Accessing Metadata in Code

RAG Integration

Chunk Management

Optimal Chunk Size

Chunk Statistics

Performance Optimization

HNSW Index Parameters

Query Optimization

Batch Processing

Troubleshooting

Best Practices

See Also