Skip to main content

Overview

InterviewGuide uses pgvector (a PostgreSQL extension) for vector storage and similarity search. This enables RAG (Retrieval-Augmented Generation) for the knowledge base feature.

Architecture


Spring AI Configuration

application.yml

Location: src/main/resources/application.yml:52
spring:
  ai:
    openai:
      # Alibaba Cloud DashScope (OpenAI-compatible mode)
      base-url: https://dashscope.aliyuncs.com/compatible-mode
      api-key: ${AI_BAILIAN_API_KEY}
      
      # Embedding model: text-embedding-v3
      embedding:
        options:
          model: text-embedding-v3
    
    # pgvector configuration
    vectorstore:
      pgvector:
        index-type: HNSW                    # Hierarchical Navigable Small World
        distance-type: COSINE_DISTANCE       # Cosine similarity
        dimensions: 1024                     # text-embedding-v3 dimension
        initialize-schema: true              # Auto-create tables (dev only)
        remove-existing-vector-store-table: false  # Keep existing data
text-embedding-v3 generates 1024-dimensional embeddings. The HNSW index provides fast approximate nearest neighbor search.

Database Schema

Spring AI automatically creates the vector_store table:
CREATE TABLE vector_store (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    content TEXT,                          -- Original text chunk
    metadata JSON,                         -- Metadata (kb_id, filename, etc.)
    embedding vector(1024)                 -- 1024-dim embedding vector
);

-- HNSW index for fast similarity search
CREATE INDEX vector_store_embedding_idx 
    ON vector_store 
    USING hnsw (embedding vector_cosine_ops);
In production, set initialize-schema: false and manage schema migrations manually to avoid accidental data loss.

KnowledgeBaseVectorService

Core service for document vectorization and similarity search. Location: modules/knowledgebase/service/KnowledgeBaseVectorService.java:1

Text Splitting

Large documents are split into smaller chunks for better retrieval:
// modules/knowledgebase/service/KnowledgeBaseVectorService.java:23
@Service
public class KnowledgeBaseVectorService {
    
    private static final int MAX_BATCH_SIZE = 10;  // Alibaba Cloud limit
    private final VectorStore vectorStore;
    private final TextSplitter textSplitter;
    private final VectorRepository vectorRepository;

    public KnowledgeBaseVectorService(VectorStore vectorStore, 
                                      VectorRepository vectorRepository) {
        this.vectorStore = vectorStore;
        this.vectorRepository = vectorRepository;
        
        // TokenTextSplitter: ~500 tokens per chunk, 50 token overlap
        this.textSplitter = new TokenTextSplitter();
    }
}
Why chunk overlap? Overlapping chunks ensure context isn’t lost at chunk boundaries. A 50-token overlap means the last 50 tokens of chunk N appear at the start of chunk N+1.

Vectorization and Storage

// modules/knowledgebase/service/KnowledgeBaseVectorService.java:44
@Transactional
public void vectorizeAndStore(Long knowledgeBaseId, String content) {
    log.info("Starting vectorization: kbId={}, contentLength={}", 
        knowledgeBaseId, content.length());
    
    try {
        // 1. Delete old vectors for this knowledge base
        deleteByKnowledgeBaseId(knowledgeBaseId);
        
        // 2. Split text into chunks
        List<Document> chunks = textSplitter.apply(
            List.of(new Document(content))
        );
        
        log.info("Text split into {} chunks", chunks.size());
        
        // 3. Add metadata (kb_id) to each chunk
        chunks.forEach(chunk -> 
            chunk.getMetadata().put("kb_id", knowledgeBaseId.toString())
        );
        
        // 4. Batch process (Alibaba Cloud limit: 10 per batch)
        int totalChunks = chunks.size();
        int batchCount = (totalChunks + MAX_BATCH_SIZE - 1) / MAX_BATCH_SIZE;
        
        log.info("Processing {} batches ({} chunks per batch)", 
            batchCount, MAX_BATCH_SIZE);
        
        for (int i = 0; i < batchCount; i++) {
            int start = i * MAX_BATCH_SIZE;
            int end = Math.min(start + MAX_BATCH_SIZE, totalChunks);
            List<Document> batch = chunks.subList(start, end);
            
            log.debug("Processing batch {}/{}: chunks {}-{}", 
                i + 1, batchCount, start + 1, end);
            
            // Generate embeddings and store
            vectorStore.add(batch);
        }
        
        log.info("Vectorization complete: kbId={}, chunks={}, batches={}",
            knowledgeBaseId, totalChunks, batchCount);
    } catch (Exception e) {
        log.error("Vectorization failed: kbId={}, error={}", 
            knowledgeBaseId, e.getMessage(), e);
        throw new RuntimeException("向量化失败: " + e.getMessage(), e);
    }
}

What Happens in vectorStore.add(batch)?

1

Embedding Generation

Spring AI calls Alibaba Cloud’s text-embedding-v3 API to generate embeddings for each chunk
2

Database Insert

Inserts records into the vector_store table with content, metadata, and embedding vector
3

Index Update

PostgreSQL automatically updates the HNSW index for fast similarity search

// modules/knowledgebase/service/KnowledgeBaseVectorService.java:88
public List<Document> similaritySearch(String query, 
                                       List<Long> knowledgeBaseIds, 
                                       int topK, 
                                       double minScore) {
    log.info("Vector similarity search: query={}, kbIds={}, topK={}, minScore={}",
        query, knowledgeBaseIds, topK, minScore);
    
    try {
        SearchRequest.Builder builder = SearchRequest.builder()
            .query(query)
            .topK(Math.max(topK, 1));

        if (minScore > 0) {
            builder.similarityThreshold(minScore);
        }

        // Filter by knowledge base IDs
        if (knowledgeBaseIds != null && !knowledgeBaseIds.isEmpty()) {
            builder.filterExpression(buildKbFilterExpression(knowledgeBaseIds));
        }

        List<Document> results = vectorStore.similaritySearch(builder.build());
        if (results == null) {
            return List.of();
        }
        
        log.info("Search complete: found {} relevant documents", results.size());
        return results;
        
    } catch (Exception e) {
        log.warn("Filter-based search failed, falling back to local filtering: {}", 
            e.getMessage());
        return similaritySearchFallback(query, knowledgeBaseIds, topK, minScore);
    }
}

Filter Expressions

Filters restrict search to specific knowledge bases:
// modules/knowledgebase/service/KnowledgeBaseVectorService.java:167
private String buildKbFilterExpression(List<Long> knowledgeBaseIds) {
    String values = knowledgeBaseIds.stream()
        .filter(Objects::nonNull)
        .map(String::valueOf)
        .map(id -> "'" + id + "'")  // Quote string values
        .collect(Collectors.joining(", "));
    
    return "kb_id in [" + values + "]";
}
Example: kb_id in ['123', '456', '789']
The filter uses the metadata JSON field. Spring AI translates this to a PostgreSQL JSON query:
WHERE metadata->>'kb_id' IN ('123', '456', '789')

Fallback Strategy

If filter-based search fails (e.g., due to database limitations), the service falls back to local filtering:
// modules/knowledgebase/service/KnowledgeBaseVectorService.java:119
private List<Document> similaritySearchFallback(String query, 
                                                List<Long> knowledgeBaseIds, 
                                                int topK, 
                                                double minScore) {
    try {
        // Fetch more results (3x topK) and filter locally
        SearchRequest.Builder builder = SearchRequest.builder()
            .query(query)
            .topK(Math.max(topK * 3, topK));
        
        if (minScore > 0) {
            builder.similarityThreshold(minScore);
        }

        List<Document> allResults = vectorStore.similaritySearch(builder.build());
        if (allResults == null || allResults.isEmpty()) {
            return List.of();
        }

        // Filter by knowledge base IDs in application code
        if (knowledgeBaseIds != null && !knowledgeBaseIds.isEmpty()) {
            allResults = allResults.stream()
                .filter(doc -> isDocInKnowledgeBases(doc, knowledgeBaseIds))
                .collect(Collectors.toList());
        }

        List<Document> results = allResults.stream()
            .limit(topK)
            .collect(Collectors.toList());

        log.info("Fallback search complete: found {} documents", results.size());
        return results;
    } catch (Exception e) {
        log.error("Fallback search failed: {}", e.getMessage(), e);
        throw new RuntimeException("向量搜索失败: " + e.getMessage(), e);
    }
}

VectorRepository

Handles direct SQL operations for vector data. Location: modules/knowledgebase/repository/VectorRepository.java:1

Deleting Vectors by Knowledge Base

// modules/knowledgebase/repository/VectorRepository.java:16
@Repository
public class VectorRepository {
    
    private final JdbcTemplate jdbcTemplate;
    
    @Transactional(rollbackFor = Exception.class)
    public int deleteByKnowledgeBaseId(Long knowledgeBaseId) {
        log.info("Deleting vector data for kbId={}", knowledgeBaseId);
        
        /* 
         * PostgreSQL JSON query:
         * - metadata->>'kb_id' extracts kb_id as text
         * - Supports both string and numeric kb_id storage
         */
        String sql = """
            DELETE FROM vector_store
            WHERE metadata->>'kb_id' = ?
               OR (metadata->>'kb_id_long' IS NOT NULL 
                   AND (metadata->>'kb_id_long')::bigint = ?)
            """;
        
        try {
            int deletedRows = jdbcTemplate.update(sql, 
                knowledgeBaseId.toString(),  // String match
                knowledgeBaseId);             // Numeric match
            
            if (deletedRows > 0) {
                log.info("Deleted {} vector rows for kbId={}", deletedRows, knowledgeBaseId);
            } else {
                log.info("No vector data found for kbId={}", knowledgeBaseId);
            }
            
            return deletedRows;
            
        } catch (Exception e) {
            log.error("Failed to delete vectors: kbId={}, error={}", 
                knowledgeBaseId, e.getMessage());
            throw new RuntimeException("删除向量数据失败", e);
        }
    }
}
Metadata Type Handling: The query checks both kb_id (string) and kb_id_long (numeric) to handle different storage formats. This ensures compatibility across schema versions.

Metadata Structure

Example Metadata

{
  "kb_id": "12345",
  "filename": "redis-guide.pdf",
  "upload_time": "2024-03-10T08:30:00Z",
  "chunk_index": 3
}

Accessing Metadata in Code

Document doc = searchResults.get(0);
Map<String, Object> metadata = doc.getMetadata();

String kbId = (String) metadata.get("kb_id");
String filename = (String) metadata.get("filename");

RAG Integration

The vector store integrates with KnowledgeBaseQueryService for RAG:
// From KnowledgeBaseQueryService.java
public String answerQuestion(List<Long> knowledgeBaseIds, String question) {
    // 1. Query rewriting (optional)
    String rewrittenQuery = rewriteQuestion(question);
    
    // 2. Vector similarity search
    List<Document> relevantDocs = vectorService.similaritySearch(
        rewrittenQuery,
        knowledgeBaseIds,
        topK,
        minScore
    );
    
    // 3. Build context from retrieved documents
    String context = relevantDocs.stream()
        .map(Document::getText)
        .collect(Collectors.joining("\n\n---\n\n"));
    
    // 4. Generate AI response with context
    String systemPrompt = buildSystemPrompt();
    String userPrompt = buildUserPrompt(context, question);
    
    String answer = chatClient.prompt()
        .system(systemPrompt)
        .user(userPrompt)
        .call()
        .content();
    
    return answer;
}

Chunk Management

Optimal Chunk Size

Too Small

  • Problem: Lack of context, poor retrieval quality
  • Example: Single sentences or short paragraphs

Too Large

  • Problem: Noisy results, irrelevant content mixed in
  • Example: Entire documents or long sections

Just Right

  • Size: 400-600 tokens (~300-450 words)
  • Overlap: 50-100 tokens for context preservation

Configuration

  • Splitter: TokenTextSplitter (default settings)
  • Embedder: text-embedding-v3 (1024 dims)

Chunk Statistics

Typical document statistics:
Document Size: 50KB
Total Tokens: ~12,500
Chunks: ~25 (500 tokens each)
Overlap: 50 tokens per boundary
Embedding API Calls: 3 batches (10 chunks per batch)

Performance Optimization

HNSW Index Parameters

The HNSW (Hierarchical Navigable Small World) index balances speed and accuracy:
-- Default HNSW parameters (configured by Spring AI)
CREATE INDEX vector_store_embedding_idx 
    ON vector_store 
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);
  • m: Number of connections per layer (higher = more accurate, slower)
  • ef_construction: Search width during index build (higher = better quality, slower build)
  • Defaults: Good balance for most use cases

Query Optimization

-- Efficient query with filter
SELECT id, content, metadata, embedding
FROM vector_store
WHERE metadata->>'kb_id' IN ('123', '456')
ORDER BY embedding <=> '[0.1, 0.2, ..., 0.9]'  -- Cosine distance
LIMIT 10;

Batch Processing

Always batch embedding API calls to reduce latency. Alibaba Cloud text-embedding-v3 supports up to 10 texts per request.
// Good: Batch of 10
vectorStore.add(chunks.subList(0, 10));

// Bad: One at a time
for (Document chunk : chunks) {
    vectorStore.add(List.of(chunk));  // 10x slower!
}

Troubleshooting

Possible Causes:
  • minScore threshold too high
  • No vectors for specified kb_id
  • Query embedding fails to match any documents
Solutions:
  • Lower minScore (try 0.2-0.3 for initial testing)
  • Check if vectorization completed successfully
  • Verify kb_id metadata is correct
Possible Causes:
  • Missing HNSW index
  • Large result set (high topK)
  • Complex filter expressions
Solutions:
  • Verify index exists: \d vector_store in psql
  • Reduce topK to minimum needed
  • Simplify filters or use fallback strategy
Possible Causes:
  • Invalid API key
  • Rate limit exceeded
  • Batch size > 10
Solutions:
  • Check AI_BAILIAN_API_KEY environment variable
  • Add retry logic with exponential backoff
  • Ensure MAX_BATCH_SIZE = 10
Possible Causes:
  • Metadata stored as wrong type (string vs. number)
  • PostgreSQL JSON syntax issues
Solutions:
  • Store kb_id as string: knowledgeBaseId.toString()
  • Use fallback strategy for broader compatibility

Best Practices

1

Normalize Metadata

Always store kb_id as a string for consistent filtering:
chunk.getMetadata().put("kb_id", knowledgeBaseId.toString());
2

Delete Before Re-Vectorizing

Always delete old vectors before adding new ones to avoid duplicates:
deleteByKnowledgeBaseId(knowledgeBaseId);
vectorizeAndStore(knowledgeBaseId, newContent);
3

Monitor Chunk Count

Track the number of chunks per document to detect anomalies:
log.info("Document vectorized: kbId={}, chunks={}", kbId, chunks.size());
4

Use Appropriate Thresholds

Adjust minScore based on query type:
  • Short queries (1-4 chars): 0.18
  • Medium queries (5-12 chars): 0.28
  • Long queries (>12 chars): 0.28
5

Handle Missing Results

Always check for empty results and provide user-friendly fallback:
if (relevantDocs.isEmpty()) {
    return "抱歉,未找到相关信息";
}

See Also

Service Layer

How KnowledgeBaseQueryService orchestrates RAG

Redis Streams

Async vectorization with VectorizeStreamConsumer

Database Config

PostgreSQL and pgvector setup

AI Model Config

Embedding model and API configuration

Build docs developers (and LLMs) love