Vector Search with FAISS - RAG Recruitment Assistant

The Problem with Keyword Search

Traditional resume screening relies on exact keyword matching:

# Old-school approach
if "Python" in cv_text:
    score += 1
if "React" in cv_text:
    score += 1
if "5 years" in cv_text:
    score += 1

Problems:

❌ Misses synonyms (“JavaScript” vs “JS”)
❌ Ignores context (“Learning Python” vs “Expert in Python”)
❌ Can’t handle semantic queries like “backend developers with API experience”

Vector Search: A Semantic Revolution

Vector search transforms text into high-dimensional numerical representations (embeddings) that capture semantic meaning. Documents with similar meanings have vectors that are close together in vector space.

Analogy: Imagine plotting words in 3D space where “dog” and “puppy” are close together, but “dog” and “car” are far apart. Vector search works in 384 dimensions instead of 3.

How It Works

The Technology Stack

HuggingFace Embeddings

The system uses sentence-transformers from HuggingFace to create embeddings:

# Source: notebook/Talent_Scout_3000x.ipynb
from langchain_huggingface import HuggingFaceEmbeddings

# Initialize embedding model
embeddings = HuggingFaceEmbeddings()

# This loads the default model: 'sentence-transformers/all-mpnet-base-v2'
# - 384 dimensions
# - Trained on 1 billion+ sentence pairs
# - Optimized for semantic similarity

Model Size

90.9M parameters

Vector Dimensions

384-dimensional embeddings

Multilingual

Supports English and Spanish

Local Execution

Runs on CPU without external API calls

FAISS (Facebook AI Similarity Search)

FAISS is a library for efficient similarity search in large-scale vector databases:

from langchain_community.vectorstores import FAISS

# Create vector store from documents
vectorstore = FAISS.from_documents(docs, embeddings)

# Convert to retriever for RAG pipeline
retriever = vectorstore.as_retriever()

Why FAISS?

⚡ Speed: Searches millions of vectors in milliseconds
📊 Scalability: Handles datasets that don’t fit in RAM
🎯 Accuracy: Multiple index types optimized for precision/speed tradeoffs
💰 Cost: Open-source and runs locally (no API costs)

From Text to Vectors: The Embedding Process

Step 1: Document Loading

from langchain_community.document_loaders import PyPDFLoader

# Load a student's CV
loader = PyPDFLoader("CV_Estudiante_4_Fernanda_Paredes.pdf")
docs = loader.load()

print(docs[0].page_content[:200])

Output:

FERNANDA PAREDES
Data Analyst Trainee
[email protected] | +51 912 345 678 | Lima, Perú

PERFIL DE ESTUDIANTE
Estudiante de 9no ciclo con interés en Desarrollo de Software y Datos.
Manejo de herramientas como Python...

Step 2: Text Chunking

LangChain automatically splits documents into manageable chunks:

# Automatic chunking by PyPDFLoader
# Each PDF page becomes a Document object
for i, doc in enumerate(docs):
    print(f"Page {i}: {len(doc.page_content)} characters")

Step 3: Vectorization

HuggingFace converts each chunk into a 384-dimensional vector:

# What happens under the hood:
text = "Estudiante con experiencia en Python y FastAPI"
vector = embeddings.embed_query(text)

print(f"Vector dimensions: {len(vector)}")
# Output: 384

print(f"First 5 values: {vector[:5]}")
# Output: [0.023, -0.145, 0.267, -0.089, 0.334]

Understanding the 384 Dimensions

Each dimension captures a different semantic feature:

Some dimensions respond to technical skills
Others encode experience level
Some capture domain (backend vs frontend)
Others represent soft skills

The model learned these features from training on millions of sentence pairs.

Step 4: Indexing in FAISS

# Create searchable index
vectorstore = FAISS.from_documents(
    documents=docs,      # List of Document objects
    embedding=embeddings # HuggingFaceEmbeddings instance
)

# FAISS builds an index structure for fast retrieval
print(f"Indexed {vectorstore.index.ntotal} vectors")

Semantic Similarity vs Keyword Matching

Example Query: “Students with API development experience”

Keyword Matching Results

# Traditional approach
keywords = ["API", "development", "experience"]
matches = []

for cv in cvs:
    score = sum(1 for kw in keywords if kw in cv)
    matches.append((cv, score))

Results:

CV	Contains “API”?	Contains “development”?	Score
CV_1	❌	✅	1
CV_2	✅	❌	1
CV_3	❌	❌	0

Problem: CV_3 says “Built RESTful web services with FastAPI” but scores 0 because it doesn’t contain the exact word “API development”.

Vector Search Results

# Semantic approach
query = "Students with API development experience"
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
relevant_docs = retriever.invoke(query)

Results:

CV	Similarity Score	Matched Text
CV_3	0.89	”Built RESTful web services with FastAPI”
CV_2	0.85	”Created API endpoints for financial management”
CV_1	0.72	”Developed backend using Spring Boot”

Success: CV_3 ranks highest because the embedding understands that “RESTful web services” is semantically equivalent to “API development”.

How FAISS Performs Fast Similarity Search

The Math: Cosine Similarity

FAISS uses cosine similarity to measure how “close” two vectors are:

import numpy as np

def cosine_similarity(vec_a, vec_b):
    """Compute similarity between two vectors"""
    dot_product = np.dot(vec_a, vec_b)
    norm_a = np.linalg.norm(vec_a)
    norm_b = np.linalg.norm(vec_b)
    return dot_product / (norm_a * norm_b)

# Example
vec_query = embeddings.embed_query("Python developer")
vec_doc = embeddings.embed_query("Experienced in Python and Django")

similarity = cosine_similarity(vec_query, vec_doc)
print(f"Similarity: {similarity:.3f}")  # Output: 0.834 (high similarity)

Range: -1 (opposite) to +1 (identical)

FAISS Index Types

FAISS offers multiple index types for different use cases:

# Default: Flat index (exact search, slower but accurate)
vectorstore = FAISS.from_documents(docs, embeddings)

# For larger datasets, you can use approximate search:
import faiss

# Create IVF index (faster, slight accuracy tradeoff)
index = faiss.IndexIVFFlat(
    quantizer=faiss.IndexFlatL2(384),  # 384 dimensions
    d=384,
    nlist=100  # Number of clusters
)

Index Types Comparison

Index Type	Speed	Accuracy	Best For
Flat	Slow	100%	< 10k vectors
IVF	Fast	~95%	10k - 1M vectors
HNSW	Very Fast	~99%	> 1M vectors

The system uses Flat index since we’re dealing with small CV databases (< 1000 candidates).

Real Code Example: Complete Vectorization Pipeline

Here’s the actual implementation from the notebook:

# Source: notebook/Talent_Scout_3000x.ipynb (Cell 3)
import random
import os
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings

# 1. SETUP
carpeta_fuente = "cvs_estudiantes_final"
archivos_disponibles = os.listdir(carpeta_fuente)
archivo_elegido = random.choice(archivos_disponibles)
ruta_archivo = f"{carpeta_fuente}/{archivo_elegido}"

print(f"📂 Selected CV: {archivo_elegido}")
print("⏳ Reading PDF and creating vectors...")

# 2. LOAD PDF
loader = PyPDFLoader(ruta_archivo)
docs = loader.load()

print(f"Loaded {len(docs)} pages")
print(f"First 100 chars: {docs[0].page_content[:100]}")

# 3. CREATE EMBEDDINGS
embeddings = HuggingFaceEmbeddings()

# Test single embedding
test_text = "Python developer with FastAPI experience"
test_vector = embeddings.embed_query(test_text)
print(f"Vector dimensions: {len(test_vector)}")

# 4. BUILD VECTOR STORE
vectorstore = FAISS.from_documents(docs, embeddings)
print(f"Indexed {vectorstore.index.ntotal} document chunks")

# 5. CREATE RETRIEVER
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}  # Return top 3 matches
)

# 6. TEST RETRIEVAL
query = "¿Qué proyectos técnicos ha desarrollado?"
relevant_chunks = retriever.invoke(query)

for i, chunk in enumerate(relevant_chunks):
    print(f"\nMatch {i+1}:")
    print(chunk.page_content[:200])

Output:

📂 Selected CV: CV_Estudiante_4_Fernanda_Paredes.pdf
⏳ Reading PDF and creating vectors...
Loaded 1 pages
First 100 chars: FERNANDA PAREDES
Data Analyst Trainee
[email protected] | +51 912 345 678
Vector dimensions: 384
Indexed 1 document chunks

Match 1:
PROYECTOS Y EXPERIENCIA
Data Analyst Trainee | Proyecto Académico (UTP)
Jun 2025 - Feb 2026
• Primer puesto en Hackathon universitaria desarrollando una app de reciclaje.
Tech: Python, PowerBI, Java, Spring Boot

Advanced Retrieval: Configuring Search Parameters

Search Type Options

# 1. Similarity Search (default)
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}  # Top 5 results
)

# 2. Max Marginal Relevance (diverse results)
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 5,
        "fetch_k": 20,    # Fetch 20, return diverse 5
        "lambda_mult": 0.5  # Diversity vs relevance tradeoff
    }
)

# 3. Similarity with Score Threshold
retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={
        "score_threshold": 0.75  # Only return matches > 0.75 similarity
    }
)

When to Use Each Type

Similarity Search

Use when: You want the most relevant results, even if they’re similar to each otherExample: Finding all students who know Python

Max Marginal Relevance (MMR)

Use when: You want diverse results that cover different aspectsExample: Finding students with varied tech stacks (backend, frontend, data)

Score Threshold

Use when: You only want high-confidence matchesExample: Finding candidates who are strong matches for senior positions

Practical Example: Multi-CV Search

Scaling up to search across multiple CVs:

import glob
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS

# 1. LOAD ALL CVs
all_docs = []
cv_files = glob.glob("cvs_estudiantes_final/*.pdf")

print(f"Loading {len(cv_files)} CVs...")

for cv_path in cv_files:
    loader = PyPDFLoader(cv_path)
    docs = loader.load()
    
    # Add source metadata
    for doc in docs:
        doc.metadata["source"] = cv_path.split("/")[-1]
    
    all_docs.extend(docs)

print(f"Total documents: {len(all_docs)}")

# 2. CREATE UNIFIED VECTOR STORE
vectorstore = FAISS.from_documents(all_docs, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# 3. SEMANTIC SEARCH
query = "Estudiantes con experiencia en desarrollo de APIs RESTful"
results = retriever.invoke(query)

# 4. DISPLAY RESULTS
for result in results:
    print(f"\nSource: {result.metadata['source']}")
    print(f"Content: {result.page_content[:150]}...")

Output:

Loading 5 CVs...
Total documents: 5

Source: CV_Estudiante_4_Fernanda_Paredes.pdf
Content: • Creación de una API RESTful para gestión financiera usando Python y FastAPI...

Source: CV_Estudiante_2_Ximena_Rios.pdf
Content: • Automatización de reportes en Excel usando scripts de Python y Pandas...

Source: CV_Estudiante_3_Nicolas_Paredes.pdf
Content: • Implementación de base de datos relacional normalizada para e-commerce...

Understanding Semantic Search Power

Query: “Students who have built web applications”

What the system finds:

CV Text	Why It Matches
”Desarrollo de un Sistema de Biblioteca Virtual"	"Sistema” → “application”, “Virtual” → “web"
"Creación de una API RESTful”	APIs are components of web applications
”Implementación de e-commerce ficticio”	E-commerce is explicitly a web application
”App de reciclaje"	"App” → “application”

Key Insight: The system understands that:

“Sistema” (Spanish) = “System” (English)
“API” is related to “web application”
“E-commerce” implies web development

This is impossible with keyword matching.

Embeddings Deep Dive: Visualizing Semantic Space

While we can’t visualize 384 dimensions, we can project to 2D to understand clustering:

import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Generate embeddings for different skill descriptions
texts = [
    "Python backend developer",
    "Java Spring Boot engineer",
    "Frontend React developer",
    "Vue.js UI specialist",
    "Data scientist with Python",
    "Machine learning engineer"
]

vectors = [embeddings.embed_query(t) for t in texts]
vectors_array = np.array(vectors)

# Reduce to 2D for visualization
pca = PCA(n_components=2)
vectors_2d = pca.fit_transform(vectors_array)

# Plot
plt.figure(figsize=(10, 6))
plt.scatter(vectors_2d[:, 0], vectors_2d[:, 1])

for i, txt in enumerate(texts):
    plt.annotate(txt, (vectors_2d[i, 0], vectors_2d[i, 1]))

plt.title("Semantic Space: Tech Skills")
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.show()

Expected clustering:

Backend skills (Python, Java) cluster together
Frontend skills (React, Vue) cluster together
Data science forms its own cluster

Performance Considerations

Embedding Generation Speed

import time

# Benchmark single embedding
start = time.time()
vector = embeddings.embed_query("Test query")
end = time.time()

print(f"Single embedding: {(end - start) * 1000:.2f}ms")
# Typical: 10-50ms on CPU

# Benchmark batch embeddings
texts = [f"Document {i}" for i in range(100)]

start = time.time()
vectors = embeddings.embed_documents(texts)
end = time.time()

print(f"Batch 100 embeddings: {(end - start):.2f}s")
print(f"Per document: {(end - start) / 100 * 1000:.2f}ms")
# Batching is ~2x faster than individual calls

FAISS Search Speed

import time

# Index 1000 documents
print(f"Index size: {vectorstore.index.ntotal} vectors")

# Benchmark search
query = "Python developer"

start = time.time()
results = retriever.invoke(query)
end = time.time()

print(f"Search time: {(end - start) * 1000:.2f}ms")
# Typical: < 5ms for < 10k vectors

Scaling Guidelines

Database Size	Index Type	Expected Search Time
< 10k vectors	Flat	< 10ms
10k - 100k	IVF	< 50ms
100k - 1M	HNSW	< 100ms
> 1M	IVF + PQ	< 200ms

For recruitment (typically < 10k CVs), Flat index is optimal.

Saving and Loading Vector Stores

FAISS indexes can be persisted to disk:

# Save to disk
vectorstore.save_local("cv_index")

# Load from disk (much faster than re-indexing)
from langchain_community.vectorstores import FAISS

loaded_vectorstore = FAISS.load_local(
    "cv_index",
    embeddings,
    allow_dangerous_deserialization=True  # Required for pickle loading
)

retriever = loaded_vectorstore.as_retriever()

Benefits:

⚡ Skip re-embedding (saves minutes for large datasets)
💾 Persist candidate database between sessions
🔄 Version control your index snapshots

Integration with RAG Pipeline

Vector search is the retrieval step in RAG:

from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# 1. RETRIEVER (Vector Search)
retriever = vectorstore.as_retriever()

# 2. PROMPT
template = """
Based on these CV sections:
{context}

Answer: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

# 3. COMPLETE CHAIN
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# 4. EXECUTE
response = chain.invoke("What tech stack does this candidate have?")

Flow:

retriever performs vector search to find relevant CV sections
prompt injects those sections as context
llm generates analysis based on retrieved context

Key Takeaways

Semantic Understanding

Vector search understands meaning, not just keywords

Multilingual Support

Works across English and Spanish seamlessly

Fast & Scalable

FAISS handles thousands of CVs with sub-second queries

Context-Aware

Distinguishes between “Learning Python” and “Expert in Python”

Next Steps

RAG Architecture

See how vector search fits into the complete RAG pipeline

Reverse Matching

Learn how semantic search enables potential-based hiring

Get Started

Core Concepts

Guides

​The Problem with Keyword Search

​Vector Search: A Semantic Revolution

​How It Works

​The Technology Stack

​HuggingFace Embeddings

Model Size

Vector Dimensions

Multilingual

Local Execution

​FAISS (Facebook AI Similarity Search)

​From Text to Vectors: The Embedding Process

​Step 1: Document Loading

​Step 2: Text Chunking

​Step 3: Vectorization

​Step 4: Indexing in FAISS

​Semantic Similarity vs Keyword Matching

​Example Query: “Students with API development experience”

​Keyword Matching Results

​Vector Search Results

​How FAISS Performs Fast Similarity Search

​The Math: Cosine Similarity

​FAISS Index Types

​Real Code Example: Complete Vectorization Pipeline

​Advanced Retrieval: Configuring Search Parameters

​Search Type Options

​When to Use Each Type

​Practical Example: Multi-CV Search

​Understanding Semantic Search Power

​Query: “Students who have built web applications”

​Embeddings Deep Dive: Visualizing Semantic Space

​Performance Considerations

​Embedding Generation Speed

​FAISS Search Speed

​Saving and Loading Vector Stores

​Integration with RAG Pipeline

​Key Takeaways

Semantic Understanding

Multilingual Support

Fast & Scalable

Context-Aware

​Next Steps

RAG Architecture

Reverse Matching

Build docs developers (and LLMs) love

The Problem with Keyword Search

Vector Search: A Semantic Revolution

How It Works

The Technology Stack

HuggingFace Embeddings

FAISS (Facebook AI Similarity Search)

From Text to Vectors: The Embedding Process

Step 1: Document Loading

Step 2: Text Chunking

Step 3: Vectorization

Step 4: Indexing in FAISS

Semantic Similarity vs Keyword Matching

Example Query: “Students with API development experience”

Keyword Matching Results

Vector Search Results

How FAISS Performs Fast Similarity Search

The Math: Cosine Similarity

FAISS Index Types

Real Code Example: Complete Vectorization Pipeline

Advanced Retrieval: Configuring Search Parameters

Search Type Options

When to Use Each Type

Practical Example: Multi-CV Search

Understanding Semantic Search Power

Query: “Students who have built web applications”

Embeddings Deep Dive: Visualizing Semantic Space

Performance Considerations

Embedding Generation Speed

FAISS Search Speed

Saving and Loading Vector Stores

Integration with RAG Pipeline

Key Takeaways

Next Steps