Skip to main content
Dense vectors are fixed-length arrays of floating-point numbers that represent semantic meaning. They’re ideal for capturing similarity relationships in text, images, and other data.

Understanding Dense Vectors

Dense vectors contain values at every dimension, unlike sparse vectors which only store non-zero values. Common use cases include:
  • Semantic text search using models like BERT, OpenAI embeddings
  • Image similarity search using vision models
  • Multimodal search combining text and images
  • Recommendation systems based on user/item embeddings

Creating a Dense Vector Schema

1

Define the Vector Field

Create a VectorSchema with your desired dimensions and data type:
from zvec import VectorSchema, DataType, HnswIndexParam

# 768-dimensional dense vector with HNSW index
vector_field = VectorSchema(
    name="embedding",
    data_type=DataType.VECTOR_FP32,
    dimension=768,
    index_param=HnswIndexParam(
        ef_construction=200,
        m=16
    )
)
2

Add to Collection Schema

Combine with scalar fields in a complete schema:
from zvec import CollectionSchema, FieldSchema

schema = CollectionSchema(
    name="documents",
    fields=[
        FieldSchema("id", DataType.INT64),
        FieldSchema("title", DataType.STRING)
    ],
    vectors=vector_field
)
3

Create the Collection

import zvec

zvec.init()
collection = zvec.create_and_open(
    path="./my_collection",
    schema=schema
)

Inserting Dense Vectors

Single Document

from zvec import Doc

doc = Doc(
    id="doc_001",
    fields={
        "id": 1,
        "title": "Introduction to Vector Search"
    },
    vectors={
        "embedding": [0.1, 0.2, 0.3, ...]  # 768 dimensions
    }
)

result = collection.insert(doc)
if result.ok():
    print("Document inserted successfully")

Batch Insert

docs = [
    Doc(
        id=f"doc_{i:03d}",
        fields={"id": i, "title": f"Document {i}"},
        vectors={"embedding": [i * 0.1] * 768}
    )
    for i in range(1, 101)
]

results = collection.insert(docs)
print(f"Inserted {sum(r.ok() for r in results)} documents")

Querying Dense Vectors

By Vector Similarity

Search for the most similar documents using a query vector:
from zvec import VectorQuery

# Your query embedding (from same model as indexed vectors)
query_vector = [0.15, 0.25, 0.35, ...]  # 768 dims

results = collection.query(
    VectorQuery(
        field_name="embedding",
        vector=query_vector
    ),
    topk=10
)

for doc in results:
    print(f"{doc.id}: {doc.field('title')} (score: {doc.score})")

By Document ID

Find similar documents to an existing document:
# Use a document's vector as the query
results = collection.query(
    VectorQuery(
        field_name="embedding",
        id="doc_001"  # Use this document's vector
    ),
    topk=10
)

With Filters

Combine vector search with scalar filters:
results = collection.query(
    VectorQuery(
        field_name="embedding",
        vector=query_vector
    ),
    filter="id > 50 and id < 100",
    topk=10
)

Choosing Vector Dimensions

Match your dimension size to your embedding model’s output:
  • OpenAI text-embedding-3-small: 1536 dimensions
  • OpenAI text-embedding-3-large: 3072 dimensions
  • all-MiniLM-L6-v2: 384 dimensions
  • BERT-base: 768 dimensions

Dimension Trade-offs

DimensionsStorageSpeedQuality
128-384LowFastGood for general use
512-768MediumMediumBetter semantic capture
1024+HighSlowerBest quality, domain-specific

Data Type Options

# Full precision (recommended)
VectorSchema("emb", DataType.VECTOR_FP32, dimension=768)

# Half precision (2x storage savings, slight quality loss)
VectorSchema("emb", DataType.VECTOR_FP16, dimension=768)

# Quantized (4x storage savings, faster search)
VectorSchema("emb", DataType.VECTOR_INT8, dimension=768)
INT8 quantization reduces storage and speeds up search but may decrease recall by 1-3%. Test with your data before production use.

Index Configuration

Balances speed and accuracy for most use cases:
from zvec import HnswIndexParam

index_param = HnswIndexParam(
    ef_construction=200,  # Higher = better quality, slower build
    m=16                  # Higher = better recall, more memory
)

IVF for Large-Scale

Best for collections with millions of vectors:
from zvec import IVFIndexParam

index_param = IVFIndexParam(
    nlist=1000,          # Number of clusters
    nprobe=10            # Search clusters (query time)
)
Guaranteed exact results, best for small collections:
from zvec import FlatIndexParam

index_param = FlatIndexParam()  # No parameters needed

Best Practices

1

Normalize Your Vectors

For cosine similarity, normalize vectors to unit length:
import numpy as np

def normalize(vector):
    norm = np.linalg.norm(vector)
    return (vector / norm).tolist()

normalized = normalize(embedding)
2

Use Consistent Models

Always use the same embedding model for indexing and querying. Mixing models (e.g., OpenAI for indexing, BERT for querying) will give poor results.
3

Batch Operations

Insert documents in batches of 100-1000 for best performance:
batch_size = 500
for i in range(0, len(all_docs), batch_size):
    batch = all_docs[i:i + batch_size]
    collection.insert(batch)
4

Monitor Index Quality

Check index completeness after insertions:
stats = collection.stats
print(f"Documents: {stats.doc_count}")
print(f"Index completeness: {stats.index_completeness}")

Common Patterns

Multi-Vector Fields

Store different embedding types in one collection:
schema = CollectionSchema(
    name="multimodal",
    fields=[FieldSchema("id", DataType.INT64)],
    vectors=[
        VectorSchema("text_emb", DataType.VECTOR_FP32, dimension=768),
        VectorSchema("image_emb", DataType.VECTOR_FP32, dimension=512)
    ]
)
See the Hybrid Search guide for querying multiple vector fields.

Vector Updates

Update a document’s vector:
updated_doc = Doc(
    id="doc_001",
    vectors={"embedding": new_embedding}
)

result = collection.update(updated_doc)

Next Steps

Build docs developers (and LLMs) love