Performance Tuning

Index Selection Guidelines

GigaVector supports multiple index types, each optimized for different scenarios:

KD-Tree (`GV_INDEX_TYPE_KDTREE`)

Best for:

Small to medium datasets (< 1M vectors)
Exact nearest neighbor search
Low-dimensional vectors (< 100 dimensions)
Static or rarely updated datasets
Memory-constrained environments

Characteristics:

O(log n) search time for balanced trees
Exact results (no approximation)
Memory efficient (Structure-of-Arrays layout)
Fast insertion for small datasets
Tree rebalancing may be needed for large datasets

When to avoid:

High-dimensional vectors (> 100 dimensions) - curse of dimensionality
Very large datasets (> 1M vectors) - tree depth becomes problematic
Frequent updates - tree rebalancing overhead

HNSW (`GV_INDEX_TYPE_HNSW`)

Best for:

Large datasets (millions to billions of vectors)
High-dimensional vectors (100+ dimensions)
Approximate nearest neighbor search with high recall
Dynamic datasets with frequent insertions
Production systems requiring sub-millisecond search latency

Characteristics:

O(log n) search time with high recall
Hierarchical graph structure
Excellent scalability
Supports binary quantization for faster candidate selection
ACORN-style exploration for filtered search

When to avoid:

Very small datasets (< 10K vectors) - overhead not justified
Exact search requirements - HNSW is approximate
Memory-constrained environments - higher memory overhead

IVFPQ (`GV_INDEX_TYPE_IVFPQ`)

Best for:

Very large datasets (millions+ vectors)
High-dimensional vectors (64+ dimensions)
Memory-constrained environments
Batch search scenarios
Production systems with trained models

Characteristics:

Product Quantization for memory efficiency
Inverted File index for fast candidate retrieval
Requires training phase before use
Supports scalar quantization for additional compression
Configurable trade-off between accuracy and speed

When to avoid:

Small datasets (< 100K vectors) - training overhead not justified
Low-dimensional vectors (< 32 dimensions) - PQ overhead
Exact search requirements - IVFPQ is approximate
Frequently changing data - requires retraining

Sparse Index (`GV_INDEX_TYPE_SPARSE`)

Best for:

Sparse vectors (most dimensions are zero)
Text embeddings and bag-of-words representations
High-dimensional sparse data

Characteristics:

Optimized for sparse vector operations
Memory efficient for sparse data
Fast search for sparse queries

Parameter Tuning Recommendations

HNSW Parameters

`M` (Number of connections per node)

Default: 16
Range: 4-64
Higher values:
- Better recall and search quality
- Higher memory usage
- Slower insertion time
Lower values:
- Faster insertion
- Lower memory usage
- Potentially lower recall
Recommendation: Start with 16, increase to 32-48 for higher quality, decrease to 8-12 for faster insertion

`efConstruction` (Candidate list size during construction)

Default: 200
Range: 50-500
Higher values:
- Better index quality and recall
- Slower insertion time
- Higher memory during construction
Lower values:
- Faster insertion
- Potentially lower recall
Recommendation: Use 200-400 for high quality, 100-200 for faster insertion

`efSearch` (Candidate list size during search)

Default: 50
Range: k to 500
Higher values:
- Better recall
- Slower search time
Lower values:
- Faster search
- Potentially lower recall
Recommendation: Set to 2-4x your typical k value. For k=10, use efSearch=20-40

Binary Quantization

use_binary_quant: Enable for 2-3x faster search with minimal recall loss
quant_rerank: Number of candidates to rerank with exact distance
- Set to 2-3x your k value
- 0 disables reranking (fastest, lower accuracy)
- Higher values improve accuracy at cost of speed

ACORN Exploration (for filtered search)

use_acorn: Enable for better recall in filtered searches
acorn_hops: Exploration depth (1-2 recommended)
- 1 hop: Fast, good recall
- 2 hops: Better recall, slower

IVFPQ Parameters

`nlist` (Number of coarse centroids)

Default: 64
Range: 16-4096
Higher values:
- Better accuracy
- More memory for centroids
- Slower training
Lower values:
- Faster training and search
- Lower memory
- Potentially lower accuracy
Recommendation:
- Small datasets (< 1M): 64-256
- Medium datasets (1M-10M): 256-1024
- Large datasets (> 10M): 1024-4096
- Rule of thumb: sqrt(N) where N is dataset size

`m` (Number of subquantizers)

Default: 8
Range: Must divide dimension evenly
Higher values:
- Better accuracy
- More memory for codebooks
- Slightly slower search
Lower values:
- Faster search
- Lower memory
- Potentially lower accuracy
Recommendation:
- Dimension 64: m=8 or m=16
- Dimension 128: m=8, m=16, or m=32
- Dimension 256: m=16, m=32, or m=64
- Must evenly divide dimension

`nbits` (Bits per subquantizer code)

Default: 8
Range: 4-8
8 bits: Standard, good accuracy
4 bits: 2x memory reduction, lower accuracy
Recommendation: Use 8 bits unless memory is critical

`nprobe` (Lists to probe during search)

Default: 4
Range: 1 to nlist
Higher values:
- Better recall
- Slower search
Lower values:
- Faster search
- Potentially lower recall
Recommendation:
- Start with nlist/16 to nlist/8
- Increase for better recall
- Decrease for faster search

`default_rerank` (Rerank pool size)

Default: 32
Range: 0 (disabled) to several hundred
Higher values:
- Better accuracy
- Slower search
0 (disabled):
- Fastest search
- Lower accuracy
Recommendation: Set to 2-4x your typical k value

Scalar Quantization

use_scalar_quant: Enable for additional 2-4x memory reduction
scalar_quant_config.bits: 4-8 bits per component
- 8 bits: Minimal accuracy loss
- 4 bits: Significant memory savings, noticeable accuracy loss
Recommendation: Enable for very large datasets where memory is critical

Oversampling Factor

oversampling_factor: Multiplier for candidate selection
Default: 1.0
Range: 1.0-3.0
Higher values: Better recall, slower search
Recommendation: Use 1.5-2.0 for high-recall scenarios

SIMD Optimization Notes

GigaVector automatically detects and uses available CPU SIMD features for optimized distance calculations.

Supported SIMD Features

SSE4.2: Basic vector operations
AVX2: 256-bit vector operations (2x speedup)
AVX-512F: 512-bit vector operations (4x speedup on supported CPUs)
FMA: Fused multiply-add operations

Compilation Flags

The build system automatically enables SIMD optimizations:

Makefile:

make lib  # Automatically uses -march=native -msse4.2 -mavx2 -mavx512f -mfma

CMake:

cmake -B build  # Automatically detects and enables available SIMD features

Runtime Detection

GigaVector detects CPU features at runtime using gv_cpu_detect_features(). The library automatically uses the best available SIMD implementation.

Performance Impact

SSE4.2: 1.5-2x speedup over scalar code
AVX2: 2-3x speedup over scalar code
AVX-512F: 3-4x speedup over scalar code (when available)

Verification

Check which SIMD features are available:

unsigned int features = gv_cpu_detect_features();
if (gv_cpu_has_feature(GV_CPU_FEATURE_AVX2)) {
    // AVX2 is available
}

Memory Usage Optimization

Structure-of-Arrays (SoA) Layout

GigaVector uses SoA storage for efficient memory access patterns:

Better cache locality
SIMD-friendly memory layout
Reduced memory fragmentation

Memory Usage Estimates

KD-Tree:

Vectors: N * dimension * sizeof(float) bytes
Tree nodes: N * (2 * sizeof(void*) + sizeof(size_t) + sizeof(float)) bytes
Total: ~N * (dimension * 4 + 32) bytes

HNSW:

Vectors: N * dimension * sizeof(float) bytes
Graph structure: N * M * sizeof(size_t) bytes (M = connections per node)
Total: ~N * (dimension * 4 + M * 8) bytes

IVFPQ:

Compressed vectors: N * m * sizeof(uint8_t) bytes (m = subquantizers)
Centroids: nlist * dimension * sizeof(float) bytes
Codebooks: m * 256 * (dimension/m) * sizeof(float) bytes
Total: ~N * m + nlist * dimension * 4 + m * 256 * (dimension/m) * 4 bytes

Memory Optimization Tips

Use IVFPQ for large datasets: 10-100x memory reduction vs. exact storage
Enable scalar quantization: Additional 2-4x memory reduction
Reduce HNSW M parameter: Lower memory usage at cost of recall
Use binary quantization: Minimal memory overhead, significant speedup
Batch operations: Process vectors in batches to control peak memory

Memory Profiling

Monitor memory usage:

// Get current memory usage
size_t memory_bytes = gv_db_get_memory_usage(db);
printf("Memory usage: %zu bytes\n", memory_bytes);

Benchmark Results and Comparisons

Typical Performance Characteristics

KD-Tree:

Search latency: 0.1-1ms (1K-100K vectors)
Insertion: 0.01-0.1ms per vector
Memory: Low overhead
Recall: 100% (exact)

HNSW:

Search latency: 0.1-5ms (1M-1B vectors)
Insertion: 0.1-1ms per vector
Memory: Moderate overhead (M * 8 bytes per vector)
Recall: 95-99% (with proper efSearch tuning)

IVFPQ:

Search latency: 0.5-10ms (1M-1B vectors)
Insertion: 0.01-0.1ms per vector
Memory: Very low overhead (m bytes per vector)
Recall: 85-95% (with proper nprobe tuning)