Repository Layout¶
Project Structure¶
PipeANN/
├── src/ # Core implementation
│ ├── index.cpp # In-memory Vamana index
│ ├── ssd_index.cpp # On-disk index (load/save, graph mgmt, entry points)
│ ├── search/ # Search algorithms
│ │ ├── pipe_search.cpp # 🌟 PipeANN pipelined search
│ │ ├── pipe_search_common.h # Shared helpers for pipelined search
│ │ ├── spec_filter_search.cpp # 🌟 Speculative filtered search
│ │ ├── beam_search.cpp # DiskANN best-first search
│ │ ├── page_search.cpp # Starling page-based search
│ │ └── coro_search.cpp # Coroutine-based multi-query search
│ ├── update/ # Update operations
│ │ ├── direct_insert.cpp # 🌟 OdinANN direct insert
│ │ └── delete_merge.cpp # Delete and merge logic
│ ├── python/ # Python binding sources
│ │ ├── pybind.cpp # pybind11 module definitions
│ │ └── pyindex.cpp # IndexPipeANN C++ wrapper impl
│ └── utils/ # Utilities
│ ├── distance.cpp # Distance computation (L2/IP/cosine)
│ ├── linux_aligned_file_reader.cpp # io_uring/AIO support
│ ├── index_build_utils.cpp # Shared build helpers
│ ├── kmeans_utils.cpp # K-means for PQ codebooks
│ ├── partition.cpp # Dataset partitioning (PiPNN)
│ └── pipnn.cpp # PiPNN graph construction
├── include/ # Header files
│ ├── index.h # In-memory index interface
│ ├── ssd_index.h # On-disk index interface
│ ├── ssd_index_defs.h # On-disk index constants & layout macros
│ ├── dynamic_index.h # Dynamic index wrapper (search + update)
│ ├── pyindex.h # C++ wrapper exposed to Python
│ ├── nbr/ # Neighbor storage & quantization
│ │ ├── abstract_nbr.h # Base interface for neighbor codecs
│ │ ├── pq_nbr.h / pq_table.h # Product-quantized neighbors
│ │ ├── rabitq_nbr.h / rabitq/ # RaBitQ 1-bit / multi-bit quantization
│ │ └── dummy_nbr.h # No-op codec (debug)
│ ├── filter/ # Speculative filtering
│ │ ├── attribute.h # AttrIndex (label inverted index, range index)
│ │ ├── selector.h # Selector tree (LabelOr/And, Range, And/Or/Not)
│ │ └── filter_utils.h # Shared filter helpers
│ └── utils/ # Headers for utilities, containers, logging
│ ├── pipnn.h, partition.h # PiPNN build
│ ├── page_cache.h, journal.h # SSD page cache & update journal
│ ├── lock_table.h, concurrent_queue.h # Concurrency primitives
│ └── ...
├── tests/ # Test programs & benchmarks
│ ├── build_disk_index.cpp # Build on-disk index
│ ├── build_disk_index_filtered.cpp # Build on-disk index with attributes
│ ├── build_memory_index.cpp # Build in-memory index
│ ├── search_disk_index.cpp # Search benchmark (SSD)
│ ├── search_disk_index_mem.cpp # Search benchmark (Load SSD index to RAM)
│ ├── search_disk_index_filtered.cpp # Filtered search benchmark
│ ├── test_insert_search.cpp # Insert-search benchmark
│ ├── overall_performance.cpp # Insert-delete-search benchmark (SSD)
│ ├── overall_perf_mem.cpp # Insert-delete-search benchmark (in-memory)
│ ├── pad_partition.cpp # Pad partition file (for Starling)
│ ├── normalize_data.cpp # Normalize vectors (for cosine/MIPS)
│ ├── test_cpu.cpp # CPU/SIMD sanity check
│ └── utils/ # Data utilities (vecs_to_bin, gt_update, ...)
├── tests_py/ # Python examples & tests
│ ├── collection_example.py # Collection API smoke tests
│ ├── test_filter.py # Python Selector pytest
│ ├── test_native_selector.py # Native selector composition pytest
│ ├── langchain_example.py # LangChain VectorStore smoke test
│ ├── qdrant_server_example.py # Qdrant-compatible server smoke test
│ ├── test_insert_attrs.py # Filtered build-vs-insert regression check
│ ├── test_range_search.py # Range-search smoke test
├── pipeann/ # Python package
│ ├── __init__.py # Re-export public Python API
│ ├── client.py # Client (multi-collection manager, disk auto-discovery)
│ ├── collection.py # Collection (document + metadata layer, schema.json persistence)
│ ├── filter.py # Attributes, AttrsVec, Selector
│ ├── index.py # IndexPipeANN, Metric, VALID_DATA_TYPES
│ ├── langchain.py # LangChain VectorStore integration
│ └── qdrant_server.py # Qdrant-compatible HTTP server for Open WebUI
├── scripts/ # Evaluation scripts
└── third_party/ # Dependencies (liburing)
Evaluation Scripts¶
The scripts/ directory reproduces the figures in our papers. Before running
anything, point the scripts at your data — they assume the dataset and index
layout below. Edit the hard-coded paths in eval_f.sh / fig*.sh or create
symlinks if your environment differs.
Expected Dataset & Index Layout¶
/mnt/nvme/data/ # Dataset Directory
├── bigann/
│ ├── 100M.bbin # SIFT100M dataset
│ ├── 100M_gt.bin # SIFT100M ground truth
│ ├── truth.bin # SIFT1B ground truth
│ ├── bigann_200M.bbin # SIFT200M (for updates)
│ └── bigann_query.bbin # SIFT query
├── deep/
│ ├── 100M.fbin # DEEP100M dataset
│ ├── 100M_gt.bin # DEEP100M ground truth
│ └── queries.fbin # DEEP query
└── SPACEV1B/
├── 100M.bin # SPACEV100M dataset
├── 100M_gt.bin # SPACEV100M ground truth
├── query.bin # SPACEV query
└── truth.bin # SPACEV1B ground truth
/mnt/nvme2/indices/ # Search-Only Indexes
├── bigann/100m # SIFT100M index prefix
├── deep/100M # DEEP100M index prefix
└── spacev/100M # SPACEV100M index prefix
/mnt/nvme/indices_upd/ # Search-Update Indexes
├── bigann/100M # SIFT100M index for updates
├── bigann_gnd_insert/ # GT for insert-search workload
└── bigann_gnd/ # GT for insert-delete-search workload
Script Layout¶
scripts/
├── tests-pipeann/ # PipeANN (OSDI'25) evaluation
│ ├── hello_world.sh # Quick functionality test
│ ├── fig11.sh ~ fig18.sh # Paper figure reproduction
│ ├── plotting.py # Generate figures
│ └── plotting.ipynb # Jupyter notebook for plotting
├── tests-odinann/ # OdinANN (FAST'26) evaluation
│ ├── hello_world.sh # Quick functionality test
│ ├── fig6.sh ~ fig12.sh # Paper figure reproduction
│ └── plotting.ipynb # Jupyter notebook for plotting
├── run_all_pipeann.sh # Run all PipeANN experiments
└── validate_index_structure.py # Index validation tool
Running the Scripts¶
Hello World (verify installation):
# PipeANN search-only test (~1 min)
bash scripts/tests-pipeann/hello_world.sh
# OdinANN update test (~1 min)
bash scripts/tests-odinann/hello_world.sh
Individual experiments:
- PipeANN (Search-Only):
fig11.sh: Latency vs Recall (100M datasets)fig12.sh: Throughput vs Recall (100M datasets)fig13.sh: Latency breakdownfig14.sh~fig18.sh: Ablation, scalability, etc.
- OdinANN (Search-Update):
fig6.sh: Insert-search on SIFT100M (~4d)fig7.sh: Insert-search on DEEP100M (~4d)fig8.sh: Insert-search on SIFT1B (~8d)fig12.sh: Insert-delete-search (~6d)
Plot results: