We ran 72 hours of benchmarks on Pinecone, Weaviate, and Qdrant using a 10M-vector dataset sourced from a production RAG deployment in the financial services sector. The vectors are 1536-dimensional OpenAI embeddings. The results surprised us in several ways — particularly around cost at scale and the developer experience gap, which is larger than the marketing materials suggest.
Benchmark Methodology
Hardware: identical c6i.8xlarge instances (32 vCPU, 64GB RAM) on AWS us-east-1 for all self-hosted configurations. Pinecone was tested on its serverless tier (no instance choice) and pod-based p2.x2 tier. All indexes used HNSW with ef_construction=200 and m=16 — the standard production configuration. Workloads: single-vector ANN queries, metadata-filtered queries (filtering to roughly 10% of the dataset), and hybrid queries (dense + sparse BM25).
We tested three query concurrency levels: 10, 100, and 500 simultaneous queries, representing light, moderate, and heavy production load patterns. Recall accuracy was measured against a brute-force ground truth computed offline. We report p50 and p95 latency for each scenario, along with monthly cost at 10M vectors with 1M queries per day.
Performance at 10M Vectors
Qdrant delivered the best raw QPS at all concurrency levels. At 100 concurrent queries, Qdrant achieved 3,200 QPS versus Weaviate's 2,100 and Pinecone serverless's 1,400. P95 latency told a similar story: Qdrant at 12ms, Weaviate at 19ms, Pinecone serverless at 34ms. Pinecone's pod-based tier (p2.x2) performed comparably to Weaviate, which makes sense given similar hardware.
Filtered queries changed the picture significantly. Weaviate's inverted index integration made filtered hybrid queries substantially faster — 1,800 QPS versus Qdrant's 1,200 QPS at 100 concurrency. For workloads that are primarily keyword-filtered semantic search, Weaviate's architecture is a genuine advantage. Pinecone's metadata filtering performance was the weakest of the three at high cardinality filter conditions.
- →Qdrant: highest raw QPS and lowest latency for unfiltered vector search
- →Weaviate: fastest metadata-filtered and hybrid search due to inverted index
- →Pinecone serverless: highest latency but lowest operational overhead
- →All three achieved >95% recall@10 at standard HNSW configurations
Cost Analysis
At 10M vectors with 1M queries/day, the monthly cost breakdown: Qdrant self-hosted on r6i.2xlarge (single node, sufficient for this scale): $370/month in EC2 costs. Weaviate self-hosted equivalently: $370/month. Weaviate Cloud (managed): $680/month. Pinecone serverless: $430/month. Pinecone pod-based (p2.x2): $900/month.
The self-hosted options are meaningfully cheaper at scale but carry operational overhead: your team is responsible for upgrades, backup, and incident response. For teams with strong infrastructure capability, Qdrant self-hosted is the most cost-effective choice by a significant margin. For teams that need managed service simplicity, Weaviate Cloud offers the best performance-per-dollar in the managed tier.
Developer Experience and Our Recommendation
Developer experience is where Pinecone wins clearly. The SDK is polished, documentation is comprehensive with real code examples, and the serverless tier requires zero infrastructure decisions to get started. Weaviate's GraphQL query interface is powerful but has a learning curve — the schema definition and query syntax take time to internalize. Qdrant's Python SDK is excellent, but the documentation for advanced configurations (sparse vectors, quantization, collection aliases) required digging into GitHub issues to supplement the official docs.
Our recommendation by use case: Qdrant for teams with infrastructure capability who need maximum performance and cost efficiency at scale. Weaviate for workloads that are primarily metadata-filtered or require tight integration with its generative AI modules. Pinecone serverless for prototyping, teams without dedicated infrastructure, or use cases under 10M vectors where managed simplicity is worth the premium.
Conclusion
The vector database market has matured to the point where all three major players are production-ready. The choice between them is no longer about whether they work — it's about which trade-offs match your team's constraints. Performance, cost, and developer experience point to different winners. Define your primary constraint first, then choose accordingly.
Sarah Chen
Head of AI