
Eat Your Own Dog Food

Daniel Lopes

Modern applications need to find similar content: whether that's matching user queries to relevant documents, recommending products, or powering AI features like semantic search. The technology behind this (called vector databases) forces developers into an uncomfortable choice: pay premium prices for managed solutions, or manage complex self-hosted systems. A serverless vector database that eliminates infrastructure overhead while delivering competitive performance at a fraction of the cost changes this equation entirely.
Before diving into Turbopuffer's specific approach, it's important to understand why developers should care about vector databases. When you need to find "similar" content, like matching a user's question to relevant documentation, powering recommendation engines, or enabling AI chatbots to reference relevant context, traditional keyword search falls short. Vector databases convert content into mathematical representations (vectors) that capture semantic meaning, enabling applications to find truly relevant results rather than just keyword matches. This capability has become essential for everything from AI-powered search to recommendation systems to Retrieval-Augmented Generation (RAG) applications.
Traditional vector databases force you to choose between convenience and cost. Managed services offer simplicity but at premium pricing, while self-hosted solutions require significant operational overhead. You can use Turbopuffer to orchestrate vector search operations that deliver 10x cost reductions compared to alternatives.
Turbopuffer is a serverless vector database that combines the operational simplicity of managed services with the cost efficiency of self-hosted solutions. The system splits data storage from processing power, letting each scale independently: a hybrid architecture that separates storage from compute, using object storage for data persistence and NVMe SSD caching for query performance.
Market Positioning
In the vector database landscape, Turbopuffer occupies a unique position as a serverless-first solution. Unlike traditional SaaS models or self-hosted approaches, Turbopuffer's architecture enables automatic scaling without capacity planning while maintaining cost predictability through usage-based billing.
Turbopuffer emerged from the recognition that existing vector database architectures weren't optimized for the elastic, cost-sensitive nature of modern AI applications. Traditional approaches tie storage and compute together, forcing you to pay for idle capacity or risk performance degradation during traffic spikes.
High-Level Architecture
The system employs a hybrid serverless design that fundamentally separates state management from compute operations:
This separation enables the system to scale compute resources independently while maintaining data durability through persistent object storage. The result is a database that can handle massive scale without requiring infrastructure management.
You can build search systems that are both fast and precise while filtering results by metadata, something many vector databases struggle with. This works through their SPFresh algorithm, a specialized indexing system designed specifically to support efficient native filtering. Unlike many vector databases that treat filtering as an afterthought, Turbopuffer's approach combines attribute and vector indexes seamlessly.
To find similar content quickly among millions of documents, you can deploy centroid-based approximate nearest neighbor indexing with hybrid index architecture that combines attribute and vector indexes for complex queries without performance degradation.
Implementation Details:
Use Case Example:
# Find similar documents with specific metadata constraints
results = ns.query(
vector=[0.1, 0.2, 0.3],
filters={'category': 'technical', 'date': {'$gte': '2024-01-01'}},
top_k=10,
include_attributes=True
)You can orchestrate vector operations through a REST-based interface with JSON encoding, supporting both basic vector operations and advanced query patterns for TypeScript AI agent frameworks.
Core Operations:
Advanced Query Capabilities:
# Multi-query with different search types
results = ns.query([
{'vector': [0.1, 0.2], 'top_k': 5},
{'query': 'machine learning', 'top_k': 10},
{'filters': {'type': 'research'}, 'top_k': 3}
])You can build AI agents with Turbopuffer using official SDKs for Python 3.8+, TypeScript/JavaScript, Go, Java, Ruby, and community-maintained Rust, with consistent APIs and strong typing support perfect for TypeScript AI agent frameworks.
Python SDK Features:
pip install turbopuffer[fast]AsyncTurbopufferimport turbopuffer as tpuf
# Initialize with automatic API key detection
client = tpuf.Turbopuffer() # Uses TURBOPUFFER_API_KEY
ns = client.namespace("documents")
# Batch upsert with attributes
ns.upsert(
ids=[1, 2, 3],
vectors=[[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]],
attributes={
"title": ["Doc 1", "Doc 2", "Doc 3"],
"category": ["tech", "business", "tech"]
}
)You can deploy vector search systems that scale automatically thanks to the platform's caching system that dramatically improves performance through intelligent NVMe SSD and memory caching, with performance characteristics showing cold queries at p90=444ms versus warm queries at p50=8ms for 1M vectors.
Cache Management:
You can deploy Turbopuffer with minimal configuration thanks to its serverless design and official SDK support across multiple programming languages, making it perfect for integrating with TypeScript AI agent frameworks.
1. Install the SDK
# Python with performance optimizations
pip install turbopuffer[fast]
# TypeScript/JavaScript
npm install @turbopuffer/turbopuffer
# Go
go get github.com/turbopuffer/turbopuffer-go2. Authentication
export TURBOPUFFER_API_KEY="your_api_key_here"3. Basic Implementation
import turbopuffer as tpuf
# Initialize client (auto-detects API key)
client = tpuf.Turbopuffer()
ns = client.namespace("quickstart")
# Add documents with vectors and metadata
ns.upsert(
vectors=[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
attributes={
"text": ["First document", "Second document"],
"category": ["demo", "example"]
}
)
# Search for similar content
results = ns.query(
vector=[0.1, 0.2, 0.3],
top_k=5,
include_attributes=True
)
for result in results:
print(f"Score: {result.similarity}, Text: {result.attributes['text']}")You can leverage several developer-friendly features that streamline the development process:
Production deployments require careful attention to performance tuning parameters and architectural decisions that can significantly impact both performance and costs when building agentic workflows.
Query Concurrency Tuning:
# Configure for high-throughput scenarios
client = tpuf.Turbopuffer(
max_concurrent_queries=16, # Default per namespace
max_wait_time_ms=800 # Maximum wait for query slot
)You can optimize cache management and indexing for better agent orchestration performance:
These optimizations enable sophisticated memory retrieval patterns for AI agents that need to access contextual information efficiently.
Namespace Organization Strategy: The most critical architectural decision involves namespace design when building TypeScript AI agent frameworks. Turbopuffer's per-namespace limits require careful planning:
# Recommended: Multiple smaller namespaces
user_docs = client.namespace(f"user_{user_id}_docs")
product_search = client.namespace("product_search_main")
# Avoid: Single large namespace approaching limits
# global_docs = client.namespace("all_documents") # Will hit 250M doc limitEfficient Filtering Patterns:
# Optimized: Specific filter patterns
results = ns.query(
vector=query_vector,
filters={'status': 'published', 'category': {'$in': ['tech', 'science']}},
top_k=20
)
# Avoid: Expensive glob patterns in middle
# filters={'title': {'$glob': '*report*'}} # Triggers full scanBatching for Discounts:
# Efficient: Batch up to 256 MB for cost savings
batch_vectors = []
batch_attributes = {}
for document in large_dataset:
batch_vectors.append(document.vector)
# ... collect batch data
if len(batch_vectors) >= 1000: # Or size-based threshold
ns.upsert(vectors=batch_vectors, attributes=batch_attributes)
batch_vectors.clear()ID Type Optimization: Choose efficient ID formats to minimize storage overhead:
user_id: int)You can see how Turbopuffer powers vector search for several high-profile applications, with the most documented success story being Cursor's implementation for AI-powered code assistance.
Cursor, the AI-powered code editor, migrated to Turbopuffer to manage billions of vectors across millions of codebases. The migration delivered:
Technical Implementation:
# Cursor's approach to namespace sharding for scale
def get_codebase_namespace(repo_id):
shard = hash(repo_id) % 10 # Distribute across 10 namespaces
return client.namespace(f"codebase_shard_{shard}")
# Index code embeddings with metadata
ns = get_codebase_namespace(repo_id)
ns.upsert(
vectors=code_embeddings,
attributes={
'file_path': file_paths,
'function_name': function_names,
'language': languages
}
)Readwise: Powers AI search features in their knowledge management platform, enabling semantic search across user-saved articles and highlights.
Enterprise Scale: The platform currently serves over 1 trillion documents across all deployments, processing 10 million+ writes per second and serving 10,000+ queries per second.
Understanding Turbopuffer's constraints is crucial for architectural planning when building AI agents, as several fundamental limits can impact scalability.
Per-Namespace Document Limits: The most significant constraint is per-namespace rather than global limits:
Document and Attribute Constraints:
# Maximum limits to consider in design
document = {
'id': 'max_128_bytes', # ID length limit
'vector': [0.1] * 10752, # Max dimensions (affects cost/latency)
'attributes': {
'large_text': 'max_8_MiB', # Per-attribute value limit
'filterable': 'max_4_KiB' # Filterable values much smaller
}
}
# Maximum 256 attributes per namespace totalCache-Dependent Performance: Cold queries can be 50x slower than warm queries (444ms vs 8ms p50), making cache warming strategies critical for consistent performance in agentic workflows.
Expensive Filter Patterns:
# Avoid: Glob patterns with wildcards in middle
{'title': {'$glob': '*report*'}} # Triggers full namespace scan
# Prefer: Prefix or suffix patterns
{'title': {'$glob': 'report*'}} # Uses index efficientlyNamespace Sharding Complexity: Large applications require manual sharding across namespaces, adding operational complexity:
# Required pattern for scale beyond single namespace limits
def get_document_namespace(doc_id):
shard = hash(doc_id) % 10
return client.namespace(f"docs_shard_{shard}")
# Must implement cross-shard querying manually
results = []
for shard in range(10):
ns = client.namespace(f"docs_shard_{shard}")
shard_results = ns.query(vector=query_vector, top_k=20)
results.extend(shard_results)Minimum Commitments: Monthly minimums don't roll over, requiring careful capacity planning despite the serverless architecture.
Filterable Attribute Overhead: Making attributes filterable costs 50% more for indexing, requiring strategic decisions about which metadata needs filtering capability.
You can deploy Turbopuffer using two deployment models to address different organizational requirements and compliance needs.
The default deployment requires minimal setup and provides immediate access to the full platform capabilities:
# Simple cloud deployment setup
export TURBOPUFFER_API_KEY="your_api_key"
pip install turbopuffer[fast]For organizations requiring data sovereignty or custom security controls, BYOC deployment supports Kubernetes environments across AWS, GCP, and Azure.
BYOC Features:
Kubernetes Configuration:
# Example Helm values for production BYOC
turbopuffer:
cache:
diskBudget: "95%"
performance:
maxConcurrentQueries: 16
maxWaitTimeMs: 800
indexing:
minUnindexed: 5000
maxUnindexed: 50000Turbopuffer's caching system provides predictable performance for frequently accessed data (p50=8ms warm queries), but cold queries can experience higher latency (p90=444ms). The system automatically warms caches based on access patterns, and production deployments should implement cache warming strategies for critical data paths in AI agent workflows.
You'll need to implement namespace sharding before reaching the limit. The recommended approach is horizontal partitioning based on logical boundaries (user IDs, content categories, time ranges). Turbopuffer is expanding limits to 1 billion documents per namespace, but architectural planning for sharding is still recommended for applications expecting massive scale.
While self-hosted solutions have lower direct costs, Turbopuffer eliminates infrastructure management, monitoring, backup, and scaling responsibilities. For variable workloads, Turbopuffer's usage-based pricing often results in lower total cost of ownership. Fixed, high-volume workloads may find self-hosted solutions more cost-effective if you have the operational expertise.
Yes, you can orchestrate data migrations using Turbopuffer's REST API and SDKs with bulk import operations. For large migrations, implement batching strategies up to 256 MB per batch to optimize costs. The migration process typically involves extracting vectors and metadata from your existing system and using Turbopuffer's upsert operations with appropriate namespace organization.
You can build hybrid search systems with Turbopuffer that natively support queries combining vector similarity with BM25 full-text search. Store your text content in attributes and use multi-query operations to combine vector search results with full-text search results. The system handles scoring and ranking across both search types automatically.
Focus on cache warming strategies, efficient namespace organization (10 namespaces of 1M documents vs 1 namespace of 10M), and avoiding expensive filter patterns. Use shorter ID formats (integers vs UUID strings), batch writes to maintain good indexing thresholds, and consider geographic deployment closer to your users.
Turbopuffer provides built-in metrics for cache hit rates, query performance, and index statistics. BYOC deployments integrate with standard Kubernetes observability platforms. Monitor cache hit rates, query concurrency levels, and indexing lag as key performance indicators for your AI agent workflows.
Namespace design is critical for both performance and cost optimization when building TypeScript AI agent frameworks. Multiple smaller namespaces (10 namespaces of 1M documents) generally perform better than single large namespaces due to indexing efficiency and cache locality. However, cross-namespace queries require application-level coordination and may increase complexity in agentic workflows.
Turbopuffer represents a fundamental shift in vector database architecture, proving that you don't have to choose between operational simplicity and cost efficiency. By separating storage from compute and implementing intelligent caching, it delivers the ease of managed services with economics that scale naturally with usage.
For teams building TypeScript AI agent frameworks, you can use Turbopuffer to orchestrate vector search with enterprise-grade performance and reliability without the enterprise-grade price tag or operational overhead. While the ecosystem remains smaller than established alternatives, the technical advantages and proven production success at companies like Cursor make it a serious contender for cost-conscious applications requiring reliable vector search capabilities in agentic workflows.
The serverless approach isn't just about reducing costs: it's about removing infrastructure decisions from the critical path of AI application development, letting teams focus on building great user experiences rather than managing database clusters.