// llms · rag · forward-deployed engineering

Pradeep.

I turn research-grade AI into systems that work in the real world.

From research
to production

I specialize in turning research-grade AI into production systems. My work spans the full stack — from writing PyTorch training loops to deploying RAG pipelines inside startups and enterprise accounts and sitting with the users who depend on them.

What interests me is the gap between what LLMs can do in a notebook and what they can reliably do in production. Closing that gap requires engineering rigor, product intuition, and the willingness to stay close to the problem long enough to understand it.

Currently focused on

  • Production RAG systems at enterprise scale
  • Efficient fine-tuning: LoRA, QLoRA, PEFT methods
  • LLM evaluation frameworks and reliability
  • Forward-deployed product strategy for AI companies

Where I've worked

Selected work

Retrieval OS

A modular retrieval-augmented generation platform built to handle 10M+ tokens/day across heterogeneous document types. Designed with pluggable embedding backends, hybrid BM25+vector search, and a streaming inference layer. Deployed into 8 Fortune 500 accounts with <200ms P99 latency.

PyTorch LangChain Pinecone FastAPI TypeScript
retriever.py
# Hybrid retrieval with re-ranking
retriever = HybridRetriever(
  dense=EmbeddingIndex(model="text-embedding-3-large"),
  sparse=BM25Index(corpus=documents),
  reranker=CrossEncoderReranker(top_k=5),
)

results = await retriever.query(
  text=query,
  filters={"tenant_id": tenant},
  alpha=0.7,  # dense weight
)

FineTune Bench

Reproducible fine-tuning evaluation framework for task-specific LLMs.

PyTorch HuggingFace LoRA W&B Python

LLM Proxy

Lightweight gateway for multi-provider LLM routing with cost tracking.

Go Redis PostgreSQL Docker
PyTorch Training Recipes
GitHub →

Thoughts

2024 / 11

The hidden cost of RAG: why retrieval quality matters more than generation

Most RAG failures happen before the model sees a single token. Here is how to diagnose and fix retrieval quality at scale.

8 min read
2024 / 08

Forward-deployed engineering: what product managers get wrong about ML

Building AI into enterprise products requires more than good models. It requires sitting with the user until you understand what "good" actually means.

6 min read
2024 / 05

LoRA at scale: lessons from fine-tuning 40+ domain-specific models

Practical observations on rank selection, learning rate schedules, and when PEFT methods stop being enough.

10 min read
2024 / 02

Why I stopped using vector databases for most RAG workloads

The operational overhead of a dedicated vector store is rarely justified. A case for hybrid approaches with Postgres.

7 min read
2023 / 11

Building LLM products: a reading list for engineers moving into PM

The books, essays, and talks that shifted how I think about the product side of AI systems.

4 min read

Say hello