GPT-4o · Pinecone · tree-sitter

Ask your codebase anything.

Semantic search over yt-dlp's 120,000-line source. Ask in plain English. DevLens retrieves the exact functions and returns cited, grounded answers.

Searching codebase…

Under the hood

How DevLens works.

A three-stage pipeline from raw source code to cited, grounded answers, built on production-grade components.

Step 01

Ingest & Parse

Every Python file is parsed with tree-sitter to build a concrete syntax tree. Top-level functions and classes are extracted as self-contained semantic chunks, each with its full source and exact line range.

tree-sitter Python AST Chunking

Step 02

Embed & Index

Each chunk is encoded into a 1,536-dimensional vector using OpenAI's text-embedding-3-small. Vectors are stored in Pinecone for sub-millisecond nearest-neighbour retrieval via semantic similarity, not keyword matching.

OpenAI Embeddings Pinecone 1536-dim

Step 03

Retrieve & Answer

Your question is embedded in real time. The top‑8 most similar chunks are passed to GPT-4o as grounding context. It synthesises a precise answer backed exclusively by real source code, with file path and line number citations.

GPT-4o RAG Top-K Retrieval
Source Files
.py  120k lines
tree-sitter
AST chunks
OpenAI API
embed-3-small
Pinecone
vector store
GPT-4o
answer + cite