GPT-4o · Pinecone · tree-sitter
Semantic search over yt-dlp's 120,000-line source. Ask in plain English. DevLens retrieves the exact functions and returns cited, grounded answers.
Under the hood
A three-stage pipeline from raw source code to cited, grounded answers, built on production-grade components.
Step 01
Every Python file is parsed with tree-sitter to build a
concrete syntax tree. Top-level functions and classes are extracted as
self-contained semantic chunks, each with its full source and exact
line range.
Step 02
Each chunk is encoded into a 1,536-dimensional vector using OpenAI's
text-embedding-3-small. Vectors are stored in Pinecone
for sub-millisecond nearest-neighbour retrieval via semantic similarity,
not keyword matching.
Step 03
Your question is embedded in real time. The top‑8 most similar
chunks are passed to GPT-4o as grounding context. It
synthesises a precise answer backed exclusively by real source code,
with file path and line number citations.