- Resolve registry URL
- NPM:
https://registry.npmjs.org/{package_name}/{version}
- PyPI:
https://pypi.org/pypi/{name}/{version}/json
- Fetch README & docs
readme = github.get_file("README.md")
docs_tree = github.list_dir("/docs")
- Scrape extra docs via Apify
- Start URLs:
homepage
, docs_url
.
- Max pages: 30.
- StackOverflow Q&A (votes ≥ 10).
- API:
GET /2.3/questions?order=desc&sort=votes&tagged={pkg}&site=stackoverflow
- Chunking
- Algorithm: recursive text splitter 800 tokens overlap = 50.
- Embedding
embedding = openai.Embedding.create(input=chunk, model=env.EMBEDDING_MODEL)["data"][0]["embedding"]
- Supabase insert
INSERT INTO documentation (...)
- Profile summary (Gemini prompt)
- Sections: Install · Quickstart · API Surface.
- Cache
redis.setex(profile_key, 35*86400, json.dumps(summary))
- Cold store
boto3.put_object(Bucket=S3_BUCKET, Key=f"profiles/{...}.json")
(S3 lifecycle: delete after 60 d).