1. Resolve registry URL
    • NPM: https://registry.npmjs.org/{package_name}/{version}
    • PyPI: https://pypi.org/pypi/{name}/{version}/json
  2. Fetch README & docs
    • readme = github.get_file("README.md")
    • docs_tree = github.list_dir("/docs")
  3. Scrape extra docs via Apify
    • Start URLs: homepage, docs_url.
    • Max pages: 30.
  4. StackOverflow Q&A (votes ≥ 10).
    • API: GET /2.3/questions?order=desc&sort=votes&tagged={pkg}&site=stackoverflow
  5. Chunking
    • Algorithm: recursive text splitter 800 tokens overlap = 50.
  6. Embedding
    • embedding = openai.Embedding.create(input=chunk, model=env.EMBEDDING_MODEL)["data"][0]["embedding"]
  7. Supabase insert
    • INSERT INTO documentation (...)
  8. Profile summary (Gemini prompt)
    • Sections: Install · Quickstart · API Surface.
  9. Cache
    • redis.setex(profile_key, 35*86400, json.dumps(summary))
  10. Cold store
    • boto3.put_object(Bucket=S3_BUCKET, Key=f"profiles/{...}.json") (S3 lifecycle: delete after 60 d).