feat: semantic icon search with VLM descriptions and embeddings#117
Draft
mmacpherson wants to merge 1 commit intomainfrom
Draft
feat: semantic icon search with VLM descriptions and embeddings#117mmacpherson wants to merge 1 commit intomainfrom
mmacpherson wants to merge 1 commit intomainfrom
Conversation
d8c0dac to
23a6efa
Compare
Natural language search for Lucide icons ("payment", "hard work", "bird")
using cosine similarity on Nomic embeddings, with Gemini-generated
descriptions providing rich semantic signal.
Architecture:
- Build time: Gemini 2.5 Flash Lite generates descriptions from rendered
icon PNGs + Lucide metadata, fastembed computes 768d Nomic embeddings
- Runtime: `pip install python-lucide[search]` adds only fastembed;
search DB (~8 MB) auto-downloads on first use from GitHub releases
- Search DB is NOT in the wheel — built at publish time and uploaded as
a GitHub release artifact (search-v{lucide_version})
Includes:
- `search_icons()` public API and `lucide search` CLI command
- `lucide describe`, `lucide build-search`, `lucide cluster` build tools
- UMAP + HDBSCAN cluster discovery with Gemini-generated theme names
- Embedding visualizations and cluster map tooling
- Pre-built descriptions for 1,694 Lucide v1.7.0 icons
- CI: publish workflow builds and uploads search DB to GitHub releases
- CI: weekly update workflow regenerates descriptions for new icons
- 62 tests (search, build pipeline, CLI)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9309c7f to
0966851
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds embedding-based semantic search so users can find Lucide icons by natural language queries instead of exact name matching.
search_icons("payment")→ dollar-sign, banknote, receipt, credit-card...search_icons("a cozy cabin in the woods")→ tent-tree, armchair, tree-deciduous...search_icons("ennui")→ annoyed, meh, frown...How it works
search_query:/search_document:prefixesWhat's included
search.py: Public API —search_icons(),search_available(),SearchResultbuild_search.py: VLM + embedding pipeline with JSONL as durable intermediatebuild_clusters.py: HDBSCAN discovers 88 semantic themes, Gemini Flash names themlucideCLI: Unified subcommands —db,describe,build-search,search,cluster,versionicon_tags(12,619),icon_categories(3,309),icon_aliases(248) in main DBInstall & try
Optional extra keeps base package lightweight
Search DB (~8MB) downloads on first use and is cached in
~/.cache/python-lucide/.Test plan
🤖 Generated with Claude Code