Skip to content

feat: semantic icon search with VLM descriptions and embeddings#117

Draft
mmacpherson wants to merge 1 commit intomainfrom
feat/semantic-search
Draft

feat: semantic icon search with VLM descriptions and embeddings#117
mmacpherson wants to merge 1 commit intomainfrom
feat/semantic-search

Conversation

@mmacpherson
Copy link
Copy Markdown
Owner

Summary

Adds embedding-based semantic search so users can find Lucide icons by natural language queries instead of exact name matching.

  • search_icons("payment") → dollar-sign, banknote, receipt, credit-card...
  • search_icons("a cozy cabin in the woods") → tent-tree, armchair, tree-deciduous...
  • search_icons("ennui") → annoyed, meh, frown...

How it works

  1. Build time: Gemini 2.5 Flash Lite generates text descriptions from rendered icon PNGs + Lucide metadata (tags, categories)
  2. Build time: nomic-embed-text-v1.5-Q computes 768d embeddings with asymmetric search_query:/search_document: prefixes
  3. Query time: User's query is embedded locally via fastembed (ONNX, no GPU), cosine similarity against pre-computed vectors

What's included

  • search.py: Public API — search_icons(), search_available(), SearchResult
  • build_search.py: VLM + embedding pipeline with JSONL as durable intermediate
  • build_clusters.py: HDBSCAN discovers 88 semantic themes, Gemini Flash names them
  • lucide CLI: Unified subcommands — db, describe, build-search, search, cluster, version
  • Relational metadata: icon_tags (12,619), icon_categories (3,309), icon_aliases (248) in main DB
  • Pre-built data: 1,703 icon descriptions + embeddings + cluster assignments for Lucide v0.577.0
  • Inline icon rendering: Kitty graphics protocol for Ghostty/kitty/WezTerm
  • UMAP + HDBSCAN visualizations: Interactive Plotly HTML maps of the embedding space
  • 62 tests, ruff + mypy clean

Install & try

pip install 'python-lucide[search]'
lucide search "love"
lucide search "hard work and determination" -v

Optional extra keeps base package lightweight

pip install python-lucide          # zero deps, 796KB — unchanged
pip install python-lucide[search]  # adds fastembed (ONNX Runtime)

Search DB (~8MB) downloads on first use and is cached in ~/.cache/python-lucide/.

Test plan

  • 62 unit tests covering search API, build pipeline, CLI subcommands, clustering
  • Quality smoke test with 10-icon subset (positive/negative controls)
  • Full 1,703-icon index verified with diverse queries
  • All pre-commit hooks pass (ruff, mypy, pytest)
  • Test on clean install without dev dependencies
  • Verify search DB auto-download from GitHub release

🤖 Generated with Claude Code

@mmacpherson mmacpherson force-pushed the feat/semantic-search branch from d8c0dac to 23a6efa Compare April 2, 2026 04:58
Natural language search for Lucide icons ("payment", "hard work", "bird")
using cosine similarity on Nomic embeddings, with Gemini-generated
descriptions providing rich semantic signal.

Architecture:
- Build time: Gemini 2.5 Flash Lite generates descriptions from rendered
  icon PNGs + Lucide metadata, fastembed computes 768d Nomic embeddings
- Runtime: `pip install python-lucide[search]` adds only fastembed;
  search DB (~8 MB) auto-downloads on first use from GitHub releases
- Search DB is NOT in the wheel — built at publish time and uploaded as
  a GitHub release artifact (search-v{lucide_version})

Includes:
- `search_icons()` public API and `lucide search` CLI command
- `lucide describe`, `lucide build-search`, `lucide cluster` build tools
- UMAP + HDBSCAN cluster discovery with Gemini-generated theme names
- Embedding visualizations and cluster map tooling
- Pre-built descriptions for 1,694 Lucide v1.7.0 icons
- CI: publish workflow builds and uploads search DB to GitHub releases
- CI: weekly update workflow regenerates descriptions for new icons
- 62 tests (search, build pipeline, CLI)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mmacpherson mmacpherson force-pushed the feat/semantic-search branch from 9309c7f to 0966851 Compare April 2, 2026 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant