Delve: AI-Powered Taxonomy Generation

Delve is a production-ready SDK and CLI for automatically generating taxonomies from your data using state-of-the-art language models.

📚 Read the full documentation →

Quick Start

Installation

pip install delve-taxonomy

# Set API keys
export ANTHROPIC_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"  # Required for classifier embeddings

CLI

# Basic usage (shows progress spinners)
delve run data.csv --text-column text

# With progress bars and ETA
delve run data.csv --text-column text -v

# Quiet mode (errors only)
delve run data.csv --text-column text -q

# JSON with nested data
delve run data.json --json-path "$.messages[*].content"

Python SDK

from delve import Delve, Verbosity

# Initialize client (silent by default - library best practice)
delve = Delve()

# Or with progress output
delve = Delve(verbosity=Verbosity.NORMAL)

# Run taxonomy generation
result = delve.run_sync("data.csv", text_column="text")

# Access results
print(f"Generated {len(result.taxonomy)} categories")
for category in result.taxonomy:
    print(f"  - {category.name}: {category.description}")

# Access labeled documents
for doc in result.labeled_documents[:5]:
    print(f"  [{doc.category}] {doc.content[:50]}...")

Binary Detection (Single Category)

For fast filtering when you know the category you're looking for:

from delve import Delve

# Find all refund-related documents (~$1-2 for 30K docs, runs in minutes)
result = Delve.find_matches(
    "data.csv",
    category={
        "name": "Refund Request",
        "description": "User asking for refund or money back",
        "keywords": ["refund", "money back", "cancel"],
    },
    text_column="text",
    threshold=0.6,
)

print(f"Found {result.stats['matches']} matches")
for doc in result.matched_documents[:5]:
    print(f"  [{doc.confidence:.2f}] {doc.content[:50]}...")

Features

Automated Taxonomy Generation - No manual category creation using Claude 3.5 Sonnet
Binary Detection - Fast, cheap single-category filtering with find_matches()
Multiple Data Sources - CSV, JSON/JSONL, LangSmith runs, pandas DataFrames
Smart Categorization - Iterative refinement with minibatch clustering
Flexible Exports - JSON, CSV, and Markdown reports

Requirements

Python 3.9+
Anthropic API key (for taxonomy generation)
OpenAI API key (for classifier embeddings when sample_size > 0)

Documentation

Development

# Install dependencies
uv sync

# Run tests
pytest tests/

# Run linting
ruff check src/

# Format code
ruff format src/

Documentation Development

To work on the documentation locally, you'll need Node.js 20.17+ (for Mintlify):

# If using nvm, the project includes .nvmrc
nvm use

# Install Mintlify CLI (if not already installed)
npm install -g mintlify

# Run the docs server
cd docs
mintlify dev

See the full documentation for more details on contributing and development.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
.langgraph_api		.langgraph_api
docs		docs
examples		examples
src		src
tests		tests
.DS_Store		.DS_Store
.codespellignore		.codespellignore
.env.example		.env.example
.gitignore		.gitignore
.nvmrc		.nvmrc
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup_venv.sh		setup_venv.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Delve: AI-Powered Taxonomy Generation

Quick Start

Installation

CLI

Python SDK

Binary Detection (Single Category)

Features

Requirements

Documentation

Development

Documentation Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Delve: AI-Powered Taxonomy Generation

Quick Start

Installation

CLI

Python SDK

Binary Detection (Single Category)

Features

Requirements

Documentation

Development

Documentation Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages