Skip to content

DazzleLib/dazzle-tree-lib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

79 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DazzleTreeLib - Universal Tree Traversal Library

Version Python License Platform

DazzleTreeLib is the first Python library with a universal adapter system for tree traversal, providing both synchronous and asynchronous tree traversal with a universal interface. Currently optimized for high-performance filesystem operations with 4-5x caching speedup and production-grade error handling, the architecture is designed to support any tree-like data structure - from game development BSTs to JSON manipulation to hierarchical data processing.

⚠️ Pre-Alpha Release: This library is in active development. APIs may change between versions. We welcome feedback and contributions!

Why another tree library?

Have you ever needed to traverse different types of tree structures - filesystems, databases, API hierarchies, JSON documents - but ended up writing similar-but-different code for each one?

Or struggled with existing libraries that are either too limited (filesystem-only) or too complex (full graph theory) when you just need solid tree traversal with good performance?

What about when you need finer control - stopping at specific depths, filtering during traversal, caching results, or processing huge trees efficiently with async/await?

DazzleTreeLib solves these problems with a universal adapter system that works with ANY tree structure while providing powerful traversal controls.

Features

  • Universal Interface: One API for filesystem, database, API, or custom trees

  • Async Support: Built-in parallelism, with full async/await implementation, and batching (3.3x faster than sync)

  • Flexible Adapters: Easy integration with any tree-like data structure

  • Smart Traversal - Stop at any depth, filter during traversal, control breadth

  • Memory Efficient: Streaming iterators for handling large trees

  • Highly Extensible: Custom adapters, collectors, and traversal strategies

  • High-Performance Intelligent Caching: 4-5x speedup with completeness-aware caching

  • Error Resilient & Production Ready - Structured concurrency, proper error handling, streaming

What Makes DazzleTreeLib Different?

Quick Comparison

Feature DazzleTreeLib anytree treelib NetworkX
Universal adapter system βœ… ❌ ❌ ❌
One API for any tree source βœ… ❌ ❌ ❌
Composable adapters βœ… ❌ ❌ ❌
Async/sync feature parity βœ… ❌ ❌ ❌
Built-in caching βœ… ❌ ❌ ❌

For more, see the detailed comparison in docs

βœ… DazzleTreeLib is Perfect for:

  • Multi-source tree traversal (files + database + API)
  • Complex filtering and transformation logic
  • Async/await workflows with parallel processing
  • Large trees requiring streaming and caching
  • Custom tree structures needing standard traversal

❌ Consider alternatives for:

  • Simple filesystem-only tasks (use os.scandir - 6-7x faster)
  • Pure graph algorithms (use NetworkX)
  • In-memory-only trees (use anytree or treelib)

Performance

Benchmark Assessment (Sept. 2025)

Comparison Performance Best Use Case
DazzleTree async vs sync 3.3x faster When using DazzleTreeLib
DazzleTree vs os.scandir 6-7x slower DazzleTree for flexibility, os.scandir for speed
Memory usage ~15MB base + 14MB/1K nodes Acceptable for most applications

Quick Start

Installation

# Install from PyPI (recommended)
pip install dazzletreelib

# Or install from source for development:
git clone https://github.com/djdarcy/dazzle-tree-lib.git
cd dazzle-tree-lib
pip install -e .

Basic Usage - Synchronous

from dazzletreelib.sync import FileSystemNode, FileSystemAdapter, traverse_tree

# Simple filesystem traversal
root_node = FileSystemNode("/path/to/directory")
adapter = FileSystemAdapter()

for node, depth in traverse_tree(root_node, adapter):
    print(f"{'  ' * depth}{node.path.name}")

Basic Usage - Asynchronous (3x+ Faster!)

import asyncio
from dazzletreelib.aio import traverse_tree_async

async def main():
    # Async traversal with blazing speed
    async for node in traverse_tree_async("/path/to/directory"):
        print(f"Processing: {node.path}")
        
        # Access file metadata asynchronously
        size = await node.size()
        if size and size > 1_000_000:  # Files > 1MB
            print(f"  Large file: {size:,} bytes")

asyncio.run(main())

Real-World Examples

Find Large Files Efficiently

from dazzletreelib.aio import traverse_tree_async
import asyncio

async def find_large_files(root_path, min_size_mb=10):
    """Find all files larger than specified size."""
    large_files = []
    
    async for node in traverse_tree_async(root_path):
        if node.path.is_file():
            size = await node.size()
            if size and size > min_size_mb * 1024 * 1024:
                large_files.append((node.path, size))
    
    # Sort by size descending
    large_files.sort(key=lambda x: x[1], reverse=True)
    return large_files

# Usage
files = asyncio.run(find_large_files("/home/user", min_size_mb=100))
for path, size in files[:10]:  # Top 10 largest
    print(f"{size/1024/1024:.1f} MB: {path}")

Parallel Directory Analysis

from dazzletreelib.aio import get_tree_stats_async
import asyncio

async def analyze_projects(project_dirs):
    """Analyze multiple project directories in parallel."""
    tasks = [get_tree_stats_async(dir) for dir in project_dirs]
    stats = await asyncio.gather(*tasks)
    
    for dir, stat in zip(project_dirs, stats):
        print(f"\n{dir}:")
        print(f"  Files: {stat['file_count']:,}")
        print(f"  Directories: {stat['dir_count']:,}")
        print(f"  Total Size: {stat['total_size']/1024/1024:.1f} MB")
        print(f"  Largest: {stat['largest_file']}")

# Analyze multiple projects simultaneously
projects = ["/code/project1", "/code/project2", "/code/project3"]
asyncio.run(analyze_projects(projects))

Directory Timestamp Fixer (folder-datetime-fix use case)

from dazzletreelib.aio import traverse_tree_async
import asyncio
from pathlib import Path
import os

async def fix_directory_timestamps(root_path):
    """Fix directory modification times to match their newest content."""
    directories = []
    
    # Collect all directories first (depth-first post-order)
    async for node in traverse_tree_async(root_path, strategy='dfs_post'):
        if node.path.is_dir():
            directories.append(node.path)
    
    # Process directories from deepest to shallowest
    for dir_path in reversed(directories):
        newest_time = 0
        
        # Find newest modification time in directory
        for item in dir_path.iterdir():
            stat = item.stat()
            newest_time = max(newest_time, stat.st_mtime)
        
        # Update directory timestamp
        if newest_time > 0:
            os.utime(dir_path, (newest_time, newest_time))
            print(f"Updated: {dir_path}")

# Fix all directory timestamps
asyncio.run(fix_directory_timestamps("/path/to/fix"))

Migrating from Sync to Async

The async API mirrors the sync API closely, making migration straightforward:

Sync Version

from dazzletreelib.sync import traverse_tree, FileSystemNode, FileSystemAdapter

node = FileSystemNode(path)
adapter = FileSystemAdapter()
for node, depth in traverse_tree(node, adapter):
    process(node)

Async Version

from dazzletreelib.aio import traverse_tree_async

async for node in traverse_tree_async(path):
    await process_async(node)

Key differences:

  • No need to create node/adapter explicitly in async
  • Use async for instead of for
  • Await any async operations on nodes
  • Wrap in asyncio.run() or existing async function

Advanced Features

Batched Parallel Processing

The async implementation uses intelligent batching for optimal performance:

# Control parallelism with batch_size and max_concurrent
async for node in traverse_tree_async(
    root,
    batch_size=256,      # Process children in batches
    max_concurrent=100   # Limit concurrent I/O operations
):
    await process(node)

Depth Limiting

# Only traverse 3 levels deep
async for node in traverse_tree_async(root, max_depth=3):
    print(node.path)

Custom Filtering

from dazzletreelib.aio import filter_tree_async

# Custom predicate function
async def is_python_file(node):
    return node.path.suffix == '.py'

# Get all Python files
python_files = await filter_tree_async(root, predicate=is_python_file)

High-Performance Caching

DazzleTreeLib features a sophisticated completeness-aware caching system that provides 4-5x performance improvements with intelligent memory management.

from dazzletreelib.aio.adapters import CompletenessAwareCacheAdapter

# Safe mode (default) - with memory protection
cached_adapter = CompletenessAwareCacheAdapter(
    base_adapter,
    enable_oom_protection=True,
    max_entries=10000,
    validation_ttl_seconds=5
)

# Fast mode - maximum performance (4-5x faster on repeated traversals)
fast_adapter = CompletenessAwareCacheAdapter(
    base_adapter,
    enable_oom_protection=False
)

# First traversal: populates cache
async for node in traverse_tree_async(root, adapter=cached_adapter):
    process(node)

# Second traversal: uses cache (4-5x faster!)
async for node in traverse_tree_async(root, adapter=cached_adapter):
    process(node)

Key features:

  • Completeness tracking: Knows if subtree is fully or partially cached
  • Depth-based caching: Understands traversal depth patterns
  • Safe/Fast modes: Choose between safety and maximum performance
  • LRU eviction: Intelligent memory management with OrderedDict
  • TTL validation: Configurable freshness checks with mtime
  • 99% memory reduction: Recent optimization removed redundant tracking

πŸ“– Documentation:

Architecture

DazzleTreeLib uses a clean, modular architecture:

dazzletreelib/
β”œβ”€β”€ version.py     # Centralized version management
β”œβ”€β”€ sync/          # Synchronous implementation
β”‚   β”œβ”€β”€ core/      # Core abstractions (Node, Adapter, Collector)
β”‚   β”œβ”€β”€ adapters/  # Tree adapters
β”‚   β”‚   β”œβ”€β”€ filesystem.py      # FileSystem traversal
β”‚   β”‚   β”œβ”€β”€ filtering.py       # FilteringWrapper
β”‚   β”‚   └── smart_caching.py   # Caching with tracking
β”‚   └── api.py     # High-level sync API
β”œβ”€β”€ aio/           # Asynchronous implementation
β”‚   β”œβ”€β”€ core/      # Async abstractions with batching
β”‚   β”œβ”€β”€ adapters/  # Async adapters
β”‚   β”‚   β”œβ”€β”€ filesystem.py      # Async filesystem with parallel I/O
β”‚   β”‚   β”œβ”€β”€ filtering.py       # Async filtering
β”‚   β”‚   └── smart_caching.py   # Async caching adapter
β”‚   └── api.py     # High-level async API
└── _common/       # Shared configuration and constants

Testing

Run the test suite:

# Recommended: Full test suite with proper isolation
python run_tests.py

# Run specific test categories
python run_tests.py --fast       # Quick tests only
python run_tests.py --isolated   # Interaction-sensitive tests
python run_tests.py --benchmarks # Performance benchmarks

# Manual pytest (for development)
pytest -m "not slow and not benchmark"  # Fast tests only
pytest -m benchmark                      # Benchmark tests only
pytest -m "not interaction_sensitive"    # Skip isolation-required tests
pytest --cov=dazzletreelib               # With coverage report

Benchmarks

Run performance benchmarks:

# Run all benchmarks
python benchmarks/accurate_performance_test.py

# Compare with native Python methods
python benchmarks/compare_file_search.py

# Run pytest benchmarks
pytest -m benchmark -v -s

Contributing

Contributions are welcome! Please ensure:

  • All tests pass (python run_tests.py)
  • Code is properly typed
  • Documentation is updated
  • Performance isn't regressed

Note: Git hooks are configured to:

  • Update version automatically on commit
  • Run fast tests before push
  • Block commits with private files on public branches

Like the project?

"Buy Me A Coffee"

Development Status

  • Stable: Sync implementation (v0.5.0)
  • Stable: Async implementation (v0.6.0)
  • Production Ready: Used in production systems (v0.10.0)
  • 🚧 Coming Soon: Additional adapters (S3, Database, API)

Related Projects

DazzleTreeLib is used in a growing set of tools:

  • folder-datetime-fix: Directory timestamp correction tool (uses DazzleTreeLib)
  • preserve: File tracking for easy location recovery & backup (/w integrity and sync functionality)

Acknowledgments

  • Inspired by excellent tree/graph libraries:
    • anytree - Python tree data structures with visualization
    • treelib - Efficient tree structure and operations
    • NetworkX - Extensive graph algorithms
    • pathlib - Modern path handling in Python stdlib
    • graph-tool - Rust-based / Python graph analysis toolkit
  • Uses aiofiles for async file operations
  • GitRepoKit - Automated version management system
  • Community contributors - Testing, feedback, and improvements

License

DazzleTreeLib Copyright (C) 2025 Dustin Darcy

MIT License - see LICENSE file for details.

About

Universal tree traversal library for Python. One API for any tree structure - filesystem, database, API, or custom. Features async/sync parity, composable adapters, and 4-5x caching speedup.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Sponsor this project

  •  

Packages

 
 
 

Contributors