dwit is a file management and knowledge-base tool.
It scans files, identifies them by their content hash,
and allows you to create relationships (tags and links) between them.
β±οΈ Benchmark: Scanned 447181 files in 5029 ms
git clone https://www.github.com/kitajusSus/DWIT.git
cd DWIT
zig build -Doptimize=ReleaseFast
sudo cp zig-out/bin/dwit /usr/local/bin/
dwit <command> [options]
dwit tag <file_path> <tag1> [tag2] ...
dwit scan <directory_path>
dwit list [--type <extension>]
dwit link <source_file> <target_file>
dwit find --tag <tag_name>Working on on project commands
# if you are testing if from source add `zig build --` before every commands
# (forgot `dwit`)
#for cleaning you db
rm dwit.db 2>/dev/null
# When running from the source directory
zig build run -- <command> [options]
zig build run -- scan .
zig build run -- list --type pdf
## if you use arch install needed package
sudo pacman -Syu graphviz
zig build run -- export-graph > graph.dot
# and use graphviz on it
dot -Tpng graph.dot -o graph.png
Commands Details
Scans a directory and builds/updates the file database.
dwit scan <directory_path>Description:
Recursively scans the given directory (e.g., .). It hashes every file found and saves its hash and path to the dwit.db database.
The scanner automatically ignores common cache/VCS directories, such as:
.git.zig-cachezig-outnode_modules
Upon completion, it prints a summary of the total files scanned and the time taken.
Lists all files known to the database.
dwit list [--type <extension>]Description:
Prints a list of all file paths stored in dwit.db.
Options:
--type <extension>: (Optional) Filters the list to show only files with the specified extension (e.g.,pdfor.pdf).
Adds one or more tags to a file.
dwit tag <file_path> <tag1> [tag2] ...Description:
Associates one or more string tags with the file specified by <file_path>. The file must already exist in the database (run scan first).
Creates a directed link between two files.
dwit link <source_file> <target_file>Description:
Creates a directional relationship from the <source_file> to the <target_file>. Both files must be known to the database.
Displays detailed information about a single file.
dwit info <file_path>Description: Provides a complete summary for the specified file, including:
- The file's unique content hash.
- A list of all tags associated with the file.
- A list of all links, showing both outgoing (
->) and incoming (<-) relationships.
Finds all files associated with a specific tag.
dwit find --tag <tag_name>Description:
Searches the database and prints a list of all files that have been marked with the specified <tag_name>.
Exports the entire database structure as a .dot graph.
dwit export-graphDescription: Prints a graph definition in the DOT language (for use with Graphviz) to standard output. In this graph:
- Nodes are the unique file hashes.
- Node Labels are the human-readable file paths.
- Edges represent the links created with the
linkcommand.
Example (Visualizing the graph):
# 1. Export the graph to a file
dwit export-graph > my_graph.dot
# 2. Use Graphviz (dot) to render it as an image
dot -Tpng my_graph.dot -o my_graph.pngGoal: Build a smart file manager in Zig
Building an intelligent file organizer that will eventually help with office document management - finding companies in contracts, analyzing invoices, visualizing document relationships.
As an intern I've seen how this office/.pdfs tasks can be boring or labourius in terms of taking much time for small reward.
Current Status: Have Zig basics from Ziglings, ready to tackle real project
"Starting with file system operations will give me the best foundation for this project"
-
File System Operations (Priority 1)
- Directory traversal in Zig
- File metadata extraction
- Error handling for file operations
-
CLI Design (Priority 2)
- Argument parsing patterns
- User-friendly help systems
-
Data Management (Priority 3)
- Memory management for file lists
- Efficient data structures
Questions to explore:
- How does Zig handle large directory trees?
- What's the best pattern for CLI commands in Zig?
- How to handle permissions errors gracefully?
How to create custom config for every user? Config it's needed to create struct of slices for skiping the scans:
- defaults
- reading json?
- creating .gitignore?
- adding full scan with this ignored_dirs
- adding -b for benchmark and check how does it perform, if Does not get worse leave it
Tests:
# clean old database if exists
rm dwit.db 2>/dev/null
# create test files
echo "note about ZIG" > note.txt
echo "PDF FILE ABOUT SKIBIDI TOILET" > memo.pdf
mkdir -p projects
echo "README" > projects/readme.md- already did commands
dwit scan ./documents # Scan directory
dwit list --type=pdf # Filter by type
dwit list # show every file from database
dwit search "contract" # Simple text search
Core Components:
src/
βββ core/
β βββ database.zig
β βββ hashing.zig
β βββ types.zig
βββ main.zig # Entry point
βββ lib.zig
- now trying to do phase 2
- OCR integration for scanned documents
- Entity extraction (company names, NIPs, amounts)
- Document relationship graphs
- Business intelligence features
- shows only file names and not path
- makes auto-tags based on config
- works on explorer/web
Problem: How to handle file permission errors gracefully?
Options:
- Fail fast - stop on first error
- Skip and continue - log errors but keep going
- Interactive - ask user what to do
Decision: Skip and continue (Option 2) Reasoning: Real-world directories often have permission issues. Better to process what we can and report issues at the end.
fn scanDirectory(path: []const u8) !ScanResult {
var result = ScanResult.init();
var errors = std.ArrayList(FileError).init(allocator);
// Process files, collect errors
for (files) |file| {
processFile(file) catch |err| {
try errors.append(FileError{ .path = file, .error = err });
continue; // Keep going
};
}
result.errors = errors.toOwnedSlice();
return result;
}Problem: JSON vs TOML vs custom format?
Decision: JSON (for now) Reasoning: Zig has built-in JSON support, simpler to start with. Can migrate to TOML later if needed.
- Looking for info
Instead of processing files sequentially (one by one), utilize a thread pool (e.g., 4 or 8 threads). This allows the application to perform tasks in parallel.
For example, while one thread is blocked waiting for a disk read (I/O-bound), another thread can be actively processing data in the CPU (CPU-bound), and a third can be writing metadata to the database. This parallel execution is the most important optimization for this type of mixed I/O and CPU workload, ensuring that system resources are used efficiently.
- Looking for info
Async I/O is a more advanced paradigm that complements multithreading. It allows your program to initiate many operations at once without waiting for each one to finish.
Instead of a thread blocking (idling) while waiting for a file read, the program can "request" the operating system to read 100 files simultaneously.
The program is then free to perform other work. As each file read is completed, the operating system notifies the program, which can then process the ready data. This model prevents threads from being wasted on waiting and is highly effective at maximizing throughput, especially with fast storage devices (SSDs).
Directory Iteration Pattern:
var dir = try std.fs.openDirAbsolute(path, .{ .iterate = true });
defer dir.close();
var iterator = dir.iterate();
while (try iterator.next()) |entry| {
switch (entry.kind) {
.directory => {
},
.file => {
const file_info = try getFileInfo(entry.name);
},
else => continue, // Skip symlinks, etc.
}
}Key Insights:
- Always defer
dir.close() - Handle errors at each level
- Use
entry.kindto filter file types
Pattern 1: Owned vs Borrowed Strings
// BAD: Dangling pointer
fn getFileName(path: []const u8) []const u8 {
return std.fs.path.basename(path); // Points to input string
}
// GOOD: Owned copy
fn getFileName(allocator: Allocator, path: []const u8) ![]const u8 {
return allocator.dupe(u8, std.fs.path.basename(path));
}Goals:
- Basic directory scanning
- File metadata extraction
- Simple CLI interface
Completed:
- Project planning and architecture design
- Learning roadmap creation
In Progress:
- Setting up development environment
- Implementing basic file scanner
Next Steps:
- Create basic project structure
- Implement directory traversal
- Add file metadata extraction
- Build simple CLI interface
- Performance: How does Zig file I/O compare to Rust/Go for large directories?
- Cross-platform: What Windows-specific issues should I expect?
- Memory: What's the best pattern for handling large file lists?
- Testing: How to create reproducible test environments for file operations?
Zig Documentation:
Inspiration Projects:
Learning Materials:
- Ziglings exercises (completed β )
- Zig Language Docs (www.zigland.com)
- "Zig in 100seconds" I just Love this guy
Short-term (1 month):
- Can scan 10k+ files without crashing
- CLI feels intuitive to use
- Code is well-structured and documented
Medium-term (3 months):
- Handles edge cases gracefully
- Performance competitive with existing tools
- Ready for OCR/document analysis features
Long-term (6 months):
- Used daily for document management
- Can analyze business documents
- Visualize document relationships