Lucian Labs

Creator's Toolkit

How 20,000 scattered audio files became a searchable, playable catalog

Lucian Labs · March 2026 · Ongoing

20,770
files cataloged
122 GB
total audio
310 hrs
of sound
6
sources

The Problem

After 15+ years of making music, audio files end up everywhere. OneDrive, iCloud, external drives, Ableton project folders, sample packs from 2008. You know you have that perfect vocal chop somewhere. You just can't find it.

Every producer hits this wall. The library grows faster than any organizational system can keep up with. Drag-and-drop DAW workflows encourage scattering. Cloud sync creates duplicates. Drive migrations leave orphans.

The Build

The Idea
ω
2026-03-06
The question
"Where are all my audio files?" A simple question that launched the project. The answer: scattered across 6 drives and cloud services, with no single index.
2026-03-06
Architecture: Python + SQLite
Python for rapid prototyping. SQLite in WAL mode for the catalog — fast reads, single-file database, no server. xxhash for content-based deduplication (hash first + last 64KB, not the whole file). ffprobe for metadata extraction.
Phase 1 — Catalog
2026-03-07
Scanner + CLI
Multi-threaded file crawler with incremental scanning. Commands: scan, stats, search, doctor. First run indexed 1,948 files.
2026-03-07
Web dashboard
Single-file HTML dashboard served by Python's ThreadingHTTPServer. Three tabs: Catalog (browse + search + filter), Sources (scan roots with stats), Duplicates (exact + near-match detection). Audio player bar for auditioning files.
Ϡ
2026-03-07
20,770 files indexed
Full scan across all 6 sources. 122.1 GB, 310 hours of audio. Discovered 760 duplicate paths across 547 groups — ~710 MB wasted.
Phase 2 — Classification
2026-03-08
PANNs audio classification
CNN14 model running on CUDA (Blackwell GPU). 527 AudioSet classes, 2048-dimensional embeddings per file. Automatic categorization: drums, synth, vocal, bass, guitar, fx, other.
2026-03-08
System tray app
Background dashboard via pystray. Runs in the system tray, opens browser on click. Persistent access without a terminal window.
2026-03-08
Duplicate detection
Two-tier dedup: exact matches (same xxhash) and near-matches (same filename, similar duration). Lazy-loaded API keeps the dashboard fast even with thousands of groups.
The Pivot — Going Native
2026-03-09
Browser audio breaks
The HTML5 <audio> element can't reliably play local files through an HTTP server. Range request issues, codec limitations, seeking bugs. Web audio is not the answer for a tool that needs to play 20,000+ files across every format.
ψ
2026-03-09
Native app architecture
Adopting the WaveLoop pattern: core foundational behaviors + customizable UI. The core owns audio playback (via miniaudio, a native C library), the catalog database, and scanning. The UI is a webview window that sends commands through a JS bridge. Users can customize or replace the UI entirely.
"Give me a way to have an app running, and if the UI requests 'play', play comes from the native app, not janky web audio."
2026-03-09
The behavior contract
A defined API that any UI can program against: ctk.audio.play(), ctk.catalog.search(), ctk.sources.scan(). The UI never touches the database directly. It sends commands, receives data. Same pattern as WaveLoop's tape engine — clear separation between behavior and presentation.
What's Next
Batch operations
Move, delete, rename from the dashboard. Dedup cleanup with one click.
Cross-platform installers
Bundled app for Windows (.exe) and macOS (.app). No Python required.
Vibe-coded UI
Open the UI layer for community customization. Share your layout, fork someone else's. The data is yours — the view should be too.

Architecture

The toolkit follows a strict separation: core behaviors handle data and audio natively, while the UI layer is a webview that communicates through a JS bridge. This means the audio engine runs at the OS level (not in a browser sandbox), and the interface can be swapped, themed, or rebuilt without touching the core.

Built with Python, SQLite (WAL mode), miniaudio (native C audio), and pywebview (native OS windows). No Electron. No Rust. Just the simplest stack that solves the problem.

Read the full build guide: Build Your Own Audio Asset Dashboard with Claude