Creator's Toolkit

How 20,000 scattered audio files became a searchable, playable catalog

Lucian Labs · March 2026 · Ongoing

20,770

files cataloged

122 GB

total audio

310 hrs

of sound

sources

The Problem

After 15+ years of making music, audio files end up everywhere. OneDrive, iCloud, external drives, Ableton project folders, sample packs from 2008. You know you have that perfect vocal chop somewhere. You just can't find it.

Every producer hits this wall. The library grows faster than any organizational system can keep up with. Drag-and-drop DAW workflows encourage scattering. Cloud sync creates duplicates. Drive migrations leave orphans.

The Build

The Idea

2026-03-06

The question

"Where are all my audio files?" A simple question that launched the project. The answer: scattered across 6 drives and cloud services, with no single index.

ↂ

2026-03-06

Architecture: Python + SQLite

Python for rapid prototyping. SQLite in WAL mode for the catalog — fast reads, single-file database, no server. xxhash for content-based deduplication (hash first + last 64KB, not the whole file). ffprobe for metadata extraction.

Phase 1 — Catalog

◭

2026-03-07

Scanner + CLI

Multi-threaded file crawler with incremental scanning. Commands: scan, stats, search, doctor. First run indexed 1,948 files.

◭

2026-03-07

Web dashboard

Single-file HTML dashboard served by Python's ThreadingHTTPServer. Three tabs: Catalog (browse + search + filter), Sources (scan roots with stats), Duplicates (exact + near-match detection). Audio player bar for auditioning files.

2026-03-07

20,770 files indexed

Full scan across all 6 sources. 122.1 GB, 310 hours of audio. Discovered 760 duplicate paths across 547 groups — ~710 MB wasted.

Phase 2 — Classification

◭

2026-03-08

PANNs audio classification

CNN14 model running on CUDA (Blackwell GPU). 527 AudioSet classes, 2048-dimensional embeddings per file. Automatic categorization: drums, synth, vocal, bass, guitar, fx, other.

◭

2026-03-08

System tray app

Background dashboard via pystray. Runs in the system tray, opens browser on click. Persistent access without a terminal window.

◭

2026-03-08

Duplicate detection

Two-tier dedup: exact matches (same xxhash) and near-matches (same filename, similar duration). Lazy-loaded API keeps the dashboard fast even with thousands of groups.

The Pivot — Going Native

▬

2026-03-09

Browser audio breaks

The HTML5 <audio> element can't reliably play local files through an HTTP server. Range request issues, codec limitations, seeking bugs. Web audio is not the answer for a tool that needs to play 20,000+ files across every format.

2026-03-09

Native app architecture

Adopting the WaveLoop pattern: core foundational behaviors + customizable UI. The core owns audio playback (via miniaudio, a native C library), the catalog database, and scanning. The UI is a webview window that sends commands through a JS bridge. Users can customize or replace the UI entirely.

"Give me a way to have an app running, and if the UI requests 'play', play comes from the native app, not janky web audio."

ↂ

2026-03-09

The behavior contract

A defined API that any UI can program against: ctk.audio.play(), ctk.catalog.search(), ctk.sources.scan(). The UI never touches the database directly. It sends commands, receives data. Same pattern as WaveLoop's tape engine — clear separation between behavior and presentation.

What's Next

⁜

Batch operations

Move, delete, rename from the dashboard. Dedup cleanup with one click.

⁜

Cross-platform installers

Bundled app for Windows (.exe) and macOS (.app). No Python required.

⁜

Vibe-coded UI

Open the UI layer for community customization. Share your layout, fork someone else's. The data is yours — the view should be too.

Architecture

The toolkit follows a strict separation: core behaviors handle data and audio natively, while the UI layer is a webview that communicates through a JS bridge. This means the audio engine runs at the OS level (not in a browser sandbox), and the interface can be swapped, themed, or rebuilt without touching the core.

Built with Python, SQLite (WAL mode), miniaudio (native C audio), and pywebview (native OS windows). No Electron. No Rust. Just the simplest stack that solves the problem.

Read the full build guide: Build Your Own Audio Asset Dashboard with Claude