Growing into the Valeon Ecosystem
There is a difference between a system that holds and a system that holds under load. The first kind is a hypothesis. It has been validated in the conditions it was built for, by the person who built it, who knew what they were doing and did not press hard in the wrong places. The second kind has been put under conditions that were not anticipated, has revealed exactly where the weakness was, and has been rebuilt around that knowledge. They can look identical from the outside. They are not the same thing.
VocaSync has been the second kind of work for the past several weeks. Not a rebuild — the surface holds, the contracts haven't changed, users haven't noticed a break in continuity. But the architecture running underneath is a substantially different thing, and the delta has been driven almost entirely by what load revealed.
My personal VocaSync account is now approaching three hundred projects. Valeon's publication pipeline runs all one hundred and forty-one posts through the platform on each rebuild cycle. These are not enormous numbers in an absolute sense, but they are real, sustained, production loads on a system I built — and therefore a system where I see everything that breaks. The alignment worker used to exhaust its memory on large audio jobs and die quietly. The synthesis worker held its breath waiting for each chunk before starting the next. The job claim loop ran on a timer, and when multiple workers were live they stampeded. None of these were fatal under light conditions. Under sustained load, they were all eventually fatal.
What follows is an account of how each of those breaks was addressed.
The Worker Layer
The architectural step that makes everything else possible is that all four VocaSync services — synthesis, alignment, transcription, and translation — are now running as proper workers with versioned GHCR images. This was already true of alignment. It is now true of all four. The deployment story is the same across every service: build, tag, push, pull, run — the same pattern that has driven Plutarc since early in its development, now running through the full audio platform.
The resource discipline that comes with proper workers is something you cannot approximate in a different model. Each container now has hard memory and CPU limits. But limits at the container level are only half the picture — the worker itself has internal throttling per job type, set below the container ceiling. This separation matters: the container hard limit is a safety rail, and the internal throttling is the actual operational ceiling. A runaway job cannot consume the container's full budget and starve everything else. More practically, multiple workers can be colocated on the same host without risk — which reduces infrastructure cost without reducing reliability.
Synthesis chunking has been hardened. Previously, silence detection drove the chunking boundaries, which worked well for speech but produced poor results on music and ambient audio that doesn't exhibit natural silence patterns. The worker now has a fixed fallback for those cases, which means chunking is no longer a source of unexpected behaviour outside its intended domain. Translation pathway parsing has also been made more robust — the previous implementation was fragile against certain model output formats. Configuration for both synthesis and translation is now available through optional API endpoint fields and through the dashboard UI, meaning behaviour is tunable without deployment changes on either side.
The Alignment Problem
The OOM issue in the alignment worker was the most significant reliability problem in VocaSync. Large audio files — long-form speech, full episodes, extended recordings — would exhaust the worker's available memory mid-run and the job would die. The failure was quiet enough that it required active investigation to diagnose rather than triggering obvious error states, and intermittent enough that it was difficult to reproduce consistently: the threshold was not a fixed file size but a function of audio density, encoding, and what else was running at the time.
The fix is chunked alignment. The worker now uses ffmpeg to scan the audio for silence gaps and derives split points from them — natural breaks in the content that allow the alignment to be divided without cutting through speech. For each chunk, ffprobe retrieves the audio metadata directly, so the worker knows exactly what it is handling before alignment begins — no inference, no guessing, no drift between what the file claims and what it contains. The alignment runs on each chunk independently, and a two-stage reconciliation process stitches the outputs into a single artefact with timestamps that are precise to the original file, not relative to chunk boundaries.
This is not a workaround. It is the correct architecture for this problem. A single-pass alignment of an arbitrarily large audio file will always be bounded by available memory. A chunked approach that scales to the content rather than the hardware is the design that should have been there from the start. It is there now.
The Operational Surface
The most architecturally interesting change in this cycle is perhaps the least visible: the job claim loop has moved from a polling model to reactive Convex subscriptions with atomic claims. Workers used to poll on a timer to check for available jobs. Under concurrent workers, this creates a race — multiple workers see the same job as claimable and attempt to claim it simultaneously. The solution is atomic claim operations in Convex, where only one worker can succeed, combined with a random variance in claim timing that spreads the load without requiring coordination. The system does not need to know how many workers are running. Each worker reacts to the subscription and attempts its claim. The architecture distributes itself.
OOV dictionaries follow the same pattern now. Workers previously loaded dictionary files from S3 on startup, which meant a fixed vocabulary for the lifetime of the process. They now subscribe reactively per job from Convex — OOV handling can be updated without restarting workers and can vary by job, which is the more accurate model for how it actually needs to work in production.
The employee dashboard has landed, with complete record isolation via a separate Clerk application and a custom RBAC solution handled in Convex. Publishable key management is now available through the dashboard UI — it was API-only before, which was the right choice for the first implementation and the wrong choice for the long term. Project search by UUID is a small addition that makes sense at scale: three hundred projects is the point at which name-based search starts returning too many partial matches, and UUID search cuts through it cleanly. Error reporting is now close to parity with Plutarc, which was the operational standard the rest of the stack needed to reach.
What's Coming
The design language across all Valeon products has been unified. VocaSync, Plutarc, the Valeon platform itself — they look like they belong to the same thing because they do, and that should now be apparent without being told.
The architectural patterns from this cycle — the deployment model, the resource discipline, the worker operational controls — are going to Plutarc next. The minor bugs that remain in Valeon and VocaSync need to clear first. Once they do, that is where my attention turns.
ShipSpace — or whatever it may inevitably end up being called — has also moved. It is no longer a conceptual hypothesis. It is in architectural planning, which is a different kind of work: slower, less satisfying in the short term, and more consequential for everything that follows. The business model and the regulatory questions around multinational logistics compliance are currently on my docket for investigation before a line of code gets written. I expect that first line sometime in the next few months. My expertise is not in the legal domain, so I am not holding my breath on the timeline — but the direction is set.