The charts looked gorgeous. Hundreds and hundreds of tests were passing. And the first human sniff test failed almost immediately.
SuperSubset is like an embeddable version of Apache Superset. It’s an analytics builder and runtime for React apps. Library-first. No iframes. Host-owned persistence. A visual designer. A runtime renderer. I wanted the kind of analytics experience people expect from Superset, but as a library I could drop into my own app instead of a whole BI system I had to orbit around.
I went looking for that and didn’t find exactly what I needed.
There were lots of close options. Apache Superset. Rill. Perspective. Dashbuilder. Good products. Mature products. But “close” didn’t work well enough. I needed chart definitions and dashboard definitions that could live as configuration. I needed a fixed backend data model delivered by the product itself. I needed the product boundary to be right.
Generating SuperSubset was an experiment.
The theory I wanted to test was simple: the bottleneck for software development is now specification, not implementation. If that’s true, then a robust open source product like Apache Superset can serve as a spec for a closely related project.
At first, it felt like the theory was working really well.
The initial deliverables were great. The charts looked gorgeous. The codebase had a nice shape. There were packages. Adapters. A designer. A renderer. A query layer. A docs site. It seemed to do everything I was hoping it would do.
And the tests were green. Lots of them.
The thing that broke wasn’t some deep technical edge case. It was the baby steps. Just trying to use the library end to end. Trying to do the obvious product workflow a human would do first. It just didn’t work.
A key part of SuperSubset is the dashboard designer. You need to be able to do normal chart-builder things. Change a bar chart to a line chart. Stack bars. Put them side by side. Adjust the point size on a dot. All the little controls that make a visual analytics tool feel real.
That only works if all the big pieces are actually talking to each other.
The stored definition has to be reflected properly in the editor. The editor has to talk to the backend data model in the right way. A change in configuration has to propagate through the system and show up in the live chart the way you’d expect. The workflow has to hold together.
That was the failure.
The agents were producing code that looked convincing locally. They were producing things that satisfied code-visible checks. But they weren’t really holding the product in their heads across the whole workflow.
They were thinking in terms of code completion and local correctness, not long-range product success.
The recovery was making the spec more concrete and making the verification more product-shaped.
I started spelling out the full end-to-end flows.
That uncovered new features along the way. One example was a probe feature so the backend model could be interrogated directly. That’s the kind of thing that emerges when you’re actually forcing the system through real usage instead of admiring a green test suite.
The other thing that really took the project over the hump was requiring documentation with screenshots showing the before and after effect of a setting. If stacking bars is supposed to work, show me the before and after. If a setting changes the chart, prove it from the product surface.
That does two useful things at once.
First, it forces the agent to exercise the product end to end. Second, it generates documentation that’s actually worth having.
Detailed test matrices helped for the same reason. They push the work back toward real behavior. They make “done” harder to gloss over.
Despite the fact that extra work was required, the “it’s about the spec, stupid” theory did hold up. The more I use spec driven development, the more I realize that even when you think you have a tight spec, you often don’t.
The key question moves from “can I build this?” to “is this specific clear enough?”.
I saw the same thing in a completely different context when building Wandir. Wandir had a third-party chat system and Cloudinary for images in the stack. Normal decisions pre-AI. The kind of thing you accept because building around them from scratch used to be obviously not worth it.
Working software is a great spec. And with a great spec, it’s very cheap to generate working software.
I replaced the chat system and Cloudinary’s image service in a matter of days. That simplified the architecture, cut supply-chain cost, and removed some glitchy boundaries where the product had to awkwardly coordinate with outside systems. The problems that still needed to be ironed out were implementation details that were not tight in the spec.
Same basic idea as SuperSubset.
When implementation gets cheap enough, more things become replaceable. More “close enough” dependencies become candidates for replacement. More product teams can justify building the exact thing they want instead of adapting themselves to somebody else’s product boundary.
The art is now in the spec.