AI Writes My Code. It Also Breaks It. So I Built This

April 7, 2026 5 min read

It started with a problem every developer knows

I was deep into a side project, using AI to write code faster than I ever had before. Claude, GPT, Copilot — the code was flowing.

But there was a catch.

Every time the AI changed something, I had to test it. Not just unit tests. I had to open the app, click through screens, eyeball layouts, and make sure nothing looked broken. Did that modal shift? Is the sidebar still aligned? Did this page suddenly lose spacing somewhere subtle?

I was doing the same visual checks again and again. Click here. Scroll there. Compare this screen to what I remembered it looked like yesterday. It was regression testing, except I was the test runner — and I was slow.

From duct tape to a real product

So I did what most developers do first: I built a quick internal tool.

It automated screenshots and comparisons for my app. It was tightly coupled, full of hardcoded paths, and absolutely not something I would have shown anyone. But it worked. I could run a suite, capture screenshots, compare them to a known-good baseline, and stop squinting at pixels.

Then the obvious realization hit me: this problem wasn’t unique to my app.

Every team shipping a frontend deals with visual regressions. Every team using AI to generate code deals with them even more, because changes happen faster and breakages can appear in places nobody thought to check.

So I started pulling the tool apart and rebuilding it as something general-purpose: a real project system, multi-browser support, an API, and a web UI for reviewing diffs.

That was the beginning of VisionTest AI.

AI didn’t just help build it — it shaped the product

As I was building VisionTest AI, I was also using AI to debug it.

When a test failed, I would feed the error and context to an LLM and ask: what’s the root cause, and how do I fix it? The results were good enough that a bigger idea started to form.

What if the platform itself could do that?

What if a failed visual regression test could trigger an investigation pipeline that identifies the likely cause in the code, proposes a fix, verifies it, and opens a pull request automatically?

That became the autonomous bug-fix pipeline.

It’s ambitious, but it comes with guardrails. AI-generated fixes never touch your main branch directly. They land on isolated branches with full diff review. You can control how much autonomy the system has, from “investigate only” to “open a PR.” The goal is not to remove human judgment. The goal is to remove the repetitive work around finding and fixing obvious breakages.

One useful feature turned into a platform

Once the visual regression engine was working, it started opening the door to things I hadn’t planned at the start.

AI-powered visual diffs. Pixel diffs are useful, but they are noisy. Anti-aliasing artifacts and tiny sub-pixel shifts can create false alarms. So I built a multi-stage cascade: pixel diff first, then SSIM, LPIPS, DINOv2, and finally a vision-language model that can explain what changed and why it matters in plain English. The result is a system that can auto-approve noise and escalate meaningful regressions.

Natural-language test creation. Writing end-to-end tests step by step is tedious. VisionTest AI can take plain-English instructions like “Go to the login page, enter test credentials, click submit, and verify the dashboard loads” and translate them into executable test steps. It is not magic, but it gets you most of the way there fast.

Storybook integration. If you already use Storybook, VisionTest AI can discover your stories and generate visual regression coverage for components automatically. Connect it once, and as new components are added, they can be tested with minimal manual effort.

Self-healing selectors. Frontend tests often fail for boring reasons, like a renamed class or a shifted DOM structure. Instead of immediately failing, the system tries to repair the selector using cached patterns, DOM analysis, heuristics, and, if needed, an LLM. Tests that would otherwise be marked flaky can keep running.

Flaky test detection. Over time, the platform tracks test stability, calculates flakiness scores, and can quarantine tests that become unreliable. That means fewer teams living with the quiet assumption that a certain test always fails and should be ignored.

Smart test selection. By mapping source files to the tests that cover them, VisionTest AI can run only the tests most relevant to a given change. That cuts CI time without blindly reducing coverage.

API testing. Visual testing is only part of the picture. I added REST and GraphQL API testing too, with assertions, environment management, and shared reporting infrastructure. The result is one platform that covers both UI and API workflows.

Why I’m releasing it now

This is an alpha release.

There are bugs. There are rough edges. Some features are much more mature than others, and the documentation still has holes.

But I’ve been building this largely on my own, and I reached the point where shipping it imperfectly felt more useful than polishing it in isolation. Software like this gets better when real people use it, break it, question it, and contribute to it.

And this is real software.

VisionTest AI currently dogfoods itself: 47 internal tests pass at 100%, using its own visual testing engine. The application runs. The pages load. The AI pipeline connects. Natural-language parsing works. It is not a concept deck or a mockup. It is a working platform I use every day.

What makes it different

VisionTest AI is different in a few ways that matter to me.

It is self-hosted, which means your screenshots, your data, and your infrastructure stay under your control.

It is AI-native, not AI bolted on as a marketing checkbox. AI is built into test creation, diff analysis, failure investigation, and automated remediation.

It is multi-provider, so you can use Anthropic, OpenAI, OpenRouter, Gemini, or run local models through Ollama or llama.cpp.

And it is built around the full workflow — not just taking screenshots, but approvals, scheduling, webhooks, RBAC, audit logging, CI/CD integration, and the surrounding operational pieces a team actually needs.

It is also open source under the MIT license, because I want people to be able to inspect it, extend it, and improve it.

Try it

git clone https://github.com/jstuart0/visiontest-ai-oss.git
cd visiontest-ai-oss
./scripts/setup.sh
npm run dev

You can have it running locally in minutes, or deploy it to Kubernetes with the Helm chart.

If this sounds useful, give it a try. Open an issue. Submit a PR. Break it. Tell me what’s missing. That’s how it gets better.

— Jay