Overview
A centralized AI-driven testing automation platform under active development — built to solve a real pain that emerged with the rise of AI coding assistants. When AI changes features across dozens of files in a single PR, traditional test suites break in unpredictable ways. This platform tackles that problem from two angles: a browser extension that captures, records, and re-plays end-to-end user flows visually, and a code-level SDK that runs deterministic unit and integration tests with AI-assisted self-healing.
Problem Statement
AI-augmented development changes everything about code velocity — a single prompt can refactor a UI component, rename props across the codebase, and rewire backend handlers in seconds. The old testing workflow can't keep up. Selector-based E2E tests break because elements moved. Snapshot tests fire false positives on cosmetic changes. Unit tests pass while the actual user-facing feature is broken. Manual QA is the bottleneck that kills the velocity AI gave us in the first place.
Existing tools (Playwright, Cypress, Selenium) require constant maintenance. CI fails not because code is broken, but because tests are brittle. Developers spend more time fixing tests than shipping features.
The goal: a testing layer that adapts to changes automatically, captures intent rather than exact selectors, and gives developers a single source of truth for "does the feature still work?" — across browser flows, API contracts, and component behavior.
Architecture
A multi-component system designed for the AI-augmented era:
- ▹Browser extension (Chrome + Firefox + Edge via WebExtension API) — records user interactions visually, captures DOM snapshots with semantic anchors, replays flows on demand
- ▹CLI + SDK (Python + TypeScript) — runs the same tests headlessly in CI, with AI-powered self-healing when selectors break
- ▹Test orchestrator (FastAPI + PostgreSQL) — stores test definitions, runs, screenshots, video traces, and AI repair suggestions
- ▹AI healer (LangGraph + Claude / GPT-4o) — when a test fails, the LLM diagnoses whether the failure is a real regression or a stale selector, and proposes a fix
Browser Extension — Visual Test Authoring
The extension installs in the browser and adds a record button to the toolbar. As the user clicks through a flow, the extension captures:
- ▹Semantic selectors, not just CSS paths — element role, accessible name, surrounding text, visual position
- ▹Network calls with request and response payloads (redacted for secrets)
- ▹Visual snapshots at every step, used as pixel-diff baselines
- ▹Console errors and warnings to flag silent failures
When done, the recording becomes a portable JSON test artifact that runs in CI without the extension installed. Re-recording an updated flow takes seconds; the AI healer merges the diff back into the existing test.
Code-Level Testing
For backend and component-level coverage, an SDK exposes simple primitives:
- ▹describeFeature / it style API in both TypeScript and Python
- ▹Auto-generates test scaffolding from a natural-language description ("when a user submits the contact form with a valid email, an email should land in the inbox")
- ▹Hooks into the existing test runner (Vitest, Jest, Pytest) so it slots into existing pipelines
The platform tracks which tests cover which files and routes, so when AI changes a file, it knows exactly which tests to prioritize re-running.
AI Self-Healing
The headline feature: when a test breaks, the AI healer doesn't just report the failure — it tries to fix the test itself.
The flow:
- ▹Test fails → orchestrator captures the failing selector, the new DOM, the diff
- ▹LangGraph agent receives the failure context plus the latest code diff from Git
- ▹LLM reasons: is this a regression in the product, or just a moved element?
- ▹If selector drift → propose updated selector + run it to verify
- ▹If real bug → open a PR comment with the diagnosis and link to the failing trace
- ▹Developer reviews the suggested fix in a one-click approve UI
This turns flaky test maintenance from a manual job into a review-and-approve workflow.
Key Features (so far)
- ▹Cross-browser extension (Chrome, Firefox, Edge) for visual flow recording
- ▹Headless replay in CI on Linux, macOS, Windows runners
- ▹Semantic selector engine that survives DOM refactors
- ▹Pixel-diff visual regression with intelligent thresholds (ignores font hinting, anti-aliasing noise)
- ▹Auto-redacting network capture (filters auth tokens, secrets, PII)
- ▹LangGraph-driven AI healer with GPT-4o and Claude 3.5 Sonnet backends
- ▹Native CI integrations (GitHub Actions, GitLab CI, CircleCI, Vercel preview hooks)
- ▹Slack and Discord notifications with embedded screenshot diffs
- ▹Open API for custom test generators
Stack
Python · FastAPI · TypeScript · Node.js · WebExtension API · Playwright · PostgreSQL · pgvector · Redis · LangGraph · LangChain · OpenAI GPT-4o · Anthropic Claude · Docker · GitHub Actions · AWS S3 (artifact storage)
Current Status
Active development. Browser extension MVP records and replays flows on Chrome and Firefox. CLI runs headless replays in GitHub Actions. AI healer prototype works on selector-drift failures; visual-regression healing is next. Targeting an early-access release with a handful of teams once the dashboard ships.
Why This Matters
AI-assisted coding without AI-assisted testing is half a workflow. The teams shipping fastest right now still pay the testing tax — features ship in an hour, tests get fixed for two days. Close that gap and the velocity gains compound. That's the bet this project is making.