April 29, 2026 · 5 min read
axe-core finds the violation. We suggest the fix.
V1 of AccessPulse adds three tiers of fix suggestions on top of axe-core's structural checks: deterministic templates, computed values for color contrast, and model-generated starting points for cases where the right answer needs human judgment. Here's how it works and what we measured against.
Run axe-core on a typical SaaS landing page. It tells you which images are missing alt. It tells you which form inputs don't have labels. It tells you which colors don't hit WCAG 1.4.3 contrast.
What it doesn't tell you is what to do about any of it.
A missing-alt error is a finding; the fix is content the developer has to write. A contrast failure is a finding; the fix is a color the developer has to pick. A button without an accessible name is a finding; the fix is text someone has to compose.
AccessPulse closes that loop. For each axe-core violation we can suggest a fix for, we suggest one.
Three tiers, picked deliberately
Not every axe-core finding has the same kind of right answer.
Some have exactly one. A <button> without a type attribute defaults to type="submit" inside a form, which is almost never what you want. The fix is type="button". An <html> element missing lang should get lang="en"(or the document's actual language). For these, the fix is the rule's definition expressed as a string template. We call them Tier 1.
Some have a bounded computed answer. A color-contrast failure needs a specific color, computable from your existing palette and the WCAG 4.5:1 (or 3:1) target. A target-size failure under WCAG 2.5.8 needs a specific minimum dimension. We compute these and surface the value. Tier 2.
Some need judgment about content the deterministic algorithm can't see. Alt text for a missing-alt image needs to describe the image. A button name needs to describe what the button does. For these — and only these — we use a multimodal language model. Tier 3.
What ships and what doesn't
Tier 1 ships.Roughly 15–20 axe-core rule types covered: missing lang, decorative images that should be alt="", untyped buttons, missing viewport meta, malformed lang attributes, and similar. Deterministic templates. No ML in the loop. The template is the rule's definition, so accuracy is bounded by construction.
Tier 2 ships. Color contrast and target size today. Algorithm, no model. The math gives you a passing value; whether it matches your design system is a human call. We surface the computed value next to the original so you can decide.
Tier 3 ships, conditionally.We measure the model against a held-out evaluation set; if generated fixes hit our quality bar (≥70% useful starting points), Tier 3 ships at launch. If not, it defers to a future version, and you see the violation without a suggested fix on those cases.
What doesn't ship — at any tier — is anything that auto-applies fixes. The accessiBe overlay model that auto-injects “accessibility” into a page got the FTC to issue a $1M fine in 2025 for false compliance claims. Suggested fixes go to a human to review. Always.
The mechanism, plainly
For each axe-core violation that maps to a Tier 3 rule — missing alt on a content image, missing form label, button with only an icon — AccessPulse runs a second pass. We send the violation context (image bytes for missing-alt cases, surrounding DOM and ARIA attributes for missing labels and button names) to Gemini 2.5 Flash with a structured prompt. The model returns a suggested string and one sentence of plain-language reasoning.
We surface every Tier 3 suggestion as a copy-pasteable code block, with the model's reasoning, with a “review before applying” disclaimer. Never auto-applied.
The companion: alt text quality scoring
In addition to suggested fixes, AccessPulse runs an alt-text quality pass on every image that already has alt. axe-core's image-alt rule passes any non-empty value: alt="image", alt="DSC_4823.jpg", marketing copy on a photo of a cat — all pass. They all fail real screen reader users.
The quality pass uses the same multimodal model as Tier 3 to score each (image, alt) pair as good(suppressed from the report — a page with 50 well-described images and 3 issues should look like a 3-issue report), marginal(flagged with the model's reasoning, no replacement suggested — the existing alt isn't broken, just improvable), or poor (flagged with reasoning AND a suggested replacement).
We wrote a separate piece on the eval methodology, the verdict semantics, and the limits — see Inside the ML alt-text-quality pass.
How we measured
Building an ML feature without an evaluation set is how you ship something that's right 30% of the time and don't notice.
We evaluate the model against a held-out test set drawn from four sources: WebAIM canonical examples (third-party-verified ground truth, stable URLs over a decade old), Wikimedia Commons, NOAA scientific charts (for the complex-image bucket), and constructed edge cases targeting the patterns where alt text most commonly fails. Ground truth is assigned by human review with per-case rationale.
We track three things: verdict-level accuracy (does the model agree with the human label?), poor precision (when the model says poor, is it actually poor? — false positives erode trust faster than false negatives), and per-verdict precision and recall.
For the Tier 3 fix-generation pass, we use a different metric: “useful starting point” rate. Each generated fix is evaluated by human review against the criterion “would a developer accepting this suggestion meet their accessibility goal.”
V1 ships only when verdict accuracy clears 80%, poor precision clears 70%, and Tier 3 useful-starting-points clears 70%.
Live results
Numbers go here once we run V1 against the full eval set with the live model and pass our quality gates. We'll publish: held-out verdict accuracy, per-verdict precision and recall, Tier 3 useful-starting-point rate, mean latency, cost per evaluation. We'll also run V1 against ~100 SaaS landing pages and report the distribution of findings — that's the companion data piece.
The full eval set itself ships open-source on the day this post becomes the Show HN lead. Audit the methodology, the ground truth, the disagreements we flagged. If we missed a case category, file an issue.
If we don't hit our thresholds, V1 doesn't ship — and you'll see a different post at this URL explaining why.
What's next
V1 covers two layers: alt-text-quality scoring on existing alt, and suggested fixes for ~15–20 axe-core rule types. We chose those because alt-text issues are the highest-frequency gap in axe-core's coverage today, and the rule types covered by suggested fixes are the ones where the right answer is computable, bounded, or — for the ML-judgment cases — measurable against ground truth.
What's next on the roadmap, gated on V1 holding up in production: the GitHub Action (AccessPulse/scan@v1) posts suggested fixes as inline PR review comments. CodeRabbit-cadence — review the fix where you're already reading the diff. We're shipping that with V1.
Free scan, no signup: accesspulse.dev. Methodology and eval set: accesspulse.dev/methodology. Action repo: github.com/AccessPulse/scan.
We'd love feedback from the a11y community, especially on the eval methodology. Happy to answer questions about the prompt design, edge cases, or specific WCAG criteria.
axe-core for the structural checks, plus three tiers of suggested fixes for the ones with a computable answer. Free scan, no signup: accesspulse.dev.
Frequently asked questions
What's the difference between axe-core and AccessPulse?
axe-core is a deterministic static analyzer: it finds structural violations it can verify programmatically — missing altattributes, insufficient contrast ratios, absent labels. It tells you what is broken. AccessPulse runs on top of axe-core and adds two layers: a three-tier fix suggestion engine that tells you what to do about each violation, and an ML-powered alt-text-quality pass that catches degraded alt text axe-core cannot evaluate because it can't see the image.
Does AccessPulse make my site WCAG compliant?
No. AccessPulse does NOT make your site WCAG compliant. WCAG conformance is a legal and process determination — it requires formal evaluation, manual auditing, and remediation across the full WCAG criterion set. Automated tools, including axe-core and AccessPulse, cover roughly 57% of detectable issues. The accessiBe overlay model claimed to achieve compliance automatically and was fined $1M by the FTC in 2025 for those false claims. We measure and recommend; we do not certify. For the gap automation cannot catch, we recommend engaging a manual auditor.
What is the three-tier fix suggestion model?
Tier 1 covers violations with exactly one correct fix expressible as a string template — a missing lang attribute, an untyped button, a decorative image that should have empty alt. Tier 2 covers violations with a bounded computed answer — color contrast failures (we compute a passing value from your palette) and target-size failures (we compute the minimum dimension). Tier 3 covers violations requiring judgment about content the algorithm cannot see — missing alt for content images, missing form labels, button names — where we use a multimodal model to generate a suggested starting point for human review.
Does AccessPulse use AI?
Yes, in two narrow places. The alt-text-quality pass uses Gemini 2.5 Flash to evaluate (image, alt) pairs — it cannot be done deterministically because you need to see the image. The Tier 3 fix suggestion pass also uses Gemini 2.5 Flash to generate starting-point text for missing alt, form labels, and button names. Both passes ship only when they clear a measured accuracy threshold on a held-out evaluation set. All other WCAG checks use axe-core's deterministic rule engine.
Where can I see the eval set?
The eval set ships open-source on the day this post becomes the Show HN lead. It includes all image-alt pairs, ground truth labels with per-case reviewer rationale, and the disagreements we flagged. For the Tier 3 fix-generation pass, the eval set includes the generated fix, the human reviewer's useful-starting-point judgment, and the criterion used. If you think we got a case wrong, file an issue. The methodology page documents how the sets were constructed and the thresholds required to ship.