Visual Regression Testing — Applitools & Percy Guide

What is Visual Regression Testing?

Visual regression testing is the practice of automatically capturing screenshots of your web application and comparing them against previously approved baseline images to detect unintended visual changes. While functional tests verify that a button works, visual tests verify that the button looks right — correct color, correct position, correct size, no overlapping elements.

Traditional functional test suites can all pass — every assertion green — while the UI is visibly broken. Imagine a CSS change that shifts a navigation bar 40px to the left, or a font-weight change that makes a hero headline barely readable, or a layout reflow on a 375px viewport that stacks elements on top of each other. None of these would fail a Selenium or Playwright assertion that only checks element presence or text content. Visual regression tests catch exactly these categories of defect.

Real examples of UI bugs that slip through functional tests

Overlapping text: A CSS z-index change causes a modal's close button to render behind the modal backdrop — the button is present in the DOM and clickable via script, but invisible to a real user.
Missing images: A broken CDN path causes product images to fall back to empty alt text boxes. The product name text still passes assertion.
Broken layout on small viewport: A flex container overflows at 375px, stacking elements vertically in an unintended order. Desktop tests pass; mobile users see a broken page.
Color contrast regression: A design system update changes a button's background from #2563eb to #60a5fa — both are "blue," but the lighter shade now fails WCAG AA contrast requirements against white text.
Font rendering: A web font fails to load, falling back to system serif — the page works functionally but looks completely wrong.

Pixel diff vs AI-based comparison: Early visual testing tools used pure pixel-by-pixel diffing. A one-pixel antialiasing difference between Chrome and Firefox would produce hundreds of "failures." Modern tools like Applitools Eyes use AI-based comparison that understands intent — it can ignore rendering differences caused by antialiasing, sub-pixel font rendering, and minor browser-level variations, while still catching genuine layout regressions. This distinction is what makes visual testing practical at scale.

Architecture Overview

Understanding how visual testing tools work under the hood helps you configure them correctly and debug failures effectively. The typical visual testing pipeline follows this flow:

Test Code
(Selenium / Playwright / Cypress)

→

Visual Testing SDK
(Eyes / Percy)

→

Screenshot Capture
(Full page / viewport)

→

Baseline Storage
(Cloud service)

→

Diff Engine
(AI / pixel)

→

Pass / Fail Report
(Dashboard / PR comment)

Your test code instructs the SDK to take a checkpoint screenshot at a named step. The SDK sends the screenshot to a cloud service that stores it and compares it against the baseline for that step name. The diff engine produces a result — pass if within tolerance, fail if visual changes are detected. Results are surfaced in the tool's dashboard and, when integrated with GitHub/GitLab, as PR comments or status checks.

Applitools Eyes Setup with Selenium Python

Applitools Eyes is the most feature-rich commercial visual testing platform. It offers AI-powered comparison, cross-browser rendering via Ultrafast Test Cloud, and a powerful dashboard for managing baselines.

Installation

pip install eyes-selenium

Basic test structure

from selenium import webdriver
from applitools.selenium import Eyes, Target

class TestVisualLogin:

    def setup_method(self):
        self.driver = webdriver.Chrome()
        self.eyes = Eyes()
        self.eyes.api_key = "YOUR_APPLITOOLS_API_KEY"

    def test_login_page_visual(self):
        self.driver.get("https://example.com/login")

        # Open Eyes session — (app name, test name, viewport size)
        self.eyes.open(
            driver=self.driver,
            app_name="My Web App",
            test_name="Login Page Visual",
            viewport_size={"width": 1280, "height": 800}
        )

        # Take a full-page checkpoint
        self.eyes.check_window("Login Page - Initial State")

        self.driver.find_element("id", "email").send_keys("user@example.com")
        self.driver.find_element("id", "password").send_keys("Password123")

        # Take another checkpoint after filling the form
        self.eyes.check_window("Login Page - Form Filled")

        # Close Eyes and get the test result
        results = self.eyes.close(raise_ex=False)
        assert results.is_passed, f"Visual differences detected: {results.url}"

    def teardown_method(self):
        self.eyes.abort_if_not_closed()
        self.driver.quit()

The key methods are:

eyes.open() — Starts an Eyes test session, sets the baseline key (app name + test name + viewport)
eyes.check_window() — Takes a full-page screenshot at this checkpoint and compares to baseline
eyes.close(raise_ex=False) — Ends the session and returns results without throwing; you handle the assertion
eyes.abort_if_not_closed() — Cleanup in teardown — closes any open session in case of test failure

Checking a specific region

from applitools.selenium import Target, Region

# Check only the header element
self.eyes.check("Header Region",
    Target.region(self.driver.find_element("css selector", "header.site-header"))
)

# Check a specific coordinate region (x, y, width, height)
self.eyes.check("Banner",
    Target.region(Region(0, 0, 1280, 200))
)

Applitools AI Match Levels

The match level controls how strictly Applitools compares the screenshot to the baseline. Choosing the right level for each test is critical to avoiding both false positives and missed regressions.

Match Level	What It Checks	Best Used For	Tolerance
Exact	Pixel-perfect match — every pixel must be identical	Static images, fixed assets, canvas elements	Zero tolerance
Strict	Human-visible changes — catches anything a user would notice	Most UI components — buttons, forms, headers	Low — ignores sub-pixel antialiasing
Layout	Structure and layout only — ignores text content and colors	Pages with dynamic content (names, dates, prices)	High — content-agnostic
Content	Text presence and position — ignores styling differences	Localization tests, font fallback detection	Medium
Ignore Colors	Structure and layout without color comparison	Dark mode vs light mode comparisons	High

from applitools.selenium import MatchLevel

# Set match level on the Eyes instance (applies to all checks)
self.eyes.match_level = MatchLevel.LAYOUT

# Or set it per-checkpoint using Target
self.eyes.check("Dashboard",
    Target.window().fully().match_level(MatchLevel.STRICT)
)

Why Layout mode reduces flakiness: Pages with timestamps, user names, dynamic prices, or advertisement content will always fail Strict comparison after the first run because the content changes. Layout mode verifies that the structural elements are in the right positions without caring about what text they contain. This is the mode I reach for on most dashboard and listing pages where the data changes but the layout should not.

Baseline Management

The baseline is the "approved correct" screenshot that all future runs compare against. Baseline management is where visual testing workflows live or die.

First run — establishing the baseline

The very first time Applitools encounters a new test name and viewport combination, it has no baseline. It accepts the screenshot automatically and creates the baseline. Subsequent runs compare against this accepted screenshot. This means the first run always "passes" — you should review it manually to confirm the initial state is correct before relying on it as a reference.

Accepting and rejecting diffs

When a visual difference is detected, the test shows as "Unresolved" in the Applitools dashboard. You review the diff side-by-side and either:

Accept (thumbs up): The change was intentional — update the baseline to the new screenshot
Reject (thumbs down): The change is a bug — mark as failed, team fixes the code

Branching baselines for feature branches

Applitools supports branching baselines that mirror your Git branches. Set the branch name in your test configuration:

# Set branch from environment variable (CI provides this)
import os
self.eyes.branch_name = os.environ.get("BRANCH_NAME", "main")
self.eyes.parent_branch_name = "main"

When a feature branch test first runs, Applitools copies the baseline from the parent branch. Changes made and accepted on the feature branch only affect that branch's baseline — merging the branch to main prompts a baseline merge as well. This prevents feature branches from polluting the main baseline.

Percy (BrowserStack) Setup

Percy is BrowserStack's visual testing platform. It integrates tightly with GitHub and GitLab pull request workflows, making it a popular choice for teams already using BrowserStack for cross-browser testing.

Installation

pip install percy-selenium

Basic Percy test with Selenium

from selenium import webdriver
from percy import percy_snapshot

class TestPercyVisual:

    def setup_method(self):
        self.driver = webdriver.Chrome()

    def test_homepage_visual(self):
        self.driver.get("https://example.com")

        # Take a Percy snapshot — name is the baseline key
        percy_snapshot(self.driver, "Homepage")

        # Navigate to login
        self.driver.find_element("link text", "Sign In").click()
        percy_snapshot(self.driver, "Login Page")

    def teardown_method(self):
        self.driver.quit()

Percy requires the PERCY_TOKEN environment variable to be set. Percy handles screenshot upload, cross-browser rendering, and comparison in their cloud. There is no client-side baseline comparison — everything happens server-side and results appear in the Percy dashboard and as PR comments.

Running Percy tests

# Set your Percy token
export PERCY_TOKEN=your_percy_token_here

# Percy wraps your test command
npx percy exec -- pytest tests/visual/

Percy GitHub Integration

Percy's most compelling feature for modern teams is its automatic GitHub pull request integration. Once you install the Percy GitHub App on your repository:

Every PR that triggers Percy tests gets a status check: percy/web
The status check shows the count of visual changes: "8 visual changes found"
Clicking "Details" opens the Percy dashboard filtered to that PR's build
Reviewers see the before/after diff inline and can approve changes with a click
Once all changes are approved, the Percy status check turns green
You can configure branch protection to require the Percy check before merging

This workflow integrates visual review into the code review process. A designer or QA engineer can review and approve visual changes in the Percy UI without touching the codebase — the developer gets a clear signal that the visual changes are intentional and approved.

Cypress + Percy Integration

Percy has a first-class Cypress integration that feels native to the Cypress ecosystem:

# Install Percy Cypress SDK
npm install --save-dev @percy/cypress @percy/cli

# In cypress/support/e2e.js (or index.js for older Cypress)
import '@percy/cypress';

// cypress/e2e/visual.cy.js
describe('Visual Regression Tests', () => {

    beforeEach(() => {
        cy.visit('https://example.com');
    });

    it('captures homepage visual snapshot', () => {
        cy.get('[data-testid="hero-section"]').should('be.visible');
        cy.percySnapshot('Homepage - Hero Section');
    });

    it('captures product listing visual snapshot', () => {
        cy.visit('/products');
        cy.get('.product-grid').should('be.visible');
        cy.percySnapshot('Products Page', { widths: [375, 768, 1280] });
    });

    it('captures navigation states', () => {
        cy.get('.nav-toggle').click();
        cy.get('.site-nav').should('have.class', 'open');
        cy.percySnapshot('Navigation - Mobile Open');
    });
});

# Run with Percy
npx percy exec -- cypress run

The widths option in cy.percySnapshot() tells Percy to render the snapshot at multiple viewport widths in a single run, giving you responsive coverage from a single test call.

Playwright + Applitools

Applitools has a dedicated Playwright SDK that uses the Ultrafast Grid — instead of running your browser-level screenshots through the driver, Applitools renders your page's DOM snapshot in their cloud across all configured browsers simultaneously. One test run, multiple browser results.

pip install eyes-playwright

from playwright.sync_api import sync_playwright
from applitools.playwright import Eyes, Target, Configuration
from applitools.common import BatchInfo, BrowserType

def test_playwright_visual():
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page()

        eyes = Eyes()
        config = Configuration()
        config.batch = BatchInfo("Playwright Visual Batch")

        # Add browsers for Ultrafast Grid cross-browser rendering
        config.add_browser(1280, 800, BrowserType.CHROME)
        config.add_browser(1280, 800, BrowserType.FIREFOX)
        config.add_browser(375, 812, BrowserType.SAFARI)

        eyes.set_configuration(config)
        eyes.open(page, "My App", "Playwright Full Page Test")

        page.goto("https://example.com")
        # Check the entire page including below-the-fold content
        eyes.check("Full Page", Target.window().fully())

        page.click('[data-testid="cta-button"]')
        eyes.check("After CTA Click", Target.window())

        results = eyes.close(raise_ex=False)
        browser.close()
        assert results.is_passed

Responsive Visual Testing

Testing only at 1280×800 gives false confidence. Real users access your app on phones, tablets, and wide monitors. Both Applitools and Percy support multi-viewport testing in a single run.

# Applitools — add multiple browsers/viewports to Configuration
from applitools.common import BrowserType, DeviceName, ScreenOrientation

config.add_browser(375, 812, BrowserType.CHROME)    # Mobile portrait
config.add_browser(768, 1024, BrowserType.CHROME)   # Tablet portrait
config.add_browser(1280, 800, BrowserType.CHROME)   # Desktop
config.add_browser(1920, 1080, BrowserType.CHROME)  # Wide desktop

# Add emulated mobile devices
config.add_device_emulation(DeviceName.iPhone_X, ScreenOrientation.PORTRAIT)
config.add_device_emulation(DeviceName.iPad_Pro, ScreenOrientation.LANDSCAPE)

# Percy — specify widths directly in snapshot call
percy_snapshot(driver, "Homepage Responsive",
    widths=[375, 768, 1024, 1280, 1920]
)

From Experience at Viasat: Viasat's satellite internet management portal is accessed by customers on a wide range of devices — from aging tablets in rural installations to modern smartphones. When I introduced visual regression testing to the QA pipeline, catching a broken 375px layout that was invisible at 1280px justified the entire tooling investment in the first sprint. Responsive visual testing is not a nice-to-have; it is a core coverage requirement for any consumer-facing web app.

Visual Testing in CI — GitHub Actions

Visual regression tests only add continuous value when they run on every pull request automatically. Here are complete GitHub Actions workflows for both Applitools and Percy.

Applitools in GitHub Actions

# .github/workflows/visual-applitools.yml
name: Visual Tests — Applitools

on: [pull_request]

jobs:
  visual-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: pip install pytest selenium eyes-selenium

      - name: Install Chrome
        uses: browser-actions/setup-chrome@latest

      - name: Run visual tests
        env:
          APPLITOOLS_API_KEY: ${{ secrets.APPLITOOLS_API_KEY }}
          BRANCH_NAME: ${{ github.head_ref }}
        run: pytest tests/visual/ -v --tb=short

Percy in GitHub Actions

# .github/workflows/visual-percy.yml
name: Visual Tests — Percy

on: [pull_request]

jobs:
  percy-visual:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          npm install -g @percy/cli
          pip install pytest selenium percy-selenium

      - name: Install Chrome
        uses: browser-actions/setup-chrome@latest

      - name: Run Percy visual tests
        env:
          PERCY_TOKEN: ${{ secrets.PERCY_TOKEN }}
        run: npx percy exec -- pytest tests/visual/ -v

Tool Comparison — Applitools vs Percy vs BackstopJS vs Chromatic

Feature	Applitools Eyes	Percy	BackstopJS	Chromatic
Comparison Engine	AI (Visual AI)	Pixel diff + rendering	Pixel diff (Resemble.js)	Pixel diff (Storybook-native)
Pricing (free tier)	Free: 1 user, limited checkpoints	Free: 5,000 screenshots/month	Free (self-hosted)	Free: 5,000 snapshots/month
Framework Support	Selenium, Playwright, Cypress, WebdriverIO, Appium	Selenium, Playwright, Cypress, WebdriverIO, Storybook	Puppeteer, Playwright, Selenium	Storybook (primary), Playwright
Cross-browser Cloud	Yes — Ultrafast Grid	Yes — BrowserStack cloud	No — local browsers only	Limited (Chrome/Firefox)
CI Integration	GitHub, GitLab, Jenkins, CircleCI	GitHub, GitLab, Bitbucket (native PR comments)	Any (local report generation)	GitHub, GitLab (Storybook PRs)
Dynamic Content Handling	Excellent — Layout/Content match levels	Good — ignore regions in config	Manual — ignore regions in JSON config	Limited — best for component isolation
Best For	Enterprise, full application visual QA	Teams on BrowserStack, PR-centric review	Budget-conscious teams, self-hosted	Storybook component libraries

Best Practices

Visual testing brings significant value but also unique challenges. These practices come from running visual test suites in production CI pipelines across multiple projects.

1. Use Layout match level for dynamic content

Any page with timestamps, user-generated content, live prices, or advertisements should use Layout match level. Strict comparison on dynamic content produces a constant stream of false positives that will erode team trust in the visual suite within weeks.

2. Define ignore regions for truly unavoidable dynamic elements

# Applitools — ignore a specific element
from applitools.selenium import Target, FloatingRegion

self.eyes.check("Dashboard",
    Target.window()
    .ignore(self.driver.find_element("id", "live-price-ticker"))
    .ignore(self.driver.find_element("css selector", ".ad-banner"))
    .match_level(MatchLevel.STRICT)
)

3. Run visual tests on every PR, not just main

Visual regressions are easiest to attribute and fix at PR time. If you only run visual tests on main after merge, you will spend significant time bisecting commits to find which change caused the regression. Catching it on the PR that introduced it costs 10 minutes; finding it after merge can cost hours.

4. Maintain separate baselines per environment

Your staging environment may have different test data, dark mode settings, or feature flags than production. Comparing screenshots from staging against a production baseline will produce false failures. Use Applitools' branch/environment configuration or Percy's parallel builds to maintain separate baselines for dev, staging, and production environments.

5. Review visual diffs before approving PRs

Make visual review part of your PR review checklist — not just code review. A PR that changes CSS should have its Percy or Applitools dashboard link checked by a reviewer with design context, not just a developer looking at code diffs.

From Experience at Amazon: When working on device management UIs at Amazon, visual consistency across a wide device catalog was critical — a UI that looked correct on a Kindle Fire HD might be broken on a Fire TV Stick's web interface. Introducing visual regression testing with Applitools' Layout match level for dynamic device lists reduced visual bug reports from customers by a measurable margin within two release cycles. The most valuable insight was that Layout mode eliminated 90% of the false positives we saw with Strict mode on pages containing device names and status strings that changed constantly.

6. Integrate visual test results into your definition of done

Visual tests should be a required status check for merging PRs — alongside unit tests and integration tests. Treat an unresolved visual change the same as a failing unit test: the PR does not merge until it is reviewed and either fixed or intentionally accepted as a design change.

Back to Blog

From Experience — Virtusa: Leading a team of 270 testers at Virtusa, we standardised on Appium for real Android device testing and Selenium WebDriver for web regression. The biggest challenge wasn't the tooling — it was consistency across a team that size. We enforced a strict Page Object Model convention and a pre-merge locator review checklist. Within two sprints, flaky test rates dropped significantly and the team achieved a 20% efficiency gain across regression cycles. At that scale, test architecture decisions matter far more than individual test quality.

Visual Regression Testing — Applitools & Percy Guide

What is Visual Regression Testing?

Real examples of UI bugs that slip through functional tests

Architecture Overview

Applitools Eyes Setup with Selenium Python

Installation

Basic test structure

Checking a specific region

Applitools AI Match Levels

Baseline Management

First run — establishing the baseline

Accepting and rejecting diffs

Branching baselines for feature branches

Percy (BrowserStack) Setup

Installation

Basic Percy test with Selenium

Running Percy tests

Percy GitHub Integration

Cypress + Percy Integration

Playwright + Applitools

Responsive Visual Testing

Visual Testing in CI — GitHub Actions

Applitools in GitHub Actions

Percy in GitHub Actions

Tool Comparison — Applitools vs Percy vs BackstopJS vs Chromatic

Best Practices

1. Use Layout match level for dynamic content

2. Define ignore regions for truly unavoidable dynamic elements

3. Run visual tests on every PR, not just main

4. Maintain separate baselines per environment

5. Review visual diffs before approving PRs

6. Integrate visual test results into your definition of done

Related Articles

Automation Testing Intro

Selenium with Python

BrowserStack Guide