Database Testing for QA Engineers — SQL & Validation Guide

Why QA Engineers Need Database Skills

There is a common failure mode in software testing: a test checks the API response, the response looks correct, the test passes — and nobody notices that the database contains incorrect, corrupted, or incomplete data. The UI shows one thing; the database has another. This gap between presentation-layer verification and data-layer verification is where some of the most serious production bugs live.

Consider these real scenarios that UI-only testing misses:

Partial writes: An order creation API returns HTTP 200 with an order ID, but due to a bug the order_items table was never populated. The order exists; the items don't.
Missing cascade deletes: A user account deletion API deletes the user row but leaves orphaned records in sessions, preferences, and audit log tables — violating data integrity and leaking PII.
Timezone corruption: An event created at 10:00 AM IST is stored as 10:00 AM UTC, making it appear 5.5 hours later than intended. The API returns the stored timestamp; the UI displays it — both appear correct in isolation.
Duplicate records: A race condition in a high-traffic endpoint creates duplicate user records despite the UI showing only one. Without a UNIQUE constraint check, this goes undetected until downstream processes break on the duplicates.
Data truncation: A product description field is VARCHAR(255) but the application allows submissions up to 500 characters. The first 255 characters are saved silently; the truncated remainder is lost. The UI shows the saved version; no error is raised.

A QA engineer who can write SQL queries and connect to a database can verify not just that the application says data was saved correctly, but that it actually was.

Architecture Overview

Understanding the layers between a test action and database storage helps identify where to query for verification and where bugs are likely to originate.

Test Action
(API call / UI interaction)

→

Application
(backend service)

→

ORM / Query Layer
(SQLAlchemy / Hibernate / raw SQL)

→

Database Engine
(MySQL / PostgreSQL)

→

Tables & Data

Indexes

Constraints
(NOT NULL / FK / UNIQUE)

Bugs can originate at any layer: the application logic might compute the wrong value before persisting it; the ORM might generate an incorrect query; the database engine might enforce a constraint that the application doesn't handle gracefully. DB-level testing lets you bypass the application layer and verify the data state directly.

SQL Essentials for QA

You do not need to be a database administrator. You need to be fluent in read queries and basic data manipulation for test setup. Here are the SQL skills that cover 90% of QA database work.

SELECT with WHERE and comparison operators

-- Find a specific user
SELECT id, email, status, created_at
FROM users
WHERE email = 'test@example.com';

-- Find users created in the last 24 hours
SELECT id, email, created_at
FROM users
WHERE created_at >= NOW() - INTERVAL 24 HOUR;

-- Find orders with a specific status and minimum amount
SELECT id, user_id, status, total_amount
FROM orders
WHERE status = 'pending'
  AND total_amount > 100.00
ORDER BY created_at DESC
LIMIT 20;

JOINs — combining data from related tables

-- INNER JOIN — only rows with matching records in both tables
SELECT
    u.email,
    o.id AS order_id,
    o.status,
    o.total_amount
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE u.email = 'test@example.com';

-- LEFT JOIN — all users, with their orders (NULL if no orders)
SELECT
    u.email,
    COUNT(o.id) AS order_count,
    SUM(o.total_amount) AS total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active'
GROUP BY u.id, u.email
HAVING COUNT(o.id) > 0
ORDER BY total_spent DESC;

-- Multi-table JOIN — users, orders, and order_items
SELECT
    u.email,
    o.id AS order_id,
    oi.product_id,
    oi.quantity,
    oi.unit_price
FROM users u
INNER JOIN orders o ON u.id = o.user_id
INNER JOIN order_items oi ON o.id = oi.order_id
WHERE o.status = 'completed'
  AND o.created_at >= '2026-01-01';

Aggregation with GROUP BY and HAVING

-- Count orders by status — useful for verifying migration correctness
SELECT status, COUNT(*) AS count
FROM orders
GROUP BY status
ORDER BY count DESC;

-- Find products with more than 100 orders
SELECT product_id, COUNT(*) AS order_count
FROM order_items
GROUP BY product_id
HAVING COUNT(*) > 100;

-- Average order value by user type
SELECT
    u.user_type,
    ROUND(AVG(o.total_amount), 2) AS avg_order_value,
    COUNT(o.id) AS total_orders
FROM users u
INNER JOIN orders o ON u.id = o.user_id
GROUP BY u.user_type;

Data Validation Queries

These are the queries I run after API calls in my test suite to verify that the database state is correct.

-- 1. Verify row count before and after an INSERT
SELECT COUNT(*) AS user_count FROM users;
-- Run API call to create user
-- Then:
SELECT COUNT(*) AS user_count_after FROM users;
-- Expected: count_after = count + 1

-- 2. Verify exact values after an UPDATE
SELECT id, email, first_name, status, updated_at
FROM users
WHERE id = 42;

-- 3. Check for NULL values in required fields
SELECT COUNT(*) AS null_email_count
FROM users
WHERE email IS NULL;
-- Expected: 0

-- 4. Check for duplicate records (should return empty if UNIQUE constraint works)
SELECT email, COUNT(*) AS duplicates
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

-- 5. Verify foreign key integrity (orphaned order_items)
SELECT oi.id, oi.order_id
FROM order_items oi
LEFT JOIN orders o ON oi.order_id = o.id
WHERE o.id IS NULL;
-- Expected: empty result set

-- 6. Verify cascade delete worked correctly
-- After deleting user with ID 42:
SELECT COUNT(*) FROM orders WHERE user_id = 42;
SELECT COUNT(*) FROM sessions WHERE user_id = 42;
SELECT COUNT(*) FROM user_preferences WHERE user_id = 42;
-- All expected: 0

Schema Testing

Schema tests verify that the database structure matches the application's expectations. These are particularly valuable after migrations — ensuring a migration was applied correctly before running functional tests.

-- Check column definitions using INFORMATION_SCHEMA (MySQL & PostgreSQL)
SELECT
    column_name,
    data_type,
    character_maximum_length,
    is_nullable,
    column_default
FROM information_schema.columns
WHERE table_schema = 'myapp_db'
  AND table_name = 'users'
ORDER BY ordinal_position;

-- Verify a specific column exists with correct type
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'orders'
  AND column_name = 'total_amount';
-- Expected: numeric / decimal

-- Check NOT NULL constraints
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'users'
  AND is_nullable = 'NO';
-- Should include: id, email, created_at

-- Verify UNIQUE constraints exist
SELECT
    tc.constraint_name,
    kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
    ON tc.constraint_name = kcu.constraint_name
WHERE tc.table_name = 'users'
  AND tc.constraint_type = 'UNIQUE';

-- Verify foreign key constraints
SELECT
    kcu.column_name,
    kcu.referenced_table_name,
    kcu.referenced_column_name
FROM information_schema.key_column_usage kcu
JOIN information_schema.referential_constraints rc
    ON kcu.constraint_name = rc.constraint_name
WHERE kcu.table_name = 'orders';

MySQL-Specific Commands

When working with a MySQL database, these commands are essential for quick exploration and debugging.

-- List all tables in the current database
SHOW TABLES;

-- Show column definitions for a table (simpler than INFORMATION_SCHEMA for quick checks)
DESCRIBE users;
-- or
DESC orders;

-- Show the full CREATE TABLE statement including constraints and indexes
SHOW CREATE TABLE users\G

-- Show current indexes on a table
SHOW INDEX FROM users;

-- Analyze a query's execution plan
EXPLAIN SELECT * FROM orders WHERE user_id = 42 AND status = 'pending';

-- Extended EXPLAIN for more detail
EXPLAIN FORMAT=JSON SELECT u.email, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id\G

-- Check slow query log status
SHOW VARIABLES LIKE 'slow_query_log%';

-- Show running processes (useful in test environments to detect lock contention)
SHOW PROCESSLIST;

PostgreSQL-Specific Commands

-- psql meta-commands (inside psql shell)
\dt                     -- List all tables in current schema
\d users                -- Describe the users table (columns, types, constraints)
\di                     -- List all indexes
\df                     -- List all functions
\dn                     -- List all schemas

-- PostgreSQL-equivalent of DESCRIBE via SQL
SELECT
    column_name,
    data_type,
    character_maximum_length,
    is_nullable,
    column_default
FROM information_schema.columns
WHERE table_name = 'users'
ORDER BY ordinal_position;

-- List indexes via pg_catalog
SELECT
    indexname,
    indexdef
FROM pg_indexes
WHERE tablename = 'users';

-- Check table sizes
SELECT
    relname AS table_name,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

-- EXPLAIN ANALYZE — actual execution stats (runs the query)
EXPLAIN ANALYZE
SELECT u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.email;

-- Check active connections and locks
SELECT pid, state, query, wait_event_type, wait_event
FROM pg_stat_activity
WHERE state != 'idle';

Connecting from Python in Tests

pip install mysql-connector-python psycopg2-binary pytest

import pytest
import mysql.connector
import psycopg2
import os


# =========================================
# MySQL fixture
# =========================================
@pytest.fixture(scope="session")
def mysql_connection():
    """Session-scoped MySQL connection."""
    conn = mysql.connector.connect(
        host=os.environ.get("DB_HOST", "localhost"),
        port=int(os.environ.get("DB_PORT", 3306)),
        user=os.environ.get("DB_USER", "testuser"),
        password=os.environ.get("DB_PASSWORD", "testpassword"),
        database=os.environ.get("DB_NAME", "testdb")
    )
    yield conn
    conn.close()


@pytest.fixture(scope="session")
def pg_connection():
    """Session-scoped PostgreSQL connection."""
    conn = psycopg2.connect(
        host=os.environ.get("PG_HOST", "localhost"),
        port=int(os.environ.get("PG_PORT", 5432)),
        user=os.environ.get("PG_USER", "testuser"),
        password=os.environ.get("PG_PASSWORD", "testpassword"),
        dbname=os.environ.get("PG_NAME", "testdb")
    )
    conn.autocommit = False  # Use explicit transactions for test isolation
    yield conn
    conn.close()


# =========================================
# Test that verifies DB state after API call
# =========================================
import requests

def test_user_creation_persisted_in_db(mysql_connection):
    """Verify that creating a user via the API correctly persists data in the DB."""

    # Step 1 — Record the count before
    cursor = mysql_connection.cursor(dictionary=True)
    cursor.execute("SELECT COUNT(*) AS count FROM users WHERE email = 'apitest@example.com'")
    before_count = cursor.fetchone()["count"]
    assert before_count == 0, "Test precondition: user should not exist"

    # Step 2 — Make the API call
    response = requests.post(
        "http://localhost:8000/api/v1/users",
        json={
            "email": "apitest@example.com",
            "firstName": "API",
            "lastName": "Test",
            "password": "SecurePass123!"
        }
    )
    assert response.status_code == 201
    user_id = response.json()["id"]

    # Step 3 — Verify DB state
    cursor.execute(
        "SELECT id, email, first_name, last_name, status, created_at "
        "FROM users WHERE id = %s",
        (user_id,)
    )
    db_user = cursor.fetchone()

    assert db_user is not None, "User must exist in database"
    assert db_user["email"] == "apitest@example.com"
    assert db_user["first_name"] == "API"
    assert db_user["status"] == "active"
    assert db_user["created_at"] is not None

    # Step 4 — Cleanup
    cursor.execute("DELETE FROM users WHERE id = %s", (user_id,))
    mysql_connection.commit()
    cursor.close()

Test Data Setup and Teardown

The safest pattern for database test data management uses pytest fixtures with yield and guaranteed cleanup, or transaction rollback for complete isolation.

import pytest
import psycopg2


@pytest.fixture
def test_user(pg_connection):
    """Create a test user, yield its ID, delete it after the test."""
    cursor = pg_connection.cursor()

    cursor.execute("""
        INSERT INTO users (email, first_name, last_name, status, created_at)
        VALUES (%s, %s, %s, %s, NOW())
        RETURNING id
    """, ("fixture-user@test.com", "Fixture", "User", "active"))

    user_id = cursor.fetchone()[0]
    pg_connection.commit()

    yield user_id  # Test runs here

    # Teardown — runs regardless of test pass/fail
    cursor.execute("DELETE FROM users WHERE id = %s", (user_id,))
    pg_connection.commit()
    cursor.close()


@pytest.fixture
def db_transaction(pg_connection):
    """Wrap the entire test in a transaction that is always rolled back.
    The most reliable isolation pattern — leaves zero residue."""
    savepoint = "test_savepoint"
    cursor = pg_connection.cursor()
    cursor.execute(f"SAVEPOINT {savepoint}")
    cursor.close()

    yield pg_connection

    # Rollback to savepoint — undoes all changes made during the test
    cursor = pg_connection.cursor()
    cursor.execute(f"ROLLBACK TO SAVEPOINT {savepoint}")
    cursor.close()


def test_order_creation_with_rollback(db_transaction):
    """Test order creation — rolled back automatically, no cleanup needed."""
    cursor = db_transaction.cursor()

    cursor.execute("""
        INSERT INTO orders (user_id, status, total_amount, created_at)
        VALUES (1, 'pending', 99.99, NOW())
        RETURNING id
    """)
    order_id = cursor.fetchone()[0]
    assert order_id is not None

    # Verify the order exists within this transaction
    cursor.execute("SELECT status FROM orders WHERE id = %s", (order_id,))
    assert cursor.fetchone()[0] == "pending"

    # After test ends, db_transaction fixture rolls back — order never permanently saved
    cursor.close()

Data-Driven Testing from the Database

Instead of hardcoding test data in your test files, you can query test scenarios directly from a dedicated test data table in the database.

-- Create a test data table in your test database
CREATE TABLE test_login_scenarios (
    id          INT PRIMARY KEY AUTO_INCREMENT,
    username    VARCHAR(255) NOT NULL,
    password    VARCHAR(255) NOT NULL,
    expected_status INT NOT NULL,
    expected_role   VARCHAR(50),
    description VARCHAR(255)
);

INSERT INTO test_login_scenarios VALUES
(1, 'admin@example.com', 'Admin123!', 200, 'admin', 'Valid admin credentials'),
(2, 'user@example.com',  'User456!',  200, 'user',  'Valid standard user credentials'),
(3, 'locked@example.com','Pass789!',  403, NULL,    'Locked account attempt'),
(4, 'test@example.com',  'wrong',     401, NULL,    'Invalid password');

def get_login_scenarios(db_connection):
    """Load test scenarios from the database."""
    cursor = db_connection.cursor(dictionary=True)
    cursor.execute("SELECT * FROM test_login_scenarios ORDER BY id")
    scenarios = cursor.fetchall()
    cursor.close()
    return scenarios


@pytest.mark.parametrize("scenario", get_login_scenarios(get_db_conn()))
def test_login_scenarios_from_db(scenario):
    """Run login tests driven by DB test data."""
    response = requests.post(
        "http://localhost:8000/api/v1/auth/login",
        json={"email": scenario["username"], "password": scenario["password"]}
    )
    assert response.status_code == scenario["expected_status"], (
        f"Scenario '{scenario['description']}' failed: "
        f"expected {scenario['expected_status']}, got {response.status_code}"
    )
    if scenario["expected_role"]:
        assert response.json().get("role") == scenario["expected_role"]

Common Database Bugs QA Should Catch

From Experience at Virtusa: One of the most impactful testing practices I introduced was a post-action DB verification step in the API regression suite. Within the first sprint, it caught a bug where a user profile update API was updating the users table but silently failing to update the user_metadata table due to a missing transaction wrapper. The API returned HTTP 200, the unit tests passed, but the database had inconsistent state. Without the DB verification query, this would have reached production and caused silent data corruption for updated user profiles.

1. Duplicate records due to missing UNIQUE constraint

-- Test: submitting the same email twice should fail at the DB level
-- First insert — should succeed
INSERT INTO users (email, first_name, last_name)
VALUES ('duplicate@test.com', 'First', 'User');

-- Second insert — should fail with duplicate key error if UNIQUE exists
INSERT INTO users (email, first_name, last_name)
VALUES ('duplicate@test.com', 'Second', 'User');
-- If this succeeds, the UNIQUE constraint is missing

-- Verification query
SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1;

2. Incorrect cascade delete behavior

-- Setup
INSERT INTO users (id, email) VALUES (999, 'cascade-test@example.com');
INSERT INTO orders (user_id, status) VALUES (999, 'pending');
INSERT INTO sessions (user_id, token) VALUES (999, 'abc123');

-- Delete user
DELETE FROM users WHERE id = 999;

-- Verify cascade — all related records should be gone
SELECT COUNT(*) FROM orders WHERE user_id = 999;    -- Expected: 0
SELECT COUNT(*) FROM sessions WHERE user_id = 999;  -- Expected: 0

3. Timezone issues

-- Check what timezone the database server is using
SELECT @@global.time_zone, @@session.time_zone;  -- MySQL
SHOW TIMEZONE;                                    -- PostgreSQL

-- Verify timestamps are stored in UTC
SELECT id, created_at, CONVERT_TZ(created_at, @@session.time_zone, 'UTC') AS utc_time
FROM events
WHERE id = 42;

-- Check for events that appear to have future timestamps (common timezone bug)
SELECT COUNT(*) AS suspicious_events
FROM events
WHERE created_at > UTC_TIMESTAMP() + INTERVAL 1 HOUR;

4. Truncated strings

-- Check field length against VARCHAR constraint
SELECT
    column_name,
    character_maximum_length
FROM information_schema.columns
WHERE table_name = 'products'
  AND column_name = 'description';

-- Test with maximum + 1 character string
-- If strict mode is not enabled, MySQL silently truncates
-- PostgreSQL raises an error

-- Verify no data was truncated by checking string length
SELECT id, LENGTH(description) AS stored_length
FROM products
WHERE id = 999;
-- Compare against the length you submitted

Database Testing in CI — GitHub Actions with Docker

# .github/workflows/db-tests.yml
name: Database Tests

on: [push, pull_request]

jobs:
  db-test:
    runs-on: ubuntu-latest

    services:
      mysql:
        image: mysql:8.0
        env:
          MYSQL_ROOT_PASSWORD: rootpassword
          MYSQL_DATABASE: testdb
          MYSQL_USER: testuser
          MYSQL_PASSWORD: testpassword
        ports:
          - 3306:3306
        options: >-
          --health-cmd="mysqladmin ping"
          --health-interval=10s
          --health-timeout=5s
          --health-retries=5

      postgres:
        image: postgres:15
        env:
          POSTGRES_USER: testuser
          POSTGRES_PASSWORD: testpassword
          POSTGRES_DB: testdb
        ports:
          - 5432:5432
        options: >-
          --health-cmd="pg_isready"
          --health-interval=10s
          --health-timeout=5s
          --health-retries=5

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: pip install pytest mysql-connector-python psycopg2-binary requests

      - name: Apply DB schema
        run: |
          mysql -h 127.0.0.1 -u testuser -ptestpassword testdb < schema/mysql/schema.sql
          PGPASSWORD=testpassword psql -h 127.0.0.1 -U testuser testdb < schema/postgres/schema.sql

      - name: Run database tests
        env:
          DB_HOST: 127.0.0.1
          DB_USER: testuser
          DB_PASSWORD: testpassword
          DB_NAME: testdb
          PG_HOST: 127.0.0.1
          PG_USER: testuser
          PG_PASSWORD: testpassword
          PG_NAME: testdb
        run: pytest tests/database/ -v --tb=short

MySQL vs PostgreSQL vs SQLite for Testing

Feature	MySQL 8.0	PostgreSQL 15	SQLite 3
Primary Use Case	Web application backend, high-read workloads	Complex queries, data integrity, JSONB	Embedded, unit testing, file-based
ACID Compliance	Yes (InnoDB engine)	Yes (full)	Yes
INFORMATION_SCHEMA	Yes — standard SQL	Yes — plus pg_catalog	PRAGMA statements (different syntax)
Docker Image	`mysql:8.0`	`postgres:15`	No Docker needed — file or in-memory
Python Library	`mysql-connector-python`	`psycopg2`	`sqlite3` (stdlib)
Test Isolation	Transactions, savepoints	Transactions, savepoints, schemas	In-memory DB per test session
CI Setup Speed	~30s to start in Docker	~20s to start in Docker	Instant — no server
Best For QA	Testing against real production equivalent	Testing complex queries, constraints, JSONB	Fast unit test DB layer, ORM testing

Best Practices

1. Never test against the production database

This cannot be overstated. Production database testing means your test data becomes real data, your cleanup scripts can accidentally delete real records, and your queries can impact production performance. Always use dedicated test databases — either a separate environment database, a Docker container in CI, or an in-memory SQLite for fast unit tests.

2. Always use transactions for cleanup

Test fixtures should use either explicit DELETE cleanup or, better, transaction rollback to ensure no test data leaks between tests. Leaked test data is a primary cause of test order dependency and intermittent failures.

3. Verify both success paths and constraint violation paths

Schema tests should verify that constraints work correctly: attempting to insert a duplicate email should fail with a meaningful error, not silently succeed. Attempting to insert NULL into a NOT NULL column should raise a constraint violation. These negative paths are as important as the positive ones.

4. Use EXPLAIN for performance validation

Add EXPLAIN queries to your test suite for critical database operations. If a query execution plan shows a full table scan on a large table that should be using an index, that is a performance bug that belongs in your defect tracker.

From Experience at Viasat: At Viasat, satellite subscriber data was managed through a PostgreSQL database with complex relationships between subscribers, devices, service plans, and usage records. When the team migrated from PostgreSQL 12 to PostgreSQL 15, I built a database regression suite using pytest and psycopg2 that ran 60+ data validation queries before and after migration: verifying row counts, constraint integrity, index presence, foreign key validity, and critical business query results. The suite caught three data migration issues that would have been invisible to the API and UI tests — including a timezone conversion error in the usage_events table that would have caused billing calculation errors.

Back to Blog

From Experience — Virtusa: At Virtusa, I was embedded in Agile ceremonies from day one — sprint planning, daily standups, backlog refinement, and retrospectives. The most valuable QA contribution in sprint planning was consistently pushing back on stories with no acceptance criteria. A story without clear acceptance criteria is a story without a definition of done, and these invariably cause defect spikes in the final days of a sprint. Formalising a "no ticket enters sprint without AC" team norm eliminated an entire class of end-of-sprint scrambles.