Why QA Engineers Need Database Skills
There is a common failure mode in software testing: a test checks the API response, the response looks correct, the test passes — and nobody notices that the database contains incorrect, corrupted, or incomplete data. The UI shows one thing; the database has another. This gap between presentation-layer verification and data-layer verification is where some of the most serious production bugs live.
Consider these real scenarios that UI-only testing misses:
- Partial writes: An order creation API returns HTTP 200 with an order ID, but due to a bug the order_items table was never populated. The order exists; the items don't.
- Missing cascade deletes: A user account deletion API deletes the user row but leaves orphaned records in sessions, preferences, and audit log tables — violating data integrity and leaking PII.
- Timezone corruption: An event created at 10:00 AM IST is stored as 10:00 AM UTC, making it appear 5.5 hours later than intended. The API returns the stored timestamp; the UI displays it — both appear correct in isolation.
- Duplicate records: A race condition in a high-traffic endpoint creates duplicate user records despite the UI showing only one. Without a UNIQUE constraint check, this goes undetected until downstream processes break on the duplicates.
- Data truncation: A product description field is VARCHAR(255) but the application allows submissions up to 500 characters. The first 255 characters are saved silently; the truncated remainder is lost. The UI shows the saved version; no error is raised.
A QA engineer who can write SQL queries and connect to a database can verify not just that the application says data was saved correctly, but that it actually was.
Architecture Overview
Understanding the layers between a test action and database storage helps identify where to query for verification and where bugs are likely to originate.
(API call / UI interaction)
(backend service)
(SQLAlchemy / Hibernate / raw SQL)
(MySQL / PostgreSQL)
(NOT NULL / FK / UNIQUE)
Bugs can originate at any layer: the application logic might compute the wrong value before persisting it; the ORM might generate an incorrect query; the database engine might enforce a constraint that the application doesn't handle gracefully. DB-level testing lets you bypass the application layer and verify the data state directly.
SQL Essentials for QA
You do not need to be a database administrator. You need to be fluent in read queries and basic data manipulation for test setup. Here are the SQL skills that cover 90% of QA database work.
SELECT with WHERE and comparison operators
-- Find a specific user
SELECT id, email, status, created_at
FROM users
WHERE email = 'test@example.com';
-- Find users created in the last 24 hours
SELECT id, email, created_at
FROM users
WHERE created_at >= NOW() - INTERVAL 24 HOUR;
-- Find orders with a specific status and minimum amount
SELECT id, user_id, status, total_amount
FROM orders
WHERE status = 'pending'
AND total_amount > 100.00
ORDER BY created_at DESC
LIMIT 20;
JOINs — combining data from related tables
-- INNER JOIN — only rows with matching records in both tables
SELECT
u.email,
o.id AS order_id,
o.status,
o.total_amount
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE u.email = 'test@example.com';
-- LEFT JOIN — all users, with their orders (NULL if no orders)
SELECT
u.email,
COUNT(o.id) AS order_count,
SUM(o.total_amount) AS total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active'
GROUP BY u.id, u.email
HAVING COUNT(o.id) > 0
ORDER BY total_spent DESC;
-- Multi-table JOIN — users, orders, and order_items
SELECT
u.email,
o.id AS order_id,
oi.product_id,
oi.quantity,
oi.unit_price
FROM users u
INNER JOIN orders o ON u.id = o.user_id
INNER JOIN order_items oi ON o.id = oi.order_id
WHERE o.status = 'completed'
AND o.created_at >= '2026-01-01';
Aggregation with GROUP BY and HAVING
-- Count orders by status — useful for verifying migration correctness
SELECT status, COUNT(*) AS count
FROM orders
GROUP BY status
ORDER BY count DESC;
-- Find products with more than 100 orders
SELECT product_id, COUNT(*) AS order_count
FROM order_items
GROUP BY product_id
HAVING COUNT(*) > 100;
-- Average order value by user type
SELECT
u.user_type,
ROUND(AVG(o.total_amount), 2) AS avg_order_value,
COUNT(o.id) AS total_orders
FROM users u
INNER JOIN orders o ON u.id = o.user_id
GROUP BY u.user_type;
Data Validation Queries
These are the queries I run after API calls in my test suite to verify that the database state is correct.
-- 1. Verify row count before and after an INSERT
SELECT COUNT(*) AS user_count FROM users;
-- Run API call to create user
-- Then:
SELECT COUNT(*) AS user_count_after FROM users;
-- Expected: count_after = count + 1
-- 2. Verify exact values after an UPDATE
SELECT id, email, first_name, status, updated_at
FROM users
WHERE id = 42;
-- 3. Check for NULL values in required fields
SELECT COUNT(*) AS null_email_count
FROM users
WHERE email IS NULL;
-- Expected: 0
-- 4. Check for duplicate records (should return empty if UNIQUE constraint works)
SELECT email, COUNT(*) AS duplicates
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
-- 5. Verify foreign key integrity (orphaned order_items)
SELECT oi.id, oi.order_id
FROM order_items oi
LEFT JOIN orders o ON oi.order_id = o.id
WHERE o.id IS NULL;
-- Expected: empty result set
-- 6. Verify cascade delete worked correctly
-- After deleting user with ID 42:
SELECT COUNT(*) FROM orders WHERE user_id = 42;
SELECT COUNT(*) FROM sessions WHERE user_id = 42;
SELECT COUNT(*) FROM user_preferences WHERE user_id = 42;
-- All expected: 0
Schema Testing
Schema tests verify that the database structure matches the application's expectations. These are particularly valuable after migrations — ensuring a migration was applied correctly before running functional tests.
-- Check column definitions using INFORMATION_SCHEMA (MySQL & PostgreSQL)
SELECT
column_name,
data_type,
character_maximum_length,
is_nullable,
column_default
FROM information_schema.columns
WHERE table_schema = 'myapp_db'
AND table_name = 'users'
ORDER BY ordinal_position;
-- Verify a specific column exists with correct type
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'orders'
AND column_name = 'total_amount';
-- Expected: numeric / decimal
-- Check NOT NULL constraints
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'users'
AND is_nullable = 'NO';
-- Should include: id, email, created_at
-- Verify UNIQUE constraints exist
SELECT
tc.constraint_name,
kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
WHERE tc.table_name = 'users'
AND tc.constraint_type = 'UNIQUE';
-- Verify foreign key constraints
SELECT
kcu.column_name,
kcu.referenced_table_name,
kcu.referenced_column_name
FROM information_schema.key_column_usage kcu
JOIN information_schema.referential_constraints rc
ON kcu.constraint_name = rc.constraint_name
WHERE kcu.table_name = 'orders';
MySQL-Specific Commands
When working with a MySQL database, these commands are essential for quick exploration and debugging.
-- List all tables in the current database
SHOW TABLES;
-- Show column definitions for a table (simpler than INFORMATION_SCHEMA for quick checks)
DESCRIBE users;
-- or
DESC orders;
-- Show the full CREATE TABLE statement including constraints and indexes
SHOW CREATE TABLE users\G
-- Show current indexes on a table
SHOW INDEX FROM users;
-- Analyze a query's execution plan
EXPLAIN SELECT * FROM orders WHERE user_id = 42 AND status = 'pending';
-- Extended EXPLAIN for more detail
EXPLAIN FORMAT=JSON SELECT u.email, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id\G
-- Check slow query log status
SHOW VARIABLES LIKE 'slow_query_log%';
-- Show running processes (useful in test environments to detect lock contention)
SHOW PROCESSLIST;
PostgreSQL-Specific Commands
-- psql meta-commands (inside psql shell)
\dt -- List all tables in current schema
\d users -- Describe the users table (columns, types, constraints)
\di -- List all indexes
\df -- List all functions
\dn -- List all schemas
-- PostgreSQL-equivalent of DESCRIBE via SQL
SELECT
column_name,
data_type,
character_maximum_length,
is_nullable,
column_default
FROM information_schema.columns
WHERE table_name = 'users'
ORDER BY ordinal_position;
-- List indexes via pg_catalog
SELECT
indexname,
indexdef
FROM pg_indexes
WHERE tablename = 'users';
-- Check table sizes
SELECT
relname AS table_name,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;
-- EXPLAIN ANALYZE — actual execution stats (runs the query)
EXPLAIN ANALYZE
SELECT u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.email;
-- Check active connections and locks
SELECT pid, state, query, wait_event_type, wait_event
FROM pg_stat_activity
WHERE state != 'idle';
Connecting from Python in Tests
pip install mysql-connector-python psycopg2-binary pytest
import pytest
import mysql.connector
import psycopg2
import os
# =========================================
# MySQL fixture
# =========================================
@pytest.fixture(scope="session")
def mysql_connection():
"""Session-scoped MySQL connection."""
conn = mysql.connector.connect(
host=os.environ.get("DB_HOST", "localhost"),
port=int(os.environ.get("DB_PORT", 3306)),
user=os.environ.get("DB_USER", "testuser"),
password=os.environ.get("DB_PASSWORD", "testpassword"),
database=os.environ.get("DB_NAME", "testdb")
)
yield conn
conn.close()
@pytest.fixture(scope="session")
def pg_connection():
"""Session-scoped PostgreSQL connection."""
conn = psycopg2.connect(
host=os.environ.get("PG_HOST", "localhost"),
port=int(os.environ.get("PG_PORT", 5432)),
user=os.environ.get("PG_USER", "testuser"),
password=os.environ.get("PG_PASSWORD", "testpassword"),
dbname=os.environ.get("PG_NAME", "testdb")
)
conn.autocommit = False # Use explicit transactions for test isolation
yield conn
conn.close()
# =========================================
# Test that verifies DB state after API call
# =========================================
import requests
def test_user_creation_persisted_in_db(mysql_connection):
"""Verify that creating a user via the API correctly persists data in the DB."""
# Step 1 — Record the count before
cursor = mysql_connection.cursor(dictionary=True)
cursor.execute("SELECT COUNT(*) AS count FROM users WHERE email = 'apitest@example.com'")
before_count = cursor.fetchone()["count"]
assert before_count == 0, "Test precondition: user should not exist"
# Step 2 — Make the API call
response = requests.post(
"http://localhost:8000/api/v1/users",
json={
"email": "apitest@example.com",
"firstName": "API",
"lastName": "Test",
"password": "SecurePass123!"
}
)
assert response.status_code == 201
user_id = response.json()["id"]
# Step 3 — Verify DB state
cursor.execute(
"SELECT id, email, first_name, last_name, status, created_at "
"FROM users WHERE id = %s",
(user_id,)
)
db_user = cursor.fetchone()
assert db_user is not None, "User must exist in database"
assert db_user["email"] == "apitest@example.com"
assert db_user["first_name"] == "API"
assert db_user["status"] == "active"
assert db_user["created_at"] is not None
# Step 4 — Cleanup
cursor.execute("DELETE FROM users WHERE id = %s", (user_id,))
mysql_connection.commit()
cursor.close()
Test Data Setup and Teardown
The safest pattern for database test data management uses pytest fixtures with yield and guaranteed cleanup, or transaction rollback for complete isolation.
import pytest
import psycopg2
@pytest.fixture
def test_user(pg_connection):
"""Create a test user, yield its ID, delete it after the test."""
cursor = pg_connection.cursor()
cursor.execute("""
INSERT INTO users (email, first_name, last_name, status, created_at)
VALUES (%s, %s, %s, %s, NOW())
RETURNING id
""", ("fixture-user@test.com", "Fixture", "User", "active"))
user_id = cursor.fetchone()[0]
pg_connection.commit()
yield user_id # Test runs here
# Teardown — runs regardless of test pass/fail
cursor.execute("DELETE FROM users WHERE id = %s", (user_id,))
pg_connection.commit()
cursor.close()
@pytest.fixture
def db_transaction(pg_connection):
"""Wrap the entire test in a transaction that is always rolled back.
The most reliable isolation pattern — leaves zero residue."""
savepoint = "test_savepoint"
cursor = pg_connection.cursor()
cursor.execute(f"SAVEPOINT {savepoint}")
cursor.close()
yield pg_connection
# Rollback to savepoint — undoes all changes made during the test
cursor = pg_connection.cursor()
cursor.execute(f"ROLLBACK TO SAVEPOINT {savepoint}")
cursor.close()
def test_order_creation_with_rollback(db_transaction):
"""Test order creation — rolled back automatically, no cleanup needed."""
cursor = db_transaction.cursor()
cursor.execute("""
INSERT INTO orders (user_id, status, total_amount, created_at)
VALUES (1, 'pending', 99.99, NOW())
RETURNING id
""")
order_id = cursor.fetchone()[0]
assert order_id is not None
# Verify the order exists within this transaction
cursor.execute("SELECT status FROM orders WHERE id = %s", (order_id,))
assert cursor.fetchone()[0] == "pending"
# After test ends, db_transaction fixture rolls back — order never permanently saved
cursor.close()
Data-Driven Testing from the Database
Instead of hardcoding test data in your test files, you can query test scenarios directly from a dedicated test data table in the database.
-- Create a test data table in your test database
CREATE TABLE test_login_scenarios (
id INT PRIMARY KEY AUTO_INCREMENT,
username VARCHAR(255) NOT NULL,
password VARCHAR(255) NOT NULL,
expected_status INT NOT NULL,
expected_role VARCHAR(50),
description VARCHAR(255)
);
INSERT INTO test_login_scenarios VALUES
(1, 'admin@example.com', 'Admin123!', 200, 'admin', 'Valid admin credentials'),
(2, 'user@example.com', 'User456!', 200, 'user', 'Valid standard user credentials'),
(3, 'locked@example.com','Pass789!', 403, NULL, 'Locked account attempt'),
(4, 'test@example.com', 'wrong', 401, NULL, 'Invalid password');
def get_login_scenarios(db_connection):
"""Load test scenarios from the database."""
cursor = db_connection.cursor(dictionary=True)
cursor.execute("SELECT * FROM test_login_scenarios ORDER BY id")
scenarios = cursor.fetchall()
cursor.close()
return scenarios
@pytest.mark.parametrize("scenario", get_login_scenarios(get_db_conn()))
def test_login_scenarios_from_db(scenario):
"""Run login tests driven by DB test data."""
response = requests.post(
"http://localhost:8000/api/v1/auth/login",
json={"email": scenario["username"], "password": scenario["password"]}
)
assert response.status_code == scenario["expected_status"], (
f"Scenario '{scenario['description']}' failed: "
f"expected {scenario['expected_status']}, got {response.status_code}"
)
if scenario["expected_role"]:
assert response.json().get("role") == scenario["expected_role"]
Common Database Bugs QA Should Catch
1. Duplicate records due to missing UNIQUE constraint
-- Test: submitting the same email twice should fail at the DB level
-- First insert — should succeed
INSERT INTO users (email, first_name, last_name)
VALUES ('duplicate@test.com', 'First', 'User');
-- Second insert — should fail with duplicate key error if UNIQUE exists
INSERT INTO users (email, first_name, last_name)
VALUES ('duplicate@test.com', 'Second', 'User');
-- If this succeeds, the UNIQUE constraint is missing
-- Verification query
SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1;
2. Incorrect cascade delete behavior
-- Setup
INSERT INTO users (id, email) VALUES (999, 'cascade-test@example.com');
INSERT INTO orders (user_id, status) VALUES (999, 'pending');
INSERT INTO sessions (user_id, token) VALUES (999, 'abc123');
-- Delete user
DELETE FROM users WHERE id = 999;
-- Verify cascade — all related records should be gone
SELECT COUNT(*) FROM orders WHERE user_id = 999; -- Expected: 0
SELECT COUNT(*) FROM sessions WHERE user_id = 999; -- Expected: 0
3. Timezone issues
-- Check what timezone the database server is using
SELECT @@global.time_zone, @@session.time_zone; -- MySQL
SHOW TIMEZONE; -- PostgreSQL
-- Verify timestamps are stored in UTC
SELECT id, created_at, CONVERT_TZ(created_at, @@session.time_zone, 'UTC') AS utc_time
FROM events
WHERE id = 42;
-- Check for events that appear to have future timestamps (common timezone bug)
SELECT COUNT(*) AS suspicious_events
FROM events
WHERE created_at > UTC_TIMESTAMP() + INTERVAL 1 HOUR;
4. Truncated strings
-- Check field length against VARCHAR constraint
SELECT
column_name,
character_maximum_length
FROM information_schema.columns
WHERE table_name = 'products'
AND column_name = 'description';
-- Test with maximum + 1 character string
-- If strict mode is not enabled, MySQL silently truncates
-- PostgreSQL raises an error
-- Verify no data was truncated by checking string length
SELECT id, LENGTH(description) AS stored_length
FROM products
WHERE id = 999;
-- Compare against the length you submitted
Database Testing in CI — GitHub Actions with Docker
# .github/workflows/db-tests.yml
name: Database Tests
on: [push, pull_request]
jobs:
db-test:
runs-on: ubuntu-latest
services:
mysql:
image: mysql:8.0
env:
MYSQL_ROOT_PASSWORD: rootpassword
MYSQL_DATABASE: testdb
MYSQL_USER: testuser
MYSQL_PASSWORD: testpassword
ports:
- 3306:3306
options: >-
--health-cmd="mysqladmin ping"
--health-interval=10s
--health-timeout=5s
--health-retries=5
postgres:
image: postgres:15
env:
POSTGRES_USER: testuser
POSTGRES_PASSWORD: testpassword
POSTGRES_DB: testdb
ports:
- 5432:5432
options: >-
--health-cmd="pg_isready"
--health-interval=10s
--health-timeout=5s
--health-retries=5
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: pip install pytest mysql-connector-python psycopg2-binary requests
- name: Apply DB schema
run: |
mysql -h 127.0.0.1 -u testuser -ptestpassword testdb < schema/mysql/schema.sql
PGPASSWORD=testpassword psql -h 127.0.0.1 -U testuser testdb < schema/postgres/schema.sql
- name: Run database tests
env:
DB_HOST: 127.0.0.1
DB_USER: testuser
DB_PASSWORD: testpassword
DB_NAME: testdb
PG_HOST: 127.0.0.1
PG_USER: testuser
PG_PASSWORD: testpassword
PG_NAME: testdb
run: pytest tests/database/ -v --tb=short
MySQL vs PostgreSQL vs SQLite for Testing
| Feature | MySQL 8.0 | PostgreSQL 15 | SQLite 3 |
|---|---|---|---|
| Primary Use Case | Web application backend, high-read workloads | Complex queries, data integrity, JSONB | Embedded, unit testing, file-based |
| ACID Compliance | Yes (InnoDB engine) | Yes (full) | Yes |
| INFORMATION_SCHEMA | Yes — standard SQL | Yes — plus pg_catalog | PRAGMA statements (different syntax) |
| Docker Image | mysql:8.0 |
postgres:15 |
No Docker needed — file or in-memory |
| Python Library | mysql-connector-python |
psycopg2 |
sqlite3 (stdlib) |
| Test Isolation | Transactions, savepoints | Transactions, savepoints, schemas | In-memory DB per test session |
| CI Setup Speed | ~30s to start in Docker | ~20s to start in Docker | Instant — no server |
| Best For QA | Testing against real production equivalent | Testing complex queries, constraints, JSONB | Fast unit test DB layer, ORM testing |
Best Practices
1. Never test against the production database
This cannot be overstated. Production database testing means your test data becomes real data, your cleanup scripts can accidentally delete real records, and your queries can impact production performance. Always use dedicated test databases — either a separate environment database, a Docker container in CI, or an in-memory SQLite for fast unit tests.
2. Always use transactions for cleanup
Test fixtures should use either explicit DELETE cleanup or, better, transaction rollback to ensure no test data leaks between tests. Leaked test data is a primary cause of test order dependency and intermittent failures.
3. Verify both success paths and constraint violation paths
Schema tests should verify that constraints work correctly: attempting to insert a duplicate email should fail with a meaningful error, not silently succeed. Attempting to insert NULL into a NOT NULL column should raise a constraint violation. These negative paths are as important as the positive ones.
4. Use EXPLAIN for performance validation
Add EXPLAIN queries to your test suite for critical database operations. If a query execution plan shows a full table scan on a large table that should be using an index, that is a performance bug that belongs in your defect tracker.
Back to Blog