Back to All Articles
Manual Testing

Exploratory Testing — Session-Based QA Guide

Honnesh Muppala May 5, 2026 13 min read

What is Exploratory Testing?

Exploratory testing is a simultaneous process of learning about the system under test, designing tests on the fly, and executing them — all in the same cognitive activity. It was defined by Cem Kaner as "a style of software testing that emphasises the personal freedom and responsibility of the individual tester to continually optimise the quality of his/her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project."

This is not ad-hoc testing. Ad-hoc has no structure, no accountability, and no documentation. Exploratory testing is structured — it uses charters, time-boxing, session notes, and debriefs. The difference is that test design happens during execution rather than weeks before, guided by what the tester learns in real time.

Exploratory testing fills the gaps that scripted test cases cannot reach. Scripts cover known risks. Exploratory testing surfaces the unknown unknowns — the bugs nobody thought to write a test case for because nobody imagined them possible.

When I tested Alexa device firmware at Amazon, scripted regression covered the happy paths and known edge cases. Exploratory sessions consistently surfaced the interactions nobody had thought to specify: what happens when you interrupt an in-progress Alexa response with another wake word? What does the device do when Wi-Fi drops during active audio playback? Those are the questions exploratory testing answers.

Scripted vs Exploratory vs Ad-hoc Testing

These three approaches are often conflated. They sit at very different points on the structure spectrum:

The key distinction: Scripted tests tell you if the system does what you expected. Exploratory tests tell you if the system does what you didn't expect. Both are necessary. At Virtusa, our mobile test strategy used scripted regression for release sign-off and exploratory sessions in the days before release to catch the issues the scripts missed. That combination consistently delivered higher bug-find rates than either approach alone.

Testing Approach Layers — Architecture Diagram

The flow below shows how a session-based exploratory test moves from requirements to a risk map:

Requirements
Charter Design
Explore Session
Observations
Bugs Found
Session Notes
Debrief
Risk Map

Each box represents a deliberate phase. The charter focuses the session. The explore phase generates raw data. Observations, bugs, and notes are captured in parallel. The debrief synthesises findings. The risk map records what areas still need attention — it becomes the input for the next session's charters.

Session-Based Test Management (SBTM)

Session-Based Test Management (SBTM) was developed by Jonathan and James Bach to bring accountability and metrics to exploratory testing. It addresses the core criticism of exploratory testing — "you can't measure it, you can't manage it" — by structuring sessions around three pillars:

  1. The charter — a focused mission for the session. What area are you exploring? What resources do you have? What questions are you trying to answer?
  2. Time-boxing — sessions run for a fixed duration, typically 60–90 minutes. This prevents open-ended wandering and creates a predictable unit of work that managers can plan around.
  3. The debrief — a structured conversation between the tester and a test lead at the end of the session. It covers what was tested, what bugs were found, what risks remain, and what should be chartered next.

SBTM transforms exploratory testing from an invisible activity into a measurable one. You can track sessions per area, bugs per session, coverage against charters, and debrief-derived risk areas. This gives management the visibility they need while preserving the cognitive freedom that makes exploratory testing effective.

A typical SBTM session structure: 15 minutes setup, 60–70 minutes exploration, 15 minutes documentation, 20 minutes debrief. Total: 2 hours per session including debrief. A tester might run 2–3 focused sessions per day.

Writing Test Charters

The charter template: "Explore [area] with [resources/tools] to discover [information]."

Charters should be specific enough to focus attention but open enough to allow discovery. A charter that specifies every step is a test script. A charter that says "test the login page" is too vague.

Five real charter examples across different feature types:

  1. Authentication: "Explore the password reset flow with valid and expired tokens using a mobile browser to discover any state management issues or UI inconsistencies."
  2. E-commerce checkout: "Explore the checkout flow with multiple items, mixed in-stock and out-of-stock, using promo codes to discover how the system handles partial fulfilment and pricing edge cases."
  3. Push notifications (mobile): "Explore notification delivery using background app state, low battery mode, and Do Not Disturb settings on Android 13 to discover any notification suppression or duplicate delivery issues."
  4. API contract: "Explore the /orders endpoint with malformed JSON payloads, missing required fields, and oversized request bodies using Postman to discover how the API handles bad input — status codes, error messages, and server stability."
  5. Alexa skill: "Explore Alexa multi-turn conversation handling with rapid follow-up intents during network latency to discover timeout behaviour, partial response delivery, and session context retention."
From Amazon: When testing Alexa-enabled devices, I used charter-based exploration specifically for network interruption scenarios. The device's behaviour during poor Wi-Fi — partial audio responses, queued commands, wake-word sensitivity changes — was too complex and context-dependent to fully cover with scripted tests. Charters like "explore device behaviour during Wi-Fi reconnection events across different Alexa command types" surfaced three separate bugs in a single 75-minute session that our 200+ automated tests had never caught.

Test Heuristics

Heuristics are mental models that guide exploratory testers toward fertile areas. They do not tell you what to test — they remind you what to think about.

SFDPOT (San Francisco Depot)

Developed by James Bach, SFDPOT covers six testing dimensions:

HICCUPS

HICCUPS is a heuristic for checking common failure areas: Hardware, Interruptions, Configurations, Capacity, Users, Platforms, Software interactions. Particularly useful for embedded and IoT device testing where environmental factors dominate failure modes.

Goldilocks

Test with values that are too small, too large, and just right. For text fields: 0 characters, max allowed characters, max+1 characters, exactly 1 character, exactly the limit, and SQL injection attempts. For numbers: -1, 0, 1, max integer, max+1, floats when integers are expected.

Running an Exploratory Session — Step by Step

A well-run session has five distinct phases with specific time allocations for a 90-minute session:

  1. Setup (10 min): Review the charter. Open the environment. Prepare your screen recorder (start it now, not after you find something interesting). Open a notes document — Markdown works well. Set a timer for 60 minutes of exploration.
  2. Explore (60 min): Follow the charter. Begin with the most critical area first — if you run out of time, you will have covered the highest risk. As you explore, think aloud (this is especially valuable in paired sessions). Follow interesting threads but note when you are deviating from the charter so you can charter that thread separately.
  3. Observe (ongoing during exploration): Notice unexpected behaviour, slow responses, UI glitches, inconsistent error messages, data that does not persist, interactions between features. Note everything — even things you are not sure are bugs.
  4. Document (10 min): Immediately after exploration ends, write up your session report while it is fresh. Record: what you explored, what you found, what you did not get to, what questions remain.
  5. Debrief (10–20 min): Meet with your test lead or a peer. Walk through your notes. Determine which observations are bugs, which need more investigation, and which should become new charters. The debrief is where individual findings become team knowledge.

Bug Bash Facilitation

A bug bash is a structured group exploratory testing event where the entire team — including developers, product managers, designers, and sometimes customer support — tests the product simultaneously for a defined period. It is one of the most effective ways to surface a wide range of bugs in a short time.

Setting up a bug bash

Inviting non-QA participants

Non-QA participants find bugs that QA misses because they use the product differently. Developers test the code they just wrote — their assumptions are the same as the implementation. Product managers test against their own mental model of the feature — surfacing requirement interpretation gaps. Customer support tests the scenarios they know break (from customer tickets) — surfacing known pain points.

Brief non-QA participants on how to file bugs: steps to reproduce, expected vs actual, screenshot. Keep the filing process lightweight — if it is too formal, participants stop reporting.

Charter cards for non-QA participants

Hand each participant a charter card: a physical or digital card with a specific area to explore. This prevents 10 people all testing the same login page. Sample charters for a checkout feature bug bash: payment flow with international cards, order editing after placement, email confirmation accuracy, mobile cart behaviour on slow connection, accessibility keyboard navigation through checkout.

Debrief and de-duplication

End the bug bash with a 30-minute group debrief. Each participant shares their top 3 findings. QA leads de-duplicate: compare reports, merge duplicates, assign severity. Within 24 hours, every filed bug should be triaged and either accepted or rejected with a comment. Fast triage maintains participant morale for future bug bashes.

From Viasat: During network equipment testing, we ran quarterly bug bashes that included network operations engineers alongside QA. The operations team's knowledge of real-world network conditions — their intuitions about what traffic patterns stress the equipment — surfaced failure modes in simulated throughput degradation that purely functional QA had never found. Cross-functional bug bashes consistently deliver higher value than QA-only sessions.

Tools for Exploratory Testing

Exploratory testing is deliberately low-tool — the primary tool is the tester's mind. But a few utilities make sessions significantly more effective:

Documenting Findings — The Session Report

Every exploratory session should produce a session report. This is the accountability mechanism that distinguishes exploratory testing from ad-hoc. The report does not need to be long — a well-structured half-page is sufficient.

Session report template:

Tester credibility is built on documentation quality. A tester who files 10 well-documented bugs beats one who files 30 poorly-described bugs every time. Developers can act on clear reports; vague reports get rejected or ignored.

Paired Exploratory Testing

Paired exploratory testing — two testers working together on the same session — delivers disproportionate value relative to the resource cost.

The driver/observer model works as follows: the driver operates the keyboard and mouse, making decisions about what to test next. The observer watches, takes notes, and verbalises observations — "did you notice the loading spinner disappeared before the data appeared?" The roles swap every 20–30 minutes.

Benefits of pairing:

Pairing is especially valuable when testing new feature areas where neither tester has deep context, or when testing high-stakes areas before a major release.

When to Use Exploratory Testing

Exploratory testing delivers the most value in specific situations:

Metrics for Exploratory Testing

SBTM provides the framework for tracking exploratory testing metrics at the session level:

Scripted vs Exploratory vs Ad-hoc Testing

Dimension Scripted Testing Exploratory Testing Ad-hoc Testing
Structure High — pre-written steps and expected results Medium — charter, time-box, debrief None — click and hope
Documentation Full — test cases, execution logs, pass/fail records Session notes, session reports, debrief notes None — findings may not be recorded
Repeatability High — same tester or different tester, same result Medium — charter is repeatable; approach varies Low — hard to reproduce exactly
Bug-finding rate Low in mature areas; high for regression High — especially for edge cases and unknown unknowns Variable — depends entirely on tester luck and intuition
Skill required Low-Medium — follow the script High — heuristics, system knowledge, adaptability Low — no formal method required
Best for Regression, compliance, audit-ready testing New features, pre-release, complex interactions Quick sanity checks, personal exploration

Best Practices for Exploratory Testing


Back to Blog
From Experience — Amazon: At Amazon's Device OS team, the bar for defect reports was exceptionally high. Every bug required the exact build number, reproduction rate (e.g. "3 out of 5 attempts"), full environment configuration, and a video wherever possible. It felt like overhead at first — but it meant any developer could pick up a ticket and reproduce the issue immediately without back-and-forth. During Alexa and Echo release cycles, this discipline directly reduced the triage loop from days to hours. A well-written bug report is not documentation overhead — it is the fastest path to a fix.