Architecture¶
django-query-doctor is built around a four-stage pipeline: Intercept, Fingerprint, Analyze, Report. Every query that passes through the pipeline is captured, normalized, analyzed for optimization issues, and reported with actionable prescriptions.
High-Level Overview¶
flowchart TD
subgraph Entry["Entry Points"]
MW["Middleware"]
CM["Context Manager"]
DEC["Decorator"]
MC["Management Commands"]
PP["Pytest Plugin"]
end
Entry --> INT["Interceptor\n(execute_wrapper)"]
INT --> FP["Fingerprinter\n(normalize ā SHA-256)"]
FP --> TURBO_CHECK{"QueryTurbo\nenabled?"}
TURBO_CHECK -- No --> AR
TURBO_CHECK -- Yes --> CACHE_LOOKUP["Cache Lookup\n(fingerprint key)"]
CACHE_LOOKUP --> MISS{"Cache\nResult?"}
MISS -- Miss --> COMPILE["Full as_sql()\ncompilation"]
COMPILE --> STORE["Store in LRU Cache\n(UNTRUSTED)"]
STORE --> AR
MISS -- "Hit (UNTRUSTED)" --> VALIDATE["Validate cached SQL\nvs fresh as_sql()"]
VALIDATE -- Match --> PROMOTE["Increment validated_count\nā TRUSTED after threshold"]
VALIDATE -- Mismatch --> POISON["POISONED\n(permanent blacklist)"]
POISON --> COMPILE
PROMOTE --> AR
MISS -- "Hit (TRUSTED)" --> EXTRACT["Extract params\nfrom Query tree\n(skip as_sql)"]
EXTRACT --> PREP{"Prepared\nstatement?"}
PREP -- Yes --> PS["Execute with\nprepare=True"]
PREP -- No --> EXEC["Normal execute"]
PS --> AR
EXEC --> AR
MISS -- "Hit (POISONED)" --> COMPILE
AR["Analyzer Registry"] --> RR["Reporter Registry"]
subgraph Output["Output Formats"]
CON["Console (Rich)"]
JSON["JSON"]
HTML["HTML Dashboard"]
LOG["Python Logging"]
OTEL["OpenTelemetry"]
end
RR --> Output
Each entry point activates the same core pipeline. The only difference is how and when the pipeline is triggered:
| Entry Point | Scope | Use Case |
|---|---|---|
QueryDoctorMiddleware |
Per HTTP request | Development and staging servers |
diagnose_queries() context manager |
Arbitrary code block | Testing, scripts, ad-hoc analysis |
@diagnose decorator |
Single function/method | View-level analysis |
diagnose_queries management command |
Entire URL set or view list | Full project scan, CI/CD |
| Pytest plugin | Per test or test suite | Automated regression detection |
Stage 1: Intercept¶
The interceptor captures every SQL query executed within its scope using Django's connection.execute_wrapper() API.
Why execute_wrapper?¶
Django provides connection.execute_wrapper() as a public, stable API for wrapping database cursor execution. Unlike monkey-patching or reading connection.queries, this approach:
- Works with
DEBUG=False(production-safe) - Is officially supported and documented by Django
- Composes cleanly with other wrappers (e.g., Sentry, OpenTelemetry)
- Does not modify global state
See Background & Design for a detailed comparison of approaches.
What Gets Captured¶
For every SQL query, the interceptor records:
| Field | Type | Description |
|---|---|---|
sql |
str |
The raw SQL string with parameter placeholders |
params |
tuple |
The bound parameters for this query |
start_time |
float |
time.perf_counter() timestamp before execution |
end_time |
float |
time.perf_counter() timestamp after execution |
duration_ms |
float |
(end_time - start_time) * 1000 |
stack_trace |
list[FrameInfo] |
Filtered call stack leading to user code |
connection_alias |
str |
The database alias (e.g., "default") |
Stack Trace Filtering¶
Raw traceback.extract_stack() output includes many frames from Django internals, database drivers, and django-query-doctor itself. The stack_tracer module filters these down to only the frames that originate in user code:
# stack_tracer.py filters out frames from:
IGNORED_MODULES = [
"django/",
"query_doctor/",
"rest_framework/",
"site-packages/",
"importlib/",
"<frozen ",
]
The result is a clean stack trace pointing directly to the line of user code that triggered the query.
Interceptor Flow¶
sequenceDiagram
participant App as Django App
participant MW as Middleware
participant INT as Interceptor
participant DB as Database
MW->>INT: Install execute_wrapper
App->>INT: ORM query triggers SQL
INT->>INT: Record start_time
INT->>INT: Capture stack trace
INT->>DB: Forward SQL to database
DB-->>INT: Return results
INT->>INT: Record end_time
INT->>INT: Store CapturedQuery
INT-->>App: Return results unchanged
Note over MW,INT: At end of request/scope
MW->>INT: Collect all CapturedQueries
INT-->>MW: Return query list
Note: The interceptor never modifies queries, parameters, or results. It is purely observational. If the interceptor encounters an internal error, it logs a warning and allows the query to proceed normally.
Stage 2: Fingerprint¶
The fingerprinter normalizes each captured SQL query into a canonical form and generates a SHA-256 hash. This fingerprint is the foundation for N+1 and duplicate detection.
Normalization Steps¶
- Parameter replacement: All literal values (numbers, strings, lists) are replaced with
?placeholders. - Whitespace normalization: Multiple spaces, tabs, and newlines are collapsed to single spaces.
- Case normalization: SQL keywords are uppercased; identifiers are preserved.
- Comment removal: SQL comments (
-- ...and/* ... */) are stripped. - IN-list normalization:
IN (?, ?, ?, ?)is normalized toIN (?)regardless of the number of parameters.
Example¶
Input: SELECT "books_book"."id", "books_book"."title"
FROM "books_book"
WHERE "books_book"."author_id" = 42
ORDER BY "books_book"."title" ASC
Output: SELECT "books_book"."id", "books_book"."title"
FROM "books_book"
WHERE "books_book"."author_id" = ?
ORDER BY "books_book"."title" ASC
Hash: a1b2c3d4e5f6... (SHA-256 of normalized SQL)
Queries that differ only in their parameter values produce the same fingerprint. This is how the analyzer detects N+1 patterns: 50 queries for different author_id values all hash to the same fingerprint.
Stage 3: Analyze¶
The analyzer stage passes the list of captured queries (with fingerprints) through a registry of analyzer classes. Each analyzer detects one specific category of optimization issue.
Analyzer Registry¶
flowchart TD
QList["Captured Queries\n(with fingerprints)"]
QList --> NP["N+1 Analyzer"]
QList --> DUP["Duplicate Analyzer"]
QList --> MI["Missing Index Analyzer"]
QList --> FS["Fat SELECT Analyzer"]
QList --> CX["Complexity Analyzer"]
QList --> DRF["DRF Serializer Analyzer"]
QList --> QE["QuerySet Eval Analyzer"]
NP --> P1["Prescriptions"]
DUP --> P2["Prescriptions"]
MI --> P3["Prescriptions"]
FS --> P4["Prescriptions"]
CX --> P5["Prescriptions"]
DRF --> P6["Prescriptions"]
QE --> P7["Prescriptions"]
P1 --> AGG["Aggregated Results"]
P2 --> AGG
P3 --> AGG
P4 --> AGG
P5 --> AGG
P6 --> AGG
P7 --> AGG
Each analyzer implements the BaseAnalyzer abstract class:
class BaseAnalyzer(ABC):
"""Base class for all query analyzers."""
@abstractmethod
def analyze(self, queries: list[CapturedQuery]) -> list[Prescription]:
"""Analyze captured queries and return prescriptions."""
...
The Prescription Dataclass¶
Every issue detected by an analyzer is represented as a Prescription:
| Field | Type | Description |
|---|---|---|
severity |
Severity |
CRITICAL, WARNING, or INFO |
category |
str |
Issue category (e.g., "n_plus_one", "duplicate") |
description |
str |
Human-readable description of the issue |
file_path |
str \| None |
Absolute path to the source file |
line_number |
int \| None |
Line number in the source file |
current_code |
str \| None |
The code that causes the issue |
suggested_fix |
str \| None |
The code that resolves the issue |
query_count |
int |
Number of queries involved |
fingerprint |
str \| None |
The query fingerprint (for N+1 / duplicate) |
time_saved_ms |
float \| None |
Estimated time savings if fixed |
Analyzer Details¶
N+1 Analyzer (nplusone.py): Groups queries by fingerprint. If a fingerprint appears more than N_PLUS_ONE_THRESHOLD times and the SQL pattern matches a ForeignKey or ManyToMany access pattern (single-row lookup by PK/FK), it is flagged as N+1. The stack trace is used to identify the source location, and the fix suggests the appropriate select_related() or prefetch_related() call.
Duplicate Analyzer (duplicate.py): Detects two types of duplicates:
- Exact duplicates: Same SQL and same parameters, executed more than once.
- Near duplicates: Same fingerprint, similar parameters (e.g., fetching the same set of rows with slightly different orderings).
Missing Index Analyzer (missing_index.py): Examines WHERE and JOIN clauses for column references, then checks Django model meta information to determine if those columns are indexed. Suggests Meta.indexes with models.Index() for missing indexes.
Fat SELECT Analyzer (fat_select.py): Detects SELECT * patterns (or selecting all model fields) when the code only accesses a subset of fields. Suggests .only() or .values() calls.
Complexity Analyzer (complexity.py): Scores query complexity based on the number of JOINs, subqueries, CASE expressions, and aggregate functions. Flags queries above a configurable threshold.
DRF Serializer Analyzer (drf_serializer.py): Specialized N+1 detection that understands Django REST Framework serializer field access patterns. Identifies when serializer fields trigger lazy-loaded relationship access.
QuerySet Eval Analyzer (queryset_eval.py): Detects unnecessary queryset evaluations such as calling .count() on an already-evaluated queryset, or iterating over a queryset multiple times.
Stage 4: Report¶
The reporter stage takes the aggregated prescriptions and formats them for output. Multiple reporters can be active simultaneously.
Reporter Formats¶
| Reporter | Module | Output |
|---|---|---|
| Console | reporters/console.py |
Rich-formatted terminal output (falls back to plain text) |
| JSON | reporters/json_reporter.py |
Structured JSON for CI/CD and tooling |
| HTML | reporters/html_reporter.py |
Interactive HTML dashboard |
| Log | via Python logging |
Standard log output at configurable level |
| OpenTelemetry | via spans/events | Query metrics as OTel attributes |
The console reporter uses Rich for colored, formatted output when available:
============================================================
QUERY DOCTOR REPORT - GET /api/books/
Total queries: 151 | Time: 340ms
============================================================
CRITICAL N+1 Query Detected
47 queries match fingerprint: SELECT * FROM books_author WHERE id = ?
...
If Rich is not installed, it falls back to plain text with the same information.
Module Structure¶
src/query_doctor/
|-- __init__.py # Package init, version
|-- middleware.py # Django middleware entry point
|-- interceptor.py # execute_wrapper, CapturedQuery storage
|-- fingerprint.py # SQL normalization and hashing
|-- stack_tracer.py # Stack trace capture and filtering
|-- conf.py # Settings with defaults
|-- decorators.py # @diagnose, @query_budget
|-- context_managers.py # diagnose_queries() context manager
|-- exceptions.py # QueryDoctorError hierarchy
|-- types.py # Shared type definitions
|-- fixer.py # Auto-fix code generation
|-- diff_filter.py # Git diff-aware filtering for CI
|-- ignore.py # Query ignore rules
|-- plugin_api.py # Custom analyzer plugin registration
|-- project_diagnoser.py # Full-project scanning logic
|-- pytest_plugin.py # Pytest plugin for test-time analysis
|-- celery_integration.py # Celery task wrapper
|-- url_discovery.py # URL pattern discovery for management commands
|-- admin_panel.py # Django admin integration
|-- urls.py # URL configuration for HTML dashboard
|-- py.typed # PEP 561 marker
|-- analyzers/
| |-- __init__.py
| |-- base.py # BaseAnalyzer ABC, Prescription dataclass
| |-- nplusone.py # N+1 detection
| |-- duplicate.py # Duplicate query detection
| |-- missing_index.py # Missing index detection
| |-- fat_select.py # Fat SELECT detection
| |-- complexity.py # Query complexity scoring
| |-- drf_serializer.py # DRF serializer N+1 detection
| |-- queryset_eval.py # Unnecessary queryset evaluation
|-- reporters/
| |-- console.py # Rich/plain text console output
| |-- json_reporter.py # JSON output
| |-- html_reporter.py # HTML dashboard reporter
|-- management/
| |-- commands/
| |-- diagnose_queries.py # Full project scan command
|-- templates/
| |-- query_doctor/
| |-- dashboard.html # HTML dashboard template
Threading and Concurrency¶
Per-Instance ContextVar Storage¶
django-query-doctor stores all per-request state in contextvars.ContextVar instances. Each QueryInterceptor instance gets its own unique ContextVar, ensuring isolation across both threads and concurrent async coroutines.
# interceptor.py
import contextvars
_interceptor_counter = 0
class QueryInterceptor:
def __init__(self, capture_stack: bool = True) -> None:
global _interceptor_counter
_interceptor_counter += 1
self._queries_var: contextvars.ContextVar[list[CapturedQuery] | None] = (
contextvars.ContextVar(
f"query_doctor_queries_{_interceptor_counter}",
default=None,
)
)
self._queries_var.set([])
This ensures that concurrent requests in both multi-threaded WSGI servers (e.g., gunicorn with sync workers) and ASGI servers (e.g., uvicorn, daphne) do not interfere with each other.
Async Django Support¶
Both the query interceptor and QueryTurbo context managers use contextvars.ContextVar, making the full capture pipeline safe for ASGI deployments with concurrent coroutines on the same thread. Django's execute_wrapper() is per-connection, and the interceptor wraps the synchronous database calls within whatever thread Django uses.
Warning: If you use raw
asynciodatabase drivers that bypass Django's ORM (e.g., directasyncpgcalls), django-query-doctor will not capture those queries. It only intercepts queries that go through Django's database backend.
No Global State¶
The following guarantees hold:
- No module-level mutable variables (lists, dicts, sets) are used for query storage. Per-request state uses
contextvars.ContextVar. - Configuration is read once per request via
conf.get_config(), which reads from Django settings (immutable at runtime). - Analyzers are stateless: they receive a query list as input and return prescriptions as output. No state is retained between invocations.
- Reporters are stateless: they receive prescriptions and produce output.
Extension Points¶
django-query-doctor is designed to be extended without modifying its source code.
| Extension Point | Mechanism | Example |
|---|---|---|
| Custom analyzers | Subclass BaseAnalyzer, register via QUERY_DOCTOR["ANALYZERS"] or plugin_api.register_analyzer() |
Detect queries on deprecated tables |
| Custom reporters | Implement reporter interface, register via QUERY_DOCTOR["REPORTERS"] |
Send prescriptions to Datadog |
| Ignore rules | Configure QUERY_DOCTOR["IGNORE_PATTERNS"] or use @query_doctor_ignore |
Skip health check queries |
| Query budgets | Use @query_budget(max_queries=N) decorator or pytest markers |
Enforce per-view limits |
| Diff filtering | Set QUERY_DOCTOR["DIFF_BASE"] to a git ref |
Only report issues in changed code |
Custom Analyzer Example¶
from query_doctor.analyzers.base import BaseAnalyzer, Prescription, Severity
class DeprecatedTableAnalyzer(BaseAnalyzer):
"""Detect queries against deprecated database tables."""
DEPRECATED_TABLES = {"legacy_users", "old_orders", "temp_cache"}
def analyze(self, queries):
prescriptions = []
for query in queries:
for table in self.DEPRECATED_TABLES:
if table in query.sql.lower():
prescriptions.append(
Prescription(
severity=Severity.WARNING,
category="deprecated_table",
description=f"Query accesses deprecated table: {table}",
file_path=query.stack_trace[0].filename if query.stack_trace else None,
line_number=query.stack_trace[0].lineno if query.stack_trace else None,
suggested_fix=f"Migrate away from {table} to the new schema.",
query_count=1,
)
)
return prescriptions
For more on the design rationale behind these choices, see Background & Design. For performance characteristics, see Performance.