Performance¶

django-query-doctor is designed to have minimal impact on application performance. This document details the overhead model, memory usage, and strategies for controlling cost in different environments.

Overhead Model¶

The overhead of django-query-doctor comes from three sources:

Query interception: Recording each SQL query as it executes
Analysis: Processing the captured queries at the end of the request
Reporting: Formatting and outputting the results

Query Interception Cost¶

For each SQL query, the interceptor performs:

Operation	Typical Cost	Notes
`time.perf_counter()` (2 calls)	~0.5 microseconds	Two calls: before and after query execution
`traceback.extract_stack()`	~10-40 microseconds	Depends on call stack depth
Stack trace filtering	~2-5 microseconds	Iterates frames, checks against ignore list
`CapturedQuery` construction	~1-2 microseconds	Dataclass instantiation
List append	~0.1 microseconds	Amortized O(1)
Total per query	~15-50 microseconds	Excluding the SQL query itself

For context, a typical Django ORM query to PostgreSQL takes 0.5-5 milliseconds. The interception overhead of 15-50 microseconds represents 0.3-10% of query execution time, depending on query complexity.

Analysis Cost Per Request¶

Analysis runs once at the end of the request, not per query.

Operation	Complexity	Typical Cost
Fingerprinting (all queries)	O(n) where n = query count	~0.5-2ms for 100 queries
N+1 detection	O(n) grouping + O(g) pattern matching	~0.2-1ms
Duplicate detection	O(n) grouping	~0.1-0.5ms
Missing index analysis	O(n) SQL inspection	~0.2-1ms
Fat SELECT analysis	O(n) SQL inspection	~0.1-0.5ms
Complexity scoring	O(n) SQL inspection	~0.1-0.3ms
DRF serializer analysis	O(n) stack trace inspection	~0.1-0.5ms
QuerySet eval analysis	O(n) pattern matching	~0.1-0.3ms
Total analysis	O(n)	~1-5ms for 100 queries

Total Overhead by Request Type¶

Request Type	Typical Query Count	Interception Overhead	Analysis Overhead	Total Overhead
Simple API endpoint	3-10	0.05-0.5ms	0.1-0.5ms	0.15-1ms
List view (paginated)	10-30	0.15-1.5ms	0.3-1.5ms	0.5-3ms
Dashboard page	30-80	0.5-4ms	1-3ms	1.5-7ms
Complex report	80-200	1.2-10ms	2-5ms	3-15ms
Unoptimized N+1 page	200-500+	3-25ms	3-8ms	6-33ms

Info: The overhead scales linearly with the number of queries. For requests that execute fewer than 50 queries, the total overhead is typically under 3ms --- well within noise for most applications.

Memory Usage¶

Per-Query Memory¶

Each CapturedQuery object consumes approximately:

Component	Size
SQL string reference	~50 bytes (pointer + small string, or shared with Django)
Parameters tuple	~100-200 bytes (varies with parameter count)
Timing floats (2x)	~16 bytes
Stack trace (filtered, ~3-5 frames)	~300-500 bytes
Fingerprint hash string	~64 bytes (SHA-256 hex)
Dataclass overhead	~50 bytes
Total per query	~600-900 bytes

A rough estimate of ~800 bytes per query is useful for capacity planning.

Per-Request Memory¶

Request Type	Query Count	Estimated Memory
Simple API endpoint	5	~4 KB
List view	20	~16 KB
Dashboard page	50	~40 KB
Complex report	150	~120 KB
Unoptimized N+1 page	500	~400 KB

All captured query data is released at the end of the request when the context variable is cleared. There is no accumulation across requests.

Tip: If memory is a concern for requests with very high query counts (500+), consider fixing the underlying query issues first --- that is exactly what django-query-doctor is for. A request with 500 queries has far bigger problems than 400 KB of diagnostic memory.

Zero-Overhead Mode¶

When django-query-doctor is disabled, the overhead is effectively zero. The middleware checks the ENABLED setting at the start of each request and short-circuits immediately:

# middleware.py (simplified)
class QueryDoctorMiddleware:
    def __call__(self, request):
        config = get_config()
        if not config["ENABLED"]:
            return self.get_response(request)  # Zero overhead path
        # ... interception and analysis ...

To disable:

# settings.py
QUERY_DOCTOR = {
    "ENABLED": False,
}

The overhead of the disabled path is a single dictionary lookup and boolean check --- roughly 0.1 microseconds per request.

Environment-Based Toggle¶

A common pattern is to enable django-query-doctor only in specific environments:

# settings.py
import os

QUERY_DOCTOR = {
    "ENABLED": os.environ.get("QUERY_DOCTOR_ENABLED", "false").lower() == "true",
}

This allows you to deploy the same code everywhere but only activate diagnostics when needed:

# Enable for a staging deploy
QUERY_DOCTOR_ENABLED=true gunicorn myapp.wsgi

# Disable for production (default)
gunicorn myapp.wsgi

Stack Trace Optimization¶

Stack trace capture (traceback.extract_stack()) is the single most expensive per-query operation, accounting for roughly 60-80% of the interception overhead. If you need to reduce overhead, you can disable stack traces:

QUERY_DOCTOR = {
    "CAPTURE_STACK_TRACES": False,
}

Impact of Disabling Stack Traces¶

Aspect	With Stack Traces	Without Stack Traces
Per-query overhead	~15-50 microseconds	~3-8 microseconds
Per-query memory	~800 bytes	~300 bytes
File:line in prescriptions	Yes	No
Suggested fix location	Exact file and line	General suggestion only
N+1 detection	Full accuracy	Full accuracy
Duplicate detection	Full accuracy	Full accuracy
DRF serializer analysis	Full accuracy	Reduced (cannot identify serializer source)

Warning: Without stack traces, prescriptions will still identify the issue and suggest the fix, but they cannot tell you where in your code the fix needs to be applied. This makes django-query-doctor significantly less useful in large codebases where the same model is accessed from many locations.

Stack Depth Limiting¶

An alternative to fully disabling stack traces is to limit the depth:

QUERY_DOCTOR = {
    "STACK_TRACE_MAX_DEPTH": 10,  # Default: no limit
}

This captures only the 10 most recent frames, which is usually sufficient to reach user code while reducing the cost of deep call stacks (common in DRF, Celery, and deeply nested view logic).

Recommended Workflows¶

Local Development¶

Use the full feature set. The overhead is negligible compared to development server response times:

# settings/local.py
QUERY_DOCTOR = {
    "ENABLED": True,
    "REPORT_FORMAT": "console",
    "CAPTURE_STACK_TRACES": True,
}

CI/CD Pipeline¶

Run the management command or pytest plugin to catch regressions. Overhead does not matter since these are batch jobs:

# .github/workflows/ci.yml
- name: Query Doctor Check
  run: python manage.py diagnose_queries --format=json --fail-on-issues

Staging Environment¶

Enable with full features for targeted debugging. Consider using path-based filtering to reduce noise:

# settings/staging.py
QUERY_DOCTOR = {
    "ENABLED": True,
    "REPORT_FORMAT": "json",
    "CAPTURE_STACK_TRACES": True,
    "INCLUDE_PATHS": ["/api/v2/", "/dashboard/"],  # Only analyze these paths
}

Production (Sampling)¶

If you need production visibility, enable with sampling to control overhead:

# settings/production.py
import random

QUERY_DOCTOR = {
    "ENABLED": True,
    "SAMPLE_RATE": 0.01,  # Analyze 1% of requests
    "REPORT_FORMAT": "json",
    "CAPTURE_STACK_TRACES": False,  # Minimize overhead
}

Note: Even at 1% sampling, you will quickly surface the most common query issues since they tend to appear on high-traffic endpoints. A single hour of 1% sampling on a 1000-rps application gives you ~36,000 analyzed requests.

Production (Zero Overhead)¶

If you do not need runtime diagnostics in production, disable entirely and rely on CI/CD for regression detection:

# settings/production.py
QUERY_DOCTOR = {
    "ENABLED": False,
}

Benchmarking Your Own Overhead¶

You can measure django-query-doctor's overhead in your specific application:

import time
from django.test import RequestFactory
from myapp.views import BookListView

factory = RequestFactory()
request = factory.get("/api/books/")

# Warm up
for _ in range(10):
    BookListView.as_view()(request)

# Measure without django-query-doctor
times_without = []
for _ in range(100):
    start = time.perf_counter()
    BookListView.as_view()(request)
    times_without.append(time.perf_counter() - start)

# Enable django-query-doctor and measure again
# (toggle via settings or context manager)
from query_doctor.context_managers import diagnose_queries

times_with = []
for _ in range(100):
    start = time.perf_counter()
    with diagnose_queries(report=False):  # Suppress output
        BookListView.as_view()(request)
    times_with.append(time.perf_counter() - start)

avg_without = sum(times_without) / len(times_without) * 1000
avg_with = sum(times_with) / len(times_with) * 1000
overhead = avg_with - avg_without

print(f"Without: {avg_without:.2f}ms")
print(f"With:    {avg_with:.2f}ms")
print(f"Overhead: {overhead:.2f}ms ({overhead / avg_without * 100:.1f}%)")

QueryTurbo Compilation-Skip Benchmarks (v2.0)¶

When QueryTurbo is enabled and a query reaches TRUSTED state, the as_sql() call is skipped entirely. Parameters are extracted directly from the Django Query tree.

Compilation-Skip Speedups¶

Measured on SQLite (compilation-only, no DB I/O). Run python benchmarks/run.py to reproduce.

Query Pattern	Speedup	Saved per Query
Simple filter	123x	38.8 μs
Multi filter	153x	49.2 μs
select_related	294x	92.5 μs
Deep select_related	374x	121.1 μs
Annotate	214x	68.6 μs
Complex (JOINs + Q + annotate)	1,050x	337.9 μs

Prepared Statement Savings (PostgreSQL + psycopg3)¶

On PostgreSQL with psycopg3, prepared statements provide additional savings of 0.5--5ms of query planner time per repeat query, with the greatest benefit on complex queries with multiple JOINs.

Database Backend Support¶

Backend	Compilation Cache	Prepared Statements	Notes
PostgreSQL (psycopg3)	Yes	Yes	Full support
PostgreSQL (psycopg2)	Yes	No	Cache only
MySQL	Yes	No	Cache only
SQLite	Yes	No	Good for dev/test
Oracle	Yes	Implicit	Via cx_Oracle cursor cache

QueryTurbo Memory Overhead¶

The compilation cache stores SQL templates and metadata. Each cache entry uses approximately 500-2000 bytes depending on SQL length. With the default MAX_SIZE of 1024 entries, the cache uses at most ~2 MB of memory.

The cache is process-local. In multi-process deployments (e.g., gunicorn with 4 workers), each worker maintains its own cache, so total memory usage is ~2 MB × worker_count.

See QueryTurbo for configuration details and the Benchmark Dashboard for monitoring cache performance.

For a detailed comparison of how this overhead compares to other tools, see Comparison. For the architectural reasons behind these performance characteristics, see Background & Design.