About the StreamingHttpResponse and RAM Usage

While HttpResponse returns the whole document at once, using StreamingHttpResponse you can provide the same document in chunks.

The performance is roughly the same, but the stream response reduces peak RAM usage for large data sets (e.g. from 16.77 MB at peak time to 5.82 MB), since the full body is never held in memory at once. Moreover, the first byte reaches the browser sooner.

The RAM savings are only meaningful for large responses, such as JSON list of all Django tricks. Small pages see negligible difference.

While HTML streams can be rendered in chunks, for JSON responses, the browser still loads the whole structure into the memory on the client side, as it needs to validate and display it with the widget of choice.

Here is an example of a JSON stream returning all published tricks:

import json
import asyncio
import tracemalloc
import logging
from asgiref.sync import sync_to_async
from django.http import StreamingHttpResponse

logger = logging.getLogger(__name__)

async def trick_list_json_stream(request):
    tracemalloc.start()

    def serialize(trick):
        return {
            "title": trick.title,
            "url": request.build_absolute_uri(trick.get_url_path()),
            "categories": [cat.slug for cat in trick.categories.all()],
            "technologies": [tech.slug for tech in trick.technologies.all()],
        }

    async def generate():
        yield '{"results": ['
        first = True

        chunk_size = 50
        offset = 0

        def fetch_chunk(offset_val):
            return list(
                Trick.objects.filter(
                    publishing_status__in=(
                        Trick.PUBLISHING_STATUS_PUBLISHED,
                        Trick.PUBLISHING_STATUS_PUBLISHED_OUTDATED,
                    )
                )
                .prefetch_related("categories", "technologies")
                [offset_val:offset_val + chunk_size]
            )

        get_chunk = sync_to_async(fetch_chunk)

        while True:
            chunk = await get_chunk(offset)

            if not chunk:
                break  # No more tricks

            # True streaming: one item at a time
            for trick in chunk:
                if not first:
                    yield ','
                yield json.dumps(serialize(trick))
                first = False
                await asyncio.sleep(0)  # Yield control

            # Explicitly delete chunk to free memory immediately
            del chunk
            offset += chunk_size

        yield ']}'

        # Calculate and log RAM usage
        current, peak = tracemalloc.get_traced_memory()
        tracemalloc.stop()
        logger.info("=== trick_list_json_stream RAM Usage ===")
        logger.info(f"Current memory: {current / 1024 / 1024:.2f} MB")
        logger.info(f"Peak memory: {peak / 1024 / 1024:.2f} MB")
        logger.info("========================================")

    return StreamingHttpResponse(generate(), content_type='application/json')