Using Python Asyncio for Concurrent ClickHouse Inserts

When a single-threaded Python writer feeds ClickHouse over HTTP, throughput collapses long before the network or the server is saturated: each INSERT blocks the interpreter until the round-trip completes, so one slow flush stalls every other pending batch. ClickHouse’s native HTTP interface is built for many concurrent batch writers, and Python’s asyncio is the cleanest way to supply them — non-blocking sockets, connection reuse, and a semaphore that pins client parallelism to the server’s max_concurrent_queries ceiling. This page shows how to build a bounded concurrent insert client with clickhouse-connect’s async interface, drive it from a producer-consumer queue, verify real throughput in system.query_log, and avoid the part-explosion and GIL traps that catch engineers who assume “async” means “faster CPU.”

Prerequisites

ClickHouse 23.3+ reachable over the HTTP interface (port 8123, or 8443 for TLS)
A destination MergeTree table, ideally fronted by a Buffer engine table (see async processing and buffer tables)
Python 3.11+ with clickhouse-connect>=0.7 (ships an async client) and, for the low-level variant, aiohttp>=3.9
The server’s max_concurrent_queries value and the number of ingestion nodes sharing it
INSERT privilege on the target table for the ingestion role
A decision on async_insert: keep it disabled server-wide and batch on the client (the pattern below), rather than double-buffering

How concurrency maps to the event loop

asyncio gives you cooperative concurrency on a single OS thread. That is exactly what an insert workload needs, because the expensive part — waiting for ClickHouse to acknowledge a flushed block — is I/O, not CPU. A producer-consumer topology keeps the loop busy: upstream sources (a Kafka consumer, an S3 event processor, a webhook handler) push records into an asyncio.Queue, and a bounded pool of consumer tasks drains the queue, accumulates rows into merge-friendly batches, and issues INSERT statements over persistent HTTP connections. A single asyncio.Semaphore caps how many inserts are in flight at once so the client never opens more concurrent queries than the server will schedule.

Three invariants keep this loop stable in production:

Connection reuse — HTTP keep-alive amortizes TLS handshakes and lets ClickHouse reuse query-execution contexts and memory arenas across inserts.
Bounded concurrency — the semaphore prevents connection exhaustion and aligns Python parallelism with the server’s scheduler.
Idempotent batching — deterministic batch boundaries plus explicit INSERT formatting make retries safe without duplicate writes.

Step 1 — Confirm the server concurrency ceiling

Client concurrency is meaningless in isolation; it is a fraction of a shared server budget. Read the live limit before sizing the semaphore:

sql

SELECT name, value
FROM system.settings
WHERE name IN ('max_concurrent_queries', 'max_concurrent_insert_queries');

Expected output on a default install:

text

┌─name───────────────────────────┬─value─┐
│ max_concurrent_queries         │ 100   │
│ max_concurrent_insert_queries  │ 0     │  -- 0 = unlimited, falls back to max_concurrent_queries
└────────────────────────────────┴───────┘

Set the client semaphore to max_concurrent_queries divided by the number of ingestion nodes, then subtract headroom for analytical SELECT traffic. With a limit of 100 across 4 writers, an 8–16 concurrency cap per writer is a safe starting point.

Step 2 — Build a bounded async insert client

The recommended path uses clickhouse-connect’s async client (get_async_client), which handles connection pooling, native-format serialization, and typed column binding while you keep control of batching and concurrency. The semaphore is the throttle; the client’s own connection pool sizing is matched to it.

python

import asyncio
import logging
from typing import Sequence, Any
import clickhouse_connect

logger = logging.getLogger(__name__)


class AsyncBatchInserter:
    def __init__(
        self,
        host: str,
        database: str,
        table: str,
        column_names: Sequence[str],
        max_concurrency: int = 8,
        batch_size: int = 50_000,
        retries: int = 3,
    ):
        self.host = host
        self.database = database
        self.table = table
        self.column_names = list(column_names)
        self.batch_size = batch_size
        self.retries = retries
        self.semaphore = asyncio.Semaphore(max_concurrency)
        self.client = None

    async def __aenter__(self):
        self.client = await clickhouse_connect.get_async_client(
            host=self.host,
            database=self.database,
            # Keep the HTTP pool at least as large as the concurrency cap.
            pool_mgr=None,
            settings={"async_insert": 0},  # batch on the client, not the server
        )
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb):
        if self.client:
            await self.client.close()

    async def insert_batch(self, rows: list[list[Any]]) -> bool:
        if not rows:
            return True
        for attempt in range(self.retries):
            try:
                async with self.semaphore:
                    await self.client.insert(
                        self.table,
                        rows,
                        column_names=self.column_names,
                    )
                return True
            except clickhouse_connect.driver.exceptions.DatabaseError as exc:
                # 252 TOO_MANY_PARTS / 241 MEMORY_LIMIT_EXCEEDED are retryable-with-backoff
                logger.warning("insert attempt %d/%d failed: %s", attempt + 1, self.retries, exc)
                if attempt < self.retries - 1:
                    await asyncio.sleep(min(2 ** attempt + 0.5, 10.0))
        return False

Passing rows as native Python lists lets clickhouse-connect serialize straight to ClickHouse’s binary Native/RowBinary format, which is materially cheaper on the server than parsing text JSONEachRow.

Step 3 — Drive the client from a queue

A consumer task pulls from the shared asyncio.Queue, accumulates until it reaches batch_size or a flush timeout fires, and hands the batch to the inserter. The timeout guarantees latency-bounded delivery even when the stream goes quiet — a partial batch never sits in memory indefinitely.

python

async def consume(inserter: AsyncBatchInserter, queue: asyncio.Queue,
                  flush_seconds: float = 1.0) -> None:
    buffer: list[list[Any]] = []
    while True:
        try:
            row = await asyncio.wait_for(queue.get(), timeout=flush_seconds)
            buffer.append(row)
            if len(buffer) >= inserter.batch_size:
                if not await inserter.insert_batch(buffer):
                    logger.critical("batch failed after retries; routing to DLQ")
                buffer = []
        except asyncio.TimeoutError:
            if buffer:  # idle flush of a partial batch
                await inserter.insert_batch(buffer)
                buffer = []


async def main():
    queue: asyncio.Queue = asyncio.Queue(maxsize=500_000)
    async with AsyncBatchInserter(
        host="clickhouse", database="analytics", table="events",
        column_names=["ts", "user_id", "event", "payload"],
        max_concurrency=8, batch_size=50_000,
    ) as inserter:
        workers = [asyncio.create_task(consume(inserter, queue)) for _ in range(8)]
        # ... producers feed queue.put(row) here ...
        await asyncio.gather(*workers)

Bounding the queue with maxsize is what applies backpressure upstream: when consumers fall behind, queue.put() blocks the producer instead of letting memory grow without limit.

Step 4 — Low-level aiohttp variant for streaming JSONEachRow

When you need raw control over the wire format — for example streaming pre-serialized JSONEachRow bytes without building Python row lists — drop to aiohttp and post directly to the HTTP endpoint. This trades the typed client for a slightly leaner hot path.

python

import asyncio, json
import aiohttp

async def stream_insert(session, base_url, database, table,
                        rows, semaphore, retries=3):
    query = f"INSERT INTO {database}.{table} FORMAT JSONEachRow"
    payload = "\n".join(json.dumps(r, ensure_ascii=False) for r in rows).encode()
    for attempt in range(retries):
        async with semaphore:
            async with session.post(f"{base_url}/?query={query}", data=payload) as resp:
                if resp.status == 200:
                    return True
                logger.warning("HTTP %d: %s", resp.status, await resp.text())
        await asyncio.sleep(min(2 ** attempt + 0.5, 10.0))
    return False

Create the session once with a TCPConnector(limit=max_concurrency * 2, keepalive_timeout=60) so sockets are reused across the whole run. Note that keepalive_timeout must stay below the server’s keep_alive_timeout (default 10s in older builds, 30s in recent ones) or the client will try to reuse a socket the server has already closed.

Step 5 — Route through a Buffer table to decouple views

Every INSERT into a table with attached materialized views fires those views synchronously, so heavy aggregations or dictionary lookups add write amplification directly onto the event loop’s critical path. Point the async client at a Buffer engine table instead of the raw MergeTree: the buffer coalesces writes in memory and flushes large blocks asynchronously, so views observe smoothed, well-sized blocks rather than one execution per small batch. Mirror the destination column types exactly and tune the flush thresholds to your accumulation window — the full pattern, including flush-threshold math, lives in the parent async processing and buffer tables guide.

sql

CREATE TABLE analytics.events_buffer AS analytics.events
ENGINE = Buffer(analytics, events,
    /* num_layers */ 8,
    /* min/max_time  */ 5, 30,
    /* min/max_rows  */ 10000, 200000,
    /* min/max_bytes */ 1000000, 50000000);

Verification

Confirm the writers actually ran concurrently and hit target throughput by reading completed inserts from system.query_log (flush the log first with SYSTEM FLUSH LOGS):

sql

SELECT
    toStartOfMinute(event_time) AS minute,
    count()                     AS insert_queries,
    sum(written_rows)           AS rows_written,
    round(avg(query_duration_ms)) AS avg_ms,
    max(memory_usage)           AS peak_mem
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query_kind = 'Insert'
  AND event_time > now() - INTERVAL 10 MINUTE
GROUP BY minute
ORDER BY minute DESC;

A healthy run shows many inserts per minute with stable avg_ms and rows_written climbing linearly with concurrency. Cross-check live socket usage against your concurrency cap:

sql

SELECT metric, value
FROM system.metrics
WHERE metric IN ('HTTPConnection', 'Query', 'InsertQuery');

HTTPConnection should hover near your semaphore limit under load, not spike past it — if it does, a connection leak or an unbounded retry loop is opening sockets faster than they close.

Gotchas and edge cases

Async is not parallel CPU. The event loop runs on one thread under the GIL. Row serialization (JSON encoding, type coercion) still happens serially, so a CPU-bound producer can starve the loop even with high max_concurrency. Push heavy serialization into asyncio.to_thread() or a ProcessPoolExecutor, and let the loop own only the I/O.
High concurrency with small batches causes TOO_MANY_PARTS. Concurrency multiplies part creation: 16 workers each flushing 500-row batches write parts far faster than background merges drain them, and ClickHouse rejects inserts with code 252. Larger batches beat more workers — see batch insert optimization and the server-side ceilings in threshold tuning and performance limits. Sizing max_insert_block_size correctly is covered in tuning max_insert_block_size for high throughput.
Client batching and server async_insert double-buffer. If you enable async_insert=1 on the server and batch on the client, rows are queued twice and flush timing becomes unpredictable. Pick one buffering layer; the client-side pattern here keeps flush boundaries deterministic and retryable.
HTTP gives you no transaction. ClickHouse does not support multi-statement transactions over HTTP, so a retried batch can double-write. Append an _insert_id UUID column and deduplicate with a ReplacingMergeTree(version) target, or filter on _insert_id NOT IN (...), to get effectively-once semantics under retries.
Buffer table reads are transparent but merges are deferred. A SELECT against the buffer reads in-memory rows unioned with the on-disk table, so fresh data is visible pre-flush — but the underlying MergeTree layout only changes on flush, so part-count and merge diagnostics lag behind what you just inserted.

Async Processing & Buffer Tables — the parent pattern that absorbs write velocity in front of the destination table.
Batch Insert Optimization — sizing blocks so concurrency does not explode part counts.
Tuning max_insert_block_size for High Throughput — the block-size lever that governs part creation per insert.
Threshold Tuning & Performance Limits — the server ceilings your semaphore must respect.
MergeTree Engine Deep Dive — why small, frequent parts hurt and how background merges drain them.

Up: Async Processing & Buffer Tables

Using Python Asyncio for Concurrent ClickHouse Inserts

Prerequisites

How concurrency maps to the event loop

Step 1 — Confirm the server concurrency ceiling

Step 2 — Build a bounded async insert client

Step 3 — Drive the client from a queue

Step 4 — Low-level aiohttp variant for streaming JSONEachRow

Step 5 — Route through a Buffer table to decouple views

Verification

Gotchas and edge cases

Related