Tuning max_insert_block_size for High Throughput

max_insert_block_size controls the largest block ClickHouse forms while parsing an incoming INSERT before it is sorted, compressed, run through synchronous materialized views, and written as a part. On ClickHouse 24.x the default is 1048576 rows, and leaving it at the default is a frequent cause of MEMORY_LIMIT_EXCEEDED on wide rows or heavy materialized-view chains, and of oversized transformation passes that stall the ingest path. This guide walks through measuring your current block behaviour, choosing a value that fits available RAM and view complexity, and verifying the result — the throughput-tuning half of batch insert optimization.

Prerequisites

A MergeTree-family target table you can inspect in system.parts (needs SELECT on system.*).
Access to system.query_log (enabled by default) to read per-insert memory_usage and written_rows.
SET privilege for session tuning, or file access to users.d/ plus SYSTEM RELOAD CONFIG for profile/server changes.
Known max_memory_usage for the profile that runs your inserts — block sizing is meaningless without it.
clickhouse-connect installed (pip install clickhouse-connect) if you drive ingestion from Python.
A representative sample of production payload (row width, materialized-view fan-out) to test against — not synthetic narrow rows.

Which Way to Tune

The parameter is a two-way trade-off, not a “bigger is faster” knob. Larger blocks amortise per-block overhead and improve compression ratios, but memory use and the synchronous materialized-view transformation both scale linearly with block size. Smaller blocks cut peak memory and make views more responsive, but multiply part-finalisation CPU and disk I/O. Use the symptom you actually observe to pick a direction before touching any value.

Each block is sorted by the table ORDER BY key, compressed, and written as one immutable part that the MergeTree engine later consolidates through background merges — so the block boundary you set here directly determines how many parts a load produces and how large each synchronous view pass becomes.

Step-by-Step Procedure

1. Measure current insert memory and block size

Read the memory footprint and written-row count of recent inserts. written_rows divided by the number of blocks approximates the effective block size ClickHouse used.

sql

SELECT
    event_time,
    query_duration_ms,
    formatReadableSize(memory_usage) AS peak_mem,
    read_rows,
    written_rows
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query_kind = 'Insert'
  AND event_time > now() - INTERVAL 2 HOUR
ORDER BY memory_usage DESC
LIMIT 10;

Expected output — a peak-memory column you can compare against the profile limit:

text

┌──────────event_time─┬─query_duration_ms─┬─peak_mem──┬─read_rows─┬─written_rows─┐
│ 2026-07-03 11:04:12 │              1840 │ 7.42 GiB  │   1048576 │      1048576 │
│ 2026-07-03 11:03:58 │              1790 │ 7.31 GiB  │   1048576 │      1048576 │
└─────────────────────┴───────────────────┴───────────┴───────────┴──────────────┘

If peak_mem sits close to max_memory_usage while written_rows equals the default 1048576, the block is the pressure source and you should decrease it.

2. Correlate block size with part creation

A block that is too small for the insert rate manufactures many tiny parts. Check average rows per active part:

sql

SELECT
    table,
    count() AS part_count,
    sum(rows) AS total_rows,
    round(avg(rows)) AS avg_rows_per_part
FROM system.parts
WHERE active = 1 AND database = currentDatabase()
GROUP BY table
ORDER BY avg_rows_per_part ASC;

A low avg_rows_per_part paired with a high part_count is the “increase block size” signal; a healthy loaded table shows hundreds of thousands to millions of rows per part after merges settle.

3. Apply a candidate value at the narrowest scope first

Start at session scope so a bad guess cannot affect other workloads. Session settings do not persist across connections, which is exactly what you want while experimenting.

sql

SET max_insert_block_size = 262144;
-- re-run a representative INSERT, then repeat step 1 to compare peak_mem

Once a value proves stable, promote it to the ingestion profile so every ETL connection inherits it without leaking into ad-hoc analytical sessions:

xml

<!-- /etc/clickhouse-server/users.d/insert_settings.xml -->
<clickhouse>
    <profiles>
        <etl_writer>
            <max_insert_block_size>262144</max_insert_block_size>
            <max_memory_usage>10737418240</max_memory_usage>
        </etl_writer>
    </profiles>
</clickhouse>

Precedence runs session override → user profile → server config, so the file above is a floor that a session SET can still raise for a one-off bulk load.

4. Reload configuration without a restart

Profile changes are picked up on the next connection after a config reload — no server bounce required:

bash

clickhouse-client --query "SYSTEM RELOAD CONFIG"
clickhouse-client --user etl_writer \
  --query "SELECT value FROM system.settings WHERE name = 'max_insert_block_size'"

Expected output confirms the new value is live for that profile:

text

5. Align the Python client to the server boundary

Clients that stream one giant frame let the server form a full default-sized block in memory. Chunking the write on the client keeps each server-side block bounded and predictable under variable network conditions:

python

import pandas as pd
from clickhouse_connect import get_client

client = get_client(host="ch-cluster", port=8123, username="etl_writer", password="***")
df = pd.read_parquet("/data/events_stream.parquet")

# Match the client chunk to the server-side block ceiling.
CHUNK_SIZE = 262_144
for start in range(0, len(df), CHUNK_SIZE):
    chunk = df.iloc[start:start + CHUNK_SIZE]
    client.insert_df("analytics.events", chunk,
                     settings={"max_insert_block_size": CHUNK_SIZE})

Passing the setting per-request pins the boundary for that insert regardless of the connecting profile. Ensure DataFrame dtypes match the ClickHouse column types so no implicit casting inflates memory during block finalisation.

Verification

After the load, confirm the block boundary took effect in two places. First, that inserts no longer approach the memory ceiling:

sql

SELECT
    max(memory_usage) AS worst_peak,
    quantile(0.95)(memory_usage) AS p95_peak
FROM system.query_log
WHERE query_kind = 'Insert'
  AND type = 'QueryFinish'
  AND event_time > now() - INTERVAL 15 MINUTE;

worst_peak should now sit comfortably under max_memory_usage with headroom for concurrent inserts. Second, that part creation is sane — a stable avg_rows_per_part climbing as merges run confirms the block size and insert rate are balanced:

sql

SELECT table, count() AS parts, round(avg(rows)) AS avg_rows_per_part
FROM system.parts
WHERE active = 1 AND database = currentDatabase()
GROUP BY table;

Gotchas & Edge Cases

max_insert_block_size does not govern native-protocol client blocks. It caps blocks the server forms while parsing input — INSERT ... SELECT, INSERT ... FORMAT, HTTP inserts, and file reads. Over the native protocol the client library forms the block, and the server instead squashes those blocks using min_insert_block_size_rows / min_insert_block_size_bytes. If tuning this setting seems to do nothing for a native-protocol loader, control block size on the client (step 5) and tune the min_insert_block_size_* pair on the server instead.

Synchronous materialized views multiply the true memory cost. A materialized view executes on the full insert block before the INSERT is acknowledged, so a JOIN, arrayJoin, or dictionary lookup inside the view can exhaust max_memory_usage at a block size a bare table tolerates easily. Where a heavy view chain dominates, decouple ingest from transformation with a staging layer — see asynchronous processing with buffer tables — rather than shrinking the block until throughput collapses. The companion ceiling for view-side part spread is covered in tuning max_partitions_per_insert_block for views.

async_insert reshapes the boundary entirely. With async_insert = 1, ClickHouse buffers small inserts server-side and flushes them on async_insert_max_data_size / async_insert_busy_timeout_ms, not on max_insert_block_size. Tuning block size on an async-insert path optimises the wrong stage; tune the async-insert flush thresholds instead.

A value larger than the payload does nothing. Blocks never exceed the number of rows actually supplied, so raising max_insert_block_size above your real batch size has no effect. Fix small parts at the client by accumulating larger batches before you touch the server ceiling.

Batch Insert Optimization — parent guide to sizing writes on client and server.
MergeTree engine deep dive — how blocks become parts and how merges consolidate them.
Asynchronous processing with buffer tables — decouple ingest from heavy synchronous views.
Tuning max_partitions_per_insert_block for views — the partition-spread ceiling that pairs with block sizing.
Real-Time Data Ingestion Pipeline Implementation — the full ingestion subsystem this setting sits inside.

Up one level: Batch Insert Optimization.