Automating Materialized View Deployment with Python

Applying a materialized view by hand — pasting CREATE MATERIALIZED VIEW into a client, re-running it after an edit, and hoping the target table survived — is where ClickHouse pipelines silently diverge: two environments end up with different SELECT projections, a redeploy leaves a stale .inner.* table behind, and nobody can say which definition is actually live. This page shows how to make deployment deterministic from Python using clickhouse-connect (ClickHouse 23.3+): a checksum-driven convergence gate so re-running a definition is a no-op, a dry-run validation path, idempotent DROP ... SYNC replacement, and a registry table you can reconcile against system.tables. It is a concrete procedure inside the broader materialized view creation patterns discipline, which decides how a view is structured; this page decides how that structure is rolled out safely and repeatably.

Prerequisites

ClickHouse 23.3 or later, with read access to system.tables, system.parts, and system.errors.
Python 3.9+ with the official driver: pip install clickhouse-connect (this page uses the HTTP driver, not the legacy clickhouse-driver).
A DDL user holding CREATE VIEW, DROP VIEW, and INSERT grants on the target database, plus CREATE TABLE for the target tables the views write into.
View definitions stored as version-controlled .sql files, one CREATE MATERIALIZED VIEW ... TO ... per file, keyed by a stable view name.
Each view’s target table already created explicitly with its own PARTITION BY / ORDER BY (the decoupled TO-clause pattern), so deployment never relies on an implicit hidden table.

Deployment Convergence Flow

Manual DDL is imperative — you tell the server do this now. Automated deployment is a reconciliation: for each definition you compute a hash of the normalized SQL, compare it to what was last recorded as live, and only touch the server when the two differ. That single gate is what makes a redeploy idempotent and what lets the same script run in CI on every push without churning the server.

Ordering matters when views depend on each other’s targets: a view must not be created before the table it reads from exists. Deriving that order from a full cross-table dependency graph is what keeps a batch deploy from failing on a forward reference — resolve the definitions with a topological sort before handing them to the deployer below.

Step-by-Step Procedure

1. Provision the deployment registry

The registry is the source of truth for “what is currently live.” A ReplacingMergeTree keyed on the view name keeps only the newest row per view, so re-recording a deployment is itself idempotent.

sql

CREATE TABLE IF NOT EXISTS analytics.mv_deployment_registry
(
    `view_name`   LowCardinality(String),
    `sql_hash`    FixedString(64),
    `status`      Enum8('failed' = 0, 'success' = 1),
    `deployed_at` DateTime64(3, 'UTC')
)
ENGINE = ReplacingMergeTree(deployed_at)
PARTITION BY tuple()
ORDER BY view_name;

Because ReplacingMergeTree collapses duplicates only during background merges, always read the registry with FINAL so a not-yet-merged older row cannot mask the current hash.

2. Normalize and hash each definition

Whitespace and case differences must not count as drift, or every reformat would trigger a needless redeploy. Collapse the SQL to a canonical form before hashing.

python

import hashlib
import logging
import time
import clickhouse_connect
from clickhouse_connect.driver.exceptions import DatabaseError, OperationalError

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
log = logging.getLogger("mv_deploy")


def normalize(sql: str) -> str:
    return " ".join(sql.strip().split()).lower()


def checksum(sql: str) -> str:
    return hashlib.sha256(normalize(sql).encode()).hexdigest()

3. Open a client and read current live hashes

clickhouse-connect speaks HTTP on port 8123 and binds parameters server-side with the {name:Type} syntax — safer than string interpolation into DDL-adjacent queries.

python

client = clickhouse_connect.get_client(
    host="clickhouse.internal", port=8123,
    username="mv_deployer", password="…",
    database="analytics",
    settings={"max_execution_time": 300},
)

def live_hash(view_name: str) -> str | None:
    rows = client.query(
        "SELECT sql_hash FROM analytics.mv_deployment_registry FINAL "
        "WHERE view_name = {v:String} AND status = 1",
        parameters={"v": view_name},
    ).result_rows
    return rows[0][0] if rows else None

4. Validate before you mutate (dry-run)

You cannot EXPLAIN a CREATE MATERIALIZED VIEW statement, but you can validate the projection that actually does the work by running EXPLAIN SYNTAX over the view’s inner SELECT. Catching a bad column reference here means a broken definition never reaches the point where it drops a working view.

python

def dry_run(view_name: str, select_body: str) -> bool:
    try:
        client.command(f"EXPLAIN SYNTAX {select_body}")
        log.info("[dry-run] %s: projection valid", view_name)
        return True
    except DatabaseError as e:
        log.error("[dry-run] %s: %s", view_name, e)
        return False

Expected output on a healthy definition:

text

2026-07-04 09:12:04 [INFO] [dry-run] mv_events_daily_agg: projection valid

5. Apply with the convergence gate and bounded retry

The apply path: skip if the hash already matches, otherwise DROP VIEW ... SYNC (the SYNC keyword blocks until the view’s background threads finish, preventing an orphaned consumer), then CREATE with exponential backoff so a transient TOO_MANY_PARTS or network blip does not abort the whole batch.

python

def execute_with_retry(stmt: str, tries: int = 3) -> None:
    for attempt in range(tries):
        try:
            client.command(stmt)
            return
        except (OperationalError, DatabaseError) as e:
            if attempt == tries - 1:
                raise
            backoff = 2 ** attempt
            log.warning("transient error (attempt %d): %s — retry in %ds", attempt + 1, e, backoff)
            time.sleep(backoff)


def deploy(view_name: str, ddl: str, select_body: str, dry: bool = False) -> str:
    target = checksum(ddl)
    if live_hash(view_name) == target:
        log.info("%s: already converged, skipping", view_name)
        return "skipped"
    if dry:
        return "valid" if dry_run(view_name, select_body) else "invalid"

    try:
        execute_with_retry(f"DROP VIEW IF EXISTS {view_name} SYNC")
        execute_with_retry(ddl)
        client.insert(
            "analytics.mv_deployment_registry",
            [[view_name, target, 1, time.time()]],
            column_names=["view_name", "sql_hash", "status", "deployed_at"],
        )
        log.info("%s: deployed", view_name)
        return "success"
    except Exception as e:  # noqa: BLE001 — record the failure, keep the batch moving
        client.insert(
            "analytics.mv_deployment_registry",
            [[view_name, target, 0, time.time()]],
            column_names=["view_name", "sql_hash", "status", "deployed_at"],
        )
        log.error("%s: deployment failed: %s", view_name, e)
        return "failed"

Driving the batch in dependency-resolved order (step in the flow above), a first run reports every view as deployed and an immediate second run reports every view as skipped — the signature of a genuinely idempotent deploy:

text

2026-07-04 09:12:05 [INFO] mv_events_daily_agg: deployed
2026-07-04 09:12:05 [INFO] mv_sessions_rollup: deployed
# second invocation, no source change:
2026-07-04 09:14:20 [INFO] mv_events_daily_agg: already converged, skipping
2026-07-04 09:14:20 [INFO] mv_sessions_rollup: already converged, skipping

Verification

Confirm the live cluster state matches what the registry claims. First, the registry itself — every managed view should have exactly one success row and the deploy timestamps should be recent:

sql

SELECT view_name, sql_hash, status, deployed_at
FROM analytics.mv_deployment_registry FINAL
ORDER BY deployed_at DESC;

Then check the server’s own view of each view against system.tables. This is the drift check a reconciliation job runs on a schedule: the create_table_query column is the canonical DDL ClickHouse holds, and it must correspond to the hash you recorded.

sql

SELECT name, engine, create_table_query
FROM system.tables
WHERE database = 'analytics'
  AND engine = 'MaterializedView'
ORDER BY name;

Finally, confirm no orphaned inner tables were left behind by an older implicit-engine deployment — a clean estate has no unmanaged .inner. tables shadowing your explicit targets:

sql

SELECT database, name
FROM system.tables
WHERE name LIKE '.inner.%'
  AND database = 'analytics';

An empty result set here means every view writes into a target you own and version — the precondition for evolving schemas without a full rebuild.

Gotchas & Edge Cases

DROP VIEW without SYNC leaves a live consumer. Dropping asynchronously returns before the view’s background insert threads stop. If your CREATE lands while the old consumer is still attached, both project into the target and you get duplicate rows until the next merge. Always drop with SYNC in a redeploy path.
A redeploy does not backfill. Replacing a materialized view only changes what happens to future inserts; rows written by the previous definition stay as they were. If the projection changed, reprocess history explicitly through an incremental refresh strategy rather than assuming the new view rewrote the past.
Normalization can hide semantically real changes. Lowercasing the whole statement means a case-sensitive string literal or a quoted identifier that differs only in case will hash identically and be skipped. If your definitions contain case-significant literals, exclude quoted regions from the lowercasing step.
Batch deploys amplify part pressure. Redeploying many views at once, each with a POPULATE-style backfill, can push a partition past parts_to_throw_insert and surface as TOO_MANY_PARTS mid-batch. Stagger heavy deploys and respect the ceilings covered in threshold tuning and performance limits; the retry loop above absorbs a brief spike but will not fix a sustained one.

Materialized View Creation Patterns — the decoupled TO-clause structure this deployment procedure assumes.
Mapping Cross-Table Dependencies for View Sync — how to derive the topological deploy order the batch runner needs.
Threshold Tuning & Performance Limits — the part-count and pool ceilings a batch deploy must stay under.
Incremental Refresh Strategies — how to reprocess history after a view definition changes.
MergeTree Engine Deep Dive — the target-table mechanics that make deployed views cheap to query and merge.

Up: Materialized View Creation Patterns

Automating Materialized View Deployment with Python

Prerequisites

Deployment Convergence Flow

Step-by-Step Procedure

1. Provision the deployment registry

2. Normalize and hash each definition

3. Open a client and read current live hashes

4. Validate before you mutate (dry-run)

5. Apply with the convergence gate and bounded retry

Verification

Gotchas & Edge Cases

Related