Security & Access Control Boundaries

Without explicit access-control boundaries, a ClickHouse analytics pipeline fails in ways that are silent until they are catastrophic: a service account that holds ALL PRIVILEGES lets a routine ETL job rewrite a production reporting table, a materialized view created by an admin quietly exposes a masked column to every downstream reader, and an unbounded quota lets one runaway query saturate the merge threads that every other tenant depends on. These are not administrative afterthoughts — they are architectural constraints that shape query execution paths, memory allocation, and data-lineage integrity. Ownership is shared: the DevOps and analytics-platform teams own the role hierarchy, quotas, and network perimeter, while the Python ETL developers own credential lifecycle and connection hardening on the client side. This page defines the privilege model, the copy-ready DDL that backs it, a step-by-step rollout with verification at each stage, the tuning thresholds that keep automated workloads inside a predictable envelope, and the named failure modes you will actually hit.

The boundaries here build directly on the query-execution mechanics established in ClickHouse Core Architecture & Analytics Fundamentals: the engine evaluates privileges at parse time, so every boundary described below is enforced before a single granule is read, which is what makes early-stage scoping the cheapest place to stop privilege escalation and resource contention.

Privilege Boundary Topology

ClickHouse enforces security through a declarative, SQL-driven privilege model that maps directly to database objects, query patterns, and resource quotas. The safest way to reason about it is to treat each pipeline stage — raw ingestion, staging, transformation, and consumption — as an isolated security domain with exactly one role that can act inside it. Roles never overlap, so a compromised or buggy ingestion job can touch raw tables and nothing else. The topology below shows how each role is confined to a single stage while data still flows forward through the pipeline.

The critical property is that a role’s reach is defined by object grants, not by network location or trust: even a client on the internal VPC with valid credentials can only execute what its role permits. That makes the grant graph the single source of truth for what any workload can do.

Core Privilege & Quota Reference

Ground the hierarchy in real objects. A tiered pipeline typically has a raw landing table, a staging layer that materialized views write into, and a curated analytics schema that reporting reads from. The DDL below is copy-ready ClickHouse SQL; each grant is scoped to the narrowest verb set a stage needs.

sql

-- Raw landing table the ingestion role writes into.
CREATE TABLE IF NOT EXISTS raw_events.clickstream
(
    event_id      UUID,
    event_time    DateTime64(3),
    event_type    LowCardinality(String),
    user_id       UInt64,
    region        LowCardinality(String),
    payload       String
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(event_time)
ORDER BY (event_type, event_time, user_id);

-- Create one role per pipeline stage.
CREATE ROLE IF NOT EXISTS etl_ingest_role;
CREATE ROLE IF NOT EXISTS mv_automation_role;
CREATE ROLE IF NOT EXISTS analytics_read_role;
CREATE ROLE IF NOT EXISTS platform_admin_role;

-- Object-level grants (least privilege — no ALL PRIVILEGES anywhere).
GRANT INSERT, SELECT              ON raw_events.*  TO etl_ingest_role;
GRANT SELECT, CREATE VIEW,
      CREATE TABLE, ALTER, INSERT ON staging.*     TO mv_automation_role;
GRANT SELECT                      ON analytics.*   TO analytics_read_role;
GRANT ACCESS MANAGEMENT, SELECT   ON *.*           TO platform_admin_role;

-- Bound automated workloads with an explicit quota so a runaway
-- query cannot starve merge threads shared by every tenant.
CREATE QUOTA IF NOT EXISTS etl_ingest_quota
KEYED BY user_name
FOR RANDOMIZED INTERVAL 1 HOUR
    MAX queries = 500,
    MAX errors = 50,
    MAX execution_time = 300,
    MAX result_rows = 10000000,
    MAX read_rows = 500000000
TO etl_ingest_role;

The MAX read_rows ceiling matters most when a job scans wide tables, because it correlates directly with I/O pressure and column-decompression overhead — the same forces analysed in columnar storage & compression. Setting it too low breaks legitimate backfills; too high defeats the quota’s purpose. Size it against a real p99 scan, not a guess.

Step-by-Step Rollout

Roll the boundary out stage by stage, verifying after each grant so a misconfiguration surfaces immediately rather than during an incident.

1. Create the roles and confirm they exist

Apply the CREATE ROLE statements above, then confirm the four roles are registered before granting anything:

sql

SELECT name, storage FROM system.roles ORDER BY name;
-- Expect: analytics_read_role, etl_ingest_role, mv_automation_role, platform_admin_role

2. Grant object-level privileges and verify the grant graph

Apply the GRANT statements, then read the effective grants back to confirm no role reaches beyond its stage:

sql

SELECT role_name, access_type, database, table
FROM system.grants
WHERE role_name IN ('etl_ingest_role','mv_automation_role','analytics_read_role')
ORDER BY role_name, database;
-- Verify etl_ingest_role has NO rows for the analytics database.

3. Bind roles to service users as default roles

Each workload authenticates as a user whose only default role is its stage role, so privileges are active without an explicit SET ROLE:

sql

CREATE USER IF NOT EXISTS svc_etl
    IDENTIFIED WITH sha256_password BY '{{from_secret_manager}}'
    HOST IP '10.0.0.0/24'
    DEFAULT ROLE etl_ingest_role;

GRANT etl_ingest_role TO svc_etl;

Verify the binding and that the host restriction took effect:

sql

SHOW CREATE USER svc_etl;
-- Confirm DEFAULT ROLE etl_ingest_role and HOST IP '10.0.0.0/24' are present.

4. Apply row-level masking for regulated columns

Where a stage must read a table but not every row, chain a row policy so the restriction is enforced in the engine rather than in application code:

sql

-- Analysts see only their own region; the policy is AND-combined
-- with any other permissive policy on the same table.
CREATE ROW POLICY IF NOT EXISTS analyst_region_policy ON analytics.events
FOR SELECT
USING region = getSetting('SQL_analyst_region')
TO analytics_read_role;

Verify the policy is active and attached to the right role:

sql

SELECT short_name, database, table, select_filter
FROM system.row_policies
WHERE table = 'events';

5. Freeze the schema surface for automated writers

A materialized view automation role needs to create and write views, but it must not be able to silently drift the staging schema. Revoke the column-altering verbs after the initial grant:

sql

REVOKE ALTER ADD COLUMN, ALTER DROP COLUMN, ALTER MODIFY COLUMN
    ON staging.events FROM mv_automation_role;

Confirm the revoke landed:

sql

SELECT access_type FROM system.grants
WHERE role_name = 'mv_automation_role' AND table = 'events'
  AND access_type LIKE 'ALTER%';
-- Expect zero ALTER COLUMN rows.

Materialized View Execution Context

Materialized views are the sharpest edge in the whole boundary because of a single, easily-missed rule: an MV runs with the privileges of the user who created it, not the user whose INSERT triggers it. An admin who creates a view over a masked source table effectively hands every downstream reader the admin’s view of that data. This is why view creation must be decoupled from routine ingestion and pinned to the scoped mv_automation_role, which holds only SELECT on sources and INSERT on targets.

The creation and privilege patterns that keep this contract intact are covered end-to-end in materialized view management & sync automation; pin any view whose refresh cost is sensitive to source cardinality against the sizing guidance in threshold tuning & performance limits, because an MV that inherits both broad privileges and an unbounded quota is the classic cause of a refresh cycle triggering an OOM kill on concurrent readers.

Python ETL Integration & Credential Lifecycle

Python ETL developers integrate with ClickHouse without ever embedding a static credential in an orchestration script or CI job. Credentials arrive through environment-driven secret injection and are rotated on a fixed cycle; the client enforces TLS verification and disables introspection functions that could leak metadata. Use clickhouse-connect — the maintained client — not the legacy driver:

python

import os
import clickhouse_connect

# Connection hardened at construction time: TLS verified, introspection
# off, and per-session limits that mirror the server-side quota.
client = clickhouse_connect.get_client(
    host=os.environ["CLICKHOUSE_HOST"],
    port=int(os.environ["CLICKHOUSE_PORT"]),
    username=os.environ["CLICKHOUSE_USER"],
    password=os.environ["CLICKHOUSE_PASSWORD"],
    secure=True,
    verify=True,
    settings={
        "allow_introspection_functions": 0,
        "max_execution_time": 120,
        "max_memory_usage": 4_000_000_000,
    },
)

For rotation, generate service-account passwords with cryptographically secure randomness — Python’s secrets module, not random or a predictable hash — and store them in HashiCorp Vault or AWS Secrets Manager. Rotate ClickHouse service accounts on a 30–90 day cycle, issuing the new credential and updating IDENTIFIED WITH before revoking the old one so active pipeline connections drain without an ingestion gap. This hardened client is the same one that drives batch insert optimization and enforces schema validation & evolution on the way in — the security settings above compose with, rather than replace, those ingestion controls.

Network Perimeter & Transport Hardening

Access boundaries extend past SQL privileges into the transport layer. ClickHouse must never be exposed to a public endpoint; deployments rely on VPC segmentation, private subnets, and security-group rules that admit only authorized orchestration nodes, BI gateways, and ETL runners. Bind listeners to internal interfaces and require TLS for every client:

xml

<clickhouse>
    <listen_host>::1</listen_host>
    <listen_host>10.0.0.1</listen_host>
    <tcp_port_secure>9440</tcp_port_secure>
    <https_port>8443</https_port>
    <openSSL>
        <server>
            <certificateFile>/etc/clickhouse-server/certs/server.crt</certificateFile>
            <privateKeyFile>/etc/clickhouse-server/certs/server.key</privateKeyFile>
            <verificationMode>strict</verificationMode>
            <loadDefaultCAFile>true</loadDefaultCAFile>
        </server>
    </openSSL>
</clickhouse>

Validate segmentation continuously: automated checks must confirm that only whitelisted CIDR blocks can reach tcp_port_secure and that the plaintext http_port (8123) is disabled or fronted by an authenticated reverse proxy. The full security-group and firewall rule set is worked through in configuring ClickHouse network security groups, which pairs the client host restriction from Step 3 with the ingress rules that back it.

Audit Trails & Compliance Alignment

Boundaries are only trustworthy when paired with observability. The system.query_log, system.query_thread_log, and system.session_log tables expose privilege utilization, failed authorizations, and resource consumption; route them to a centralized SIEM or data lake for retention and anomaly detection. To align with SOC 2, HIPAA, or GDPR controls, restrict system.* reads to administrators and keep audit logging on:

sql

-- Only platform admins may read system tables.
REVOKE SELECT ON system.* FROM PUBLIC;
GRANT  SELECT ON system.* TO platform_admin_role;

-- Log every query and failed grant attempt for the audit trail.
SET log_queries = 1;
SET log_queries_min_type = 'QUERY_START';

Audit role assignments on a schedule by comparing declared stage permissions against observed query patterns, then revoke stale grants and archive unused roles. Treating the grant graph as a code-managed artifact — reviewed, diffed, and version-controlled — is what keeps the boundary honest as the pipeline grows.

Tuning Parameters

Setting	Default	Recommended (production)	Effect
`max_read_rows` (quota)	`0` (unbounded)	`5×10^8` per hour	Caps rows a role may scan; the primary guard against a runaway job saturating I/O.
`max_execution_time`	`0` (unbounded)	`120–300`	Kills long queries before they hold merge threads; short for ETL, longer for backfills.
`max_memory_usage`	`10 GB`	`4 GB` per ETL session	Per-query memory ceiling; prevents an MV refresh from OOM-killing concurrent readers.
`allow_introspection_functions`	`0`	`0`	Keep off for service accounts so `addressToLine`-style calls cannot leak host metadata.
`max_concurrent_queries_for_user`	`0` (unbounded)	`20–50`	Bounds a single service account’s fan-out so one client cannot exhaust the query pool.
`access_control_improvements/select_from_system_db_requires_grant`	`false`	`true`	Forces an explicit grant to read `system.*`, closing the default metadata-exposure gap.
`distributed_ddl/pool_size`	`1`	`1`	Serialize cluster-wide GRANT/REVOKE so access changes apply deterministically on every node.

Troubleshooting

Privilege escalation through a materialized view. Symptom: analytics_read_role users can read a column that a row policy or column grant should have hidden. Cause: the MV was created by an over-privileged user and inherits their access. Diagnose with SELECT create_table_query FROM system.tables WHERE name = '<mv_name>'; and check who owns it via SHOW GRANTS. Fix: drop and recreate the view under mv_automation_role, then confirm with SELECT * FROM system.grants WHERE role_name = 'mv_automation_role'; that it holds only SELECT on the source.

Quota silently ignored. Symptom: an ETL job scans far past its read_rows ceiling with no error. Cause: the quota is defined but not attached, or the role is not the user’s default. Diagnose with SELECT * FROM system.quota_usage WHERE quota_name = 'etl_ingest_quota'; — if read_rows stays NULL, the quota is not binding. Fix: re-run the CREATE QUOTA ... TO etl_ingest_role statement and confirm the user’s default role with SHOW CREATE USER svc_etl.

Service account locked out after rotation. Symptom: pipeline connections fail with Authentication failed right after a rotation window. Cause: the old credential was revoked before in-flight connections drained. Diagnose with SELECT * FROM system.session_log WHERE user = 'svc_etl' AND type = 'LoginFailure' ORDER BY event_time DESC LIMIT 20;. Fix: issue the new IDENTIFIED WITH first, let existing sessions expire, then revoke — never revoke-then-issue.

Unexpected system.* exposure. Symptom: a non-admin user reads system.query_log and sees other tenants’ query text. Cause: the default PUBLIC grant on system.* was never revoked. Diagnose with SELECT * FROM system.grants WHERE access_type = 'SELECT' AND database = 'system' AND role_name = '';. Fix: REVOKE SELECT ON system.* FROM PUBLIC and set select_from_system_db_requires_grant = true.

Row policy leaks after adding a second policy. Symptom: attaching a new permissive row policy widens rather than narrows access. Cause: ClickHouse OR-combines permissive policies, so an unscoped one overrides a strict one. Diagnose with SELECT short_name, is_restrictive, select_filter FROM system.row_policies WHERE table = 'events';. Fix: mark the guard policy AS RESTRICTIVE so it is AND-combined, ensuring every reader must satisfy it.

Configuring ClickHouse network security groups — the ingress rules and CIDR whitelists that back the transport hardening above
MergeTree engine deep dive — the storage engine whose scan cost your read_rows quota is sized against
Columnar storage & compression — decompression overhead that drives memory limits for scoped roles
Materialized view management & sync automation — the creation patterns that keep MV privilege inheritance safe
Batch insert optimization — the ingestion path the hardened clickhouse-connect client feeds

Up: ClickHouse Core Architecture & Analytics Fundamentals

Explore further

Configuring ClickHouse Network Security Groups When a ClickHouse cluster moves from a single-node proof of concept to a replicated production deployment, the network layer becomes the first thing that b…