Skip to content

Antalya 26.1; Remote initiator improvements 2#1608

Open
ianton-ru wants to merge 3 commits intoantalya-26.1from
feature/antalya-26.1/remote_initiator_improvements_2
Open

Antalya 26.1; Remote initiator improvements 2#1608
ianton-ru wants to merge 3 commits intoantalya-26.1from
feature/antalya-26.1/remote_initiator_improvements_2

Conversation

@ianton-ru
Copy link
Copy Markdown

@ianton-ru ianton-ru commented Apr 1, 2026

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Use object_storage_remote_initiator without object_storage_cluster on initial node.

Documentation entry for user-facing changes

Solved #1607
Query with object_storage_remote_initiator and object_storage_remote_initiator_cluster can be executed, when object_storage_cluster setting is defined only on remote node.

SELECT * FROM datalake.`iceberg.table` SETTINGS object_storage_remote_initiator=1, object_storage_remote_initiator_cluster='swarm_external_name'

converted to

SELECT * FROM remote('random_node_from_swarm_external_name', iceberg(...))

Remote node random_node_from_swarm_external_name executes

SELECT* FROM iceberg(...)

as simple table function or as cluster table function depend on object_storage_cluster defined only on this node.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@ianton-ru
Copy link
Copy Markdown
Author

@codex review

@ianton-ru ianton-ru changed the title Antalya 26.1; Remote initiator improvements 2 [WIP] Antalya 26.1; Remote initiator improvements 2 Apr 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Workflow [PR], commit [fb624cb]

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a03617679

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const auto & settings = context->getSettingsRef();

is_cluster_function = !settings[Setting::object_storage_cluster].value.empty() && typename Base::Configuration().isClusterSupported();
is_cluster_function = typename Base::Configuration().isClusterSupported();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve simple-mode fallback when no cluster is configured

Assigning is_cluster_function from Base::Configuration().isClusterSupported() unconditionally forces s3(...)/iceberg(...) fallback functions into cluster mode even when object_storage_cluster is empty. In the CREATE TABLE ... AS <table_function> path, arguments are parsed before validateUseToCreateTable() is called, so this change now makes validateUseToCreateTable() throw for previously valid non-cluster statements like CREATE TABLE t AS s3(...). Please keep cluster mode gated by an actual configured cluster name (or a narrower remote-initiator condition) to avoid regressing non-cluster table-function creation.

Useful? React with 👍 / 👎.

@ianton-ru
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ed463afe38

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +118 to +119
is_cluster_function = (!settings[Setting::object_storage_cluster].value.empty() || !settings[Setting::object_storage_remote_initiator_cluster].value.empty())
&& typename Base::Configuration().isClusterSupported();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate cluster fallback on remote initiator being enabled

This condition now flips to cluster-mode parsing whenever object_storage_remote_initiator_cluster is set, even if object_storage_remote_initiator is disabled. That changes plain s3/iceberg/... table-function behavior globally for sessions that carry only the cluster-name setting (for example from a profile), and can trigger cluster-only restrictions such as validateUseToCreateTable() rejecting create-table usage unexpectedly.

Useful? React with 👍 / 👎.

@ianton-ru ianton-ru force-pushed the feature/antalya-26.1/remote_initiator_improvements_2 branch from ed463af to c60f71f Compare April 1, 2026 14:27
@ianton-ru
Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ianton-ru ianton-ru force-pushed the feature/antalya-26.1/remote_initiator_improvements_2 branch from c60f71f to b1efb77 Compare April 1, 2026 15:42
@ianton-ru
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b1efb77d2a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const auto & settings = context->getSettingsRef();
ASTPtr query_to_send = query_info.query;

if (!isClusterSupported() || cluster_name_from_settings.empty())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exclude non-cluster storages from remote-initiator rewrite

The new if (!isClusterSupported() || cluster_name_from_settings.empty()) branch now routes object_storage_remote_initiator=1 queries through updateQueryToSendIfNeeded(..., make_cluster_function=false) even when the storage explicitly reports isClusterSupported()==false. For datalake configs like StorageLocalPaimonConfiguration (used by PaimonLocal), StorageObjectStorageCluster::updateQueryForDistributedEngineIfNeeded cannot map the engine name (PaimonLocal) to a table function and throws a logical error before the remote call is built. Previously these queries fell back to pure local read; now they fail whenever remote initiator is enabled.

Useful? React with 👍 / 👎.

@ianton-ru ianton-ru force-pushed the feature/antalya-26.1/remote_initiator_improvements_2 branch from b1efb77 to 5ea8d78 Compare April 1, 2026 16:32
@ianton-ru
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ea8d78f3f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +462 to +463
std::vector<std::string> settings_to_restore = {"object_storage_remote_initiator", "object_storage_remote_initiator_cluster"};
new_context->resetSettingsToDefaultValue(settings_to_restore);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Strip remote initiator cluster setting before forwarding

This path resets object_storage_remote_initiator_cluster only in new_context, but the forwarded SQL still keeps the query-level setting because only object_storage_remote_initiator is removed from ASTSetQuery. In the new flow where object_storage_remote_initiator_cluster is required when no local object_storage_cluster exists, every forwarded query can carry that setting to the remote host; mixed-version nodes (or profiles with setting constraints) can then fail on an otherwise valid query. Remove object_storage_remote_initiator_cluster from the forwarded query settings as well.

Useful? React with 👍 / 👎.

@ianton-ru ianton-ru force-pushed the feature/antalya-26.1/remote_initiator_improvements_2 branch from 5ea8d78 to fb624cb Compare April 1, 2026 18:42
@ianton-ru
Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ianton-ru
Copy link
Copy Markdown
Author

Audit: PR #1608 — Antalya 26.1: Remote initiator improvements 2

Source: Altinity/ClickHouse#1608
Base: antalya-26.1
Reviewed revision: branch including commit fb624cbad07 (Fix, current tip at audit time) and related feature commits (e.g. e1286a97214, merge 6a036176795 with antalya-26.1).

User-facing intent: Allow object_storage_remote_initiator with object_storage_remote_initiator_cluster when object_storage_cluster is defined only on the remote node (initiator runs s3/iceberg/… without a local cluster name; query is rewritten to remote(...)).


Confirmed defects

Medium — Remote initiator ignored when isClusterSupported() is false

  • Impact: For StorageObjectStorageCluster backed by a configuration with isClusterSupported() == false (e.g. DataLakeConfiguration with is_cluster_supported = false, such as local Paimon / local Delta Lake templates in DataLakeConfiguration.h), a query with object_storage_remote_initiator = 1 and a non-empty object_storage_remote_initiator_cluster no longer takes the convertToRemote path. The new early return calls readFallBackToPure on the initiator only, so execution is not delegated to the remote cluster. Previously, the remote-initiator branch lived inside if (!isClusterSupported() || cluster_name_from_settings.empty()), so it still ran when cluster support was disabled but the local cluster name was empty (which is always the case here because StorageObjectStorageCluster::getClusterName returns "" when !isClusterSupported()).
  • Anchor: IStorageCluster::readsrc/Storages/IStorageCluster.cpp (early if (!isClusterSupported()) { readFallBackToPure; return; } before the object_storage_remote_initiator block).
  • Trigger: object_storage_remote_initiator enabled, object_storage_remote_initiator_cluster set, storage configuration reports isClusterSupported() == false, local object_storage_cluster empty (typical for these configs).
  • Why defect: Control-flow regression versus the prior combined condition; remote-initiator settings are silently ignored instead of forwarding the query, which can change where the query runs (initiator vs chosen remote host).
  • Fix direction (short): Reorder or nest checks so object_storage_remote_initiator is evaluated for the empty–local-cluster case even when isClusterSupported() is false, or explicitly reject with BAD_ARGUMENTS if that combination is unsupported (fail closed with a clear error instead of silent local fallback).
  • Regression test direction (short): Integration test: table function / engine path with isClusterSupported() == false, remote initiator settings, assert query runs on remote (or document and assert explicit error if intentionally unsupported).

Positive change in the same commit (not a defect)

  • convertToRemote: Query-level SETTINGS now remove both object_storage_remote_initiator and object_storage_remote_initiator_cluster when stripping the clause, and the copied context resets both via resetSettingsToDefaultValue. This avoids leaving object_storage_remote_initiator_cluster in the AST while the initiator flag was already cleared.

Previously identified issue (resolved, earlier in PR)

With empty local object_storage_cluster, cluster_name_in_settings stayed false unless setClusterNameInSettings(true) ran for the remote-only Fallback path, which broke extractDynamicStorageType / named-collection handling for data-lake table functions.

  • Resolution: TableFunctionObjectStorageClusterFallback::executeImpl calls setClusterNameInSettings(true) for every StorageObjectStorageCluster produced via BaseCluster::executeImpl when is_cluster_function || is_remote.

Coverage summary

Area Notes
Scope reviewed TableFunctionObjectStorageClusterFallback; IStorageCluster::read (new isClusterSupported early exit + empty-cluster + remote branches); IStorageCluster::convertToRemote (context + AST SETTINGS cleanup); StorageObjectStorageCluster::{getClusterName, readFallBackToPure, updateQueryToSendIfNeeded}; DataLakeConfiguration / is_cluster_supported; integration tests under tests/integration/test_s3_cluster/.
Categories failed Remote-initiator path vs isClusterSupported() ordering (regression for isClusterSupported() == false).
Categories passed Cluster-capable object storage + empty local cluster + remote initiator (unchanged vs prior split); setClusterNameInSettings for remote-only Fallback; stripping both remote-initiator-related settings from AST and context on convertToRemote; is_cluster_function still gated on non-empty object_storage_cluster + cluster support (CREATE TABLE / simple mode).
Assumptions / limits Static review; integration tests not executed here. Blast radius of the Medium finding is narrow (configurations that override isClusterSupported() to false).

Notable code anchors

  • Early cluster-support exit: src/Storages/IStorageCluster.cppread() lines ~319–323 (!isClusterSupported()readFallBackToPure).
  • Remote initiator (cluster-capable, empty local cluster): same file — cluster_name_from_settings.empty() branch with object_storage_remote_initiator.
  • Fallback + cluster_name_in_settings: src/TableFunctions/TableFunctionObjectStorageClusterFallback.cppexecuteImpl.
  • getClusterName when cluster unsupported: src/Storages/ObjectStorage/StorageObjectStorageCluster.cpp — returns "" if !isClusterSupported().

Audit metadata

  • Method: Static review (call graph / control-flow diff vs previous antalya-26.1...HEAD, fault categories, spot-check for C++ / concurrency on touched paths).
  • Output rule: Confirmed defects only in the main list; improvements called out separately.

AI-generated audit note.

@ianton-ru
Copy link
Copy Markdown
Author

Medium — Remote initiator ignored when isClusterSupported() is false

It is expected.

@ianton-ru ianton-ru changed the title [WIP] Antalya 26.1; Remote initiator improvements 2 Antalya 26.1; Remote initiator improvements 2 Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants