fix: use branch name + date + hash for CI schema names to prevent collisions by devin-ai-integration[bot] · Pull Request #940 · elementary-data/dbt-data-reliability

devin-ai-integration · 2026-02-27T09:44:19Z

Summary

Fixes flaky Databricks CI failures (SCHEMA_NOT_FOUND / TABLE_OR_VIEW_NOT_FOUND) caused by cross-branch schema name collisions.

Root cause: The "Write dbt profiles" step truncated the schema identifier to 40 characters via head -c 40. For databricks_catalog, the warehouse prefix alone consumes 39 of those 40 characters, leaving only 1 character to differentiate branches. Any two branches starting with the same letter (e.g. core-402-duckdb-support and core-397-clickhouse-support) produced identical schemas. When their CI jobs ran concurrently (different concurrency groups, same schema), one job's env.clear() would drop the other's schemas mid-test.

Evidence: Runs 22466863627 and 22466754085 overlapped (00:08–00:42 UTC, Feb 27), both targeting schema dbt_pkg_databricks_catalog_dbt_latest_official_c, both failed. The latest run with no concurrent Databricks job passed.

Fix: Replace head -c 40 truncation with a structured suffix: <branch_24>_<YYMMDD>_<8-char SHA-256 of concurrency group>.

Truncated branch name (≤24 chars) — keeps schemas human-readable in the data platform
Date stamp (YYMMDD) — makes it easy to identify and clean up old schemas by age
8-char hash of the full concurrency group (warehouse type + dbt version + branch) — guarantees uniqueness across concurrent jobs

Example schema: dbt_pkg_core_402_duckdb_suppor_260227_a1b2c3d4

Worst-case total: dbt_pkg_ (8) + branch (24) + _ (1) + date (6) + _ (1) + hash (8) + _elementary (11) + _gw7 (4) = 63 chars — exactly at PostgreSQL's limit (the most restrictive warehouse).

Review & Testing Checklist for Human

Verify schema suffix in CI logs: Check the "Write dbt profiles" step output in this PR's CI run — it now logs the resolved SCHEMA_SUFFIX. Confirm it matches the <branch>_<YYMMDD>_<hash> format.
Run a second PR's CI concurrently against this one (ideally with a branch starting with the same characters) to validate that the collision is actually resolved end-to-end.
Spot-check the 63-char budget math against the actual schema names visible in Databricks/Snowflake after a CI run. The _elementary and _gw7 suffixes are appended by pytest-xdist and dbt — verify these haven't changed length.
Orphaned schemas from old naming convention: The old dbt_pkg_databricks_catalog_dbt_latest_official_c* schemas on Databricks/Snowflake/etc won't be automatically cleaned up. Consider a one-time manual cleanup.

Notes

Other warehouse types were less affected (snowflake/bigquery/redshift had 10-11 chars of differentiation), but the old approach was still fragile — this fix eliminates the truncation risk entirely.
The cancel-in-progress + --clear-on-end race condition (where a cancelled run's teardown drops the new run's schemas) is a separate, less frequent issue not addressed here.
The date stamp uses the CI runner's date at execution time (UTC), not the PR creation date.

Link to Devin run: https://app.devin.ai/sessions/e01c77a2322a476faf5f6162e9390351
Requested by: @haritamar

Summary by CodeRabbit

Chores
- Updated test workflow to use a safe, hash-based schema suffix combining a sanitized branch segment, a YYMMDD date stamp, and an 8-character hash for uniqueness.
- Added branch sanitization, length-safety safeguards, and an echo of the computed suffix for visibility.
- Kept existing concurrency grouping while improving collision avoidance and documentation.

… collisions The previous approach truncated the full ref string to 40 characters with `head -c 40`. For databricks_catalog, the warehouse prefix alone consumed 39 of those 40 characters, leaving only 1 character to differentiate branches. Any two branches starting with the same letter (e.g. core-402-duckdb-support and core-397-clickhouse-support) produced identical schema names, causing concurrent CI jobs to drop each other's schemas (SCHEMA_NOT_FOUND / TABLE_OR_VIEW_NOT_FOUND). New approach: use `pr<number>_<8-char hash of concurrency group>` (or `run<id>_<hash>` for workflow_dispatch). This guarantees uniqueness across concurrent jobs while staying well within PostgreSQL's 63-char schema name limit (worst case: 46 chars). Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

devin-ai-integration · 2026-02-27T09:44:24Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

github-actions · 2026-02-27T09:44:29Z

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

coderabbitai · 2026-02-27T09:44:36Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between aa8ae1d and c58c8c0.

📒 Files selected for processing (1)

.github/workflows/test-warehouse.yml

📝 Walkthrough

Walkthrough

Replaces substring-based schema naming with a collision-resistant scheme: builds CONCURRENCY_GROUP from warehouse-type, dbt-version, and branch, computes an 8-char SHA-256 SHORT_HASH, creates a sanitized lowercase SAFE_BRANCH, adds DATE_STAMP (YYMMDD), echoes and composes SCHEMA_SUFFIX as SAFE_BRANCH_DATE_STAMP_SHORT_HASH, and sets schema to dbt_pkg_$SCHEMA_SUFFIX. (50 words)

Changes

Cohort / File(s)	Summary
GitHub Workflow Schema Naming `.github/workflows/test-warehouse.yml`	Replaces truncated-ref schema derivation with a hash-based suffix. Adds `CONCURRENCY_GROUP` (from warehouse-type, dbt-version, branch), `SHORT_HASH` (first 8 chars of SHA-256), `SAFE_BRANCH` (lowercased, sanitized, max 24 chars), and `DATE_STAMP` (YYMMDD); composes `SCHEMA_SUFFIX` as `SAFE_BRANCH_DATE_STAMP_SHORT_HASH`, echoes the suffix, and writes schema as `dbt_pkg_$SCHEMA_SUFFIX`. Retains existing concurrency grouping key while changing the written schema name; includes inline comments on collision avoidance and PostgreSQL name-length budgeting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibble on hashes in moonlit code,
SAFE_BRANCH curled, DATE_STAMP in tow,
SHORT_HASH hops where conflicts won't grow,
SCHEMA_SUFFIX stitched soft and low,
CI gardens bloom where binaries sow. 🌿

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main fix: switching from truncation to a branch-name + date + hash scheme for CI schema names to prevent collisions.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch devin/1772185362-fix-databricks-schema-collision

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

.github/workflows/test-warehouse.yml (1)

142-153: Optional: log resolved schema suffix once for easier CI verification.

A single log line would simplify validating workflow_call propagation and diagnosing future schema issues.

💡 Minimal observability tweak

           if [ -n "$PR_NUMBER" ]; then
             SCHEMA_SUFFIX="pr${PR_NUMBER}_${SHORT_HASH}"
           else
             # workflow_dispatch or other non-PR triggers — use run id
             SCHEMA_SUFFIX="run${{ github.run_id }}_${SHORT_HASH}"
           fi
+          echo "Using CI schema suffix: $SCHEMA_SUFFIX"
 
           python "${{ github.workspace }}/dbt-data-reliability/integration_tests/profiles/generate_profiles.py" \
             --template "${{ github.workspace }}/dbt-data-reliability/integration_tests/profiles/profiles.yml.j2" \
             --output ~/.dbt/profiles.yml \
             --schema-name "dbt_pkg_$SCHEMA_SUFFIX"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test-warehouse.yml around lines 142 - 153, Add a single
echo log that prints the resolved SCHEMA_SUFFIX (constructed from
PR_NUMBER/SHORT_HASH or run${{ github.run_id }}_SHORT_HASH) right after its
assignment and before invoking the python generate_profiles.py step so CI shows
the final schema name; reference the SCHEMA_SUFFIX variable and the
PR_NUMBER/SHORT_HASH resolution logic and ensure the log runs in both PR and
non-PR branches.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/test-warehouse.yml:
- Around line 142-153: Add a single echo log that prints the resolved
SCHEMA_SUFFIX (constructed from PR_NUMBER/SHORT_HASH or run${{ github.run_id
}}_SHORT_HASH) right after its assignment and before invoking the python
generate_profiles.py step so CI shows the final schema name; reference the
SCHEMA_SUFFIX variable and the PR_NUMBER/SHORT_HASH resolution logic and ensure
the log runs in both PR and non-PR branches.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 03bc6dd and 0917081.

📒 Files selected for processing (1)

.github/workflows/test-warehouse.yml

…ee schemas Per feedback: use <truncated_branch>_<8-char hash of concurrency group> instead of pr<number>_<hash>. This keeps schemas human-readable in the data platform so you can tell which branch a schema belongs to. The hash of the full concurrency group (warehouse type + dbt version + branch) guarantees uniqueness even when branch names are truncated. Also adds a log line echoing the resolved schema suffix for CI debugging. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

Schema format is now: dbt_pkg_<branch_24>_<YYMMDD>_<hash_8> Example: dbt_pkg_core_402_duckdb_suppor_260227_a1b2c3d4 The date stamp makes it easy to identify and clean up old schemas by age (e.g. DROP schemas older than 30 days). Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai

🧹 Nitpick comments (1)

.github/workflows/test-warehouse.yml (1)
145-148: Consider extracting the concurrency group pattern to avoid duplication.

The concurrency group string is defined in two places:

Line 60: tests_${{ inputs.warehouse-type }}_dbt_${{ inputs.dbt-version }}_${{ github.head_ref || github.ref_name }}

Line 145: tests_${{ inputs.warehouse-type }}_dbt_${{ inputs.dbt-version }}_${BRANCH_NAME}

While functionally equivalent (since BRANCH_NAME is set from the same GitHub expression), if someone modifies the concurrency group format in one location, they could forget the other, causing the hash to no longer match the actual concurrency group—defeating the collision-avoidance purpose.

One option to reduce this risk is to add a comment near line 60 pointing to the reconstruction here, or vice versa.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test-warehouse.yml around lines 145 - 148, Extract the
concurrency group construction into a single reusable variable and reference it
in both places instead of reconstructing it: create a single step or env var
named CONCURRENCY_GROUP (used by the existing CONCURRENCY_GROUP assignment and
by the earlier occurrence at line ~60) and then derive SHORT_HASH, SAFE_BRANCH,
and SCHEMA_SUFFIX from that single CONCURRENCY_GROUP/BRANCH_NAME source;
alternatively, if you prefer minimal change, add a clear comment next to the
first concurrency-group declaration pointing to the reconstruction here (and
vice versa) so future edits keep both formats in sync.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/test-warehouse.yml:
- Around line 145-148: Extract the concurrency group construction into a single
reusable variable and reference it in both places instead of reconstructing it:
create a single step or env var named CONCURRENCY_GROUP (used by the existing
CONCURRENCY_GROUP assignment and by the earlier occurrence at line ~60) and then
derive SHORT_HASH, SAFE_BRANCH, and SCHEMA_SUFFIX from that single
CONCURRENCY_GROUP/BRANCH_NAME source; alternatively, if you prefer minimal
change, add a clear comment next to the first concurrency-group declaration
pointing to the reconstruction here (and vice versa) so future edits keep both
formats in sync.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0917081 and 7f66573.

📒 Files selected for processing (1)

.github/workflows/test-warehouse.yml

haritamar · 2026-02-27T10:36:42Z

.github/workflows/test-warehouse.yml

+          # Build a short, collision-free schema identifier from the branch
+          # name and an 8-char hash of the full concurrency group.
+          #
+          # Previous approach used `head -c 40` on the full ref string, which


Explaining the history is not interesting. Replace with a short explanation about how it is computed.

Done — simplified the comment to just explain the format and budget, removed the history explanation. See c58c8c0.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

devin-ai-integration bot assigned haritamar Feb 27, 2026

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

devin-ai-integration bot and others added 2 commits February 27, 2026 09:49

devin-ai-integration bot changed the title ~~fix: use PR number + hash for CI schema names to prevent cross-branch collisions~~ fix: use branch name + date + hash for CI schema names to prevent collisions Feb 27, 2026

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

haritamar reviewed Feb 27, 2026

View reviewed changes

docs: simplify schema computation comment per review feedback

c58c8c0

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

devin-ai-integration bot mentioned this pull request Feb 27, 2026

fix: add Docker healthcheck for Dremio in CI #941

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use branch name + date + hash for CI schema names to prevent collisions#940

fix: use branch name + date + hash for CI schema names to prevent collisions#940
devin-ai-integration[bot] wants to merge 4 commits intomasterfrom
devin/1772185362-fix-databricks-schema-collision

devin-ai-integration bot commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

devin-ai-integration bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

coderabbitai bot commented Feb 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

haritamar Feb 27, 2026

Uh oh!

devin-ai-integration bot Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devin-ai-integration bot commented Feb 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review & Testing Checklist for Human

Notes

Summary by CodeRabbit

Uh oh!

devin-ai-integration bot commented Feb 27, 2026

🤖 Devin AI Engineer

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

haritamar Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devin-ai-integration bot commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading