Skip to content

fix: surface seed and init failures instead of swallowing them#938

Merged
haritamar merged 2 commits intomasterfrom
devin/1772105501-investigate-databricks-schema-failures
Feb 26, 2026
Merged

fix: surface seed and init failures instead of swallowing them#938
haritamar merged 2 commits intomasterfrom
devin/1772105501-investigate-databricks-schema-failures

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Feb 26, 2026

Summary

Previously, dbt seed and dbt run failures during test environment initialization were silently swallowed because:

  1. The dbt runner is created with raise_on_failure=False
  2. Both DbtDataSeeder.seed() and Environment.init() ignored the bool return values from the runner

This caused confusing cascading failures — e.g., a SCHEMA_NOT_FOUND during seed would be swallowed, and the test would later fail with TABLE_OR_VIEW_NOT_FOUND, masking the real root cause. This was observed in databricks CI where per-worker schema creation (pytest-xdist _gw0_gw7 suffixes) intermittently fails.

Changes:

  • data_seeder.py: Check dbt_runner.seed() return value; raise RuntimeError immediately on failure instead of proceeding to yield
  • env.py: Check return values of both dbt run --selector init and dbt run --select elementary; log specific errors and raise RuntimeError if either fails
  • conftest.py: Add worker ID and schema suffix to init log messages for diagnosing parallel test failures

Review & Testing Checklist for Human

  • Verify RuntimeError in env.init() doesn't break xdist worker lifecycle: The raise will now fail the worker's session fixture. Confirm this correctly marks all tests in that worker as errors (expected behavior) rather than causing hangs or orphaned processes.
  • Verify RuntimeError in data_seeder.py propagates correctly through the context manager: The raise happens before yield inside a try/finally, so seed_path.unlink() in finally should still execute. Confirm no resource leaks.
  • Check if any existing CI jobs have init failures that were previously hidden: This PR intentionally surfaces failures that were silently swallowed. If any warehouse target has a flaky init, those tests will now fail loudly instead of producing confusing downstream errors.

Recommended test plan: Run the full CI matrix and compare results. Any new failures should be init/seed failures that were previously masked — verify they show the clear RuntimeError message pointing to the root cause.

Notes

Summary by CodeRabbit

  • Bug Fixes

    • Improved detection and reporting for database seed failures with clearer diagnostic messages and explicit failure propagation.
    • Strengthened test environment initialization to log errors and halt on critical setup failures rather than proceeding with invalid states.
  • Chores

    • Expanded logging during test setup to include worker and schema suffix details for better diagnostics and traceability.

- DbtDataSeeder.seed(): check return value and raise RuntimeError on failure
  (previously ignored the bool return, causing confusing TABLE_OR_VIEW_NOT_FOUND
  errors downstream when the real cause was SCHEMA_NOT_FOUND during seed)
- Environment.init(): check return values of dbt run commands and raise
  RuntimeError on failure (previously silently continued even if schema
  creation failed)
- conftest.py: log worker ID and schema_suffix during init for debugging
  parallel test runs (pytest-xdist workers gw0-gw7)

Co-Authored-By: unknown <>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Contributor

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 08a6ebb and 6b4acf9.

📒 Files selected for processing (1)
  • integration_tests/tests/env.py

📝 Walkthrough

Walkthrough

Adds explicit error checking, logging, and RuntimeError propagation to test environment initialization and data seeding; also exposes PYTEST_XDIST_WORKER and SCHEMA_NAME_SUFFIX from conftest imports and introduces a module-level logger in env.py.

Changes

Cohort / File(s) Summary
Test setup imports & logging
integration_tests/tests/conftest.py
Added PYTEST_XDIST_WORKER and SCHEMA_NAME_SUFFIX to imports from dbt_project; enhanced initialization log messages to include worker and schema suffix.
Seed error handling
integration_tests/tests/data_seeder.py
Captured dbt seed result, log detailed error on failure, and raise RuntimeError when seed does not succeed.
Environment run checks & logger
integration_tests/tests/env.py
Added module-level logger; replaced sequential run calls with checked runs (selector="init" and select="elementary"), logging errors and raising RuntimeError on failure of either step.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped through logs both near and far,
I checked each seed and named each star.
When runs stumble, I loudly say—
"Raise the flag, don't sneak away!"
The warren's tests now stand up tall, hooray! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding proper error handling to surface seed and init failures instead of silently swallowing them, which aligns with the code changes across all three modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1772105501-investigate-databricks-schema-failures

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
integration_tests/tests/env.py (1)

28-46: Consider early exit on init failure to reduce log noise.

The current approach runs both commands and collects all failures before raising, which is reasonable for diagnostics. However, if init fails, the elementary run will almost certainly fail too (since it depends on init artifacts), producing additional error output that may obscure the root cause.

You might consider short-circuiting:

♻️ Optional: Early exit on init failure
     def init(self):
         init_success = self.dbt_runner.run(selector="init")
         if not init_success:
             logger.error(
                 "Environment init failed: 'dbt run --selector init' returned "
                 "failure. The target schema may not have been created. "
                 "Subsequent seeds and queries will likely fail with "
                 "SCHEMA_NOT_FOUND or TABLE_OR_VIEW_NOT_FOUND."
             )
+            raise RuntimeError(
+                "Test environment initialization failed during 'dbt run --selector init'. "
+                "Check the dbt output above for the root cause."
+            )
         elementary_success = self.dbt_runner.run(select="elementary")
         if not elementary_success:
             logger.error(
                 "Environment init failed: 'dbt run --select elementary' "
                 "returned failure. Elementary models may not be available."
             )
-        if not init_success or not elementary_success:
             raise RuntimeError(
-                "Test environment initialization failed. Check the dbt "
-                "output above for the root cause."
+                "Test environment initialization failed during 'dbt run --select elementary'. "
+                "Check the dbt output above for the root cause."
             )

If the current behavior of running both is intentional for completeness, that's also fine.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/env.py` around lines 28 - 46, When init fails, avoid
running the dependent "elementary" dbt step to reduce noise: in the init logic
around self.dbt_runner.run(selector="init") check init_success and if False
immediately log the error and raise/return instead of proceeding to call
self.dbt_runner.run(select="elementary"); keep the existing error message for
init (and only run the elementary step when init_success is True), referencing
the init_success and elementary_success variables and the
self.dbt_runner.run(...) calls so the short-circuit is applied in the same
block.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@integration_tests/tests/env.py`:
- Around line 28-46: When init fails, avoid running the dependent "elementary"
dbt step to reduce noise: in the init logic around
self.dbt_runner.run(selector="init") check init_success and if False immediately
log the error and raise/return instead of proceeding to call
self.dbt_runner.run(select="elementary"); keep the existing error message for
init (and only run the elementary step when init_success is True), referencing
the init_success and elementary_success variables and the
self.dbt_runner.run(...) calls so the short-circuit is applied in the same
block.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f0307f3 and 08a6ebb.

📒 Files selected for processing (3)
  • integration_tests/tests/conftest.py
  • integration_tests/tests/data_seeder.py
  • integration_tests/tests/env.py

When 'dbt run --selector init' fails, skip the dependent 'dbt run --select
elementary' step since it will almost certainly fail too, producing additional
noise that obscures the root cause.

Addresses CodeRabbit nitpick.

Co-Authored-By: unknown <>
@haritamar haritamar merged commit f3ca4da into master Feb 26, 2026
21 checks passed
@haritamar haritamar deleted the devin/1772105501-investigate-databricks-schema-failures branch February 26, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant