Conversation
- DbtDataSeeder.seed(): check return value and raise RuntimeError on failure (previously ignored the bool return, causing confusing TABLE_OR_VIEW_NOT_FOUND errors downstream when the real cause was SCHEMA_NOT_FOUND during seed) - Environment.init(): check return values of dbt run commands and raise RuntimeError on failure (previously silently continued even if schema creation failed) - conftest.py: log worker ID and schema_suffix during init for debugging parallel test runs (pytest-xdist workers gw0-gw7) Co-Authored-By: unknown <>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @devin-ai-integration[bot] |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: defaults Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds explicit error checking, logging, and RuntimeError propagation to test environment initialization and data seeding; also exposes PYTEST_XDIST_WORKER and SCHEMA_NAME_SUFFIX from conftest imports and introduces a module-level logger in env.py. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
integration_tests/tests/env.py (1)
28-46: Consider early exit on init failure to reduce log noise.The current approach runs both commands and collects all failures before raising, which is reasonable for diagnostics. However, if
initfails, theelementaryrun will almost certainly fail too (since it depends on init artifacts), producing additional error output that may obscure the root cause.You might consider short-circuiting:
♻️ Optional: Early exit on init failure
def init(self): init_success = self.dbt_runner.run(selector="init") if not init_success: logger.error( "Environment init failed: 'dbt run --selector init' returned " "failure. The target schema may not have been created. " "Subsequent seeds and queries will likely fail with " "SCHEMA_NOT_FOUND or TABLE_OR_VIEW_NOT_FOUND." ) + raise RuntimeError( + "Test environment initialization failed during 'dbt run --selector init'. " + "Check the dbt output above for the root cause." + ) elementary_success = self.dbt_runner.run(select="elementary") if not elementary_success: logger.error( "Environment init failed: 'dbt run --select elementary' " "returned failure. Elementary models may not be available." ) - if not init_success or not elementary_success: raise RuntimeError( - "Test environment initialization failed. Check the dbt " - "output above for the root cause." + "Test environment initialization failed during 'dbt run --select elementary'. " + "Check the dbt output above for the root cause." )If the current behavior of running both is intentional for completeness, that's also fine.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration_tests/tests/env.py` around lines 28 - 46, When init fails, avoid running the dependent "elementary" dbt step to reduce noise: in the init logic around self.dbt_runner.run(selector="init") check init_success and if False immediately log the error and raise/return instead of proceeding to call self.dbt_runner.run(select="elementary"); keep the existing error message for init (and only run the elementary step when init_success is True), referencing the init_success and elementary_success variables and the self.dbt_runner.run(...) calls so the short-circuit is applied in the same block.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@integration_tests/tests/env.py`:
- Around line 28-46: When init fails, avoid running the dependent "elementary"
dbt step to reduce noise: in the init logic around
self.dbt_runner.run(selector="init") check init_success and if False immediately
log the error and raise/return instead of proceeding to call
self.dbt_runner.run(select="elementary"); keep the existing error message for
init (and only run the elementary step when init_success is True), referencing
the init_success and elementary_success variables and the
self.dbt_runner.run(...) calls so the short-circuit is applied in the same
block.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
integration_tests/tests/conftest.pyintegration_tests/tests/data_seeder.pyintegration_tests/tests/env.py
When 'dbt run --selector init' fails, skip the dependent 'dbt run --select elementary' step since it will almost certainly fail too, producing additional noise that obscures the root cause. Addresses CodeRabbit nitpick. Co-Authored-By: unknown <>
Summary
Previously,
dbt seedanddbt runfailures during test environment initialization were silently swallowed because:raise_on_failure=FalseDbtDataSeeder.seed()andEnvironment.init()ignored theboolreturn values from the runnerThis caused confusing cascading failures — e.g., a
SCHEMA_NOT_FOUNDduring seed would be swallowed, and the test would later fail withTABLE_OR_VIEW_NOT_FOUND, masking the real root cause. This was observed in databricks CI where per-worker schema creation (pytest-xdist_gw0–_gw7suffixes) intermittently fails.Changes:
data_seeder.py: Checkdbt_runner.seed()return value; raiseRuntimeErrorimmediately on failure instead of proceeding to yieldenv.py: Check return values of bothdbt run --selector initanddbt run --select elementary; log specific errors and raiseRuntimeErrorif either failsconftest.py: Add worker ID and schema suffix to init log messages for diagnosing parallel test failuresReview & Testing Checklist for Human
RuntimeErrorinenv.init()doesn't break xdist worker lifecycle: The raise will now fail the worker's session fixture. Confirm this correctly marks all tests in that worker as errors (expected behavior) rather than causing hangs or orphaned processes.RuntimeErrorindata_seeder.pypropagates correctly through the context manager: The raise happens beforeyieldinside atry/finally, soseed_path.unlink()infinallyshould still execute. Confirm no resource leaks.Recommended test plan: Run the full CI matrix and compare results. Any new failures should be init/seed failures that were previously masked — verify they show the clear
RuntimeErrormessage pointing to the root cause.Notes
SCHEMA_NOT_FOUNDerrors in databricks CI (PR fix: use direct adapter connection for run_query (bypass log parsing) #936). Root cause analysis found thatraise_on_failure=False+ unchecked return values meant schema creation failures were invisible, leading to confusing cascading errors.Summary by CodeRabbit
Bug Fixes
Chores