feat: add DuckDB support with macro dispatches and CI configuration#935
feat: add DuckDB support with macro dispatches and CI configuration#935devin-ai-integration[bot] wants to merge 14 commits intomasterfrom
Conversation
- Add duckdb to CI warehouse matrix in test-all-warehouses.yml - Add DuckDB profiles generation in test-warehouse.yml - Add 15+ duckdb__ macro overrides for adapter compatibility: - buckets_cte, target_database, timeadd, insert_rows, delete_and_insert - generate_elementary_profile_args, get_tables/columns_from_information_schema - data_type_list, has_temp_table_support, create_table_as - create_or_replace (with should_commit for ROLLBACK resilience) - replace_table_data (with should_commit for in-memory persistence) - day_of_week/hour_of_week (using dayname() instead of to_char) - edr_drop_schema for test cleanup - Skip 9 tests incompatible with DuckDB in-memory mode: - Artifact upload tests that rely on dynamically created test metadata - Schema changes tests that require column state persistence across invocations Co-Authored-By: unknown <>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @devin-ai-integration[bot] |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds DuckDB support across the codebase: new DuckDB-specific macros for metadata, time utilities, data types, and table operations; updates integration tests and CI workflows to include DuckDB and generate in-memory DuckDB dbt profiles for testing. Changes
Sequence Diagram(s)sequenceDiagram
participant WF as Workflow
participant FS as "Write DuckDB profiles"
participant DBT as dbt
participant DU as DuckDB
WF->>FS: matrix includes duckdb -> run profile writer
FS->>DBT: write in-memory DuckDB profiles (main/test)
WF->>DBT: install dbt with duckdb extras
DBT->>DU: run tests/queries using generated profiles
DU-->>DBT: return query results
DBT-->>WF: finish job (artifacts/logs)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
The dbt debug step uses the root dbt_project.yml (profile: elementary) while the tests use integration_tests/dbt_project (profile: elementary_tests). Both profiles need to be present in profiles.yml for CI to work. Co-Authored-By: unknown <>
- Add adapter.commit() to duckdb__edr_drop_schema for consistency - Add table_catalog filter to get_tables/columns_from_information_schema - Remove redundant drop_first=true from duckdb__create_or_replace - Add explanatory comments to create_table_as and create_or_replace - Use placeholder+sed approach for profiles.yml schema variable - Add both elementary and elementary_tests profiles for DuckDB CI Co-Authored-By: unknown <>
Co-Authored-By: unknown <>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.github/workflows/test-warehouse.yml (1)
141-141: Fragile indentation-based profile stripping.
sed -i 's/^ //'is coupled to the exact 10-space indentation of the heredoc lines as they appear in this workflow file. Any future reformatting (e.g., an auto-formatter changing the step's indentation) would silently produce malformedprofiles.ymlwithout any CI failure untildbt debugruns.A more robust alternative is to write the file with no leading indentation via a Python one-liner or a
printfblock, or to use a known delimiter (e.g.,sed -i 's/^[[:space:]]*//'for full stripping, orsed -i 's/^ //'for a more targeted 2-space strip applied to a matching pattern).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test-warehouse.yml at line 141, The current fragile indentation strip uses the exact token sed -i 's/^ //' which breaks if the heredoc indentation changes; replace that command with a whitespace-robust variant such as sed -i 's/^[[:space:]]*//' to remove any leading whitespace, or alternatively generate the profiles file without indentation (e.g., a Python one-liner that writes the heredoc content directly or a printf/heredoc block that has no leading spaces) so that the step that currently runs sed -i 's/^ //' will always produce a valid profiles.yml.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/test-warehouse.yml:
- Around line 117-118: The guard using string comparison is incorrect for
semantic versions; update the workflow step named "Write DuckDB profiles" to
either remove the brittle string comparison and rely on pip/dpkg installation
failure or replace it with a semantic-version check (compare parsed numeric
major/minor/patch from inputs.dbt-version against 1.8.0) before executing the
DuckDB-specific step, and/or add an explicit documentation comment stating
dbt-duckdb requires dbt-core >= 1.8.0; reference the workflow input
inputs.dbt-version and the step name "Write DuckDB profiles" to locate where to
implement the change.
---
Nitpick comments:
In @.github/workflows/test-warehouse.yml:
- Line 141: The current fragile indentation strip uses the exact token sed -i
's/^ //' which breaks if the heredoc indentation changes; replace that
command with a whitespace-robust variant such as sed -i 's/^[[:space:]]*//' to
remove any leading whitespace, or alternatively generate the profiles file
without indentation (e.g., a Python one-liner that writes the heredoc content
directly or a printf/heredoc block that has no leading spaces) so that the step
that currently runs sed -i 's/^ //' will always produce a valid
profiles.yml.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
.github/workflows/test-warehouse.ymlintegration_tests/dbt_project/macros/clear_env.sqlintegration_tests/tests/test_dbt_artifacts/test_groups.pymacros/edr/metadata_collection/get_columns_from_information_schema.sqlmacros/edr/metadata_collection/get_tables_from_information_schema.sqlmacros/utils/table_operations/create_or_replace.sqlmacros/utils/table_operations/create_table_as.sql
🚧 Files skipped from review as they are similar to previous changes (2)
- macros/utils/table_operations/create_or_replace.sql
- macros/utils/table_operations/create_table_as.sql
…fact uploads The delete_and_insert macro (used in diff-mode artifact caching) was missing the should_commit pattern that other DuckDB code paths use. Without it, dbt's post-on-run-end ROLLBACK undoes the DELETE+INSERT changes on DuckDB. - Add should_commit parameter to delete_and_insert macro - Call adapter.commit() after queries when should_commit=true - Pass should_commit through from upload_artifacts_to_table - Remove all skip_targets(["duckdb"]) markers from tests that now pass Result: 184 passed, 9 skipped (dremio-only), 0 failures on DuckDB. Co-Authored-By: unknown <>
…ters The previous commit passed should_commit generically through upload_artifacts_to_table, which caused Trino (and potentially other adapters) to fail. Instead, check target.type == 'duckdb' directly in the delete_and_insert macro to only commit for DuckDB. Co-Authored-By: unknown <>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
macros/utils/table_operations/delete_and_insert.sql (1)
187-187: Update DuckDB comment to match current implementation.Line 187 says commit is handled “via should_commit”, but commit is actually performed directly in
delete_and_insert(Line 27).Proposed fix
-{# DuckDB - separate queries without transaction wrapping (commit handled by caller via should_commit) #} +{# DuckDB - separate queries without transaction wrapping (commit handled in delete_and_insert) #}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/utils/table_operations/delete_and_insert.sql` at line 187, The top-of-file DuckDB comment is outdated: it says commits are handled "via should_commit" but the commit is performed directly inside the delete_and_insert macro/function; update the comment to accurately state that delete_and_insert performs its own commit (and that callers should not expect should_commit to control commit behavior), e.g., replace the phrase "commit handled by caller via should_commit" with a brief note that delete_and_insert commits internally and callers should manage transactions accordingly; ensure the comment references the delete_and_insert macro name so future readers see the relationship.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@macros/utils/table_operations/delete_and_insert.sql`:
- Around line 25-28: After the cleanup DROP statements run they must be
protected with a second explicit commit for DuckDB to avoid orphaned
intermediate tables; update the block that currently calls adapter.commit()
inside the {% if target.type == 'duckdb' %} condition to add another {% do
adapter.commit() %} immediately after the cleanup DROP section (i.e., commit
once after the DML and again after the cleanup drops) so that the cleanup is
persisted even if dbt performs a post-run-end ROLLBACK.
---
Nitpick comments:
In `@macros/utils/table_operations/delete_and_insert.sql`:
- Line 187: The top-of-file DuckDB comment is outdated: it says commits are
handled "via should_commit" but the commit is performed directly inside the
delete_and_insert macro/function; update the comment to accurately state that
delete_and_insert performs its own commit (and that callers should not expect
should_commit to control commit behavior), e.g., replace the phrase "commit
handled by caller via should_commit" with a brief note that delete_and_insert
commits internally and callers should manage transactions accordingly; ensure
the comment references the delete_and_insert macro name so future readers see
the relationship.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
macros/utils/table_operations/delete_and_insert.sql
Address CodeRabbit review: - Add second adapter.commit() after cleanup DROP statements to prevent orphaned intermediate tables on DuckDB (since has_temp_table_support returns false, these are regular persistent tables) - Fix outdated comment on duckdb__get_delete_and_insert_queries Co-Authored-By: unknown <>
…pshots, transactionless cleanup - Add duckdb__get_normalized_data_type macro to normalize DuckDB type synonyms (e.g., TEXT→VARCHAR, INT→INTEGER, DOUBLE PRECISION→DOUBLE) - Add adapter.commit() in insert_data_monitoring_metrics and insert_schema_columns_snapshot so DuckDB inserts survive post-on-run-end ROLLBACK - Add duckdb__get_clean_elementary_test_tables_queries to use transactionless DROP statements (avoids nested transaction errors on DuckDB) Co-Authored-By: unknown <>
…ma_changes test DuckDB's split() function returns 1-indexed arrays, but the default full_name_split macro uses 0-indexed access. This caused the table_name field in schema change test results to contain the schema name instead of the actual table name, making _read_test_results unable to find the correct results. Added duckdb__full_name_split using split_part() with 1-indexed access (same pattern as postgres/athena). Removed debug code from test files. Co-Authored-By: unknown <>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
macros/utils/table_operations/delete_and_insert.sql (1)
192-214:duckdb__get_delete_and_insert_queriesis identical to several other adapter variants.The body is byte-for-byte the same as
redshift__,athena__,dremio__, andtrino__. A shared helper macro (e.g.,transactionless__get_delete_and_insert_queries) could reduce duplication, but given the team's stated preference for explicit per-adapter variants this is purely optional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/utils/table_operations/delete_and_insert.sql` around lines 192 - 214, The macro duckdb__get_delete_and_insert_queries is byte-for-byte identical to other adapter macros (redshift__, athena__, dremio__, trino__); to remove duplication, extract the shared logic into a new helper macro (e.g., transactionless__get_delete_and_insert_queries) that implements the delete/insert query assembly, then have duckdb__get_delete_and_insert_queries call that helper (or replace its body with a thin wrapper delegating to the new macro), ensuring you preserve the same parameters (relation, insert_relation, delete_relation, delete_column_key) and return behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@macros/edr/tests/on_run_end/handle_tests_results.sql`:
- Around line 133-142: The DROP of temp_relation can be rolled back on DuckDB
because the code only commits once; after calling
elementary.fully_drop_relation(temp_relation) when target.type == 'duckdb' and
elementary.has_temp_table_support() is false, add a second explicit
adapter.commit() so the DROP is persisted; update the block around
insert_query/adapter.commit()/elementary.fully_drop_relation to call
adapter.commit() again (referencing adapter.commit(),
elementary.fully_drop_relation(temp_relation), and
elementary.has_temp_table_support()).
- Around line 187-196: The DROP of temp_relation in
insert_schema_columns_snapshot is executed after the initial commit but before
dbt's potential ROLLBACK, so for DuckDB you must commit again after calling
elementary.fully_drop_relation(temp_relation); update the on-run-end logic to
check target.type == 'duckdb' and call adapter.commit() immediately after the {%
do elementary.fully_drop_relation(temp_relation) %} (similar to the existing
commit after insert_query) so the DROP is persisted.
---
Nitpick comments:
In `@macros/utils/table_operations/delete_and_insert.sql`:
- Around line 192-214: The macro duckdb__get_delete_and_insert_queries is
byte-for-byte identical to other adapter macros (redshift__, athena__, dremio__,
trino__); to remove duplication, extract the shared logic into a new helper
macro (e.g., transactionless__get_delete_and_insert_queries) that implements the
delete/insert query assembly, then have duckdb__get_delete_and_insert_queries
call that helper (or replace its body with a thin wrapper delegating to the new
macro), ensuring you preserve the same parameters (relation, insert_relation,
delete_relation, delete_column_key) and return behavior.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
integration_tests/tests/test_schema_changes.pymacros/edr/system/system_utils/full_names.sqlmacros/edr/tests/on_run_end/handle_tests_results.sqlmacros/edr/tests/test_utils/clean_elementary_test_tables.sqlmacros/utils/data_types/get_normalized_data_type.sqlmacros/utils/table_operations/delete_and_insert.sql
…ults Addresses CodeRabbit review: fully_drop_relation runs in a new implicit transaction after the first adapter.commit(). Without a second commit, dbt's post-on-run-end ROLLBACK undoes the DROP, leaving orphaned temp tables. Added second adapter.commit() after cleanup in both insert_data_monitoring_metrics and insert_schema_columns_snapshot. Co-Authored-By: unknown <>
The duckdb__get_normalized_data_type mapping already normalizes TEXT to VARCHAR, so the baseline type comparison works correctly. No additional changes needed — just removing the skip. Co-Authored-By: unknown <>
… profiles to template Co-Authored-By: unknown <>
.github/workflows/test-warehouse.yml
Outdated
|
|
||
| - name: Install Elementary | ||
| run: pip install "./elementary[${{ (inputs.warehouse-type == 'databricks_catalog' && 'databricks') || inputs.warehouse-type }}]" | ||
| run: pip install "./elementary[${{ (inputs.warehouse-type == 'databricks_catalog' && 'databricks') || (inputs.warehouse-type == 'duckdb' && 'postgres') || inputs.warehouse-type }}]" |
There was a problem hiding this comment.
Why does duckdb imply postgres?
There was a problem hiding this comment.
The elementary Python package doesn't have a duckdb extra. Its available extras are: postgres, snowflake, bigquery, redshift, databricks, athena, spark, trino, clickhouse.
Each extra just pulls in the corresponding dbt adapter (e.g. elementary[postgres] → dbt-postgres). The dbt-duckdb adapter is already installed by the "Install dbt" step above (line 110-115), so the elementary install doesn't need to handle it.
However, elementary[postgres] pulls in dbt-postgres as an unnecessary extra dependency. We could simplify this to just elementary (no extra) since dbt-duckdb is already installed. Want me to change it to:
pip install "./elementary"
instead? Or if you'd prefer to add a duckdb extra to the elementary package, that would be the cleaner long-term fix.
| {%- do elementary.edr_create_table_as(true, temp_relation, test_tables_union_query) %} | ||
| {% do elementary.run_query(insert_query) %} | ||
|
|
||
| {# DuckDB: commit so insert survives dbt's post-on-run-end ROLLBACK #} |
There was a problem hiding this comment.
Just checking- this is a duckdb-specific requirement?
For example, why doesn't redshift suffer from this rollback?
(You don't have to change if you think it is correct, but I want a clarification)
There was a problem hiding this comment.
Yes, this is DuckDB-specific. Here's why Redshift (and Postgres) don't need it:
Redshift/Postgres path: default__get_delete_and_insert_queries (line 50-65) wraps the DELETE+INSERT in a single SQL string with explicit BEGIN TRANSACTION; ... COMMIT;. This COMMIT inside the SQL string persists the data before dbt's post-on-run-end ROLLBACK can touch it. The ROLLBACK becomes a no-op because there's nothing left to roll back.
DuckDB's problem: DuckDB doesn't support BEGIN TRANSACTION inside a multi-statement SQL string the same way — it errors or ignores it. So duckdb__get_delete_and_insert_queries returns separate query strings without transaction wrapping (same pattern as Redshift's redshift__get_delete_and_insert_queries which also returns separate strings — but Redshift's adapter auto-commits each statement). DuckDB's adapter runs under an implicit transaction, so dbt's post-hook ROLLBACK undoes everything.
The adapter.commit() call is the DuckDB equivalent of the COMMIT; inside the default SQL string — it's just done at the Python API level instead of inside SQL.
Note: Redshift also returns separate queries (line 123-144) without BEGIN/COMMIT wrapping, but Redshift's dbt adapter auto-commits each statement by default, so the ROLLBACK doesn't affect it. DuckDB's adapter doesn't auto-commit.
| {% do adapter.drop_relation(insert_relation) %} | ||
| {% endif %} | ||
|
|
||
| {# DuckDB: commit cleanup drops so they survive dbt's post-on-run-end ROLLBACK too #} |
There was a problem hiding this comment.
Again - checking as my other comment about why this is duckdb specific and not relevant to other DBs that commit like redshift
(And again, if you think this is correct please just explain)
There was a problem hiding this comment.
Same mechanism as my answer on handle_tests_results.sql:
The default__ dispatch (used by Postgres) wraps everything in BEGIN TRANSACTION; ... COMMIT; inside the SQL string (line 50-65), so the data is committed before dbt's ROLLBACK fires.
Redshift has its own dispatch (redshift__get_delete_and_insert_queries, line 123-144) that also returns separate queries without BEGIN/COMMIT, but Redshift's dbt adapter auto-commits each individual statement execution. So dbt's ROLLBACK is effectively a no-op on Redshift.
DuckDB's adapter doesn't auto-commit — it runs under an implicit transaction. So when dbt issues ROLLBACK after on-run-end, it undoes the DELETE+INSERT. The adapter.commit() call explicitly persists the changes before that ROLLBACK can fire.
TL;DR: Postgres uses SQL-level COMMIT, Redshift auto-commits per statement, DuckDB needs Python-level adapter.commit().
| {% endmacro %} | ||
|
|
||
| {% macro duckdb__has_temp_table_support() %} | ||
| {% do return(false) %} |
There was a problem hiding this comment.
It really doesn't, right?
There was a problem hiding this comment.
DuckDB does support CREATE TEMPORARY TABLE syntactically. However, in this codebase, has_temp_table_support controls whether edr_create_table_as creates temp tables or regular tables.
The issue is that DuckDB temp tables are scoped to a single connection/cursor. In the test harness, dbt run operations and subsequent dbt run-operation calls may use different cursor instances on the shared DuckDB connection. A temp table created in one cursor isn't always visible to another, causing "Table does not exist" errors.
Returning false means the code uses regular (non-temp) tables with explicit cleanup via fully_drop_relation, which is more reliable. This is the same approach used by spark, trino, athena, dremio, and clickhouse — all return false for similar visibility reasons.
| {# DuckDB - truncate and insert with commit to survive ROLLBACK on in-memory databases #} | ||
| {% macro duckdb__replace_table_data(relation, rows) %} | ||
| {% call statement('truncate_relation') -%} | ||
| delete from {{ relation }} where 1=1 |
There was a problem hiding this comment.
Just to verify - there's no truncate statement in duckdb?
There was a problem hiding this comment.
DuckDB does support TRUNCATE TABLE. The delete from {{ relation }} where 1=1 pattern was copied from the spark__replace_table_data dispatch (line 20-21) for consistency, but I can change it to TRUNCATE TABLE {{ relation }} if you prefer — it's functionally equivalent. Want me to make that change?
Co-Authored-By: unknown <>
…p commits DuckDB's dbt adapter reuses the same cursor (DuckDBConnectionWrapper.cursor() returns a cached instance), so temp tables are visible across all queries within a single dbt invocation. This means has_temp_table_support can return true. Changes: - duckdb__has_temp_table_support() now returns true - duckdb__edr_get_create_table_as_sql() respects the temporary flag - Removed redundant DuckDB commit-after-cleanup-drop blocks in handle_tests_results (temp tables are auto-dropped by dbt, no manual cleanup needed) Co-Authored-By: unknown <>
|
Update on I investigated this further and my previous answer about cursor visibility was wrong. DuckDB temp tables actually do work within a single dbt invocation because the
So all queries within a dbt invocation share one cursor, and temp tables are visible across all of them. I verified this locally with both Latest commit (2af51df) changes |
feat: add DuckDB support to CI with macro dispatches
Summary
Adds DuckDB as a new CI warehouse target with 20+
duckdb__macro dispatch overrides. DuckDB runs in:memory:mode. Key implementation decisions:has_temp_table_support→true): Thedbt-duckdbadapter reuses a single cached cursor instance (DuckDBConnectionWrapper.cursor()returnsself._cursor), so temp tables are visible across all queries within a dbt invocation.adapter.commit()for DuckDB after DML operations to survive dbt's post-on-run-end ROLLBACK. Applied indelete_and_insert,handle_tests_results(bothinsert_data_monitoring_metricsandinsert_schema_columns_snapshot),create_or_replace, andreplace_table_data. DuckDB can't use SQL-levelBEGIN TRANSACTION; ... COMMIT;like Postgres, and doesn't auto-commit per statement like Redshift.CREATE OR REPLACE TABLEused instead of drop-then-create for atomicity.duckdb__full_name_split: DuckDB'ssplit()returns 1-indexed arrays (unlike default which uses 0-indexed). Usessplit_part()with correct 1-indexed access for database/schema/table name extraction.duckdb__get_normalized_data_type: Maps DuckDB type synonyms (TEXT→VARCHAR, INT→INTEGER, DOUBLE PRECISION→DOUBLE, etc.) so schema change detection compares types correctly.dayname()instead ofto_char()for day-of-week;EXTRACT(hour from ...)for hour extraction.duckdb__get_clean_elementary_test_tables_queriesuses individual DROP statements (noBEGIN TRANSACTIONwrapping) since DuckDB doesn't support multi-statement transaction blocks in a single SQL string.test_schema_changesandtest_schema_changes_from_baseline. No DuckDB-specific skips remain.Macro overrides added
buckets_cte,target_database,timeadd,escape_special_chars,delete_and_insert,generate_elementary_profile_args,get_tables_from_information_schema,get_columns_from_information_schema,data_type_list,get_normalized_data_type,has_temp_table_support,create_table_as,create_or_replace,replace_table_data,day_of_week,hour_of_week,edr_drop_schema,full_name_split,get_clean_elementary_test_tables_queriesUpdates since last revision
generate_profiles.py,adapter_query_runner.py, improvedenv.pyerror handling).profiles.yml.j2template: removed the hardcoded "Write DuckDB profiles" workflow step and theif: inputs.warehouse-type != 'duckdb'guard. DuckDB now uses the samegenerate_profiles.pycode path as all other adapters.test-all-warehouses.yml: the category now includes both Docker-based adapters (postgres, clickhouse, trino, dremio) and fully in-process adapters (duckdb). DuckDB runs in thetest-localjob (no secrets needed, no approval gate), while cloud adapters remain intest-cloud.replace_table_datanow usesTRUNCATE TABLEinstead ofdelete from ... where 1=1for the DuckDB dispatch, aligning with standard truncation semantics.elementary[postgres]extras mapping for DuckDB in the workflow. DuckDB now installs aselementary(no extra) sincedbt-duckdbis already installed by the "Install dbt" step. Anelementary[duckdb]extra will be added to the elementary package in a follow-up PR.has_temp_table_support→true): Investigation revealed thatdbt-duckdbreuses the same cursor instance for all queries within a dbt invocation, so temp tables are visible across queries. Updatedduckdb__edr_get_create_table_as_sqlto respect thetemporaryflag. Removed now-unnecessary cleanup commit blocks fromhandle_tests_results.sql(temp tables are auto-dropped by dbt).Review & Testing Checklist for Human
has_temp_table_support=truechange relies ondbt-duckdb'sDuckDBConnectionWrapper.cursor()returning a cached cursor instance. If this implementation detail changes in futuredbt-duckdbversions, temp tables could become invisible across queries. Consider adding a comment in the code or documentation about this dependency.adapter.commit()appears after all DML operations that need to survive dbt's post-on-run-end ROLLBACK. Checkdelete_and_insert.sql(after DML + after cleanup drops),handle_tests_results.sql(after inserts only),create_or_replace.sql, andreplace_table_data.sql.duckdb__full_name_splitcorrectness: Confirmsplit_part(full_table_name, '.', 1/2/3)correctly extracts database/schema/table for DuckDB's 1-indexed arrays. Consider edge cases wherefull_table_namemay not have all 3 parts.profiles.yml.j2renders correctly for DuckDB with:memory:path and thatgenerate_profiles.pyhandles DuckDB without requiring secrets (should work on fork PRs). Test locally:python integration_tests/profiles/generate_profiles.py --template integration_tests/profiles/profiles.yml.j2 --output /tmp/test_profiles.yml --schema-name test_schemaand inspect the output.get_normalized_data_typecompleteness: Check the type synonym mapping against DuckDB's type docs. Missing synonyms could cause false-positive schema change alerts.Notes
split()function returns 1-indexed arrays (index 0 = None), butdefault__full_name_splitused 0-indexed access. This causedtable_namefield to be populated with schema name instead of table name, breaking_read_test_resultsWHERE filter. Fixed by addingduckdb__full_name_splitwithsplit_part()(1-indexed like Postgres).duckdb__get_normalized_data_typemapping normalizesTEXT→VARCHAR, so the baseline type comparison works correctly even thoughSTRING_JINJAevaluates toTEXTfor DuckDB.:memory:mode: Using in-memory DuckDB for speed. Explicitadapter.commit()calls ensure data persists despite dbt's ROLLBACK behavior on test connections.CREATE OR REPLACE TABLE: DuckDB supports this natively, so we use it instead of drop-then-create for cleaner atomicity.BEGIN TRANSACTION; ... COMMIT;inside query strings. Redshift auto-commits each statement. DuckDB runs under an implicit transaction and doesn't support multi-statementBEGIN/COMMITblocks, so we use Python-leveladapter.commit()instead.falsedue to assumed cursor visibility issues. Investigation revealed thatdbt-duckdb'sLocalEnvironment.handle()creates one cursor and wraps it inDuckDBConnectionWrapper, which returns the same cached cursor for allconnection.handle.cursor()calls. This means all queries share one cursor, so temp tables work correctly. Verified locally withtest_artifacts(14 tests) andtest_schema_changes(2 tests).Link to Devin run: https://app.devin.ai/sessions/7f53c258c1cd4ad39cb015a8b8a38452
Requested by: @haritamar
Summary by CodeRabbit
New Features
Tests