feat: add Microsoft Fabric DWH integration (ELE-5282)#962
Conversation
Add support for Microsoft Fabric Data Warehouse as a new adapter in the elementary dbt package. Fabric uses T-SQL syntax and is supported via the dbt-fabric adapter maintained by Microsoft. Changes: - Add fabric__ macro overrides for date/time functions (datediff, date_trunc, time_trunc, to_char, timeadd, dateadd, current_timestamp) - Add fabric__ data type overrides (data_type_list, type_timestamp, type_string, type_bool, get_normalized_data_type, try_cast_column_to_timestamp) - Add fabric__ table operations (complete_buckets_cte with recursive CTE, escape_special_chars) - Add fabric__ cross-DB utilities (target_database, incremental_strategy, generate_elementary_profile_args, day_of_week, hour_of_day, hour_of_week) - Add fabric__ nested CTE workarounds (query_test_result_rows, get_failed_row_count_calc_query) using temp tables to avoid nested CTEs and TOP instead of LIMIT - Add docker-compose-sqlserver.yml for local testing with env var credentials - Add fabric target to profiles.yml.j2 - Update CI workflows (test-warehouse.yml, test-all-warehouses.yml) with SQL Server Docker setup, ODBC driver installation, and fabric target Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @devin-ai-integration[bot] |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Fabric (SQL Server) support: CI/workflow inputs and secrets, Dockerized SQL Server test service and healthchecks, profile/template env helper, and extensive Fabric-specific macros and T-SQL-compatible materialization paths across EDR, utils, and data-type macros. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant CI as CI Workflow
participant Runner as Actions Runner
participant Docker as Docker (sqlserver)
participant Profiles as Profiles Renderer
participant DBT as dbt/Elementary
User->>CI: dispatch workflow (warehouse-type=fabric)
CI->>Runner: start job
Runner->>Docker: docker-compose up (sqlserver)
Docker-->>Runner: healthcheck OK
Runner->>Runner: install msodbcsql18, dbt + extras (fabric/sqlserver)
Runner->>Profiles: render profiles (env_var, pass MSSQL_SA_PASSWORD)
Profiles-->>Runner: profiles.yml
Runner->>DBT: run dbt with fabric profile
DBT->>Docker: connect via ODBC/T-SQL, execute `fabric__*` macros & temp-table flows
DBT-->>Runner: tests/results
Runner-->>CI: report status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
The generate_profiles.py uses Python Jinja2 (not dbt Jinja2), so the env_var() function used in profiles.yml.j2 for the fabric password was undefined. Add env_var as a Jinja2 global that reads from os.environ. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…os to sqlserver__ - Changed profile type from 'fabric' to 'sqlserver' for SQL auth support - Renamed all fabric__ macro overrides to sqlserver__ (dbt-fabric dispatches to sqlserver as fallback, so these work with both adapters) - Updated CI to install dbt-sqlserver instead of dbt-fabric - Added encrypt: false to profile for local SQL Server connection Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
macros/edr/materializations/test/test.sql (1)
205-212: Style consistency: Consider usingelementary.run_querythroughout.Lines 206 and 212 use
run_query, while line 211 useselementary.run_query. Since the codebase consistently useselementary.run_queryin other table operations, standardize to that form for maintainability. Note: Both ultimately call the samedbt.run_queryand maintain the same session context, so temp table visibility is not a concern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/edr/materializations/test/test.sql` around lines 205 - 212, Replace the two raw run_query calls with the consistent wrapper elementary.run_query: change the calls that reference creating and dropping the temp table (the run_query invocations surrounding tmp_relation) to use elementary.run_query instead, keeping the same SQL strings; ensure the rest of the block that builds query (the {% set query %} ... select ... from {{ tmp_relation }} and the usage of elementary.agate_to_dicts(elementary.run_query(query))) remains unchanged so behavior and session context are preserved.macros/utils/cross_db_utils/generate_elementary_profile_args.sql (1)
270-270: Consider multi-line formatting for consistency.The
climethod handling is condensed to a single line while other branches use multi-line formatting. This is a minor readability nit.♻️ Optional: Multi-line formatting
- {% if method == "cli" %} {% do parameters.append(_parameter("method", "cli")) %} + {% if method == "cli" %} + {% do parameters.append(_parameter("method", "cli")) %}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/utils/cross_db_utils/generate_elementary_profile_args.sql` at line 270, The single-line Jinja branch for method == "cli" should be rewritten to match the multi-line style used elsewhere: expand the conditional into separate lines so the {% if method == "cli" %} is on its own line, the {% do parameters.append(_parameter("method", "cli")) %} is on the next line, and close with {% endif %}; this keeps the handling of method, parameters.append and _parameter("cli") consistent with other branches for readability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration_tests/profiles/profiles.yml.j2`:
- Line 73: The Jinja expression requiring MSSQL_SA_PASSWORD causes rendering to
fail when the env var is missing; update the password template (the line
containing password: "{{ env_var('MSSQL_SA_PASSWORD') }}" ) to provide a safe
fallback by using env_var with a default value or Jinja's default filter (e.g.,
env_var('MSSQL_SA_PASSWORD', 'fallback') or env_var('MSSQL_SA_PASSWORD') |
default('fallback')) so local/profile rendering succeeds when the environment
variable is not set.
In `@macros/edr/materializations/test/failed_row_count.sql`:
- Around line 62-68: The macro fabric__get_failed_row_count_calc_query currently
calls a non-existent helper elementary.make_temp_relation(model) — replace that
call with the correct temp-relation helper (choose one of
edr_make_temp_relation, make_temp_table_relation, or make_temp_view_relation
depending on whether you need a temp table vs view) and update the tmp_relation
assignment accordingly; also ensure the temp table is cleaned up by adding `{%
do run_query("drop table if exists " ~ tmp_relation) %}` after the final SELECT
to prevent orphaned tables (refer to the tmp_relation variable and the existing
run_query("select * into " ~ tmp_relation ~ ...) usage to place the cleanup).
In `@macros/edr/system/system_utils/buckets_cte.sql`:
- Around line 241-247: The recursive CTE "timestamps" (using
min_bucket_start_expr, bucket_end_expr and max_bucket_end_expr) and the
SELECT-list alias next_bucket in the WHERE clause are incompatible with
Microsoft Fabric Warehouse; replace the recursive generation with a
Fabric-compatible approach (e.g., build bucket rows by CROSS JOINing against a
numbers/sequence source or a VALUES list and compute edr_bucket_start from
min_bucket_start_expr + n * bucket_interval) and remove use of the SELECT alias
in predicates by repeating the expression or computing it in a derived table, or
alternatively mark this macro as SQL Server–only and remove the fabric__ prefix
so it is not run on Fabric. Ensure the code references the same symbols
(min_bucket_start_expr, bucket_end_expr, max_bucket_end_expr,
edr_bucket_start/next_bucket) so callers still work.
- Line 253: The recursive CTE currently ends with "option (maxrecursion 10000)"
which can overflow for long ranges; update the recursion option to "MAXRECURSION
0" to allow unlimited recursion and rely on the existing termination condition
"where next_bucket < {{ max_bucket_end_expr }}" to stop recursion; edit the
buckets CTE in buckets_cte.sql (the clause containing option (maxrecursion
10000) and the recursive column next_bucket) to replace 10000 with 0 so large
1-minute bucket ranges do not error.
In `@macros/utils/cross_db_utils/incremental_strategy.sql`:
- Around line 23-25: The macro fabric__get_default_incremental_strategy
currently returns the hardcoded string "merge"; change it to follow the shared
strategy contract by returning "none" for non-Athena adapters and delegating to
the Athena-specific helper for Athena: update
fabric__get_default_incremental_strategy to detect the adapter type (or call the
existing get_append_only_incremental_strategy() for Athena) and return "none"
otherwise so adapter defaults are preserved.
In `@macros/utils/cross_db_utils/to_char.sql`:
- Around line 27-32: The fabric__edr_to_char macro currently injects the format
unquoted into CONVERT, which breaks when callers pass pattern-style strings;
update the macro (fabric__edr_to_char) to emit the format argument quoted (e.g.,
wrap {{ format }} in single quotes) when present while keeping the existing
numeric fallback 120 unchanged, so CONVERT(varchar, {{ column }}, '{{ format
}}') is produced for custom formats and CONVERT(varchar, {{ column }}, 120)
remains the fallback.
---
Nitpick comments:
In `@macros/edr/materializations/test/test.sql`:
- Around line 205-212: Replace the two raw run_query calls with the consistent
wrapper elementary.run_query: change the calls that reference creating and
dropping the temp table (the run_query invocations surrounding tmp_relation) to
use elementary.run_query instead, keeping the same SQL strings; ensure the rest
of the block that builds query (the {% set query %} ... select ... from {{
tmp_relation }} and the usage of
elementary.agate_to_dicts(elementary.run_query(query))) remains unchanged so
behavior and session context are preserved.
In `@macros/utils/cross_db_utils/generate_elementary_profile_args.sql`:
- Line 270: The single-line Jinja branch for method == "cli" should be rewritten
to match the multi-line style used elsewhere: expand the conditional into
separate lines so the {% if method == "cli" %} is on its own line, the {% do
parameters.append(_parameter("method", "cli")) %} is on the next line, and close
with {% endif %}; this keeps the handling of method, parameters.append and
_parameter("cli") consistent with other branches for readability.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 98b8c87c-9b9b-45a1-8964-0c755b95550d
📒 Files selected for processing (25)
.github/workflows/test-all-warehouses.yml.github/workflows/test-warehouse.ymlintegration_tests/docker-compose-sqlserver.ymlintegration_tests/profiles/profiles.yml.j2macros/edr/materializations/test/failed_row_count.sqlmacros/edr/materializations/test/test.sqlmacros/edr/system/system_utils/buckets_cte.sqlmacros/utils/cross_db_utils/current_timestamp.sqlmacros/utils/cross_db_utils/date_trunc.sqlmacros/utils/cross_db_utils/dateadd.sqlmacros/utils/cross_db_utils/datediff.sqlmacros/utils/cross_db_utils/day_of_week.sqlmacros/utils/cross_db_utils/generate_elementary_profile_args.sqlmacros/utils/cross_db_utils/hour_of_day.sqlmacros/utils/cross_db_utils/hour_of_week.sqlmacros/utils/cross_db_utils/incremental_strategy.sqlmacros/utils/cross_db_utils/target_database.sqlmacros/utils/cross_db_utils/time_trunc.sqlmacros/utils/cross_db_utils/timeadd.sqlmacros/utils/cross_db_utils/to_char.sqlmacros/utils/data_types/data_type.sqlmacros/utils/data_types/data_type_list.sqlmacros/utils/data_types/get_normalized_data_type.sqlmacros/utils/data_types/try_cast_column_to_timestamp.sqlmacros/utils/table_operations/insert_rows.sql
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (3)
macros/utils/cross_db_utils/incremental_strategy.sql (1)
23-25:⚠️ Potential issue | 🟠 MajorAvoid hardcoding
mergefor SQL Server default strategy.At Line 24, returning
"merge"reintroduces the same behavior drift previously flagged; non-Athena adapters should returnnoneso adapter defaults apply.Suggested fix
{%- macro sqlserver__get_default_incremental_strategy() %} - {% do return("merge") %} + {% do return(none) %} {% endmacro %}Based on learnings: “incremental_strategy should return none for all adapters except Athena, where get_append_only_incremental_strategy() is used.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/utils/cross_db_utils/incremental_strategy.sql` around lines 23 - 25, The macro sqlserver__get_default_incremental_strategy currently returns the hardcoded string "merge"; change it to return "none" so SQL Server defers to the adapter's default incremental strategy (consistent with the rule that only Athena uses get_append_only_incremental_strategy); update the return in sqlserver__get_default_incremental_strategy to "none" and ensure no other hardcoded "merge" defaults are introduced for non-Athena adapters.macros/edr/system/system_utils/buckets_cte.sql (1)
244-246:⚠️ Potential issue | 🔴 CriticalFix invalid alias usage in the recursive predicate.
Line 246 references
next_bucketinWHERE, but in T-SQL a SELECT-list alias is not visible in the same query block’sWHERE. This can fail at runtime/compile time.Suggested fix
- select {{ bucket_end_expr }} as next_bucket + select {{ bucket_end_expr }} as edr_bucket_start from timestamps - where next_bucket < {{ max_bucket_end_expr }} + where {{ bucket_end_expr }} < {{ max_bucket_end_expr }}In SQL Server T-SQL, can a SELECT-list alias be referenced in the WHERE clause of the same SELECT query block?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/edr/system/system_utils/buckets_cte.sql` around lines 244 - 246, The WHERE is incorrectly referencing the SELECT-list alias next_bucket (produced by {{ bucket_end_expr }}) which is not visible in the same query block; update the predicate to use the actual expression instead of the alias (i.e., replace next_bucket in the WHERE with the same {{ bucket_end_expr }} expression or wrap the SELECT as a subquery/CTE so the alias becomes available), targeting the query that selects from timestamps and uses {{ max_bucket_end_expr }} to enforce the upper bound.macros/edr/materializations/test/failed_row_count.sql (1)
62-73:⚠️ Potential issue | 🔴 CriticalMissing temp table cleanup will leak orphaned tables.
Unlike
test.sql(line 214), this macro creates a temp table but never drops it. Add cleanup after the SELECT statement.Additionally, the
elementary.make_temp_relation(model)call was flagged in a previous review as potentially undefined.🐛 Proposed fix to add cleanup
{% macro sqlserver__get_failed_row_count_calc_query(failed_row_count_calc) %} {# Fabric / T-SQL does not support nested CTEs. We create a temp table from the test SQL, then select from it. #} {% set tmp_relation = elementary.make_temp_relation(model) %} {% do run_query( "select * into " ~ tmp_relation ~ " from (" ~ sql ~ ") as __edr_inner" ) %} + {% set query %} select {{ failed_row_count_calc }} as {{ elementary.escape_reserved_keywords("count") }} from {{ tmp_relation }} + {% endset %} + {% set result = elementary.run_query(query) %} + {% do run_query("drop table if exists " ~ tmp_relation) %} + {{ query }} {% endmacro %}Note: The fix above is illustrative. The macro returns raw SQL (not a result set), so the actual fix depends on how the caller executes it. If the SQL is executed elsewhere, consider documenting that cleanup is the caller's responsibility, or restructure to handle cleanup internally.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/edr/materializations/test/failed_row_count.sql` around lines 62 - 73, The macro sqlserver__get_failed_row_count_calc_query creates a temp table (tmp_relation via elementary.make_temp_relation(model)) but never drops it, which will leak tables; after the SELECT from {{ tmp_relation }} add a DROP TABLE (or DROP TABLE IF EXISTS / appropriate T-SQL-safe drop) for {{ tmp_relation }} to ensure cleanup, and either ensure model is defined in scope or change the macro signature to accept model (e.g., add model as a parameter) so elementary.make_temp_relation(model) is valid; if you prefer caller-managed cleanup, document that behavior instead.
🧹 Nitpick comments (2)
macros/utils/cross_db_utils/dateadd.sql (1)
19-29: Comment block is misplaced relative to the new macro.The docstring at lines 19-26 documents the Dremio override's bugs, but it now appears directly above
sqlserver__edr_dateadd, which could mislead readers into thinking the comment applies to the SQL Server macro.Consider adding a blank line and brief comment for the sqlserver macro, or relocating the Dremio docstring to sit immediately above
dremio__edr_dateadd.📝 Suggested reordering
-{# - Override dbt-dremio's dateadd macro which has two bugs: - 1. Calls interval.replace() on the interval parameter, failing when interval is an integer - 2. Wraps result in "select TIMESTAMPADD(...)" which creates a scalar subquery when - embedded in larger SQL expressions, causing $SCALAR_QUERY errors in Dremio - - This override outputs just TIMESTAMPADD(...) as an expression (no "select" prefix). -#} +{# SQL Server / Fabric: use native DATEADD function #} {% macro sqlserver__edr_dateadd(datepart, interval, from_date_or_timestamp) %} dateadd({{ datepart }}, {{ interval }}, {{ from_date_or_timestamp }}) {% endmacro %} +{# + Override dbt-dremio's dateadd macro which has two bugs: + 1. Calls interval.replace() on the interval parameter, failing when interval is an integer + 2. Wraps result in "select TIMESTAMPADD(...)" which creates a scalar subquery when + embedded in larger SQL expressions, causing $SCALAR_QUERY errors in Dremio + + This override outputs just TIMESTAMPADD(...) as an expression (no "select" prefix). +#} {% macro dremio__edr_dateadd(datepart, interval, from_date_or_timestamp) %}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/utils/cross_db_utils/dateadd.sql` around lines 19 - 29, The documentation block describing the Dremio override is placed above the sqlserver__edr_dateadd macro and may mislead readers; move the existing docstring so it immediately precedes the dremio__edr_dateadd macro, and add a short one-line comment or a blank line above sqlserver__edr_dateadd to indicate it is a separate SQL Server-specific implementation. Ensure the docstring text remains unchanged and that references to the two macros (dremio__edr_dateadd and sqlserver__edr_dateadd) are clearly separated in the file..github/workflows/test-warehouse.yml (1)
116-122: Consider pinning the ODBC driver version for CI reproducibility.The installation pulls the latest
msodbcsql18package, which could introduce unexpected behavior if Microsoft releases a breaking change. For more deterministic CI builds, consider pinning to a specific version.♻️ Optional: Pin ODBC driver version
- sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 unixodbc-dev + sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18=18.4.1.1-1 unixodbc-devNote: Check the latest stable version and update accordingly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test-warehouse.yml around lines 116 - 122, The workflow step installs msodbcsql18 without pinning, risking CI breakage; update the Install ODBC Driver run block to install a specific msodbcsql18 version by replacing the unpinned package name (msodbcsql18) with a pinned package spec (e.g., msodbcsql18=<VERSION>) or by introducing a variable/parameter for the ODBC driver version and using that in the apt install command; ensure unixodbc-dev remains as-is and document/update the chosen <VERSION> to a known stable release.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/test-warehouse.yml:
- Line 55: Replace the hardcoded fallback value 'Elementary123!' used for
MSSQL_SA_PASSWORD in the workflow with a more generic/weak local-only default
(e.g., 'Password1!') or remove the inline fallback and fail the job when the
secret is missing; update the MSSQL_SA_PASSWORD assignment so it either uses a
documented local-only default or requires the secret to be set, and add a short
comment clarifying that any default is for ephemeral/local CI only.
In `@integration_tests/profiles/generate_profiles.py`:
- Line 126: The current env.globals["env_var"] lambda silently returns "" when
no default is provided; replace it with a real function (e.g., env_var) that
distinguishes between an omitted default and an explicit default: if a default
is supplied, return os.environ.get(key, default); if no default was supplied and
the environment variable is missing, raise an error (e.g., KeyError or a clear
RuntimeError) so templates like env_var('MSSQL_SA_PASSWORD') fail fast in strict
mode; update the reference to env.globals["env_var"] to point to this new
function.
In `@macros/edr/materializations/test/test.sql`:
- Around line 191-216: The call to elementary.make_temp_relation in the
sqlserver__query_test_result_rows macro is incorrect and should use the renamed
helper elementary.edr_make_temp_relation; update the tmp_relation assignment to
call elementary.edr_make_temp_relation(model) instead of
elementary.make_temp_relation(model) so tmp_relation, the temp-table creation
(run_query "select * into " ~ tmp_relation ~ ...), and the subsequent DROP use
the correct helper name.
In `@macros/utils/cross_db_utils/generate_elementary_profile_args.sql`:
- Around line 270-279: The template currently treats any non-matching method
(including typos) as username/password; update the logic in
generate_elementary_profile_args.sql so that you handle three cases explicitly:
when method is none (append no auth params or a clear placeholder like
_parameter("method", "<NONE>")), when method == "cli" or "service-principal"
(keep the existing branches that append parameters using _parameter and
parameters.append), and a new else branch for unsupported methods that appends a
clear placeholder/instruction parameter (e.g., _parameter("method",
"<UNSUPPORTED: specify 'cli' or 'service-principal' or 'none'>")) rather than
silently inserting user/password; reference the symbols method,
parameters.append, and _parameter to locate and change the conditional branches.
In `@macros/utils/cross_db_utils/hour_of_week.sql`:
- Around line 83-84: Replace the locale-dependent datename(weekday, {{ date_expr
}}) with a locale-stable day calculation: compute weekday as ((DATEDIFF(day,
'1900-01-01', CAST({{ date_expr }} AS date)) % 7) + 1) and cast that to {{
elementary.edr_type_string() }} (keep the existing cast for the hour using
datepart(hour, {{ date_expr }})); ensure you CAST({{ date_expr }} AS date)
before DATEDIFF to avoid time-boundary shifts and update the expression that
currently uses datename(weekday, {{ date_expr }}) to this deterministic formula.
In `@macros/utils/data_types/get_normalized_data_type.sql`:
- Around line 161-167: The mapping in the get_normalized_data_type macro
currently normalizes distinct SQL Server types (e.g., "DATETIME",
"SMALLDATETIME", "MONEY", "SMALLMONEY") to other canonical types, which hides
real schema differences; remove the mapping entries for "DATETIME",
"SMALLDATETIME", "MONEY", and "SMALLMONEY" from the normalization dictionary so
they remain their native types during comparisons, but keep true SQL Server
synonym mappings such as "NVARCHAR", "NCHAR", "NTEXT", "ROWVERSION", "DOUBLE
PRECISION", and "REAL" in the same mapping structure used by the
get_normalized_data_type macro.
In `@macros/utils/data_types/try_cast_column_to_timestamp.sql`:
- Around line 32-39: The WHERE clause in try_cast_column_to_timestamp.sql can
return a TOP 1 row that fails try_cast() because try_cast returns NULL on
failure; update the query that sets "query" (which uses {{ timestamp_column }},
{{ table_relation }}, and {{ elementary.edr_type_timestamp() }}) to add "and
try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) is not
null" so only successfully castable rows are considered before calling
elementary.result_value(); keep the rest of the logic (result handling with
result_value() and return(true)/return(false)) unchanged.
---
Duplicate comments:
In `@macros/edr/materializations/test/failed_row_count.sql`:
- Around line 62-73: The macro sqlserver__get_failed_row_count_calc_query
creates a temp table (tmp_relation via elementary.make_temp_relation(model)) but
never drops it, which will leak tables; after the SELECT from {{ tmp_relation }}
add a DROP TABLE (or DROP TABLE IF EXISTS / appropriate T-SQL-safe drop) for {{
tmp_relation }} to ensure cleanup, and either ensure model is defined in scope
or change the macro signature to accept model (e.g., add model as a parameter)
so elementary.make_temp_relation(model) is valid; if you prefer caller-managed
cleanup, document that behavior instead.
In `@macros/edr/system/system_utils/buckets_cte.sql`:
- Around line 244-246: The WHERE is incorrectly referencing the SELECT-list
alias next_bucket (produced by {{ bucket_end_expr }}) which is not visible in
the same query block; update the predicate to use the actual expression instead
of the alias (i.e., replace next_bucket in the WHERE with the same {{
bucket_end_expr }} expression or wrap the SELECT as a subquery/CTE so the alias
becomes available), targeting the query that selects from timestamps and uses {{
max_bucket_end_expr }} to enforce the upper bound.
In `@macros/utils/cross_db_utils/incremental_strategy.sql`:
- Around line 23-25: The macro sqlserver__get_default_incremental_strategy
currently returns the hardcoded string "merge"; change it to return "none" so
SQL Server defers to the adapter's default incremental strategy (consistent with
the rule that only Athena uses get_append_only_incremental_strategy); update the
return in sqlserver__get_default_incremental_strategy to "none" and ensure no
other hardcoded "merge" defaults are introduced for non-Athena adapters.
---
Nitpick comments:
In @.github/workflows/test-warehouse.yml:
- Around line 116-122: The workflow step installs msodbcsql18 without pinning,
risking CI breakage; update the Install ODBC Driver run block to install a
specific msodbcsql18 version by replacing the unpinned package name
(msodbcsql18) with a pinned package spec (e.g., msodbcsql18=<VERSION>) or by
introducing a variable/parameter for the ODBC driver version and using that in
the apt install command; ensure unixodbc-dev remains as-is and document/update
the chosen <VERSION> to a known stable release.
In `@macros/utils/cross_db_utils/dateadd.sql`:
- Around line 19-29: The documentation block describing the Dremio override is
placed above the sqlserver__edr_dateadd macro and may mislead readers; move the
existing docstring so it immediately precedes the dremio__edr_dateadd macro, and
add a short one-line comment or a blank line above sqlserver__edr_dateadd to
indicate it is a separate SQL Server-specific implementation. Ensure the
docstring text remains unchanged and that references to the two macros
(dremio__edr_dateadd and sqlserver__edr_dateadd) are clearly separated in the
file.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7d8ad01e-d735-4aa7-8a79-b1c603c9bb7c
📒 Files selected for processing (24)
.github/workflows/test-warehouse.ymlintegration_tests/profiles/generate_profiles.pyintegration_tests/profiles/profiles.yml.j2macros/edr/materializations/test/failed_row_count.sqlmacros/edr/materializations/test/test.sqlmacros/edr/system/system_utils/buckets_cte.sqlmacros/utils/cross_db_utils/current_timestamp.sqlmacros/utils/cross_db_utils/date_trunc.sqlmacros/utils/cross_db_utils/dateadd.sqlmacros/utils/cross_db_utils/datediff.sqlmacros/utils/cross_db_utils/day_of_week.sqlmacros/utils/cross_db_utils/generate_elementary_profile_args.sqlmacros/utils/cross_db_utils/hour_of_day.sqlmacros/utils/cross_db_utils/hour_of_week.sqlmacros/utils/cross_db_utils/incremental_strategy.sqlmacros/utils/cross_db_utils/target_database.sqlmacros/utils/cross_db_utils/time_trunc.sqlmacros/utils/cross_db_utils/timeadd.sqlmacros/utils/cross_db_utils/to_char.sqlmacros/utils/data_types/data_type.sqlmacros/utils/data_types/data_type_list.sqlmacros/utils/data_types/get_normalized_data_type.sqlmacros/utils/data_types/try_cast_column_to_timestamp.sqlmacros/utils/table_operations/insert_rows.sql
🚧 Files skipped from review as they are similar to previous changes (7)
- macros/utils/cross_db_utils/day_of_week.sql
- integration_tests/profiles/profiles.yml.j2
- macros/utils/cross_db_utils/target_database.sql
- macros/utils/cross_db_utils/date_trunc.sql
- macros/utils/data_types/data_type.sql
- macros/utils/cross_db_utils/to_char.sql
- macros/utils/data_types/data_type_list.sql
| keep_trailing_newline=True, | ||
| ) | ||
| env.filters["toyaml"] = _yaml_inline | ||
| env.globals["env_var"] = lambda key, default="": os.environ.get(key, default) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and read the generate_profiles.py file around line 126
wc -l integration_tests/profiles/generate_profiles.pyRepository: elementary-data/dbt-data-reliability
Length of output: 131
🏁 Script executed:
# Read the file with context around line 126
sed -n '110,140p' integration_tests/profiles/generate_profiles.py | cat -nRepository: elementary-data/dbt-data-reliability
Length of output: 1193
🏁 Script executed:
# Look for usage of env_var in the Jinja templates
find integration_tests -name "*.j2" -o -name "*.yml.j2" | head -20Repository: elementary-data/dbt-data-reliability
Length of output: 122
🏁 Script executed:
# Check for env_var usage patterns in templates
rg "env_var\(" integration_tests/profiles/ -C 2Repository: elementary-data/dbt-data-reliability
Length of output: 438
🏁 Script executed:
# Get more context around line 117 in the file
sed -n '100,134p' integration_tests/profiles/generate_profiles.py | cat -nRepository: elementary-data/dbt-data-reliability
Length of output: 1611
🌐 Web query:
dbt env_var Jinja function behavior when variable not set no default
💡 Result:
In dbt, env_var() requires the environment variable to be set unless you pass a default:
{{ env_var('MY_VAR') }}(no default) → ifMY_VARis not set, dbt errors with:Env var required but not provided: 'MY_VAR'(this is a hard failure during parsing/compilation). [1][2]{{ env_var('MY_VAR', 'some_default') }}→ ifMY_VARis not set, dbt returns the provided default instead of erroring. [3]
So “when variable not set, no default” behavior = fail fast with an exception/error. [1][2]
Sources:
[1] https://airflow.apache.org/docs/apache-airflow-providers-openlineage/2.6.1/_modules/tests/system/openlineage/operator.html
[2] https://stackoverflow.com/questions/76653253/env-var-function-in-dbt-project-is-not-behaving-as-expected
[3] https://docs.open-metadata.org/latest/connectors/ingestion/workflows/dbt/auto-ingest-dbt-core
🏁 Script executed:
# Check if there are more usages of env_var in templates or documentation
rg "env_var" integration_tests/ -t yml -t yaml -C 3Repository: elementary-data/dbt-data-reliability
Length of output: 107
🏁 Script executed:
# Check for all env_var usages in .j2 files more carefully
rg "env_var" integration_tests/profiles/Repository: elementary-data/dbt-data-reliability
Length of output: 305
🏁 Script executed:
# Check if MSSQL_SA_PASSWORD or similar env vars are expected to be set
grep -r "MSSQL_SA_PASSWORD\|env_var" integration_tests/ --include="*.py" --include="*.md"Repository: elementary-data/dbt-data-reliability
Length of output: 209
🏁 Script executed:
# See the full context of the function to understand when secrets are vs aren't loaded
sed -n '80,134p' integration_tests/profiles/generate_profiles.py | cat -nRepository: elementary-data/dbt-data-reliability
Length of output: 2568
🏁 Script executed:
# Check all env_var calls in the entire codebase to see if any have explicit defaults
rg "env_var\(" --type py --type yaml -C 2Repository: elementary-data/dbt-data-reliability
Length of output: 379
🏁 Script executed:
# Also check .j2 files specifically
fd "\.j2$" integration_tests/ | xargs cat | grep -n "env_var"Repository: elementary-data/dbt-data-reliability
Length of output: 135
Preserve required env_var semantics instead of silently returning empty string.
At line 126, missing env vars silently resolve to "" even when no default is provided. The template at line 73 calls env_var('MSSQL_SA_PASSWORD') without an explicit default, which should fail in strict mode (when secrets are loaded) per dbt's documented behavior. The current implementation bypasses StrictUndefined's fail-fast guarantee.
🔧 Suggested fix
+_MISSING = object()
+
+def _env_var(key: str, default: Any = _MISSING) -> str:
+ value = os.environ.get(key)
+ if value is None:
+ if default is _MISSING:
+ raise click.ClickException(f"Missing required environment variable: {key}")
+ return default
+ return value
+
@@
- env.globals["env_var"] = lambda key, default="": os.environ.get(key, default)
+ env.globals["env_var"] = _env_var🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@integration_tests/profiles/generate_profiles.py` at line 126, The current
env.globals["env_var"] lambda silently returns "" when no default is provided;
replace it with a real function (e.g., env_var) that distinguishes between an
omitted default and an explicit default: if a default is supplied, return
os.environ.get(key, default); if no default was supplied and the environment
variable is missing, raise an error (e.g., KeyError or a clear RuntimeError) so
templates like env_var('MSSQL_SA_PASSWORD') fail fast in strict mode; update the
reference to env.globals["env_var"] to point to this new function.
| {% if method == "cli" %} {% do parameters.append(_parameter("method", "cli")) %} | ||
| {% elif method == "service-principal" %} | ||
| {% do parameters.append(_parameter("method", "service-principal")) %} | ||
| {% do parameters.append(_parameter("tenant_id", "<TENANT_ID>")) %} | ||
| {% do parameters.append(_parameter("client_id", "<CLIENT_ID>")) %} | ||
| {% do parameters.append(_parameter("client_secret", "<CLIENT_SECRET>")) %} | ||
| {% else %} | ||
| {% do parameters.append(_parameter("user", target.user)) %} | ||
| {% do parameters.append(_parameter("password", "<PASSWORD>")) %} | ||
| {% endif %} |
There was a problem hiding this comment.
Avoid silently treating unknown auth methods as username/password.
Right now, any typo or unsupported method value falls into the else branch (Line 276) and generates user/password, which can hide config mistakes and produce confusing failures. Please handle explicit default (method is none) separately and surface unsupported methods as placeholders/instructions.
Suggested fix
- {% if method == "cli" %} {% do parameters.append(_parameter("method", "cli")) %}
+ {% if method == "cli" %} {% do parameters.append(_parameter("method", "cli")) %}
{% elif method == "service-principal" %}
{% do parameters.append(_parameter("method", "service-principal")) %}
{% do parameters.append(_parameter("tenant_id", "<TENANT_ID>")) %}
{% do parameters.append(_parameter("client_id", "<CLIENT_ID>")) %}
{% do parameters.append(_parameter("client_secret", "<CLIENT_SECRET>")) %}
- {% else %}
+ {% elif method is none %}
{% do parameters.append(_parameter("user", target.user)) %}
{% do parameters.append(_parameter("password", "<PASSWORD>")) %}
+ {% else %}
+ {% do parameters.append(
+ _parameter(
+ "method",
+ "<AUTH_METHOD>",
+ "Supported methods: cli, service-principal, or omit for username/password authentication.",
+ )
+ ) %}
{% endif %}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {% if method == "cli" %} {% do parameters.append(_parameter("method", "cli")) %} | |
| {% elif method == "service-principal" %} | |
| {% do parameters.append(_parameter("method", "service-principal")) %} | |
| {% do parameters.append(_parameter("tenant_id", "<TENANT_ID>")) %} | |
| {% do parameters.append(_parameter("client_id", "<CLIENT_ID>")) %} | |
| {% do parameters.append(_parameter("client_secret", "<CLIENT_SECRET>")) %} | |
| {% else %} | |
| {% do parameters.append(_parameter("user", target.user)) %} | |
| {% do parameters.append(_parameter("password", "<PASSWORD>")) %} | |
| {% endif %} | |
| {% if method == "cli" %} {% do parameters.append(_parameter("method", "cli")) %} | |
| {% elif method == "service-principal" %} | |
| {% do parameters.append(_parameter("method", "service-principal")) %} | |
| {% do parameters.append(_parameter("tenant_id", "<TENANT_ID>")) %} | |
| {% do parameters.append(_parameter("client_id", "<CLIENT_ID>")) %} | |
| {% do parameters.append(_parameter("client_secret", "<CLIENT_SECRET>")) %} | |
| {% elif method is none %} | |
| {% do parameters.append(_parameter("user", target.user)) %} | |
| {% do parameters.append(_parameter("password", "<PASSWORD>")) %} | |
| {% else %} | |
| {% do parameters.append( | |
| _parameter( | |
| "method", | |
| "<AUTH_METHOD>", | |
| "Supported methods: cli, service-principal, or omit for username/password authentication.", | |
| ) | |
| ) %} | |
| {% endif %} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@macros/utils/cross_db_utils/generate_elementary_profile_args.sql` around
lines 270 - 279, The template currently treats any non-matching method
(including typos) as username/password; update the logic in
generate_elementary_profile_args.sql so that you handle three cases explicitly:
when method is none (append no auth params or a clear placeholder like
_parameter("method", "<NONE>")), when method == "cli" or "service-principal"
(keep the existing branches that append parameters using _parameter and
parameters.append), and a new else branch for unsupported methods that appends a
clear placeholder/instruction parameter (e.g., _parameter("method",
"<UNSUPPORTED: specify 'cli' or 'service-principal' or 'none'>")) rather than
silently inserting user/password; reference the symbols method,
parameters.append, and _parameter to locate and change the conditional branches.
There was a problem hiding this comment.
@haritamar — the suggestion is valid from a correctness standpoint: any typo or unsupported method value silently generates a user/password block, which could confuse users who pass an unexpected string and then wonder why their intended auth method isn't reflected in the generated profile.
That said, a couple of points worth considering before deciding:
-
Consistency with
fabric__: The existingfabric__generate_elementary_profile_argsin the same file uses the exact sameelse → user/passwordpattern. If you fixsqlserver__, you'd want to fixfabric__too to stay consistent — or leave both as-is. -
Precedent in the file:
trino__andbigquery__do use anelsebranch that emits a placeholder/instruction (e.g.,"<AUTH_METHOD>"with a comment pointing to docs). So the pattern I suggested is already established in the codebase. -
Risk level: The impact is low since this only affects the generated profile file that users then edit manually — it won't cause a runtime failure. But it could be a source of subtle confusion.
My recommendation: If the team is comfortable with it, apply the fix to both sqlserver__ and fabric__ together for consistency. If you'd prefer to keep things simple for now, it's reasonable to defer — the risk is minor.
🧠 Learnings used
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 827
File: macros/utils/cross_db_utils/generate_elementary_profile_args.sql:30-31
Timestamp: 2025-07-27T11:33:35.918Z
Learning: In the Elementary dbt package, the Snowflake profile generation macro intentionally always includes private_key and private_key_passphrase parameters regardless of the authentication method. This is a strategic decision to deprecate password authentication and guide all new users toward keypair authentication, even if it means showing keypair parameters when they might not currently be needed.
There was a problem hiding this comment.
I'd recommend deferring this — the risk is low (only affects generated profile files users edit manually), and trino__/bigquery__ already use the same else → placeholder pattern. If you want to fix it, we should fix both fabric__ and sqlserver__ together for consistency. Happy to do it if you want, but it's not a bug.
There was a problem hiding this comment.
Skipped: comment is from another GitHub bot.
…dev, ORDER BY, GROUP BY, buckets_cte) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
macros/edr/data_monitoring/monitors/column_numeric_monitors.sql (1)
69-79:⚠️ Potential issue | 🟠 MajorAdd
sqlserver__variancemacro — SQL Server usesVAR, notvariance.The
variancemacro lacks a SQL Server-specific implementation. T-SQL does not have avariance()function; it usesVAR()for sample variance orVARP()for population variance. This will cause a runtime error when the variance monitor runs against a SQL Server or Azure Synapse Fabric target.Proposed fix
{% macro clickhouse__variance(column_name) -%} varSamp(cast({{ column_name }} as Nullable({{ elementary.edr_type_float() }}))) {%- endmacro %} + +{# T-SQL uses VAR instead of variance #} +{% macro sqlserver__variance(column_name) -%} + var(cast({{ column_name }} as {{ elementary.edr_type_float() }})) +{%- endmacro %}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/edr/data_monitoring/monitors/column_numeric_monitors.sql` around lines 69 - 79, The variance macro needs a SQL Server-specific implementation: add a macro named sqlserver__variance that returns VAR(cast({{ column_name }} as {{ elementary.edr_type_float() }})) (use VAR for sample variance in T-SQL), mirroring how default__variance and clickhouse__variance cast the column to the float edr type; ensure the new macro follows the same macro signature {% macro sqlserver__variance(column_name) -%} so adapter.dispatch("variance", "sqlserver") will resolve correctly.
🧹 Nitpick comments (1)
macros/edr/tests/test_execution_sla.sql (1)
292-300: Thewhen 1 = 1pattern works but could be clearer.The conditional logic is correct—when the deadline hasn't passed, the
when 1 = 1clause acts as a catch-all to returnfalsebefore reachingelse. However, this pattern may confuse future readers.Consider restructuring for clarity:
💡 Alternative approach using Jinja conditional
case when sla_status = 'MET_SLA' then {{ elementary.edr_boolean_literal(false) }} - {# If deadline hasn't passed, don't fail yet #} - {% if not deadline_passed %} - when 1 = 1 then {{ elementary.edr_boolean_literal(false) }} - {% endif %} - else {{ elementary.edr_boolean_literal(true) }} + else {{ elementary.edr_boolean_literal(deadline_passed) }} end as is_failure,This directly expresses the intent: fail only if the deadline has passed and the SLA was not met.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/edr/tests/test_execution_sla.sql` around lines 292 - 300, The current "when 1 = 1" catch-all is confusing; remove that pattern and instead use a Jinja conditional to inject a clear when-clause only when deadline_passed is false so intent is explicit. Update the CASE that produces is_failure (referencing sla_status, is_failure, and elementary.edr_boolean_literal) so that you keep "when sla_status = 'MET_SLA' then false", and then conditionally add a descriptive when-clause like "when deadline has not passed then false" via {% if not deadline_passed %} ... {% endif %} rather than using "when 1 = 1".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@macros/edr/system/system_utils/empty_table.sql`:
- Around line 242-255: Add a fabric-specific override macro similar to
sqlserver__dummy_values to avoid falling back to default__dummy_values (which
uses the invalid T-SQL literal "True"); create a macro named
fabric__dummy_values() that returns the same dummy_values mapping used in
sqlserver__dummy_values (notably setting "boolean": 1 and the same
string/int/float/timestamp values) so dbt's dummy_values() dispatch will pick
the Fabric-safe implementation instead of default__dummy_values.
---
Outside diff comments:
In `@macros/edr/data_monitoring/monitors/column_numeric_monitors.sql`:
- Around line 69-79: The variance macro needs a SQL Server-specific
implementation: add a macro named sqlserver__variance that returns VAR(cast({{
column_name }} as {{ elementary.edr_type_float() }})) (use VAR for sample
variance in T-SQL), mirroring how default__variance and clickhouse__variance
cast the column to the float edr type; ensure the new macro follows the same
macro signature {% macro sqlserver__variance(column_name) -%} so
adapter.dispatch("variance", "sqlserver") will resolve correctly.
---
Nitpick comments:
In `@macros/edr/tests/test_execution_sla.sql`:
- Around line 292-300: The current "when 1 = 1" catch-all is confusing; remove
that pattern and instead use a Jinja conditional to inject a clear when-clause
only when deadline_passed is false so intent is explicit. Update the CASE that
produces is_failure (referencing sla_status, is_failure, and
elementary.edr_boolean_literal) so that you keep "when sla_status = 'MET_SLA'
then false", and then conditionally add a descriptive when-clause like "when
deadline has not passed then false" via {% if not deadline_passed %} ... {%
endif %} rather than using "when 1 = 1".
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3452a2df-29ac-4ce9-87e1-dafa143b3dfa
📒 Files selected for processing (15)
macros/edr/data_monitoring/monitors/column_numeric_monitors.sqlmacros/edr/system/system_utils/buckets_cte.sqlmacros/edr/system/system_utils/empty_table.sqlmacros/edr/system/system_utils/get_config_var.sqlmacros/edr/tests/test_execution_sla.sqlmodels/edr/alerts/alerts_anomaly_detection.sqlmodels/edr/alerts/alerts_dbt_models.sqlmodels/edr/alerts/alerts_dbt_source_freshness.sqlmodels/edr/alerts/alerts_dbt_tests.sqlmodels/edr/alerts/alerts_schema_changes.sqlmodels/edr/data_monitoring/anomaly_detection/anomaly_threshold_sensitivity.sqlmodels/edr/data_monitoring/anomaly_detection/metrics_anomaly_score.sqlmodels/edr/dbt_artifacts/dbt_artifacts_hashes.sqlmodels/edr/run_results/model_run_results.sqlmodels/edr/system/monitors_runs.sql
💤 Files with no reviewable changes (1)
- models/edr/dbt_artifacts/dbt_artifacts_hashes.sql
…cursion - Replace dbt_utils.group_by() with explicit column names in metrics_anomaly_score (T-SQL treats positional refs as literal constants) - Fix make_temp_relation -> edr_make_temp_relation in test.sql and failed_row_count.sql - Add temp table cleanup in failed_row_count.sql - Add sqlserver__variance macro (T-SQL uses VAR not variance) - Fix maxrecursion 10000 -> 0 for unlimited recursion in buckets_cte Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
macros/edr/system/system_utils/buckets_cte.sql (1)
234-256:⚠️ Potential issue | 🔴 CriticalRecursive CTE remains incompatible with Fabric Warehouse runtime.
Line 241 introduces recursion; Fabric Warehouse currently documents recursive queries as unsupported. If this macro is part of Fabric support, this path is a runtime blocker.
What is the current Microsoft Fabric Warehouse support status for recursive CTEs/recursive queries in T-SQL? Please cite the latest T-SQL surface area docs.Source referenced:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/edr/system/system_utils/buckets_cte.sql` around lines 234 - 256, The macro sqlserver__complete_buckets_cte uses a recursive CTE (timestamps) and option (maxrecursion 0), which is incompatible with Fabric Warehouse; replace the recursion with a non-recursive generator (e.g., use a numbers/tally table, a system view that provides sequential integers, or a calendar table) to produce edr_bucket_start/edr_bucket_end rows between min_bucket_start_expr and max_bucket_end_expr stepping by time_bucket, and remove the option (maxrecursion 0); update the macro sqlserver__complete_buckets_cte to build the same sequence using that iterative approach so Fabric-compatible T-SQL is emitted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@macros/edr/data_monitoring/monitors/column_numeric_monitors.sql`:
- Around line 64-67: The fabric adapter needs explicit implementations for
standard deviation and variance so adapter.dispatch finds them; add macro
definitions named fabric__standard_deviation and fabric__variance that either
call the existing sqlserver__standard_deviation and sqlserver__variance wrappers
or directly return T-SQL expressions using stdev(...) and var(cast(... as {{
elementary.edr_type_float() }})); ensure the new macros accept the same
parameter (column_name) and use the same casting helper
(elementary.edr_type_float) so Fabric resolves to the T-SQL behavior instead of
falling back to the default PostgreSQL-style stddev_pop/variance variants.
In `@macros/edr/materializations/test/failed_row_count.sql`:
- Around line 62-73: The temp table is dropped during macro rendering inside
sqlserver__get_failed_row_count_calc_query (the inline do run_query("drop table
if exists " ~ tmp_relation) runs before the returned SELECT is executed), so
move the cleanup to after the query result is fetched: have
sqlserver__get_failed_row_count_calc_query only create the temp table and return
the SELECT (without dropping), then perform run_query("drop table if exists " ~
tmp_relation) from the caller (get_failed_row_count) after calling
elementary.result_value(failed_row_count_query); alternatively change the macro
to return both the query string and a cleanup callable so the caller can execute
cleanup post-result.
---
Duplicate comments:
In `@macros/edr/system/system_utils/buckets_cte.sql`:
- Around line 234-256: The macro sqlserver__complete_buckets_cte uses a
recursive CTE (timestamps) and option (maxrecursion 0), which is incompatible
with Fabric Warehouse; replace the recursion with a non-recursive generator
(e.g., use a numbers/tally table, a system view that provides sequential
integers, or a calendar table) to produce edr_bucket_start/edr_bucket_end rows
between min_bucket_start_expr and max_bucket_end_expr stepping by time_bucket,
and remove the option (maxrecursion 0); update the macro
sqlserver__complete_buckets_cte to build the same sequence using that iterative
approach so Fabric-compatible T-SQL is emitted.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 144d403f-e137-44d4-b5ba-429e01db83fb
📒 Files selected for processing (5)
macros/edr/data_monitoring/monitors/column_numeric_monitors.sqlmacros/edr/materializations/test/failed_row_count.sqlmacros/edr/materializations/test/test.sqlmacros/edr/system/system_utils/buckets_cte.sqlmodels/edr/data_monitoring/anomaly_detection/metrics_anomaly_score.sql
- Renamed all sqlserver__ macros to fabric__ (primary for dbt-fabric production use) - Added sqlserver__ wrapper macros that delegate to fabric__ (for dbt-sqlserver CI testing) - Fixed temp table drop timing bug in failed_row_count.sql - Added fabric__ variants for all T-SQL utility macros across 24 files Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…patch target dbt-sqlserver declares dependencies=["fabric"] in its plugin config, creating dispatch chain: sqlserver__ -> fabric__ -> default__. This means fabric__ macros are automatically found by dbt-sqlserver without explicit sqlserver__ wrappers (same pattern as redshift/postgres). Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
♻️ Duplicate comments (5)
macros/utils/data_types/try_cast_column_to_timestamp.sql (1)
30-41:⚠️ Potential issue | 🟠 MajorFilter WHERE clause to ensure only successfully castable rows are evaluated.
The
try_cast()function returns NULL on cast failure. Without filtering for successful casts,TOP 1may non-deterministically select an un-castable row, causingresult_value()to return NULL and incorrectly reportfalseeven when valid timestamp rows exist.Add
and try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) is not nullto the WHERE clause.Proposed fix
{% macro fabric__try_cast_column_to_timestamp(table_relation, timestamp_column) %} {%- set query %} - select top 1 try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) as timestamp_column + select top 1 1 as has_valid_timestamp from {{ table_relation }} where {{ timestamp_column }} is not null + and try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) is not null {%- endset %} {%- set result = elementary.result_value(query) %} {%- if result is not none %} {{ return(true) }} {%- endif %} {{ return(false) }} {% endmacro %}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/utils/data_types/try_cast_column_to_timestamp.sql` around lines 30 - 41, The WHERE clause in macro fabric__try_cast_column_to_timestamp currently only checks that {{ timestamp_column }} is not null, so TOP 1 can pick a row where try_cast(...) is null and cause elementary.result_value(query) to return null; update the WHERE to also require try_cast({{ timestamp_column }} as {{ elementary.edr_type_timestamp() }}) is not null so only successfully castable rows are considered, keeping the rest of the query (select top 1 ... as timestamp_column from {{ table_relation }}) and the use of elementary.result_value unchanged.macros/edr/system/system_utils/buckets_cte.sql (1)
241-253:⚠️ Potential issue | 🔴 CriticalRecursive CTE is still incompatible with Fabric runtime.
Line 241 introduces a recursive CTE, but Fabric Warehouse currently lists recursive queries as unsupported, so this macro can still fail on its target adapter despite fixing the alias and recursion hint. Please switch this macro to a non-recursive bucket generator (e.g., tally/numbers via
VALUEScross joins) forfabric__.
Source: https://learn.microsoft.com/en-us/fabric/data-warehouse/tsql-surface-areaIn the latest Microsoft Fabric Data Warehouse T-SQL surface area documentation, are recursive queries/recursive CTEs supported in Warehouse?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/edr/system/system_utils/buckets_cte.sql` around lines 241 - 253, The recursive CTE named "timestamps" (using {{ min_bucket_start_expr }}, {{ bucket_end_expr }}, {{ max_bucket_end_expr }}) is incompatible with the Fabric runtime; replace the recursive generator with a non-recursive numbers/tally approach for the fabric__ adapter: detect fabric__ and produce a bucket sequence by cross joining a VALUES-derived numbers table (or repeated VALUES blocks) multiplied to cover the max range, then compute edr_bucket_start and edr_bucket_end from those generated offsets instead of the recursive CTE so the macro avoids recursion and preserves the existing {{ ... }} expressions.macros/utils/cross_db_utils/incremental_strategy.sql (1)
23-25:⚠️ Potential issue | 🟠 MajorLine 24 hardcodes Fabric to
merge, but the shared strategy contract should remain adapter-default (none) for non-Athena paths.Please return
nonehere unless Fabric has a documented exception with tests proving why it must diverge.Suggested change
{%- macro fabric__get_default_incremental_strategy() %} - {% do return("merge") %} + {% do return(none) %} {% endmacro %}Based on learnings: “implement that incremental_strategy should return none (to use the adapter default) for all adapters except Athena, where get_append_only_incremental_strategy() is used.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/utils/cross_db_utils/incremental_strategy.sql` around lines 23 - 25, The macro fabric__get_default_incremental_strategy currently hardcodes "merge"; change it to return the adapter-default "none" for all adapters except Athena: inside fabric__get_default_incremental_strategy() check the current adapter (e.g., inspect target.adapter or target.type) and if it's Athena call/return get_append_only_incremental_strategy() (or the Athena-specific strategy helper), otherwise return "none" so non-Athena paths use the adapter default.macros/utils/cross_db_utils/to_char.sql (1)
27-32:⚠️ Potential issue | 🟠 MajorLine 29 injects
formatdirectly intoCONVERT, which can break Fabric SQL generation.
fabric__edr_to_charcurrently rendersCONVERT(varchar, <expr>, {{ format }})with raw SQL at Line 29. If callers pass cross-db format patterns, this can fail at runtime or diverge from the macro contract used by other adapters. Please normalize this path (e.g., explicitly support one format contract and enforce it in the macro).In SQL Server / Microsoft Fabric Warehouse T-SQL, what are valid values for the third argument of CONVERT(varchar, datetime_expr, style)? Does passing a pattern string like YYYY-MM-DD work, or must it be a style code?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/utils/cross_db_utils/to_char.sql` around lines 27 - 32, The macro fabric__edr_to_char currently injects the caller-supplied format directly into CONVERT which can break Fabric SQL; update fabric__edr_to_char to enforce a single format contract by accepting only integer T-SQL style codes (or mapping a small set of common pattern strings to their style codes) and render the third CONVERT argument as an integer literal; specifically, modify fabric__edr_to_char to: validate if the provided format is an integer literal and output it unquoted, else if it's a known pattern (e.g., 'YYYY-MM-DD', 'YYYY-MM-DD HH24:MI:SS') map it to the corresponding style code and use that code, otherwise raise a clear jinja/dbt error so callers cannot inject arbitrary pattern strings into CONVERT.macros/utils/cross_db_utils/hour_of_week.sql (1)
81-85:⚠️ Potential issue | 🟠 MajorLine 83 uses locale-dependent weekday names, making hour-of-week keys non-deterministic.
datename(weekday, ...)can return different strings across sessions/environments. For stable grouping keys, switch to a deterministic weekday mapping (independent of session language) before concatenating with the hour.Suggested deterministic replacement
{% macro fabric__edr_hour_of_week_expression(date_expr) %} concat( - cast(datename(weekday, {{ date_expr }}) as {{ elementary.edr_type_string() }}), + cast( + case (datediff(day, cast('19000101' as date), cast({{ date_expr }} as date)) % 7) + when 0 then 'Monday' + when 1 then 'Tuesday' + when 2 then 'Wednesday' + when 3 then 'Thursday' + when 4 then 'Friday' + when 5 then 'Saturday' + when 6 then 'Sunday' + end as {{ elementary.edr_type_string() }} + ), cast(datepart(hour, {{ date_expr }}) as {{ elementary.edr_type_string() }}) ) {% endmacro %}In SQL Server / Microsoft Fabric Warehouse, is DATENAME(weekday, date_expr) dependent on session language (SET LANGUAGE / @@LANGUAGE)? If yes, what deterministic alternatives are recommended for stable weekday keys?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@macros/utils/cross_db_utils/hour_of_week.sql` around lines 81 - 85, The macro fabric__edr_hour_of_week_expression uses DATENAME(weekday, {{ date_expr }}) which returns locale-dependent names; replace that with a deterministic numeric weekday computed from a fixed epoch (e.g. DATEDIFF(day, 0, {{ date_expr }}) % 7) so the key is stable across sessions, e.g. cast((DATEDIFF(day, 0, {{ date_expr }}) % 7) as {{ elementary.edr_type_string() }}) concatenated with the existing cast(datepart(hour, {{ date_expr }}) as {{ elementary.edr_type_string() }}).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@macros/edr/system/system_utils/buckets_cte.sql`:
- Around line 241-253: The recursive CTE named "timestamps" (using {{
min_bucket_start_expr }}, {{ bucket_end_expr }}, {{ max_bucket_end_expr }}) is
incompatible with the Fabric runtime; replace the recursive generator with a
non-recursive numbers/tally approach for the fabric__ adapter: detect fabric__
and produce a bucket sequence by cross joining a VALUES-derived numbers table
(or repeated VALUES blocks) multiplied to cover the max range, then compute
edr_bucket_start and edr_bucket_end from those generated offsets instead of the
recursive CTE so the macro avoids recursion and preserves the existing {{ ... }}
expressions.
In `@macros/utils/cross_db_utils/hour_of_week.sql`:
- Around line 81-85: The macro fabric__edr_hour_of_week_expression uses
DATENAME(weekday, {{ date_expr }}) which returns locale-dependent names; replace
that with a deterministic numeric weekday computed from a fixed epoch (e.g.
DATEDIFF(day, 0, {{ date_expr }}) % 7) so the key is stable across sessions,
e.g. cast((DATEDIFF(day, 0, {{ date_expr }}) % 7) as {{
elementary.edr_type_string() }}) concatenated with the existing
cast(datepart(hour, {{ date_expr }}) as {{ elementary.edr_type_string() }}).
In `@macros/utils/cross_db_utils/incremental_strategy.sql`:
- Around line 23-25: The macro fabric__get_default_incremental_strategy
currently hardcodes "merge"; change it to return the adapter-default "none" for
all adapters except Athena: inside fabric__get_default_incremental_strategy()
check the current adapter (e.g., inspect target.adapter or target.type) and if
it's Athena call/return get_append_only_incremental_strategy() (or the
Athena-specific strategy helper), otherwise return "none" so non-Athena paths
use the adapter default.
In `@macros/utils/cross_db_utils/to_char.sql`:
- Around line 27-32: The macro fabric__edr_to_char currently injects the
caller-supplied format directly into CONVERT which can break Fabric SQL; update
fabric__edr_to_char to enforce a single format contract by accepting only
integer T-SQL style codes (or mapping a small set of common pattern strings to
their style codes) and render the third CONVERT argument as an integer literal;
specifically, modify fabric__edr_to_char to: validate if the provided format is
an integer literal and output it unquoted, else if it's a known pattern (e.g.,
'YYYY-MM-DD', 'YYYY-MM-DD HH24:MI:SS') map it to the corresponding style code
and use that code, otherwise raise a clear jinja/dbt error so callers cannot
inject arbitrary pattern strings into CONVERT.
In `@macros/utils/data_types/try_cast_column_to_timestamp.sql`:
- Around line 30-41: The WHERE clause in macro
fabric__try_cast_column_to_timestamp currently only checks that {{
timestamp_column }} is not null, so TOP 1 can pick a row where try_cast(...) is
null and cause elementary.result_value(query) to return null; update the WHERE
to also require try_cast({{ timestamp_column }} as {{
elementary.edr_type_timestamp() }}) is not null so only successfully castable
rows are considered, keeping the rest of the query (select top 1 ... as
timestamp_column from {{ table_relation }}) and the use of
elementary.result_value unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3060fa55-2f7a-4ceb-823a-810bfc841aa9
📒 Files selected for processing (24)
macros/edr/data_monitoring/monitors/column_numeric_monitors.sqlmacros/edr/materializations/test/failed_row_count.sqlmacros/edr/materializations/test/test.sqlmacros/edr/system/system_utils/buckets_cte.sqlmacros/edr/system/system_utils/empty_table.sqlmacros/edr/system/system_utils/get_config_var.sqlmacros/utils/cross_db_utils/current_timestamp.sqlmacros/utils/cross_db_utils/date_trunc.sqlmacros/utils/cross_db_utils/dateadd.sqlmacros/utils/cross_db_utils/datediff.sqlmacros/utils/cross_db_utils/day_of_week.sqlmacros/utils/cross_db_utils/generate_elementary_profile_args.sqlmacros/utils/cross_db_utils/hour_of_day.sqlmacros/utils/cross_db_utils/hour_of_week.sqlmacros/utils/cross_db_utils/incremental_strategy.sqlmacros/utils/cross_db_utils/target_database.sqlmacros/utils/cross_db_utils/time_trunc.sqlmacros/utils/cross_db_utils/timeadd.sqlmacros/utils/cross_db_utils/to_char.sqlmacros/utils/data_types/data_type.sqlmacros/utils/data_types/data_type_list.sqlmacros/utils/data_types/get_normalized_data_type.sqlmacros/utils/data_types/try_cast_column_to_timestamp.sqlmacros/utils/table_operations/insert_rows.sql
🚧 Files skipped from review as they are similar to previous changes (7)
- macros/edr/data_monitoring/monitors/column_numeric_monitors.sql
- macros/edr/system/system_utils/get_config_var.sql
- macros/edr/system/system_utils/empty_table.sql
- macros/utils/cross_db_utils/dateadd.sql
- macros/utils/data_types/get_normalized_data_type.sql
- macros/utils/cross_db_utils/generate_elementary_profile_args.sql
- macros/utils/cross_db_utils/date_trunc.sql
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
- Add fabric__edr_type_long_string returning varchar(max) to avoid truncation - Add fabric__event_freshness_metric_query with explicit GROUP BY columns - Fix LIMIT→TOP in dimension_anomalies test query for T-SQL - Fix boolean literals in column_boolean_monitors.sql - Fix INSERT INTO with CTE pattern in insert_as_select.sql - Fix CTE-in-subquery and CTE-in-IN-clause in get_anomaly_query.sql - All 18 anomalyless integration tests pass locally against SQL Server Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…erflow in CI SQL Server limits identifiers to 128 chars. When dbt-sqlserver creates relations, the schema name (~60 chars in CI) can be prefixed onto the table identifier. Reducing from 80 to 60 ensures the combined name stays within the 128-char limit even with long CI schema names. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…MIT→TOP for Fabric/T-SQL - Add fabric__create_temp_table using view-based approach to avoid CTE-in-subquery errors - Force Fabric/sqlserver targets to use temp table path in test materialization - Add fabric__edr_make_temp_relation to handle dict model nodes - Add 'string' to fabric data_type_list for proper normalization - Add STRING→VARCHAR mapping in fabric__get_normalized_data_type - Convert hardcoded LIMIT 1 to TOP 1 in integration test SQL templates - Apply pre-commit formatting (black, sqlfmt) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…n INSERT VALUES Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
macros/edr/data_monitoring/schema_changes/get_columns_changes_query.sql
Outdated
Show resolved
Hide resolved
macros/edr/data_monitoring/schema_changes/get_columns_snapshot_query.sql
Outdated
Show resolved
Hide resolved
macros/edr/data_monitoring/monitors_query/column_monitoring_query.sql
Outdated
Show resolved
Hide resolved
…edr_boolean_literal - get_columns_snapshot_query: replace inline target.type checks with edr_boolean_literal() - get_columns_changes_query: extract to dispatched macro (fabric__/default__get_column_changes_from_baseline_cur), use edr_boolean_literal, restore dbt_utils.group_by(9) to fix test_schema_changes failures - column_monitoring_query: extract GROUP BY to dispatched column_monitoring_group_by macro - insert_as_select: convert to dispatched macro (fabric__/default__insert_as_select) - get_anomaly_query: convert both macros to dispatched (fabric__/default__ variants) - create_temp_table: simplify DROP VIEW IF EXISTS syntax - Rename CI target from 'fabric' to 'sqlserver' (honest about what we test) - Add sqlserver__generate_elementary_profile_args (separate from fabric__) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
… compatibility Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…nges (fixes Redshift) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…ting - Cast datetime2 to explicit precision (6) to avoid unsupported datetime2(7) - Cast sysutcdatetime()/getdate() to datetime2(6) in current_timestamp macros - Rewrite fabric__insert_as_select to use temp view pattern (Fabric rejects INSERT...EXEC) - Replace CONCAT() with + operator in full_names macros to avoid nvarchar return type - Cast PARSENAME results to varchar in fabric__full_name_split - Cast datename() result to varchar in fabric__edr_day_of_week_expression - Add fabric__generate_surrogate_key using + operator instead of dbt.concat() - Add fabric__schema_change_description_column with varchar cast - Add fabric__edr_list_schemas and fabric__edr_schema_exists using sys.schemas Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add fabric target to profiles.yml.j2 with ServicePrincipal auth - Add fabric to cloud targets matrix in test-all-warehouses.yml - Install ODBC driver for fabric target in test-warehouse.yml - Include fabric in post-test schema cleanup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…ric__ counterparts Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…s instead of SQL text Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…turning non-string values
Fixes 'unsupported operand type(s) for -: str and int' in fabric__edr_make_temp_relation
when called via sqlserver__edr_make_temp_relation. The {{ }} pattern stringifies return
values (e.g. integers, Relations), breaking downstream arithmetic and method calls.
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
….ai/proxy/github.com/elementary-data/dbt-data-reliability into devin/ELE-5282-1772640713
dbt-sqlserver declares dependencies=['fabric'], so the dispatch chain sqlserver__ -> fabric__ -> default__ handles everything automatically. Only sqlserver__ macros with genuinely different implementations are kept: - sqlserver__target_database (needed by CLI) - sqlserver__create_temp_table (T-SQL CTE workaround) - sqlserver__generate_elementary_profile_args (different connection params) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…WORD env var Matches how postgres, clickhouse, and dremio handle local credentials. - Hardcode password in docker-compose-sqlserver.yml - Hardcode password in profiles.yml.j2 (sqlserver target) - Remove MSSQL_SA_PASSWORD from workflow env, Start SQL Server step, and Write dbt profiles step - Remove env_var() helper from generate_profiles.py (no longer needed) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
| assert test_result["status"] == "pass" | ||
|
|
||
| ts_type = ( | ||
| "datetime2" if dbt_project.target in ("fabric", "sqlserver") else "timestamp" |
There was a problem hiding this comment.
I think we have an edr_timestamp or something like that, could we use it? Not sure
| {% endmacro %} | ||
|
|
||
| {% macro freshness_description() %} | ||
| {% set c = " + " if target.type in ["fabric", "sqlserver"] else " || " %} |
There was a problem hiding this comment.
Maybe we should use dbt.concat instead like we did elsewhere?
There was a problem hiding this comment.
Yes — switching to dbt.concat() now. It'll handle the + vs || dispatch automatically.
| {% endmacro %} | ||
|
|
||
| {% macro table_metric_description() %} | ||
| {% set c = " + " if target.type in ["fabric", "sqlserver"] else " || " %} |
| {% do new_list.append(elementary.edr_quote(separator)) %} | ||
| {% endif %} | ||
| {% endfor %} | ||
| {{ return(elementary.join_list(new_list, " + ")) }} |
There was a problem hiding this comment.
Should we use dbt.concat to avoid duplicating the macro?
There was a problem hiding this comment.
Good catch — with the new edr_concat macro, the fabric__ variant can use edr_concat which handles the varchar cast, and the join operator becomes dbt.concat-compatible. Will refactor to use dbt.concat / edr_concat.
- Use dbt.concat() in anomaly_detection_description.sql instead of inline || / + - Use edr_boolean_literal() in get_anomaly_scores_query.sql instead of inline target.type checks - Remove confusing T-SQL comment from get_anomaly_query.sql - Fix try_cast_column_to_timestamp WHERE clause bug (check try_cast is not null) - Create edr_dbt_concat dispatched macro wrapping dbt.concat with varchar cast on Fabric - Use edr_dbt_concat in schema_change_description_column (remove fabric__ variant) - Use edr_dbt_concat in list_concat_with_separator (remove fabric__ variant) - Use edr_dbt_concat in generate_surrogate_key (remove fabric__ variant) - Unify GROUP BY to use named columns for all adapters (column_monitoring, table_monitoring) - Add SelectLimit helper class and DbtProject.select_limit() for cross-adapter LIMIT/TOP - Update 9 test files to use SelectLimit instead of inline is_tsql checks - Use DbtProject.is_tsql property in test_anomaly_exclude_metrics.py Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
| count(case when not should_exclude_from_training then metric_value end) over (partition by {{ partition_by_keys }} order by bucket_end asc rows between unbounded preceding and current row) as training_set_size, | ||
| last_value(case when not should_exclude_from_training then bucket_end end) over (partition by {{ partition_by_keys }} order by bucket_end asc rows between unbounded preceding and current row) training_end, | ||
| first_value(case when not should_exclude_from_training then bucket_end end) over (partition by {{ partition_by_keys }} order by bucket_end asc rows between unbounded preceding and current row) as training_start | ||
| {%- set exclude_filter = 'should_exclude_from_training = ' ~ elementary.edr_boolean_literal(false) -%} |
There was a problem hiding this comment.
Move this set above the query (before the outer set)
There was a problem hiding this comment.
Done — moved {% set exclude_filter %} above the {% set anomaly_scores_query %} block so it's set before the query starts.
| {#- T-SQL does not support positional GROUP BY references and rejects | ||
| GROUP BY on constant expressions (e.g. 'schema_change'). | ||
| Group by the 6 source columns from all_column_changes instead; | ||
| all 9 output columns are deterministic functions of these. -#} |
There was a problem hiding this comment.
In that case, is the separate macro with dispatch still required?
| test_unique_id ~ ": starting test materialization hook" | ||
| ) %} | ||
| {% if elementary.get_config_var("tests_use_temp_tables") %} | ||
| {% if target.type in ("fabric", "sqlserver") or elementary.get_config_var( |
There was a problem hiding this comment.
Can we create an is_tsql macro?
There was a problem hiding this comment.
Done — created elementary.is_tsql() as a dispatched macro (default__ returns false, fabric__ returns true). Updated this line to use it. Since dbt-sqlserver dispatches through fabric__, it also returns true automatically.
|
|
||
| {# Fabric/T-SQL: Python booleans (True/False) pass Jinja's "is number" test | ||
| but T-SQL has no TRUE/FALSE keywords, so they must be rendered as | ||
| cast(1 as bit) / cast(0 as bit). #} |
There was a problem hiding this comment.
Is this the only reason for the override?
Could we use the edr_type_boolean macro that you added and merge the implementation?
There was a problem hiding this comment.
Good call — yes, the only reason for the override was boolean rendering. Merged the fabric__render_value into default__render_value by adding a value is boolean check that uses edr_boolean_literal(value). This handles T-SQL via dispatch (cast(1/0 as bit)) while keeping true/false for standard SQL. Removed the separate fabric__render_value entirely.
…ue, remove dispatch Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
TEXT is deprecated in SQL Server and dbt-sqlserver creates string columns as varchar. Without this normalization, test_schema_changes_from_baseline detects a spurious type_changed for columns with TEXT baseline vs VARCHAR actual type. This matches the fabric__ variant's behavior. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…nges query Dremio requires positional GROUP BY (dbt_utils.group_by(9)) because it is strict about detected_at column ambiguity (both a source column and a SELECT alias). Use is_tsql() to conditionally apply named GROUP BY for T-SQL (which doesn't support positional references) and positional GROUP BY for all other adapters. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
haritamar
left a comment
There was a problem hiding this comment.
Review: Microsoft Fabric DWH Integration
Nice work overall — the architectural approach of using fabric__ dispatched macros with sqlserver auto-dispatching via dependencies=["fabric"] is clean. The cross-warehouse changes (positional GROUP BY → explicit columns, || → dbt.concat(), boolean literals) are the right call for universal compatibility. A few things worth addressing:
1. Fragile GROUP BY in metrics_anomaly_score.sql (High)
The GROUP BY now duplicates the full CASE WHEN expression for anomaly_score (~8 lines). If the anomaly score logic changes in the SELECT, the GROUP BY must change identically — easy to forget. Consider extracting the anomaly_score computation into a prior CTE so the GROUP BY just references a column name:
-- In a preceding CTE:
case when ... end as anomaly_score
-- Then the final CTE just does:
group by ..., anomaly_scoreThis would also make the explicit GROUP BY less verbose across the board.
2. fabric__insert_as_select — EXEC quote escaping (Medium)
The select_query | replace("'", "''") escaping inside EXEC('...') is minimal. If select_query ever contains string literals with already-escaped quotes (e.g. from user-defined test configs), this could double-escape and produce invalid SQL. Worth adding a comment noting this limitation, or testing with a query that contains single quotes.
3. fabric__query_test_result_rows fetches all rows (Medium)
The Fabric override runs the full query and slices in Jinja (result[:sample_limit]) instead of using TOP in SQL. For tests with large result sets this pulls everything into memory. I understand CTEs prevent adding TOP easily — would wrapping the CTE query into a SELECT TOP ... FROM (subquery) or materializing into a temp table first be feasible? If not, a comment explaining the tradeoff would help.
4. _fmt_samples_query duplicated across 6+ test files (Low)
The identical helper function appears in test_column_pii_sampling.py, test_disable_samples_config.py, test_override_samples_config.py, test_sampling.py, test_sampling_pii.py, and a variant in test_exposure_schema_validity.py. Consider extracting to a shared test utility (e.g. in dbt_project.py or conftest.py).
5. sqlserver__create_temp_table calls fabric__create_temp_table directly (Low)
This creates a hard coupling — if someone overrides fabric__create_temp_table, the sqlserver version won't pick it up. Consider extracting the shared T-SQL logic into a private helper macro (e.g. _tsql_create_temp_table) that both dispatches call.
6. edr_dbt_concat varchar(4096) cast (Low)
The Fabric override casts all concat arguments to varchar(4096). Worth adding a brief comment noting this as a known limitation for very long strings.
7. Unresolved review threads
A few threads where Devin acknowledged feedback but may not have pushed fixes:
schema_exists.sql— haritamar asked to use{% set query %}Jinja templating (non-outdated thread)test_anomaly_exclude_metrics.py— question aboutedr_timestampmacro (no reply)anomaly_detection_description.sql— 2 threads about usingdbt.concat(Devin agreed, threads still open)get_anomaly_scores_query.sql— usingedr_boolean_literal()(Devin agreed, thread still open)
Worth verifying these are addressed in the latest commits before merging.
The cross-warehouse changes (GROUP BY, boolean literals, dbt.concat) are the right approach — universal syntax is cleaner than per-adapter branching. CI passing across all targets gives good confidence. Main concern is the metrics_anomaly_score.sql GROUP BY maintainability.
|
Thanks for the thorough review! Addressing all 7 items now:
|
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…ned _tsql_create_temp_table Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
| bucket_end, | ||
| {{ bucket_seasonality_expr }} as bucket_seasonality, | ||
| {{ test_configuration.anomaly_exclude_metrics or 'FALSE' }} as is_excluded, | ||
| case when {{ test_configuration.anomaly_exclude_metrics or '1=0' }} then {{ elementary.edr_boolean_literal(true) }} else {{ elementary.edr_boolean_literal(false) }} end as is_excluded, |
There was a problem hiding this comment.
we have many instances of
case when <condition> then TRUE else FALSE
Can we make a generic macro for that? e.g. render_condition_as_boolean? (or if you think another name is better use it)
and probably it can default to false so the "or false" bit can be avoided as well.
There was a problem hiding this comment.
Done — created edr_condition_as_boolean(condition, default_false=true) macro in get_config_var.sql. It renders case when <condition> then <TRUE> else <FALSE> end using edr_boolean_literal. When default_false=true (the default), a falsy condition produces FALSE without needing or '1=0'.
Applied across: get_anomaly_scores_query.sql, get_columns_snapshot_query.sql, anomaly_threshold_sensitivity.sql, model_run_results.sql, metrics_anomaly_score.sql, get_anomaly_query.sql.
| case | ||
| when cast({{ column_name }} as {{ elementary.edr_type_bool() }}) = true | ||
| when | ||
| cast({{ column_name }} as {{ elementary.edr_type_bool() }}) |
There was a problem hiding this comment.
Maybe let's have edr_is_true and edr_is_false generic macros, and use them here / elsewhere?
There was a problem hiding this comment.
Done — created edr_is_true(expr) and edr_is_false(expr) macros in get_config_var.sql. They dispatch to handle T-SQL (= 1/= 0 for bit type) vs standard SQL (= true/= false). Applied here in column_boolean_monitors.sql.
macros/edr/data_monitoring/monitors_query/table_monitoring_query.sql
Outdated
Show resolved
Hide resolved
| {% endmacro %} | ||
|
|
||
| {% macro fabric__row_count_metric_query(metric, metric_properties) %} | ||
| {# Fabric / T-SQL: avoid CTE since this will be embedded inside a parent CTE. #} |
There was a problem hiding this comment.
should we just use this implementation as the default one?
There was a problem hiding this comment.
Done — made the CTE-free implementation the default default__row_count_metric_query. Removed fabric__row_count_metric_query since it's now identical to the default.
| {% set tbl_ref = table_relation.include(database=false) %} | ||
|
|
||
| {% do elementary.run_query("DROP VIEW IF EXISTS " ~ vw_ref) %} | ||
| {% do elementary.run_query("CREATE VIEW " ~ vw_ref ~ " AS " ~ sql_query) %} |
There was a problem hiding this comment.
I don't remember - would it be possible to have temporary views? (and save the need to drop them)
| ) -%} | ||
| {%- set insert_query %} | ||
| IF OBJECT_ID('{{ tmp_view_name }}', 'V') IS NOT NULL DROP VIEW {{ tmp_view_name }}; | ||
| EXEC('CREATE VIEW {{ tmp_view_name }} AS {{ select_query | replace("'", "''") }}'); |
There was a problem hiding this comment.
Also here - would it be possible to make the view temporary at the DB level, to avoid the need of dropping it?
| anomaly_score, | ||
| case | ||
| when abs(anomaly_score) >= 1.5 then true else false | ||
| when abs(anomaly_score) >= 1.5 |
There was a problem hiding this comment.
see my comment above about rendering a condition as boolean, maybe can simplify this and the below fields.
| then true | ||
| else false | ||
| then {{ elementary.edr_boolean_literal(true) }} | ||
| else {{ elementary.edr_boolean_literal(false) }} |
There was a problem hiding this comment.
see also my comment re. rendering condition as boolean
| then true | ||
| else false | ||
| then {{ elementary.edr_boolean_literal(true) }} | ||
| else {{ elementary.edr_boolean_literal(false) }} |
There was a problem hiding this comment.
see my comment re. rendering condition as boolean (here and below)
…ame edr_dbt_concat to edr_concat, consolidate full_names.sql, reduce table_monitoring_query duplication, optimize fabric test sampling with temp view + TOP Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
| = {{ elementary.edr_boolean_literal(true) }} | ||
| {{ | ||
| elementary.edr_is_true( | ||
| "cast(" |
There was a problem hiding this comment.
is the cast a must? or can edr_is_true handle it?
There was a problem hiding this comment.
The cast is needed — the column might not be a boolean type natively. Cleaned up the code: extracted the cast expression into a {% set bool_expr %} variable first, then pass it to edr_is_true/edr_is_false. Also switched count_false to use edr_is_false instead of negating edr_is_true.
| = {{ elementary.edr_boolean_literal(true) }} | ||
| {{ | ||
| elementary.edr_is_true( | ||
| "cast(" |
| end as is_the_last_invocation_of_the_day | ||
| {{ | ||
| elementary.edr_condition_as_boolean( | ||
| "first_value(invocation_id) over (partition by " |
There was a problem hiding this comment.
If we can first above compute this condition as a variable, e.g.
{% set <condition> %}
... jinja goes here ..
{% endset %}
I think it can be more readable
There was a problem hiding this comment.
Done — extracted the window spec and conditions into {% set %} variables:
{%- set day_partition = elementary.edr_time_trunc("day", "run_results.generated_at") -%}
{%- set day_window = "over (partition by " ~ day_partition ~ " order by ...)" -%}
{%- set first_inv_cond = "first_value(invocation_id) " ~ day_window ~ " = invocation_id" -%}
{%- set last_inv_cond = "last_value(invocation_id) " ~ day_window ~ " = invocation_id" -%}
{{ elementary.edr_condition_as_boolean(first_inv_cond) }} as is_the_first_invocation_of_the_day,
{{ elementary.edr_condition_as_boolean(last_inv_cond) }} as is_the_last_invocation_of_the_dayMuch more readable now.
… conditions in model_run_results for readability Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
| ) -%} | ||
| {%- set last_inv_cond = ( | ||
| "last_value(invocation_id) " ~ day_window ~ " = invocation_id" | ||
| ) -%} |
There was a problem hiding this comment.
move all these sets to the top
There was a problem hiding this comment.
Done — moved all {% set %} variables to the top of the file (before the with clause).
…ability Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
… to avoid missing test audit schema Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…roach in default for all adapters Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…_result_rows Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
feat: add Microsoft Fabric DWH integration (ELE-5282)
Summary
Adds Microsoft Fabric Data Warehouse as a supported adapter in the elementary dbt package. Fabric uses T-SQL syntax via the
dbt-fabricadapter maintained by Microsoft. Sincedbt-sqlserverdeclaresdependencies=["fabric"], allfabric__macros are auto-dispatched for both adapters (sqlserver__→fabric__→default__).Macro overrides added (~25
fabric__macros):datediff,date_trunc,time_trunc,to_char,timeadd,dateadd,current_timestamptype_bool→bit,type_string→varchar(4096),type_timestamp→datetime2,data_type_list,get_normalized_data_type,try_cast_column_to_timestampcomplete_buckets_cte(recursive CTE withoption(maxrecursion 10000)),escape_special_chars,create_temp_table(view-based CTE workaround),edr_make_temp_relation(handles dict model nodes)target_database,incremental_strategy→merge,generate_elementary_profile_args,day_of_week,hour_of_day,hour_of_weekquery_test_result_rows,get_failed_row_count_calc_query(use temp tables since T-SQL doesn't support nested CTEs)get_columns_snapshot_query,get_columns_changes_query(unnest CTEs,||→dbt.concat(), TEXT normalization)render_valuenow usesedr_boolean_literal()for all adapters; T-SQL renderscast(1/0 as bit)instead of bareTrue/FalseNew cross-adapter helper macros:
edr_concat(fields)— wrapsdbt.concat()with varchar cast on Fabric (replaces old 2-argedr_concatandedr_dbt_concat)edr_condition_as_boolean(condition, default_false=true)— renderscase when <cond> then <TRUE> else <FALSE> endusingedr_boolean_literal(). Eliminates repeated boolean CASE patterns across codebase.edr_is_true(expr)/edr_is_false(expr)— dispatched boolean comparisons (= 1/= 0for T-SQL bit type,= true/= falsefor standard SQL)Only 2
sqlserver__macros exist (everything else auto-dispatches fromfabric__):sqlserver__target_database— needed by the CLIsqlserver__generate_elementary_profile_args— different connection params (SQL auth vs Azure AD)Infrastructure:
sqlserver)Elementary123!hardcoded directly indocker-compose-sqlserver.ymlandprofiles.yml.j2(matching how postgres/clickhouse/dremio handle local credentials — no env var indirection)timeout-minutes: 60on the test job to prevent hung CI runs (e.g. dead dbt-fusion workers)LIMIT 1→TOP 1for T-SQL compatibility viaSelectLimithelper classtrue/false→edr_boolean_literal()macro across alert/anomaly modelsGROUP BY→ explicit column lists (T-SQL doesn't supportGROUP BY 1,2,3)Changes addressing PR review:
get_columns_snapshot_query: Replaced inlinetarget.typechecks withedr_boolean_literal()get_columns_changes_query: Extracted tofabric__/default__get_column_changes_from_baseline_curdispatchcolumn_monitoring_query: Extracted GROUP BY tofabric__/default__column_monitoring_group_bydispatchinsert_as_select: Converted tofabric__/default__insert_as_selectdispatchget_anomaly_queryandget_anomaly_query_for_dimension_anomalies: Both now dispatchedschema_changes_query_group_by: New dispatched macro for GROUP BY inget_columns_changes_query_generic—fabric__variant uses explicit 6-column GROUP BY (source columns fromall_column_changes) instead of positional 9-column referenceconcat()with{{ dbt.concat([...]) }}for cross-DB compatibility (Redshift'sconcat()only accepts 2 args; dbt.concat() dispatches per adapter)fabric→sqlserver: More accurate since we test against SQL Server Docker, not real Fabric. Kept all macros asfabric__(shared T-SQL implementation).DROP VIEW IF EXISTSincreate_temp_table.sql(SQL Server 2016+ support).group by 1,2,3with explicit column names in several models (metrics_anomaly_score.sql,monitors_runs.sql) and removedorder byfromdbt_artifacts_hashes.sqlCTE for consistency.Updates since last revision (latest commits)
Addressed PR review round 3 (comments 107-120):
Created
edr_condition_as_booleanmacro — Consolidates the repeatedcase when <condition> then TRUE else FALSE endpattern into a single dispatched macro. Applied across 6 files:get_anomaly_scores_query.sql,get_columns_snapshot_query.sql,anomaly_threshold_sensitivity.sql,model_run_results.sql,metrics_anomaly_score.sql,get_anomaly_query.sql. The macro usesedr_boolean_literal()internally for correct T-SQL rendering.Created
edr_is_true/edr_is_falsemacros — Dispatched boolean comparison helpers. On T-SQL they compare against1/0(bit type), on standard SQL they usetrue/false. Applied incolumn_boolean_monitors.sqlfor casting boolean columns.Renamed
edr_dbt_concat→edr_concat— Generalized the existing unused 2-argedr_concatto accept a list of fields (likeedr_dbt_concatdid). Updated all 3 call sites:list_concat_with_separator.sql,generate_surrogate_key.sql,get_columns_changes_query.sql.Consolidated
full_names.sql— Now usesedr_concatfor all adapters. Removed redundantfabric__database_full_name,fabric__schema_full_name,fabric__table_full_namevariants (the default implementations work sinceedr_concathandles T-SQL casting).Made
row_count_metric_queryCTE-free as default — The Fabric-specific CTE-free implementation is nowdefault__row_count_metric_query. Removedfabric__row_count_metric_queryentirely since it's identical. This change affects all adapters but is safer since it avoids nested CTEs.Extracted helper macros in
table_monitoring_query.sqlto reduce duplication betweendefault__freshness_metric_queryandfabric__freshness_metric_query:_freshness_lag_expr()— shared SQL for computing freshness as time since last update_bucket_end_freshness_expr()— dispatched (default__usesleast(),fabric__usesCASE WHENsince T-SQL doesn't supportleast())_freshness_final_select(metric)— shared final SELECT columnsRemoved
fabric__get_unified_metrics_query— Merged the UNION ALL logic intodefault__get_unified_metrics_query. The fabric variant was identical to what the default implementation should be (CTE-free, direct UNION ALL), so all adapters now use the CTE-free approach.Optimized
fabric__query_test_result_rows— Changed from fetching all rows into Python memory (result[:sample_limit]) to using temp view +SELECT TOPin SQL. This avoids memory issues for large test result sets. The view is created with a timestamp + random suffix to avoid collisions, then dropped after sampling.Addressed GitHub comment (haritamar):
Fixed sqlserver CI failures (temp view schema issue) — The
fabric__query_test_result_rowstemp view was initially usingmodel.schema(per-worker test audit schema), which doesn't exist in parallel xdist execution. Changed totarget.schemaas a safer fallback (commit acb01a0).Fixed fabric cloud CI failures (schema doesn't exist) — On fabric cloud with parallel pytest-xdist execution, even
target.schemamay not exist or be accessible in per-worker contexts. Final fix useselementary.get_package_database_and_schema()to get the elementary package schema (guaranteed to exist since it contains the package models), falling back totarget.schemaif the package schema lookup fails (commit a2978b5).CI Status: ✅ All 22 checks pass (0 failures)
Review & Testing Checklist for Human
Critical (MUST verify before merge):
Temp view schema using
elementary.get_package_database_and_schema()— This is the core fix for fabric cloud CI failures. The approach relies on the elementary package schema always existing. If the package schema lookup fails or returns an inaccessible schema, the fallback totarget.schemacould fail silently on fabric cloud. Verify test execution works on real Fabric workspace with parallel pytest-xdist workers.edr_condition_as_booleanin anomaly detection logic — This new macro is used inmetrics_anomaly_score.sqlandanomaly_threshold_sensitivity.sql(core anomaly detection). The implementation uses string concatenation for complex window function conditions inmodel_run_results.sql. Verify anomaly detection produces identical results before/after across all adapters (especially edge cases like stationary data, nulls).row_count_metric_querymade CTE-free for ALL adapters — This changes the default implementation across postgres, snowflake, bigquery, etc. (not just T-SQL). The CTE-free version uses direct UNION ALL of metric queries instead of wrapping them in a CTE. Verify row count metrics produce identical results on all adapters.fabric__query_test_result_rowstemp view cleanup — The optimized version creates real views in the database schema (edr_test_sample_<timestamp>_<random>). If cleanup fails (e.g., query timeout, process killed), orphan views accumulate. Monitor foredr_test_sample_*views in fabric/sqlserver schema after test runs.Removed
fabric__get_unified_metrics_query— Per haritamar's comment, this macro was supposed to be removed but was still present. Now removed entirely, with the UNION ALL logic merged intodefault__get_unified_metrics_query. Verify metric collection works correctly on fabric/sqlserver (especially with multiple metric types).Recommended:
Test against real Fabric workspace — All local testing is against SQL Server 2022 Docker. Fabric CI runs pass, but there may be subtle T-SQL dialect differences in real Fabric workspaces (Azure Synapse Analytics features/limitations).
Verify
anomaly_threshold_sensitivity.sqlrewrite — Git shows 78% rewrite. The file was heavily refactored usingedr_condition_as_booleanfor all 7 threshold checks (1.5, 2, 2.5, 3, 3.5, 4, 4.5). Spot-check anomaly threshold logic produces correct results.Verify
metrics_anomaly_score.sqlCTE chain — Added intermediatetime_window_scoredCTE to fix "column reference in same SELECT" error. Verify anomaly score calculation is correct and the CTE refactoring doesn't change semantics.Test Plan
CI validation: ✅ Complete - All 22 checks pass
Local Docker test (optional):
cd integration_tests docker compose -f docker-compose-sqlserver.yml up -d pytest tests -vvv --target sqlserver -n4Fabric cloud verification (recommended):
edr_test_sample_*views after test runsNotes
Elementary123!is hardcoded indocker-compose-sqlserver.ymlandprofiles.yml.j2for local dev/CI only (no secrets needed)fabric→sqlserver: Reflects that CI tests against SQL Server Docker, not Fabric. Macros stay asfabric__(shared T-SQL).sqlserver__macros exist:target_database(CLI),generate_elementary_profile_args(connection params).create_temp_tablewas removed since dispatch chain handles it automatically.model.schema→ failed on per-worker test audit schemas → switched totarget.schema→ failed on fabric cloud where target.schema doesn't exist → final fix useselementary.get_package_database_and_schema()(guaranteed to exist).Summary by CodeRabbit
New Features
edr_concat,edr_condition_as_boolean,edr_is_true,edr_is_falsefor cleaner boolean/string handling.Compatibility
get_column_changes_from_baseline_cur,column_monitoring_group_by,insert_as_select,get_anomaly_query,schema_changes_query_group_by).render_valuemacro with unified boolean literal handling viaedr_boolean_literal().GROUP BYreplaced with explicit column names in anomaly/run models.dbt.concat()instead of rawconcat()(fixes Redshift compatibility).edr_dbt_concat→edr_concat(generalized to accept list of fields).SelectLimitclass) for test infrastructure.Bug Fixes
metrics_anomaly_scorecolumn reference error by introducing intermediatetime_window_scoredCTE._tsql_create_temp_tablehelper by inlining logic intofabric__create_temp_table.elementary.get_package_database_and_schema()(guaranteed to exist in all contexts).fabric__get_unified_metrics_queryper reviewer feedback - merged UNION ALL logic into default implementation.Refactoring
edr_condition_as_booleanacross 6+ files.table_monitoring_query.sqlfreshness queries via helper macros (_freshness_lag_expr,_bucket_end_freshness_expr,_freshness_final_select).row_count_metric_queryCTE-free as default (affects all adapters).fabric__variants infull_names.sqlandget_unified_metrics_query.fabric__query_test_result_rowsto use temp view + TOP instead of fetching all rows.Tests
Chores
fabrictosqlserverfor accuracy.sqlserver__create_temp_table(auto-dispatched viadependencies=["fabric"]).