feat(pkg-r): Lazy SQL tibble sources #165

gadenbuie · 2025-12-10T21:56:02Z

Closes #51

How does this work?

pkgload::load_all("~/work/posit-dev/querychat/pkg-r")
#> ℹ Loading querychat
library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)

con <- DBI::dbConnect(duckdb::duckdb())
duckdb::dbWriteTable(con, "mtcars", mtcars)

mtcars_db <- tbl(con, "mtcars")

Simple tbl source

First, we can create a new data source from the tbl() object.

src <- TblLazySource$new(mtcars_db)
(res <- src$execute_query("SELECT * FROM mtcars WHERE cyl > 4"))
#> # Source:   SQL [?? x 11]
#> # Database: DuckDB 1.4.1 [root@Darwin 25.0.0:R 4.5.2/:memory:]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  4  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  5  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  6  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  7  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#>  8  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
#>  9  16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
#> 10  17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
#> # ℹ more rows

Which returns a tbl() that can be chained into further dplyr operations.

res |> count(cyl, gear)
#> # Source:   SQL [?? x 3]
#> # Database: DuckDB 1.4.1 [root@Darwin 25.0.0:R 4.5.2/:memory:]
#>     cyl  gear     n
#>   <dbl> <dbl> <dbl>
#> 1     6     5     1
#> 2     6     3     2
#> 3     8     3    12
#> 4     6     4     4
#> 5     8     5     2

Complicated tbl source

This same process even works for more complicated tibbles, like the result of
of dplyr pipeline on SQL tibbles.

mtcars_6_8_cyl <- mtcars_db |> inner_join(mtcars_db |> dplyr::filter(cyl > 4))
#> Joining with `by = join_by(mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear,
#> carb)`
src <- TblLazySource$new(mtcars_6_8_cyl)

And again, the result is a tbl() that can be folded into further dplyr
operations.

(res2 <- src$execute_query("SELECT * FROM mtcars_6_8_cyl WHERE gear < 6"))
#> # Source:   SQL [?? x 11]
#> # Database: DuckDB 1.4.1 [root@Darwin 25.0.0:R 4.5.2/:memory:]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  4  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  5  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  6  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  7  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#>  8  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
#>  9  16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
#> 10  17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
#> # ℹ more rows
res2 |> count(cyl, gear)
#> # Source:   SQL [?? x 3]
#> # Database: DuckDB 1.4.1 [root@Darwin 25.0.0:R 4.5.2/:memory:]
#>     cyl  gear     n
#>   <dbl> <dbl> <dbl>
#> 1     6     3     2
#> 2     8     5     2
#> 3     6     4     4
#> 4     6     5     1
#> 5     8     3    12

The way we make this work is by extracting the SQL for the dplyr pipeline up
until we create a data source, and then, for complicated queries at least, we
use a local CTE, letting the LLM write queries against that CTE as if it were
a fixed table.

src$complete_query("SELECT * FROM mtcars_6_8_cyl WHERE gear < 6") |> cat()
#> Error in cat(src$complete_query("SELECT * FROM mtcars_6_8_cyl WHERE gear < 6")): attempt to apply non-function

Amazingly, we can even apply this strategy to get the schema of the CTE. This
took a small amount of updating to get_schema_impl() to make it work, but
the core logic is exactly the same.

src$get_schema() |> cat()
#> Table: mtcars_6_8_cyl
#> Columns:
#> - mpg (FLOAT)
#>   Range: 10.4 to 21.4
#> - cyl (FLOAT)
#>   Range: 6 to 8
#> - disp (FLOAT)
#>   Range: 145 to 472
#> - hp (FLOAT)
#>   Range: 105 to 335
#> - drat (FLOAT)
#>   Range: 2.76 to 4.22
#> - wt (FLOAT)
#>   Range: 2.62 to 5.424
#> - qsec (FLOAT)
#>   Range: 14.5 to 20.22
#> - vs (FLOAT)
#>   Range: 0 to 1
#> - am (FLOAT)
#>   Range: 0 to 1
#> - gear (FLOAT)
#>   Range: 3 to 5
#> - carb (FLOAT)
#>   Range: 1 to 8

cpsievert · 2025-12-15T22:49:22Z

pkg-r/R/DataSource.R

+      if (!is_missing(table_name)) {
+        check_sql_table_name(table_name)
+        self$table_name <- table_name
+        use_cte <- identical(table_name, remote_name %||% remote_table)


remote_table isn't defined

Also, is the logic here inverted? That is, should it be use_cte <- !identical(table_name, remote_name)?

pkg-r/R/QueryChat.R

pkg-r/R/DataSource.R

cpsievert · 2025-12-15T23:22:17Z

pkg-r/R/DataSource.R

+      # Collect various signals to infer the table name
+      obj_name <- deparse1(substitute(tbl))


I think we should be making table_name required and keep any substitute() magic further up in the call stack.

Somewhat relatedly, do you see any utility to exporting the DataSource implementations (i.e., DataFrameSource, etc)?

cpsievert · 2025-12-15T23:26:50Z

pkg-r/R/DataSource.R

+      sprintf(
+        "WITH %s AS (\n%s\n)\n%s",
+        DBI::dbQuoteIdentifier(private$conn, self$table_name),
+        private$tbl_cte,
+        query
+      )


Very clever! 💯

gadenbuie · 2025-12-16T18:22:25Z

pkg-r/R/QueryChat.R


        output$dt <- DT::renderDT({
+          df <- qc_vals$df()
+          if (inherits(df, "tbl_sql")) {


Check that data_source is a TblLazySource instead of sniffing the df result

All of this isn't really necessary -- I've included it because dplyr is in suggests and just want to make sure we've gone past a check_installed("dplyr") (which happens if you've created a TblLazySource).

gadenbuie added 4 commits December 10, 2025 16:20

docs(pkg-r): Small adjustment of description/details

f6f7a72

chore(pkg-r): Factor out valid table name check utility

e930959

chore: document

ab0af9e

feat(pkg-r): TblLazySource -- querychat with lazy tibbles

9eff500

cpsievert reviewed Dec 15, 2025

View reviewed changes

pkg-r/R/QueryChat.R Show resolved Hide resolved

cpsievert reviewed Dec 15, 2025

View reviewed changes

pkg-r/R/DataSource.R Show resolved Hide resolved

cpsievert reviewed Dec 15, 2025

View reviewed changes

gadenbuie commented Dec 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(pkg-r): Lazy SQL tibble sources #165

feat(pkg-r): Lazy SQL tibble sources #165

Uh oh!

gadenbuie commented Dec 10, 2025

Uh oh!

cpsievert Dec 15, 2025

Uh oh!

cpsievert Dec 15, 2025

Uh oh!

Uh oh!

Uh oh!

cpsievert Dec 15, 2025 •

edited

Loading

Uh oh!

cpsievert Dec 15, 2025

Uh oh!

gadenbuie Dec 16, 2025

Uh oh!

gadenbuie Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Collect various signals to infer the table name
		obj_name <- deparse1(substitute(tbl))

feat(pkg-r): Lazy SQL tibble sources #165

Are you sure you want to change the base?

feat(pkg-r): Lazy SQL tibble sources #165

Uh oh!

Conversation

gadenbuie commented Dec 10, 2025

How does this work?

Simple tbl source

Complicated tbl source

Uh oh!

cpsievert Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

cpsievert Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cpsievert Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cpsievert Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

gadenbuie Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

gadenbuie Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cpsievert Dec 15, 2025 •

edited

Loading