-
Notifications
You must be signed in to change notification settings - Fork 22
feat(pkg-r): Lazy SQL tibble sources #165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| if (!is_missing(table_name)) { | ||
| check_sql_table_name(table_name) | ||
| self$table_name <- table_name | ||
| use_cte <- identical(table_name, remote_name %||% remote_table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote_table isn't defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is the logic here inverted? That is, should it be use_cte <- !identical(table_name, remote_name)?
| # Collect various signals to infer the table name | ||
| obj_name <- deparse1(substitute(tbl)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be making table_name required and keep any substitute() magic further up in the call stack.
Somewhat relatedly, do you see any utility to exporting the DataSource implementations (i.e., DataFrameSource, etc)?
| sprintf( | ||
| "WITH %s AS (\n%s\n)\n%s", | ||
| DBI::dbQuoteIdentifier(private$conn, self$table_name), | ||
| private$tbl_cte, | ||
| query | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clever! 💯
|
|
||
| output$dt <- DT::renderDT({ | ||
| df <- qc_vals$df() | ||
| if (inherits(df, "tbl_sql")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check that data_source is a TblLazySource instead of sniffing the df result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this isn't really necessary -- I've included it because dplyr is in suggests and just want to make sure we've gone past a check_installed("dplyr") (which happens if you've created a TblLazySource).
Closes #51
How does this work?
Simple tbl source
First, we can create a new data source from the
tbl()object.Which returns a
tbl()that can be chained into furtherdplyroperations.Complicated tbl source
This same process even works for more complicated tibbles, like the result of
of dplyr pipeline on SQL tibbles.
And again, the result is a
tbl()that can be folded into further dplyroperations.
The way we make this work is by extracting the SQL for the dplyr pipeline up
until we create a data source, and then, for complicated queries at least, we
use a local CTE, letting the LLM write queries against that CTE as if it were
a fixed table.
Amazingly, we can even apply this strategy to get the schema of the CTE. This
took a small amount of updating to
get_schema_impl()to make it work, butthe core logic is exactly the same.