feat: add __arrow_c_stream__ function#11338
Draft
jules-ch wants to merge 9 commits into
Draft
Conversation
Add pyarrow capsule method to quickly convert datarray to polars The function is mostly zero copy, only the coordinates grid need to be computed
3c5df58 to
ef226fb
Compare
3cd7cdc to
d6ea2fe
Compare
bc5c796 to
e5e19d6
Compare
jules-ch
commented
May 14, 2026
Comment on lines
+491
to
+492
| if not values.flags.c_contiguous: | ||
| values = np.ascontiguousarray(values) |
Author
There was a problem hiding this comment.
I think we can only use values.ravel down there to ensure contiguous array.
Author
|
Thought For It will be sparse if dataarrays does not have the same coords, but that's another PR altogether. Edit:Well that worked better than I thought it would: pl.DataFrame(pa.table(ds))
shape: (2_082_966, 7)
┌───────┬───────┬──────────┬───────────┬───────────────┬──────┬──────┐
│ month ┆ level ┆ latitude ┆ longitude ┆ z ┆ u ┆ v │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ f32 ┆ f32 ┆ f64 ┆ f64 ┆ f64 │
╞═══════╪═══════╪══════════╪═══════════╪═══════════════╪══════╪══════╡
│ 1 ┆ 200 ┆ 90.0 ┆ -180.0 ┆ 106837.512109 ┆ null ┆ null │
│ 1 ┆ 200 ┆ 90.0 ┆ -179.25 ┆ 106839.237136 ┆ null ┆ null │
│ 1 ┆ 200 ┆ 90.0 ┆ -178.5 ┆ 106837.512109 ┆ null ┆ null │
│ 1 ┆ 200 ┆ 90.0 ┆ -177.75 ┆ 106839.237136 ┆ null ┆ null │
│ 1 ┆ 200 ┆ 90.0 ┆ -177.0 ┆ 106837.512109 ┆ null ┆ null │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ null ┆ 200 ┆ null ┆ null ┆ null ┆ null ┆ null │
│ null ┆ 500 ┆ null ┆ null ┆ null ┆ null ┆ null │
│ null ┆ 850 ┆ null ┆ null ┆ null ┆ null ┆ null │
│ 1 ┆ null ┆ null ┆ null ┆ null ┆ null ┆ null │
│ 7 ┆ null ┆ null ┆ null ┆ null ┆ null ┆ null │
└───────┴───────┴──────────┴───────────┴───────────────┴──────┴──────┘Will make an another PR if this one get merged. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add pyarrow capsule method to quickly convert datarray to polars
The function is mostly zero copy, only the coordinates grid need to be computed.
I wanted to implement the
__arrow_c_array__function to return a fixed_shape_tensor but somehow polars prioritize this over__arrow_c_stream__method.So for convenience I leave this here for now.
Feel free to close this PR and discuss this further in a dedicated issue if you want.
We can go one step further to save memory with
pa.DictionaryArrayto use the indice encoding that pyarrow supports out of the box we just need to create the indices using numpy before.This enable :
Checklist
.to_polars_df()method (very similar to.to_dataframe(), which implicitly uses pandas) #10135whats-new.rstapi.rstAI Disclosure