Update cuDF Python user guides to current/pandas 3 behaviors#22720
Update cuDF Python user guides to current/pandas 3 behaviors#22720mroeschke wants to merge 7 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughThree cuDF docs updated for 26.08: copy-on-write is now the default and implicit; data-types guidance expanded with a new "Specifying data types" section and clarifications for ChangescuDF 26.08 Documentation Updates
🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/cudf/source/cudf/data-types.md (1)
158-163:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd explicit imports in this example block.
This section uses
pdandcudfwithout showing imports in the same block. For runnable snippets, please include prerequisites locally (or add a short note that imports were defined earlier).As per coding guidelines, documentation changes should prioritize completeness and clarity, including clear prerequisites.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/cudf/source/cudf/data-types.md` around lines 158 - 163, The example block uses pd and cudf but doesn't show their imports; update the snippet to include explicit prerequisite imports (e.g., import pandas as pd and import cudf) at the top of the same fenced code block (or add a short note that imports were defined earlier) so the example using pd.Series and cudf constructs is runnable and self-contained; target the example that references pd and cudf in the data-types example.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/cudf/source/cudf/copy-on-write.md`:
- Line 5: Fix the duplicated word in the intro sentence by replacing the phrase
"share the the same underlying data" with "share the same underlying data" in
the Copy-on-write documentation (the sentence beginning "Copy-on-write is a
memory management strategy...") so the wording is clear and free of the repeated
"the".
In `@docs/cudf/source/cudf/data-types.md`:
- Line 17: Fix the malformed datetime dtype in the table row by adding the
missing closing quote to `'datetime64[us]'` so the list reads `'datetime64[s]'`,
`'datetime64[ms]'`, `'datetime64[us]'`, `'datetime64[ns]'`; update the Datetime
table row (the string containing the datetime64 entries) to ensure all dtype
tokens are consistently quoted.
- Around line 3-6: Update the sentence that reads "cuDF also support data types
from the `[Arrow type system](...)`" to correct the grammar by changing
"support" to "supports" and fix the Arrow link formatting by removing the
surrounding backticks so it becomes a normal Markdown link (e.g., "cuDF also
supports data types from the [Arrow type system](...)"). Ensure the rest of the
sentence remains unchanged.
In `@docs/cudf/source/cudf/pandas-comparison.md`:
- Line 179: Replace the phrase "floating point results" with the hyphenated
compound adjective "floating-point results" in the sentence that begins "Series
of floats. If you need to compare floating point results, you" in the
documentation to improve readability and conform to the style guide.
- Around line 30-33: The sentence "cuDF all the data types in pandas..." is
missing the verb; update the sentence containing the phrase "cuDF all the data
types in pandas except for `pandas.PeriodDtype`, `pandas.SparseDtype`" to
include "supports" so it reads "cuDF supports all the data types in pandas
except for `pandas.PeriodDtype`, `pandas.SparseDtype` and third-party
`ExtensionDtype`s..." and keep the rest of the paragraph and links (e.g., "Data
Types") unchanged.
- Around line 145-147: The sentence mixes two different behaviors and should be
split and clarified: explain one case that to get a predictable (sorted) order
you can pass sort=True, and separately explain that to match pandas' default
behavior (which may be unsorted) you can enable mode.pandas_compatible or
explicitly use sort=False; update the text around the `sort=True` and
`sort=False` mentions and `mode.pandas_compatible` so each behavior and when to
use it is stated clearly and without contradiction.
---
Outside diff comments:
In `@docs/cudf/source/cudf/data-types.md`:
- Around line 158-163: The example block uses pd and cudf but doesn't show their
imports; update the snippet to include explicit prerequisite imports (e.g.,
import pandas as pd and import cudf) at the top of the same fenced code block
(or add a short note that imports were defined earlier) so the example using
pd.Series and cudf constructs is runnable and self-contained; target the example
that references pd and cudf in the data-types example.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: eaf3394a-dac8-4f7d-a4a2-38500659d301
📒 Files selected for processing (3)
docs/cudf/source/cudf/copy-on-write.mddocs/cudf/source/cudf/data-types.mddocs/cudf/source/cudf/pandas-comparison.md
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/cudf/source/cudf/data-types.md (1)
50-55:⚠️ Potential issue | 🟠 Major | ⚡ Quick winExample output conflicts with the requested dtype.
The example creates
cudf.Series(..., dtype=pd.Float64Dtype())but showsdtype: Float32. This undermines trust in the example’s correctness.Suggested doc fix
>>> s = cudf.Series([1, 2, 3], dtype=pd.Float64Dtype()) >>> s 0 1.0 1 2.0 2 3.0 -dtype: Float32 +dtype: Float64As per coding guidelines, documentation changes should prioritize Accuracy: Verify code examples compile and run correctly and Consistency.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/cudf/source/cudf/data-types.md` around lines 50 - 55, The example shows a mismatch between the requested dtype and the printed dtype: the call to cudf.Series(..., dtype=pd.Float64Dtype()) should produce a Float64 dtype but the output shows Float32; update the example so the shown output matches the created Series (either change the constructor to use pd.Float32Dtype() or change the displayed dtype to Float64) and verify with cudf.Series(...) that the printed dtype and values are accurate; reference the cudf.Series call and the dtype=pd.Float64Dtype() token when making the correction.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/cudf/source/cudf/data-types.md`:
- Line 19: Update the Timedelta (duration) row to use the canonical dtype
strings by replacing `'timedelta[s]'`, `'timedelta[ms]'`, `'timedelta[us]'`,
`'timedelta[ns]'` with `'timedelta64[s]'`, `'timedelta64[ms]'`,
`'timedelta64[us]'`, `'timedelta64[ns]'`; ensure the table cell under the
"Timedelta (duration)" entry and any nearby examples or references use the
`timedelta64[...]` form for consistency with the documented/accepted dtype
style.
---
Outside diff comments:
In `@docs/cudf/source/cudf/data-types.md`:
- Around line 50-55: The example shows a mismatch between the requested dtype
and the printed dtype: the call to cudf.Series(..., dtype=pd.Float64Dtype())
should produce a Float64 dtype but the output shows Float32; update the example
so the shown output matches the created Series (either change the constructor to
use pd.Float32Dtype() or change the displayed dtype to Float64) and verify with
cudf.Series(...) that the printed dtype and values are accurate; reference the
cudf.Series call and the dtype=pd.Float64Dtype() token when making the
correction.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 338fdd55-ba76-4e8d-9167-ecfaa19a7318
📒 Files selected for processing (3)
docs/cudf/source/cudf/copy-on-write.mddocs/cudf/source/cudf/data-types.mddocs/cudf/source/cudf/pandas-comparison.md
✅ Files skipped from review due to trivial changes (1)
- docs/cudf/source/cudf/pandas-comparison.md
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/cudf/source/cudf/copy-on-write.md
vyasr
left a comment
There was a problem hiding this comment.
Mainly need to figure out what to do with copy-on-write.
There was a problem hiding this comment.
Should we drop this section altogether? We need dev docs for how CoW is implemented, but as far as user-facing behavior now that this is the default and only behavior of pandas I don't know if we need to discuss it in cudf's user-facing docs at all anymore.
| and dictionary-like data. | ||
| cuDF largely uses the same [data type objects](https://pandas.pydata.org/docs/user_guide/basics.html#dtypes) supported by pandas, including | ||
| numeric, datetime, timedelta, and string data types. cuDF also supports | ||
| data types from the [Arrow type system](https://arrow.apache.org/docs/format/CDataInterface.html#data-type-description-format-strings) such as decimals, list, |
There was a problem hiding this comment.
Should we also mention the pandas nullable types?
|
The CI failure is https://github.com/rapidsai/cudf/actions/runs/26669664425/job/78612813093?pr=22720#step:13:5928
|
Description
Similar to #22689, updates several user guides to 26.08 behavior where pandas 3 is supported
Checklist