Update cuDF Python user guides to current/pandas 3 behaviors by mroeschke · Pull Request #22720 · rapidsai/cudf

mroeschke · 2026-05-29T21:54:59Z

Description

Similar to #22689, updates several user guides to 26.08 behavior where pandas 3 is supported

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…_ref

coderabbitai · 2026-05-29T21:58:36Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e2eab8a0-3fe3-46d6-b753-e46807bf8d77

📥 Commits

Reviewing files that changed from the base of the PR and between 3660638 and 2b1c0b1.

📒 Files selected for processing (1)

docs/cudf/source/cudf/data-types.md

✅ Files skipped from review due to trivial changes (1)

docs/cudf/source/cudf/data-types.md

📝 Walkthrough

Summary by CodeRabbit

Documentation
- Clarified that copy-on-write is the default since 26.08 (aligned with pandas 3.0), removed opt-in instructions, and shortened guidance on defensive copies.
- Expanded supported data types: updated unsigned integer defaults, timezone-aware datetimes, decimals, lists/structs; added a "Specifying dtypes" section with examples and clarified "object" is string-only.
- Reworked pandas/cudf compatibility guide: clarified dtype exclusions, missing-value semantics (cudf.NA), iteration and GPU→CPU→GPU workflow, and ordering/nondeterminism guidance.

Walkthrough

Three cuDF docs updated for 26.08: copy-on-write is now the default and implicit; data-types guidance expanded with a new "Specifying data types" section and clarifications for object, decimal, and nested types; and the pandas-compatibility guide reorganized with clearer behavioral guidance.

Changes

cuDF 26.08 Documentation Updates

Layer / File(s)	Summary
Copy-on-write default behavior `docs/cudf/source/cudf/copy-on-write.md`	Introduction and conclusion rewritten to state copy-on-write is the default since 26.08 and that defensive `copy()` calls are no longer needed.
Supported data types table and defaults `docs/cudf/source/cudf/data-types.md`	Supported data types overview and default dtype table updated (including unsigned integer defaults); note retained that `pandas.PeriodDtype`/`pandas.SparseDtype` are unsupported; footer reference removed.
Specifying data types and decimal intro `docs/cudf/source/cudf/data-types.md`	New "Specifying data types" section added documenting accepted pandas-like and Arrow dtype arguments with runnable examples; decimal lead-in wording adjusted.
Object dtype and nested types `docs/cudf/source/cudf/data-types.md`	`object` dtype clarified as string-only in cuDF with updated example; nested (`List`/`Struct`) intro rewritten to describe child-element typing and adjusted `py:class` references.
pandas/cudf comparison introduction `docs/cudf/source/cudf/pandas-comparison.md`	Document header/intro rewritten to clarify compatibility scope, dtype eligibility, and that missing values use `cudf.NA`; transitional sentence added.
pandas/cudf behavioral guidance `docs/cudf/source/cudf/pandas-comparison.md`	Iteration guidance updated to recommend GPU→CPU→GPU conversion (`.to_arrow()`/`.to_pandas()` then reconstruct), result-ordering and floating-point nondeterminism rephrased, duplicate-column limitation restated, and `.apply()` limitations relocated.

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

rapidsai/cudf#22352: Overlapping changes to pandas-compatibility/nested-data documentation and py:class references in docs/cudf/source/cudf/data-types.md.

Suggested labels

Python

Suggested reviewers

wence-
rjzamora
bdice

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: updating cuDF Python user guides to reflect current behavior and pandas 3 compatibility, which aligns with all three modified documentation files.
Description check	✅ Passed	The description clearly relates to the changeset by referencing similar PR 22689 and stating the purpose of updating user guides to 26.08 behavior for pandas 3 support, matching the documentation updates in the PR.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/cudf/source/cudf/data-types.md (1)
158-163: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add explicit imports in this example block.

This section uses pd and cudf without showing imports in the same block. For runnable snippets, please include prerequisites locally (or add a short note that imports were defined earlier).

As per coding guidelines, documentation changes should prioritize completeness and clarity, including clear prerequisites.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/cudf/source/cudf/data-types.md` around lines 158 - 163, The example
block uses pd and cudf but doesn't show their imports; update the snippet to
include explicit prerequisite imports (e.g., import pandas as pd and import
cudf) at the top of the same fenced code block (or add a short note that imports
were defined earlier) so the example using pd.Series and cudf constructs is
runnable and self-contained; target the example that references pd and cudf in
the data-types example.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/cudf/source/cudf/copy-on-write.md`:
- Line 5: Fix the duplicated word in the intro sentence by replacing the phrase
"share the the same underlying data" with "share the same underlying data" in
the Copy-on-write documentation (the sentence beginning "Copy-on-write is a
memory management strategy...") so the wording is clear and free of the repeated
"the".

In `@docs/cudf/source/cudf/data-types.md`:
- Line 17: Fix the malformed datetime dtype in the table row by adding the
missing closing quote to `'datetime64[us]'` so the list reads `'datetime64[s]'`,
`'datetime64[ms]'`, `'datetime64[us]'`, `'datetime64[ns]'`; update the Datetime
table row (the string containing the datetime64 entries) to ensure all dtype
tokens are consistently quoted.
- Around line 3-6: Update the sentence that reads "cuDF also support data types
from the `[Arrow type system](...)`" to correct the grammar by changing
"support" to "supports" and fix the Arrow link formatting by removing the
surrounding backticks so it becomes a normal Markdown link (e.g., "cuDF also
supports data types from the [Arrow type system](...)"). Ensure the rest of the
sentence remains unchanged.

In `@docs/cudf/source/cudf/pandas-comparison.md`:
- Line 179: Replace the phrase "floating point results" with the hyphenated
compound adjective "floating-point results" in the sentence that begins "Series
of floats. If you need to compare floating point results, you" in the
documentation to improve readability and conform to the style guide.
- Around line 30-33: The sentence "cuDF all the data types in pandas..." is
missing the verb; update the sentence containing the phrase "cuDF all the data
types in pandas except for `pandas.PeriodDtype`, `pandas.SparseDtype`" to
include "supports" so it reads "cuDF supports all the data types in pandas
except for `pandas.PeriodDtype`, `pandas.SparseDtype` and third-party
`ExtensionDtype`s..." and keep the rest of the paragraph and links (e.g., "Data
Types") unchanged.
- Around line 145-147: The sentence mixes two different behaviors and should be
split and clarified: explain one case that to get a predictable (sorted) order
you can pass sort=True, and separately explain that to match pandas' default
behavior (which may be unsorted) you can enable mode.pandas_compatible or
explicitly use sort=False; update the text around the `sort=True` and
`sort=False` mentions and `mode.pandas_compatible` so each behavior and when to
use it is stated clearly and without contradiction.

---

Outside diff comments:
In `@docs/cudf/source/cudf/data-types.md`:
- Around line 158-163: The example block uses pd and cudf but doesn't show their
imports; update the snippet to include explicit prerequisite imports (e.g.,
import pandas as pd and import cudf) at the top of the same fenced code block
(or add a short note that imports were defined earlier) so the example using
pd.Series and cudf constructs is runnable and self-contained; target the example
that references pd and cudf in the data-types example.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: eaf3394a-dac8-4f7d-a4a2-38500659d301

📥 Commits

Reviewing files that changed from the base of the PR and between b99f73f and 30bae30.

📒 Files selected for processing (3)

docs/cudf/source/cudf/copy-on-write.md
docs/cudf/source/cudf/data-types.md
docs/cudf/source/cudf/pandas-comparison.md

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/cudf/source/cudf/data-types.md (1)
50-55: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Example output conflicts with the requested dtype.

The example creates cudf.Series(..., dtype=pd.Float64Dtype()) but shows dtype: Float32. This undermines trust in the example’s correctness.
Suggested doc fix
 >>> s = cudf.Series([1, 2, 3], dtype=pd.Float64Dtype())
 >>> s
 0    1.0
 1    2.0
 2    3.0
-dtype: Float32
+dtype: Float64
As per coding guidelines, documentation changes should prioritize Accuracy: Verify code examples compile and run correctly and Consistency.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/cudf/source/cudf/data-types.md` around lines 50 - 55, The example shows
a mismatch between the requested dtype and the printed dtype: the call to
cudf.Series(..., dtype=pd.Float64Dtype()) should produce a Float64 dtype but the
output shows Float32; update the example so the shown output matches the created
Series (either change the constructor to use pd.Float32Dtype() or change the
displayed dtype to Float64) and verify with cudf.Series(...) that the printed
dtype and values are accurate; reference the cudf.Series call and the
dtype=pd.Float64Dtype() token when making the correction.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/cudf/source/cudf/data-types.md`:
- Line 19: Update the Timedelta (duration) row to use the canonical dtype
strings by replacing `'timedelta[s]'`, `'timedelta[ms]'`, `'timedelta[us]'`,
`'timedelta[ns]'` with `'timedelta64[s]'`, `'timedelta64[ms]'`,
`'timedelta64[us]'`, `'timedelta64[ns]'`; ensure the table cell under the
"Timedelta (duration)" entry and any nearby examples or references use the
`timedelta64[...]` form for consistency with the documented/accepted dtype
style.

---

Outside diff comments:
In `@docs/cudf/source/cudf/data-types.md`:
- Around line 50-55: The example shows a mismatch between the requested dtype
and the printed dtype: the call to cudf.Series(..., dtype=pd.Float64Dtype())
should produce a Float64 dtype but the output shows Float32; update the example
so the shown output matches the created Series (either change the constructor to
use pd.Float32Dtype() or change the displayed dtype to Float64) and verify with
cudf.Series(...) that the printed dtype and values are accurate; reference the
cudf.Series call and the dtype=pd.Float64Dtype() token when making the
correction.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 338fdd55-ba76-4e8d-9167-ecfaa19a7318

📥 Commits

Reviewing files that changed from the base of the PR and between 30bae30 and 008c8c4.

📒 Files selected for processing (3)

docs/cudf/source/cudf/copy-on-write.md
docs/cudf/source/cudf/data-types.md
docs/cudf/source/cudf/pandas-comparison.md

✅ Files skipped from review due to trivial changes (1)

docs/cudf/source/cudf/pandas-comparison.md

🚧 Files skipped from review as they are similar to previous changes (1)

docs/cudf/source/cudf/copy-on-write.md

…_ref

vyasr

Mainly need to figure out what to do with copy-on-write.

vyasr · 2026-05-30T01:48:51Z

Should we drop this section altogether? We need dev docs for how CoW is implemented, but as far as user-facing behavior now that this is the default and only behavior of pandas I don't know if we need to discuss it in cudf's user-facing docs at all anymore.

vyasr · 2026-05-30T01:49:18Z

-and dictionary-like data.
+cuDF largely uses the same [data type objects](https://pandas.pydata.org/docs/user_guide/basics.html#dtypes) supported by pandas, including
+numeric, datetime, timedelta, and string data types. cuDF also supports
+data types from the [Arrow type system](https://arrow.apache.org/docs/format/CDataInterface.html#data-type-description-format-strings) such as decimals, list,


Should we also mention the pandas nullable types?

vyasr · 2026-05-30T01:54:00Z

The CI failure is https://github.com/rapidsai/cudf/actions/runs/26669664425/job/78612813093?pr=22720#step:13:5928

/__w/cudf/cudf/docs/cudf/source/cudf/data-types.md:25: WARNING: py:class reference target not found: cudf.core.dtypes.IntervalDtype [ref.class]

mroeschke added 3 commits May 29, 2026 21:30

Revist copy-on-write user guide post pandas 3.0

876ee19

Update pandas-comparisons.md

736292e

Merge remote-tracking branch 'upstream/main' into cudf/doc/user_guide…

30bae30

…_ref

mroeschke self-assigned this May 29, 2026

mroeschke added doc Documentation non-breaking Non-breaking change labels May 29, 2026

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Coderabbit review

008c8c4

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Comment thread docs/cudf/source/cudf/data-types.md Outdated

mroeschke added 3 commits May 29, 2026 22:12

timedelta64

3660638

Merge remote-tracking branch 'upstream/main' into cudf/doc/user_guide…

8762fd4

…_ref

Update py:class references

2b1c0b1

vyasr requested changes May 30, 2026

View reviewed changes

Conversation

mroeschke commented May 29, 2026

Description

Checklist

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vyasr left a comment

Choose a reason for hiding this comment

Uh oh!

vyasr May 30, 2026

Choose a reason for hiding this comment

Uh oh!

vyasr May 30, 2026

Choose a reason for hiding this comment

Uh oh!

vyasr commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 29, 2026 •

edited

Loading