feat: support nested STRUCT and ARRAY data display in anywidget mode #2359

shuoweil · 2025-12-29T17:59:37Z

Implements flattening and expansion for complex data types in the interactive display for anywidget mode.

Key Features:

Automatic Flattening: STRUCT columns are flattened into intuitive dot-notation columns (e.g., user.name).
Array Expansion: ARRAY columns are expanded into multiple rows with visual grouping.
Visual Continuity: Continuation rows for arrays are styled for better parent-row context.

verified at:

vs code notebook: screen/3ST4m9xN9w3iqD9
colab notebook: screen/7NG4LiTEPuAC27F

Fixes #<438181139> 🦕

…bles

review-notebook-app · 2025-12-29T17:59:42Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

tswast · 2026-01-05T22:54:33Z

bigframes/display/_flatten.py

+
+def flatten_nested_data(
+    dataframe: pd.DataFrame,
+) -> tuple[pd.DataFrame, dict[str, list[int]], list[str], set[str]]:


Tuple is hard to understand. Can we use a frozen dataclass, instead?

tswast · 2026-01-06T22:32:01Z

bigframes/display/_flatten.py

+            )
+
+            new_cols_to_add[new_col_name] = pd.Series(
+                new_list_array.to_pylist(),


to_pylist() can be quite expensive to call. If we already have a pyarrow array, I don't think it's necessary to convert it.

Done. I've removed the .to_pylist() calls and now pass the Arrow arrays directly to pandas for better performance.

bigframes/display/_flatten.py

tswast · 2026-01-06T22:38:23Z

bigframes/display/_flatten.py

+
+            new_cols_to_add[new_col_name] = pd.Series(
+                new_list_array.to_pylist(),
+                dtype=pd.ArrowDtype(pa.list_(field.type)),


I'm confused. Why are we creating a list type here? Could you explain in comments what the purpose is? I thought we were flattening based on the function name.

Good point. I've added a comment to clarify that the function is transforming an array<struct<...>> into separate array columns.

tswast · 2026-01-06T22:40:38Z

bigframes/display/_flatten.py

+    for orig_idx in dataframe.index:
+        non_array_data = non_array_df.loc[orig_idx].to_dict()
+        array_values = {}
+        max_len_in_row = 0
+        non_na_array_found = False
+
+        for col_name in array_columns:
+            val = dataframe.loc[orig_idx, col_name]


This is looping through each value in Python, which is going to be very slow. Please use native code such as https://arrow.apache.org/docs/python/generated/pyarrow.compute.list_flatten.html to avoid such loops.

Thanks for the suggestion. I've refactored the array explosion logic to use a much faster vectorized approach with pandas.explode and merge, which removes the Python loops entirely.

tswast · 2026-01-06T22:41:18Z

bigframes/display/_flatten.py

+            continue
+
+        # Create one row per array element, up to max_len_in_row
+        for array_idx in range(max_len_in_row):


This is looping through each element of each array in Python, which is going to be even slower.

I have completely refactored _explode_array_columns to use a vectorized approach with pandas.explode and merge. This eliminated all Python loops, including the slow inner loop you pointed out, significantly improving performance.

…ates

- Replaced Python-based row explosion with optimized PyArrow computation for nested arrays. - Cleaned up comments in to strictly adhere to Google Python Style Guide (focused on 'why', removed redundant 'what'). - Renamed variable to for clarity. - Verified changes with Python unit tests and JavaScript frontend tests.

tswast · 2026-01-13T14:36:09Z

bigframes/display/_flatten.py

+                return "struct"
+            if pa.types.is_list(pa_type):
+                return (
+                    "array_of_struct"
+                    if pa.types.is_struct(pa_type.value_type)
+                    else "array"
+                )
+        return "clear"


These magic strings worry me. Could you create an enum for category, instead?

https://docs.python.org/3/library/enum.html

Done. I've replaced the strings with a private _ColumnCategory Enum.

tswast · 2026-01-13T14:37:04Z

bigframes/display/_flatten.py

+        continuation_rows: A set of row indices that are continuation rows.
+        cleared_on_continuation: A list of column names that should be cleared on continuation rows.


It's not 100% clear to me what is meant by "continuation". I assume that it means rows post-flattening that correspond to the second element of an array and beyond? Please expand these docstrings further.

You are right. I've updated the docstrings in FlattenResult to explicitly clarify that "continuation rows" refer to the 2nd element onwards of an exploded array, and "cleared" columns are those (typically scalars) that are replicated but shouldn't be visually repeated.

tswast · 2026-01-13T14:40:04Z

bigframes/display/_flatten.py

+    """The result of flattening a DataFrame.
+
+    Attributes:
+        dataframe: The flattened DataFrame.


Please add some comments about what happens to the original index columns. Based on the description of the other fields, I assume that a unique index is created post-flatten?

I've updated the docstrings and the implementation. The original index (including named Index and MultiIndex) is preserved and duplicated across the exploded rows. This serves as the visual grouping key for the table display.

tswast · 2026-01-13T14:40:53Z

bigframes/display/_flatten.py

+
+
+@dataclasses.dataclass(frozen=True)
+class ColumnClassification:


Please put a leading _ in front of class names that aren't intended to be used outside of this module.

tswast · 2026-01-13T14:43:19Z

bigframes/display/html.py

+    continuation_rows: set[int] | None,
+    clear_on_continuation: list[str],


Same here, add some more explanation to the docstrings. To keep it shorter, you could reference bigframes/display/_flatten.py so that folks can look there for the complete explanation.

Done. I updated the docstrings to reference bigframes.display._flatten.FlattenResult for the detailed definitions.

tswast · 2026-01-13T14:44:08Z

bigframes/display/table_widget.js

Neat feature!

tswast · 2026-01-13T14:46:16Z

bigframes/display/_flatten.py

Please create a test_flatten.py file with a few tests that check some of the flattening logic directly without the HTML rendering part. Specifically, let's focus on what happens to index/multiindex columns, as that's my main worry / question.

Done. I created tests/unit/display/test_flatten.py. I moved the logic-specific tests there and added dedicated test cases (test_flatten_preserves_original_index, test_flatten_preserves_multiindex) to verify that indices are correctly preserved and duplicated during the flattening process.

shuoweil added 4 commits December 29, 2025 16:36

refactor(display): use CSS classes in HTML tables

f20cde5

refactor(display): use CSS classes in HTML tables

19e2c4f

feat(display): support nested STRUCT and ARRAY data in interactive ta…

4b68243

…bles

Merge branch 'main' into shuowei-anywidget-nested-strcut-array

8a7609a

shuoweil self-assigned this Dec 29, 2025

shuoweil requested review from a team as code owners December 29, 2025 17:59

shuoweil requested a review from tswast December 29, 2025 17:59

product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Dec 29, 2025

shuoweil added 2 commits December 29, 2025 18:09

chore: remove unreached code

ceca74d

refactor: code refactor

63e4a3c

product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Dec 29, 2025

shuoweil added 4 commits December 29, 2025 22:06

refactor: resue pandas struct.explode()

3affd92

refactor: revert the refactor

c53da80

Merge branch 'main' into shuowei-anywidget-ui-improve

fa37000

test: merge notebook

60785f3

shuoweil force-pushed the shuowei-anywidget-nested-strcut-array branch from f583833 to 60785f3 Compare January 2, 2026 21:28

tswast reviewed Jan 5, 2026

View reviewed changes

shuoweil added 3 commits January 6, 2026 00:32

Merge branch 'main' into shuowei-anywidget-nested-strcut-array

0a88b10

feat: use dataclass for flatten_nested_data

f32a53f

feat: Refactor HTML rendering and document JS tests

3944249

shuoweil force-pushed the shuowei-anywidget-nested-strcut-array branch from 2bb97d3 to 3944249 Compare January 6, 2026 03:40

shuoweil requested a review from tswast January 6, 2026 03:44

Merge branch 'main' into shuowei-anywidget-nested-strcut-array

ce59668

tswast requested changes Jan 6, 2026

View reviewed changes

Fix: Improve performance of nested data flattening

41df7b3

shuoweil requested a review from tswast January 7, 2026 00:47

shuoweil added 2 commits January 8, 2026 01:32

test: rerun notebook

21a5d5c

fix(display): add row hover effect for nested data rows

36a9a37

shuoweil marked this pull request as draft January 8, 2026 18:39

shuoweil added 14 commits January 8, 2026 22:39

refactor: code refactor

4d46e3c

refactor: improve _flatten readability and table widget styles

0f48f82

docs: move implementation details from docstrings to block comments

a8a39dc

docs: remove redundant comments in _flatten.py

dfe5fec

refactor: simplify flattening logic in _flatten.py

15bdf54

refactor: use mutable ColumnClassification object in _flatten.py

59c3a2a

fix: resolve bug in _classify_columns logic and enable functional upd…

6d28d28

…ates

refactor: simplify _classify_columns logic in _flatten.py

09635e6

fix: resolve NameError for ExplodeResult and formatting

2de5a3c

refactor(anywidget): optimize array flattening using pyarrow

9a19966

test: rerun notebook

9886e5f

refactor: remove nested loop

b2166ed

Merge branch 'main' into shuowei-anywidget-nested-strcut-array

a34802e

shuoweil marked this pull request as ready for review January 9, 2026 21:45

shuoweil added 2 commits January 12, 2026 22:11

Merge main to shuowei-anywidget-nested-strcut-array

7763818

test: rerun notebook to verify the merge

27ae231

tswast reviewed Jan 13, 2026

View reviewed changes

shuoweil added 4 commits January 13, 2026 19:27

Merge commit '798af4a30' into shuowei-anywidget-nested-strcut-array

f74f82a

refactor: replace magic strings for col categories with a private Enum

03eba5e

refactor: replace magic strings for col categories with a private Enum

eea0a87

test: rerun notebook

ca19957

shuoweil force-pushed the shuowei-anywidget-nested-strcut-array branch from 8eb7211 to ca19957 Compare January 13, 2026 19:59

shuoweil added 2 commits January 13, 2026 20:02

Merge branch 'main' into shuowei-anywidget-nested-strcut-array

4e9eaa4

docs: rerun test

cb7ae87

shuoweil requested a review from tswast January 13, 2026 20:06

test: update year

fb2d029

		continuation_rows: A set of row indices that are continuation rows.
		cleared_on_continuation: A list of column names that should be cleared on continuation rows.



		@dataclasses.dataclass(frozen=True)
		class ColumnClassification:

		continuation_rows: set[int] \| None,
		clear_on_continuation: list[str],

feat: support nested STRUCT and ARRAY data display in anywidget mode #2359

Are you sure you want to change the base?

feat: support nested STRUCT and ARRAY data display in anywidget mode #2359

Uh oh!

Conversation

shuoweil commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Dec 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shuoweil Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shuoweil commented Dec 29, 2025 •

edited

Loading

shuoweil Jan 13, 2026 •

edited

Loading