feat: add collections validation and GFQL support #874

lmeyerov · 2025-12-29T15:01:28Z

Summary

add collections validator with strict/autofix behavior and GFQL wire normalization
expose collections API with validate/warn and plot-time URL param validation
add collections tests and clarify plan template location
move collections helpers to graphistry/collections.py and introduce typed models in graphistry/models/collections.py

Testing

python -m pytest graphistry/tests/test_collections.py graphistry/tests/test_dataset_id_invalidation.py
./bin/lint.sh
./bin/mypy.sh

graphistry/PlotterBase.py

mj3cheun · 2026-01-12T18:00:08Z

graphistry/PlotterBase.py

+        show_collections: Optional[bool] = None,
+        collections_global_node_color: Optional[str] = None,
+        collections_global_edge_color: Optional[str] = None,
+        encode: bool = True,


in the interest of simplifying, would it be difficult to detect pre-encoded strings?

yeah this surprised me: the reason it's like this is bc the base REST api takes AABBCC instead of ~css colors like we started to do elsewhere i believe (js color(xyz) normalizes to rgba('...'), handling '#AABBCC', 'silver', etc)

so maybe fix is to ship better colors in REST, and then this upgrades to that?

i think thats a great idea to ship better colours in REST will add that to the list

i was talking about the encode: bool = True, though, which is If True, JSON-minify and URL-encode collections. Use False for pre-encoded strings.

i was thinking we could just detect pre-encoded strings (which i assume are URL encoded and can be detected by attempting a parse). then we can remove this parameter

mj3cheun · 2026-01-12T18:09:24Z

graphistry/validate/validate_collections.py

+    validate: ValidationParam = 'autofix',
+    warn: bool = True
+) -> List[Dict[str, Any]]:
+    validate_mode, warn = normalize_validation_params(validate, warn)


nitpick: this seems to already be called everywhere that uses normalize_collections, can either remove this extra call here or remove the normalize_validation_params everywhere else

mj3cheun · 2026-01-12T18:23:12Z

graphistry/validate/validate_collections.py

+    if isinstance(collections, list):
+        return _coerce_collection_list(collections, validate_mode, warn)
+    if isinstance(collections, dict):
+        return _coerce_collection_list(collections, validate_mode, warn)


nitpick: would just get rid of the _coerce_collection_list function and put the contents in here, would get rid of checking is list is dict etc twice and make easier to read

mj3cheun · 2026-01-12T18:49:27Z

graphistry/validate/validate_collections.py

+            if validate_mode == 'autofix':
+                collection_type = str(collection_type)


if collection_type was not already a string, im not sure that just typecasting it is going to produce a valid type string given this string must be either set or intersection

actually nevermind, just saw the if collection_type not in ('set', 'intersection'): below, im thinking maybe combine this check with the check below, if collection_type is not string its all but guaranteed to fail and hit continue in the section below

mj3cheun · 2026-01-12T19:15:02Z

general comment, validate_mode = 'autofix' seems more likely to produce invalid output and/or result in a collection getting skipped than otherwise. to me its biggest value is typecasting stuff like id=1 to id="1" etc. not sure how much value there is in making it explicit, maybe we just have warn true/false and otherwise handle things silently?

lmeyerov · 2026-01-12T19:33:26Z

hmm, if they do something like g.plot(validate_mode='autofix'), what do we want to happen , or not happen, w/ collections?

that was mostly added b/c people keep having dirty data that fails arrow conversion and rather seeing it load vs fixing their data/cfg

mj3cheun · 2026-01-12T19:33:58Z

graphistry/validate/validate_collections.py

+            return raw
+        return raw
+
+    return _normalize_ops_list(_extract_ops_value(gfql_ops))


i feel this could be simplified by just feeding the GFQL into graphistry.compute.chain and seeing if it parses properly rather than setting up stuff here to determine if GFQL is valid or not

mj3cheun · 2026-01-12T19:36:20Z

graphistry/validate/validate_collections.py

+    if expr_type != 'intersection':
+        _issue(
+            'Intersection expr type must be "intersection"',
+            {'index': entry_index, 'value': expr_type},
+            validate_mode,
+            warn
+        )
+        return None


this function is only called if type is intersection, so this if statement will never return true

mj3cheun · 2026-01-12T19:45:35Z

@lmeyerov regarding general comment above: im thinking maybe instead of validate we call it strict with "true" or "false" where strict true will throw errors and false will pass over non-compliant collections

autofix sorta implies to me that we are "correcting" the data when invalid but we arent really i think? more just passing it over if there are any mistakes

if we feel we need to type cast we might want to just do it anyway strict true or not

lmeyerov · 2026-01-12T20:01:33Z

the issue with strict true/false is that's closer to what we had before, and users were complaining that they just wanted it to 'work', hence autofix (coerce++). validate=true/false is closer to what you're thinking, while 'autofix' (coerce) is "it'll run, but may not be what you want, but you said wanted soemthing that runs"

We therefore have leeway in what autofix does --- we just need to warn (if warn=auto/true), and do what. So the q is... what should it do wrt diff collections errors, if strong opinions about any params in any direction?

My default intuition is probably:

drop collections with invalid gfql
colors: random-but-deterministic? or neutral grey / transparent? (default black looks buggy)
others: ??? disable / see if there's default values ?

mj3cheun · 2026-01-12T20:08:19Z

agree with that, i think we do the 3 bullets listed if strict (or whatever we want to call it) is false and its almost what we are doing right now. theres only 1 point i would make about the current implementation

instead of just trying to typecast stuff (which a lot of the time doesnt work), i would prefer we do default values or in the case of colours, random colours

in summary its only the approach i think might need to change, not the intention

EDIT: if we do the above it actually gets closer to a true autofix than what it was before, so i dont have an issue with the name anymore

lmeyerov added 8 commits December 29, 2025 06:55

chore: clarify plan location in template

3f528c9

feat: add collections validation and gfql support

49e4cd3

fix: satisfy mypy in collections validation

1454f11

feat: add collections helper constructors

8265a63

feat: wrap collection set expr to gfql chain

2e92668

Refine collections types and helpers

f6b47ef

Fix collections typing for mypy

e4a1d61

Validate collections settings inputs

82f5961

lmeyerov commented Jan 7, 2026

View reviewed changes

graphistry/PlotterBase.py Show resolved Hide resolved

lmeyerov added 9 commits January 7, 2026 11:02

Simplify collections typing

0083f19

Reuse gfql chain normalization in collections helpers

dded25c

Refine collections gfql normalization for mypy

0adf38d

Normalize collections GFQL via Chain and reject Let

adcc18f

Avoid Chain.from_json in collections normalization

99b8a41

Allow Let in collections normalization

aaae047

Simplify collections gfql wrapping

51f9e20

Slim collections validation helpers

06c0a67

Simplify collections input parsing

1554f53

mj3cheun reviewed Jan 12, 2026

View reviewed changes

fix: canonicalize collections validation and encoding

a0e1c10

refactor: simplify collections normalization

3cf9e9d

		if validate_mode == 'autofix':
		collection_type = str(collection_type)

feat: add collections validation and GFQL support #874

Are you sure you want to change the base?

feat: add collections validation and GFQL support #874

Uh oh!

Conversation

lmeyerov commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

Uh oh!

mj3cheun Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

lmeyerov Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mj3cheun Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mj3cheun Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

mj3cheun Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

mj3cheun Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

mj3cheun Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

mj3cheun commented Jan 12, 2026

Uh oh!

lmeyerov commented Jan 12, 2026

Uh oh!

mj3cheun Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

mj3cheun Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

mj3cheun commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lmeyerov commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mj3cheun commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lmeyerov commented Dec 29, 2025 •

edited

Loading

lmeyerov Jan 12, 2026 •

edited

Loading

mj3cheun Jan 12, 2026 •

edited

Loading

mj3cheun commented Jan 12, 2026 •

edited

Loading

lmeyerov commented Jan 12, 2026 •

edited

Loading

mj3cheun commented Jan 12, 2026 •

edited

Loading