-
Notifications
You must be signed in to change notification settings - Fork 218
feat: add collections validation and GFQL support #874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
graphistry/PlotterBase.py
Outdated
| show_collections: Optional[bool] = None, | ||
| collections_global_node_color: Optional[str] = None, | ||
| collections_global_edge_color: Optional[str] = None, | ||
| encode: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the interest of simplifying, would it be difficult to detect pre-encoded strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this surprised me: the reason it's like this is bc the base REST api takes AABBCC instead of ~css colors like we started to do elsewhere i believe (js color(xyz) normalizes to rgba('...'), handling '#AABBCC', 'silver', etc)
so maybe fix is to ship better colors in REST, and then this upgrades to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think thats a great idea to ship better colours in REST will add that to the list
i was talking about the encode: bool = True, though, which is If True, JSON-minify and URL-encode collections. Use False for pre-encoded strings.
i was thinking we could just detect pre-encoded strings (which i assume are URL encoded and can be detected by attempting a parse). then we can remove this parameter
| validate: ValidationParam = 'autofix', | ||
| warn: bool = True | ||
| ) -> List[Dict[str, Any]]: | ||
| validate_mode, warn = normalize_validation_params(validate, warn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: this seems to already be called everywhere that uses normalize_collections, can either remove this extra call here or remove the normalize_validation_params everywhere else
| if isinstance(collections, list): | ||
| return _coerce_collection_list(collections, validate_mode, warn) | ||
| if isinstance(collections, dict): | ||
| return _coerce_collection_list(collections, validate_mode, warn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: would just get rid of the _coerce_collection_list function and put the contents in here, would get rid of checking is list is dict etc twice and make easier to read
| if validate_mode == 'autofix': | ||
| collection_type = str(collection_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if collection_type was not already a string, im not sure that just typecasting it is going to produce a valid type string given this string must be either set or intersection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually nevermind, just saw the if collection_type not in ('set', 'intersection'): below, im thinking maybe combine this check with the check below, if collection_type is not string its all but guaranteed to fail and hit continue in the section below
|
general comment, validate_mode = 'autofix' seems more likely to produce invalid output and/or result in a collection getting skipped than otherwise. to me its biggest value is typecasting stuff like id=1 to id="1" etc. not sure how much value there is in making it explicit, maybe we just have warn true/false and otherwise handle things silently? |
|
hmm, if they do something like that was mostly added b/c people keep having dirty data that fails arrow conversion and rather seeing it load vs fixing their data/cfg |
| return raw | ||
| return raw | ||
|
|
||
| return _normalize_ops_list(_extract_ops_value(gfql_ops)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i feel this could be simplified by just feeding the GFQL into graphistry.compute.chain and seeing if it parses properly rather than setting up stuff here to determine if GFQL is valid or not
| if expr_type != 'intersection': | ||
| _issue( | ||
| 'Intersection expr type must be "intersection"', | ||
| {'index': entry_index, 'value': expr_type}, | ||
| validate_mode, | ||
| warn | ||
| ) | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is only called if type is intersection, so this if statement will never return true
|
@lmeyerov regarding general comment above: im thinking maybe instead of
if we feel we need to type cast we might want to just do it anyway strict true or not |
|
the issue with strict true/false is that's closer to what we had before, and users were complaining that they just wanted it to 'work', hence autofix (coerce++). validate=true/false is closer to what you're thinking, while 'autofix' (coerce) is "it'll run, but may not be what you want, but you said wanted soemthing that runs" We therefore have leeway in what autofix does --- we just need to warn (if warn=auto/true), and do what. So the q is... what should it do wrt diff collections errors, if strong opinions about any params in any direction? My default intuition is probably:
|
|
agree with that, i think we do the 3 bullets listed if strict (or whatever we want to call it) is false and its almost what we are doing right now. theres only 1 point i would make about the current implementation
in summary its only the approach i think might need to change, not the intention EDIT: if we do the above it actually gets closer to a true autofix than what it was before, so i dont have an issue with the name anymore |
Summary
graphistry/collections.pyand introduce typed models ingraphistry/models/collections.pyTesting