Skip to content

Commit 3ffb136

Browse files
committed
Optimize id()/start_id()/end_id() with direct column access
NOTE: This PR was created with AI tools and a human Add optimization to avoid rebuilding full vertex/edge objects when only the id, start_id, or end_id is needed. Instead of calling age_id() on a reconstructed _agtype_build_vertex/_agtype_build_edge, directly access the underlying graphid column and wrap it with graphid_to_agtype(). Implementation: - Export hidden columns (_age_default_varname_id_*, _age_default_varname_ start_id_*, _age_default_varname_end_id_*) when entities pass between clauses via export_entity_hidden_columns() - Store Var references (id_var, start_id_var, end_id_var) in transform_entity - try_optimize_id_funcs() checks for these optimizable patterns: - Cross-clause: Use stored Var from previous clause export - Current-clause: Extract graphid Var from entity's build expression - Validate Var references to prevent stale references after WITH clauses Optimized patterns: MATCH (p) RETURN id(p) -- cross-clause in RETURN MATCH (p) WHERE id(p) > 0 RETURN p -- current-clause in WHERE MATCH (p) MATCH (q) WHERE id(p) > 0 -- cross-clause in WHERE Query plan improvement: Before: Filter: age_id(_agtype_build_vertex(p.id, _label_name(...), p.properties)) After: Filter: graphid_to_agtype(p.id) All original regression tests passed. Added additional regression tests. Files changed: - src/include/parser/cypher_transform_entity.h: Add id_var, start_id_var, end_id_var, props_var fields to transform_entity struct - src/backend/parser/cypher_clause.c: Add export_entity_hidden_columns() and integrate with handle_prev_clause() - src/backend/parser/cypher_expr.c: Add try_optimize_id_funcs(), extract_id_var_from_entity_expr(), find_entity_in_current_cpstate() - regress/sql/cypher_match.sql: Add 13 WHERE optimization tests - regress/sql/expr.sql: Add cross-clause optimization tests - regress/expected/*.out: Update expected output modified: regress/expected/cypher_match.out modified: regress/expected/cypher_with.out modified: regress/sql/cypher_match.sql modified: regress/sql/cypher_with.sql modified: src/backend/parser/cypher_clause.c modified: src/backend/parser/cypher_expr.c modified: src/backend/parser/cypher_transform_entity.c modified: src/include/parser/cypher_transform_entity.h
1 parent 48fca83 commit 3ffb136

8 files changed

Lines changed: 1337 additions & 6 deletions

File tree

regress/expected/cypher_match.out

Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3533,6 +3533,244 @@ SELECT * FROM cypher('test_enable_containment', $$ EXPLAIN (costs off) MATCH (x:
35333533
Filter: ((agtype_access_operator(VARIADIC ARRAY[properties, '"school"'::agtype]) = '{"name": "XYZ College", "program": {"major": "Psyc", "degree": "BSc"}}'::agtype) AND (agtype_access_operator(VARIADIC ARRAY[properties, '"phone"'::agtype]) = '[123456789, 987654321, 456987123]'::agtype))
35343534
(2 rows)
35353535

3536+
--
3537+
-- Test: WHERE clause id(), start_id(), end_id() optimizations in current clause
3538+
-- These tests verify that id/start_id/end_id calls in WHERE clauses use direct
3539+
-- column access (graphid_to_agtype) instead of rebuilding the full vertex/edge.
3540+
--
3541+
SELECT create_graph('test_where_opt');
3542+
NOTICE: graph "test_where_opt" has been created
3543+
create_graph
3544+
--------------
3545+
3546+
(1 row)
3547+
3548+
-- Create test data
3549+
SELECT * FROM cypher('test_where_opt', $$
3550+
CREATE (:Person {name: 'Alice'})-[:KNOWS {since: 2020}]->(:Person {name: 'Bob'})
3551+
$$) as (a agtype);
3552+
a
3553+
---
3554+
(0 rows)
3555+
3556+
-- Test 1: WHERE with id(vertex) in current clause - should use graphid_to_agtype
3557+
SELECT * FROM cypher('test_where_opt', $$
3558+
MATCH (p:Person)
3559+
WHERE id(p) > 0
3560+
RETURN p.name
3561+
$$) as (name agtype);
3562+
name
3563+
---------
3564+
"Alice"
3565+
"Bob"
3566+
(2 rows)
3567+
3568+
-- Test 2: EXPLAIN to verify optimization (graphid_to_agtype instead of age_id)
3569+
SELECT * FROM cypher('test_where_opt', $$
3570+
EXPLAIN (VERBOSE, COSTS OFF)
3571+
MATCH (p:Person)
3572+
WHERE id(p) > 0
3573+
RETURN p.name
3574+
$$) as (plan agtype);
3575+
QUERY PLAN
3576+
-----------------------------------------------------------------------------------------------------------------------------------------------
3577+
Seq Scan on test_where_opt."Person" p
3578+
Output: agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(p.id, _label_name('20334'::oid, p.id), p.properties), '"name"'::agtype])
3579+
Filter: (graphid_to_agtype(p.id) > '0'::agtype)
3580+
(3 rows)
3581+
3582+
-- Test 3: WHERE with id(edge) in current clause
3583+
SELECT * FROM cypher('test_where_opt', $$
3584+
MATCH (p:Person)-[e:KNOWS]->(q:Person)
3585+
WHERE id(e) > 0
3586+
RETURN p.name, q.name
3587+
$$) as (name1 agtype, name2 agtype);
3588+
name1 | name2
3589+
---------+-------
3590+
"Alice" | "Bob"
3591+
(1 row)
3592+
3593+
-- Test 4: WHERE with start_id(edge) in current clause
3594+
SELECT * FROM cypher('test_where_opt', $$
3595+
MATCH (p:Person)-[e:KNOWS]->(q:Person)
3596+
WHERE start_id(e) > 0
3597+
RETURN p.name, q.name
3598+
$$) as (name1 agtype, name2 agtype);
3599+
name1 | name2
3600+
---------+-------
3601+
"Alice" | "Bob"
3602+
(1 row)
3603+
3604+
-- Test 5: WHERE with end_id(edge) in current clause
3605+
SELECT * FROM cypher('test_where_opt', $$
3606+
MATCH (p:Person)-[e:KNOWS]->(q:Person)
3607+
WHERE end_id(e) > 0
3608+
RETURN p.name, q.name
3609+
$$) as (name1 agtype, name2 agtype);
3610+
name1 | name2
3611+
---------+-------
3612+
"Alice" | "Bob"
3613+
(1 row)
3614+
3615+
-- Test 6: EXPLAIN to verify edge optimization (all three: id, start_id, end_id)
3616+
SELECT * FROM cypher('test_where_opt', $$
3617+
EXPLAIN (VERBOSE, COSTS OFF)
3618+
MATCH (p:Person)-[e:KNOWS]->(q:Person)
3619+
WHERE id(e) > 0 AND start_id(e) > 0 AND end_id(e) > 0
3620+
RETURN p.name
3621+
$$) as (plan agtype);
3622+
QUERY PLAN
3623+
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3624+
Hash Join
3625+
Output: agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(p.id, _label_name('20334'::oid, p.id), p.properties), '"name"'::agtype])
3626+
Hash Cond: (q.id = e.end_id)
3627+
-> Seq Scan on test_where_opt."Person" q
3628+
Output: q.id, q.properties
3629+
-> Hash
3630+
Output: p.id, p.properties, e.end_id
3631+
-> Hash Join
3632+
Output: p.id, p.properties, e.end_id
3633+
Hash Cond: (p.id = e.start_id)
3634+
-> Seq Scan on test_where_opt."Person" p
3635+
Output: p.id, p.properties
3636+
-> Hash
3637+
Output: e.start_id, e.end_id
3638+
-> Seq Scan on test_where_opt."KNOWS" e
3639+
Output: e.start_id, e.end_id
3640+
Filter: ((graphid_to_agtype(e.id) > '0'::agtype) AND (graphid_to_agtype(e.start_id) > '0'::agtype) AND (graphid_to_agtype(e.end_id) > '0'::agtype))
3641+
(17 rows)
3642+
3643+
-- Test 7: Combined WHERE with multiple id() calls on different entities
3644+
SELECT * FROM cypher('test_where_opt', $$
3645+
MATCH (p:Person)-[e:KNOWS]->(q:Person)
3646+
WHERE id(p) > 0 AND id(q) > 0 AND id(e) > 0
3647+
RETURN p.name, q.name
3648+
$$) as (name1 agtype, name2 agtype);
3649+
name1 | name2
3650+
---------+-------
3651+
"Alice" | "Bob"
3652+
(1 row)
3653+
3654+
-- Test 8: WHERE with id() comparison between entities
3655+
SELECT * FROM cypher('test_where_opt', $$
3656+
MATCH (p:Person)-[e:KNOWS]->(q:Person)
3657+
WHERE start_id(e) = id(p) AND end_id(e) = id(q)
3658+
RETURN p.name, q.name
3659+
$$) as (name1 agtype, name2 agtype);
3660+
name1 | name2
3661+
---------+-------
3662+
"Alice" | "Bob"
3663+
(1 row)
3664+
3665+
-- Test 9: WHERE with id() in complex expression
3666+
SELECT * FROM cypher('test_where_opt', $$
3667+
MATCH (p:Person)
3668+
WHERE id(p) > 0 AND id(p) < 9223372036854775807
3669+
RETURN p.name
3670+
$$) as (name agtype);
3671+
name
3672+
---------
3673+
"Alice"
3674+
"Bob"
3675+
(2 rows)
3676+
3677+
-- Test 10: Cross-clause WHERE still works (entity from previous MATCH)
3678+
SELECT * FROM cypher('test_where_opt', $$
3679+
MATCH (p:Person)
3680+
MATCH (q:Person)
3681+
WHERE id(p) > 0
3682+
RETURN p.name, q.name
3683+
$$) as (name1 agtype, name2 agtype);
3684+
name1 | name2
3685+
---------+---------
3686+
"Alice" | "Alice"
3687+
"Bob" | "Alice"
3688+
"Alice" | "Bob"
3689+
"Bob" | "Bob"
3690+
(4 rows)
3691+
3692+
-- Test 11: EXPLAIN cross-clause to verify optimization
3693+
SELECT * FROM cypher('test_where_opt', $$
3694+
EXPLAIN (VERBOSE, COSTS OFF)
3695+
MATCH (p:Person)
3696+
MATCH (q:Person)
3697+
WHERE id(p) > 0
3698+
RETURN p.name
3699+
$$) as (plan agtype);
3700+
QUERY PLAN
3701+
-----------------------------------------------------------------------------------------------------------------------------------------------
3702+
Nested Loop
3703+
Output: agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(p.id, _label_name('20334'::oid, p.id), p.properties), '"name"'::agtype])
3704+
-> Seq Scan on test_where_opt."Person" q
3705+
Output: q.id, q.properties
3706+
-> Materialize
3707+
Output: p.id, p.properties
3708+
-> Seq Scan on test_where_opt."Person" p
3709+
Output: p.id, p.properties
3710+
Filter: (graphid_to_agtype(p.id) > '0'::agtype)
3711+
(9 rows)
3712+
3713+
-- Test 12: Combined cross-clause and current-clause WHERE optimization
3714+
-- p is from previous clause (cross-clause), q and e are from current clause (intra-clause)
3715+
SELECT * FROM cypher('test_where_opt', $$
3716+
MATCH (p:Person)
3717+
MATCH (q:Person)-[e:KNOWS]->(r:Person)
3718+
WHERE id(p) > 0 AND id(q) > 0 AND id(e) > 0 AND start_id(e) > 0
3719+
RETURN p.name, q.name, r.name
3720+
$$) as (name1 agtype, name2 agtype, name3 agtype);
3721+
name1 | name2 | name3
3722+
---------+---------+-------
3723+
"Alice" | "Alice" | "Bob"
3724+
"Bob" | "Alice" | "Bob"
3725+
(2 rows)
3726+
3727+
-- Test 13: EXPLAIN combined cross-clause and current-clause WHERE
3728+
SELECT * FROM cypher('test_where_opt', $$
3729+
EXPLAIN (VERBOSE, COSTS OFF)
3730+
MATCH (p:Person)
3731+
MATCH (q:Person)-[e:KNOWS]->(r:Person)
3732+
WHERE id(p) > 0 AND id(q) > 0 AND id(e) > 0 AND start_id(e) > 0
3733+
RETURN p.name
3734+
$$) as (plan agtype);
3735+
QUERY PLAN
3736+
-----------------------------------------------------------------------------------------------------------------------------------------------
3737+
Nested Loop
3738+
Output: agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(p.id, _label_name('20334'::oid, p.id), p.properties), '"name"'::agtype])
3739+
-> Seq Scan on test_where_opt."Person" p
3740+
Output: p.id, p.properties
3741+
Filter: (graphid_to_agtype(p.id) > '0'::agtype)
3742+
-> Materialize
3743+
-> Nested Loop
3744+
Inner Unique: true
3745+
-> Hash Join
3746+
Output: e.end_id
3747+
Inner Unique: true
3748+
Hash Cond: (e.start_id = q.id)
3749+
-> Seq Scan on test_where_opt."KNOWS" e
3750+
Output: e.id, e.start_id, e.end_id, e.properties
3751+
Filter: ((graphid_to_agtype(e.id) > '0'::agtype) AND (graphid_to_agtype(e.start_id) > '0'::agtype))
3752+
-> Hash
3753+
Output: q.id
3754+
-> Seq Scan on test_where_opt."Person" q
3755+
Output: q.id
3756+
Filter: (graphid_to_agtype(q.id) > '0'::agtype)
3757+
-> Index Only Scan using "Person_pkey" on test_where_opt."Person" r
3758+
Output: r.id
3759+
Index Cond: (r.id = e.end_id)
3760+
(23 rows)
3761+
3762+
SELECT drop_graph('test_where_opt', true);
3763+
NOTICE: drop cascades to 4 other objects
3764+
DETAIL: drop cascades to table test_where_opt._ag_label_vertex
3765+
drop cascades to table test_where_opt._ag_label_edge
3766+
drop cascades to table test_where_opt."Person"
3767+
drop cascades to table test_where_opt."KNOWS"
3768+
NOTICE: graph "test_where_opt" has been dropped
3769+
drop_graph
3770+
------------
3771+
3772+
(1 row)
3773+
35363774
--
35373775
-- Clean up
35383776
--

0 commit comments

Comments
 (0)