-
Notifications
You must be signed in to change notification settings - Fork 7
Taxonomic Inference: 4. Quality Control Queries
This page gives useful queries for doing quality control on resources that contain branch painting directives ('start points' and 'stop points'). You can run these queries by pasting them in the EOL Cypher query form or you can use the EOL web services API.
A start point id that does not resolve to a DH Page represents a missed opportunity, since it means some data from the resource will be lost. A stop point id that does not resolve, but whose start point does resolve, represents high likelihood of an incorrect inference, since the stop point taxon could easily have been lumped in with another taxon, elided from the hierarchy, or given a new identifier. In all of these cases, 'painting' into descendants of the former stop point will be incorrectly inferred.
Find all branch painting 'start points' that either aren't in the DH, or don't have Page nodes at all.
Change the resource id (635, below) to the id of the resource that you are checking.
Change starts_at to stops_at to check for stop points not in the DH, which is the more important case.
MATCH (r:Resource {resource_id: 635})<-[:supplier]-
(:Trait)-[:metadata]->
(m:MetaData)-[:predicate]->
(:Term {uri: 'https://eol.org/schema/terms/starts_at'})
WITH DISTINCT toInteger(m.measurement) AS point_id
OPTIONAL MATCH (point:Page {page_id: point_id})
OPTIONAL MATCH (point)-[:parent]->(parent:Page)
WITH point_id, point, parent
WHERE parent IS NULL
RETURN point_id, point.canonical
ORDER BY point.page_id, point_id
LIMIT 1000
The following query is intended to find stop points that are missing or not in the DH, when they descend from start points that are in the DH:
MATCH (r:Resource {resource_id: 635})<-[:supplier]-
(t:Trait)-[:metadata]->
(m:MetaData)-[:predicate]->
(:Term {uri: 'https://eol.org/schema/terms/starts_at'}),
(t)-[:metadata]->
(m2:MetaData)-[:predicate]->
(:Term {uri: 'https://eol.org/schema/terms/stops_at'})
WITH toInteger(m.measurement) AS start_id,
toInteger(m2.measurement) AS stop_id
MATCH (stop:Page {page_id: stop_id})-[:parent*1..]->
(start:Page {page_id: start_id})
OPTIONAL MATCH (stop)-[:parent]->(stop_parent:Page)
WITH stop_id, stop, stop_parent
WHERE stop_parent IS NULL
RETURN stop_id, stop.canonical
ORDER BY stop.page_id, stop_id
LIMIT 1000
MATCH (r:Resource {resource_id: 640})<-[:supplier]-
(t:Trait)-[:metadata]->
(m2:MetaData)-[:predicate]->
(:Term {uri: 'https://eol.org/schema/terms/stops_at'})
OPTIONAL MATCH
(t)-[:metadata]->
(m1:MetaData)-[:predicate]->
(:Term {uri: 'https://eol.org/schema/terms/starts_at'})
WITH toInteger(m2.measurement) AS stop_id,
toInteger(m1.measurement) AS start_id,
t
MATCH (stop:Page {page_id: stop_id})
MATCH (start:Page {page_id: start_id})
OPTIONAL MATCH (stop)-[z:parent*1..]->(start)
WITH stop_id, stop, t
WHERE z IS NULL
RETURN stop_id, stop.canonical, t.eol_pk
ORDER BY stop.page_id, stop_id
LIMIT 1000