Skip to content

Taxonomic Inference: 4. Quality Control Queries

Katja Schulz edited this page Mar 2, 2023 · 2 revisions

This page gives useful queries for doing quality control on resources that contain branch painting directives ('start points' and 'stop points'). You can run these queries by pasting them in the EOL Cypher query form or you can use the EOL web services API.

A start point id that does not resolve to a DH Page represents a missed opportunity, since it means some data from the resource will be lost. A stop point id that does not resolve, but whose start point does resolve, represents high likelihood of an incorrect inference, since the stop point taxon could easily have been lumped in with another taxon, elided from the hierarchy, or given a new identifier. In all of these cases, 'painting' into descendants of the former stop point will be incorrectly inferred.

Find invalid start and stop point ids

Find all branch painting 'start points' that either aren't in the DH, or don't have Page nodes at all.

Change the resource id (635, below) to the id of the resource that you are checking.

Change starts_at to stops_at to check for stop points not in the DH, which is the more important case.

MATCH (r:Resource {resource_id: 635})<-[:supplier]-
      (:Trait)-[:metadata]->
      (m:MetaData)-[:predicate]->
      (:Term {uri: 'https://eol.org/schema/terms/starts_at'})
WITH DISTINCT toInteger(m.measurement) AS point_id
OPTIONAL MATCH (point:Page {page_id: point_id})
OPTIONAL MATCH (point)-[:parent]->(parent:Page)
WITH point_id, point, parent
WHERE parent IS NULL
RETURN point_id, point.canonical
ORDER BY point.page_id, point_id
LIMIT 1000

Find critical stop point ids (invalid ones whose start point ids are valid)

The following query is intended to find stop points that are missing or not in the DH, when they descend from start points that are in the DH:

MATCH (r:Resource {resource_id: 635})<-[:supplier]-
      (t:Trait)-[:metadata]->
      (m:MetaData)-[:predicate]->
      (:Term {uri: 'https://eol.org/schema/terms/starts_at'}),
      (t)-[:metadata]->
      (m2:MetaData)-[:predicate]->
      (:Term {uri: 'https://eol.org/schema/terms/stops_at'})
WITH toInteger(m.measurement) AS start_id,
     toInteger(m2.measurement) AS stop_id
MATCH (stop:Page {page_id: stop_id})-[:parent*1..]->
      (start:Page {page_id: start_id})
OPTIONAL MATCH (stop)-[:parent]->(stop_parent:Page)
WITH stop_id, stop, stop_parent
WHERE stop_parent IS NULL
RETURN stop_id, stop.canonical
ORDER BY stop.page_id, stop_id
LIMIT 1000

Find stop nodes that are not under start nodes

MATCH (r:Resource {resource_id: 640})<-[:supplier]-
      (t:Trait)-[:metadata]->
      (m2:MetaData)-[:predicate]->
      (:Term {uri: 'https://eol.org/schema/terms/stops_at'})
OPTIONAL MATCH 
      (t)-[:metadata]->
      (m1:MetaData)-[:predicate]->
      (:Term {uri: 'https://eol.org/schema/terms/starts_at'})
WITH toInteger(m2.measurement) AS stop_id,
     toInteger(m1.measurement) AS start_id,
     t
MATCH (stop:Page {page_id: stop_id})
MATCH (start:Page {page_id: start_id})
OPTIONAL MATCH (stop)-[z:parent*1..]->(start)
WITH stop_id, stop, t
WHERE z IS NULL
RETURN stop_id, stop.canonical, t.eol_pk
ORDER BY stop.page_id, stop_id
LIMIT 1000

Clone this wiki locally