High-Performance Apache AGE Integration for Gap Analysis (#488)#801
Open
Khushi281300 wants to merge 4 commits intoOWASP:mainfrom
Open
High-Performance Apache AGE Integration for Gap Analysis (#488)#801Khushi281300 wants to merge 4 commits intoOWASP:mainfrom
Khushi281300 wants to merge 4 commits intoOWASP:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR introduces Apache AGE as a high-performance alternative to Neo4j for OpenCRE's gap analysis. It directly addresses the performance bottlenecks identified in Issue #488, where complex graph traversals were taking excessive time on standard hardware.
Following a deep dive into the previous implementation attempts, I have resolved critical stability and connectivity issues that previously led to "timeout" or "hanging" reports, making Apache AGE a production-ready alternative backend.
Key Improvements & Bug Fixes
Synchronous Connection Lifecycle (Stability)
Problem: Previous attempts suffered from a race condition where the application would attempt to populate the database before the background connection thread was established, leading to "No AGE connection" errors. Fix: Refactored AGEDB.populate_DB to use
instance_blocking(timeout=30)
, ensuring the database is fully ready before execution.
Multi-Backend Safe Property Access (Correctness)
Problem: Graph result mapping in
gap_analysis
used Neo4j-specific attribute access (.id), causing AttributeError when switching to AGE. Fix: Implemented a backend-agnostic property getter getattr(node, "id", None) or node.get("id") to ensure seamless switching between databases.
AGE Execution Environment Fix (Query logic)
Problem: Verification and benchmarking scripts were failing because the age extension wasn't being explicitly loaded in the database session. Fix: Updated
check_age_stats.py
and internal query handlers to auto-load the AGE extension and set the correct ag_catalog search path.
Code Quality & CI
Fix: Reformatted the entire modified codebase with black to ensure compliance with project styling and CI (Super-Linter) requirements.
Performance & Verification Evidence
Database Population
The migration flow is now robust and handles over 2,000 nodes and 5,000 links into Apache AGE in seconds.Population Success
Data Parity & Metrics
Verification scripts show identical path counts and node structures between backends, ensuring no data loss during the transition.Graph Metrics
End-to-End API Resilience
Automated endpoint testing confirms the /rest/v1/map_analysis API is fully operational with the AGE backend.Resilience Test
result:
1.
How to Use
Ensure your Docker stack is running: docker start cre-age cre-redis-stack.
Update your
.env
file: GRAPH_DB_TYPE=age.
Populate the graph: python cre.py --populate_graph_db.
Verify stats: python check_age_stats.py.
Conclusion
By resolving the underlying connection and mapping bugs, Apache AGE now provides a stable and significantly more responsive gap analysis experience for OpenCRE users.