Skip to content

High-Performance Apache AGE Integration for Gap Analysis (#488)#801

Open
Khushi281300 wants to merge 4 commits intoOWASP:mainfrom
Khushi281300:feature/age-integration-fix
Open

High-Performance Apache AGE Integration for Gap Analysis (#488)#801
Khushi281300 wants to merge 4 commits intoOWASP:mainfrom
Khushi281300:feature/age-integration-fix

Conversation

@Khushi281300
Copy link

Overview

This PR introduces Apache AGE as a high-performance alternative to Neo4j for OpenCRE's gap analysis. It directly addresses the performance bottlenecks identified in Issue #488, where complex graph traversals were taking excessive time on standard hardware.

Following a deep dive into the previous implementation attempts, I have resolved critical stability and connectivity issues that previously led to "timeout" or "hanging" reports, making Apache AGE a production-ready alternative backend.

Key Improvements & Bug Fixes

  1. Synchronous Connection Lifecycle (Stability)
    Problem: Previous attempts suffered from a race condition where the application would attempt to populate the database before the background connection thread was established, leading to "No AGE connection" errors. Fix: Refactored AGEDB.populate_DB to use
    instance_blocking(timeout=30)
    , ensuring the database is fully ready before execution.

  2. Multi-Backend Safe Property Access (Correctness)
    Problem: Graph result mapping in
    gap_analysis
    used Neo4j-specific attribute access (.id), causing AttributeError when switching to AGE. Fix: Implemented a backend-agnostic property getter getattr(node, "id", None) or node.get("id") to ensure seamless switching between databases.

  3. AGE Execution Environment Fix (Query logic)
    Problem: Verification and benchmarking scripts were failing because the age extension wasn't being explicitly loaded in the database session. Fix: Updated
    check_age_stats.py
    and internal query handlers to auto-load the AGE extension and set the correct ag_catalog search path.

  4. Code Quality & CI
    Fix: Reformatted the entire modified codebase with black to ensure compliance with project styling and CI (Super-Linter) requirements.

Performance & Verification Evidence

  1. Database Population
    The migration flow is now robust and handles over 2,000 nodes and 5,000 links into Apache AGE in seconds.Population Success

  2. Data Parity & Metrics
    Verification scripts show identical path counts and node structures between backends, ensuring no data loss during the transition.Graph Metrics

  3. End-to-End API Resilience
    Automated endpoint testing confirms the /rest/v1/map_analysis API is fully operational with the AGE backend.Resilience Test

result:

1.

Screenshot 2026-03-13 230938 #### 2. Screenshot 2026-03-13 231003 #### 3. Screenshot 2026-03-13 232941 #### 4. Screenshot 2026-03-13 234529

How to Use

Ensure your Docker stack is running: docker start cre-age cre-redis-stack.
Update your
.env
file: GRAPH_DB_TYPE=age.
Populate the graph: python cre.py --populate_graph_db.
Verify stats: python check_age_stats.py.

Conclusion

By resolving the underlying connection and mapping bugs, Apache AGE now provides a stable and significantly more responsive gap analysis experience for OpenCRE users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant