fix(cross-repo): emit HTTP_CALLS for unindexed client libs and normalize URLs for route matching (#523)#536
Conversation
|
Built this branch on macOS (arm64) and traced resolve_single_call to see exactly what happens. The two halves behave differently: pass_cross_repo.c (URL normalize + template match): works. Holding an HTTP_CALLS edge constant (url_path = the full URL http://order-service:8080/v2/orders/123), main links nothing and this branch links it to the templated route /v2/orders/{id}. Clean before/after. pass_calls.c (emit without the client lib indexed): doesn't fire for a genuinely external requests. The call reaches resolve_single_call and first_string_arg holds the URL, so that part is fine. But with requests not installed or vendored anywhere indexable, the import map is empty (imp_count=0), so cbm_registry_resolve returns an empty qualified_name and the function returns at The moment requests is locally resolvable (a vendored stub or an installed venv), imp_count=1, res_qn becomes '...requests.get', svc=1, and it emits, but that is the resolved path, which main emits on too. So the emit-without-target path seems to help only when the callee resolves to a QN that has no node, not when the external client resolves to nothing. Was requests pip-installed in your WSL repro? If so, cross_http_calls: 1 is the matcher fix plus a resolvable call, and the index-just-my-service case (the #523 scenario) is still 0. Happy to share the repro. |
|
Good catch you were exactly right. The emit sat after the empty-QN early return, so a genuinely external Fixed in the latest commit: the detection now lives in the empty-QN branch and classifies from the raw callee name (
One caveat worth flagging separately: a single-file provider still returns 0, but for an unrelated reason FastAPI route extraction ( thanks for tracing this on your end ;) |
|
Re-validated 0a8a44f on macOS (arm64), with a genuinely external requests (no stub, no install, consumer indexes only its own service):
So the empty-QN path is fixed. Confirmed on my end. On the single-file caveat: I'm not reproducing it here. My provider is a single app.py with two routes (@app.get + @app.post), both Route nodes extracted fine, which is why the end-to-end run above links. So the no-route-on-single-file behavior may be platform-specific or a narrower trigger than file count, rather than a general <50-file thing. Didn't block the cross-repo case for me, but happy to share details if you open a separate issue for it. Nice work turning this around so fast. |
|
Thanks for re-validating ;) You're right to push back on the single-file theory if your single Appreciate the thorough trace throughout it made both fixes tighter. |
|
Thanks @RithvikReddy0-0 — the unindexed-client + URL-normalization direction is right. Two things before this can land:
Also, |
…DeusData#523) Addresses review on DeusData#536. insert_cross_edge now skips insertion when an identical (source_id, target_id, type) edge already exists. The pass reaches the same caller/route pair from both directions and emit_cross_route_bidirectional writes both DBs, so without this guard the same CROSS_HTTP_CALLS pair was re-emitted and inflated http_edges. Verified idempotent: repeated runs and runs from either project side both yield cross_http_calls: 1 with exactly one edge per DB. Documented why emit_http_async_edge is called with source_node as both source and target in the unindexed-external-client path. Signed-off-by: RithvikReddy0-0 <rithvikreddymukkara@gmail.com>
|
Thanks for the review. Addressed all three in 4817d79. 1. Duplicate edges. Fixed via an idempotency guard in
One thing worth raising on the "single direction" suggestion: I don't think dropping the reverse Tradeoff I want to flag: the guard adds a 3. Self-pass comment. Added documents that the external client has no graph node, so 2. Test. This is the one I'd like a pointer on. |
|
Thanks @RithvikReddy0-0 — the two-pronged diagnosis (emit HTTP_CALLS for unindexed clients + bidirectional route matching) is right, and this is close. A few things before it can land:
Solid work overall — looking forward to the revision. |
…s, dedupe edges (DeusData#523) Two root causes of cross-repo-intelligence returning 0 edges: pass_calls.c — external HTTP/async clients (requests, httpx, axios) that resolve to no QN because their library isn't indexed were dropped before classification. Detect them from the raw callee name in the empty-QN branch and emit against the source node, guarded on a URL/topic-shaped arg so a non-URL arg can't emit a source->source self-edge. pass_cross_repo.c — match_http_routes: - cr_url_path() strips scheme+host+port so a consumer's full URL matches a provider's bare route path. - cr_path_matches_template() + find_route_handler_fuzzy() match concrete paths (/v2/orders/123) against templated routes (/v2/orders/{id}). - reverse-direction match so a provider-initiated run finds the consumer's HTTP_CALLS (the provider has no outbound calls of its own). - insert_cross_edge / emit_cross_route_bidirectional are idempotent and return whether a row was inserted; match loops count real inserts, so a pair matched from both directions does not duplicate or inflate counts. Signed-off-by: RithvikReddy0-0 <rithvikreddymukkara@gmail.com>
4817d79 to
55fa3e1
Compare
|
Rebased onto latest main and adapted to the new
On the rebase: main's For the regression test — |
Fixes two root causes of cross-repo-intelligence returning 0 edges (#523).
pass_calls.c
HTTP client calls (requests, httpx, axios, etc.) were silently dropped when
the client library wasn't indexed (external pip/npm dep). The callee resolved
to a QN but cbm_gbuf_find_by_qn returned NULL, so the call was discarded
before HTTP classification.
Fix: detect known HTTP/async patterns via cbm_service_pattern_match and emit
the edge even without a target node in the graph.
pass_cross_repo.c
Three issues in match_http_routes:
Consumer url_path carried full URL (scheme+host+port); provider Route has
bare path. Added cr_url_path() to strip scheme+authority before QN lookup.
Concrete paths (/v2/orders/123) never matched templated routes
(/v2/orders/{id}). Added cr_path_matches_template() and
find_route_handler_fuzzy() for segment-level template matching.
match_http_routes only searched HTTP_CALLS in the src project. When
cross-repo is run from the provider side, HTTP_CALLS live in the consumer
DB. Added reverse direction call so both orientations are covered.
Repro
Verified manually: FastAPI provider + requests consumer, cross-repo-intelligence
now returns cross_http_calls: 1 where it previously returned 0.
Checklist
git commit -s) — required, CI rejectsunsigned commits (DCO, see CONTRIBUTING.md)
make -f Makefile.cbm test)make -f Makefile.cbm lint-ci)