Skip to content

Phase 4 Advanced Instrumentation

Ivan P edited this page Jun 5, 2026 · 2 revisions

Custom spans, DIDComm context propagation, wallet/cryptographic operation tracing. No timeline — implemented opportunistically, per service, when a specific operational or performance question warrants deeper visibility than auto-instrumentation provides.

Phase 4 issues are created on-demand when a concrete need is identified.


Blueprint: acapy-agent Custom Instrumentation

The following is a concrete example of Phase 4 instrumentation for acapy-agent. It demonstrates the three most valuable custom instrumentation targets and can serve as a pattern for other services.

1. Wallet Cryptography Tracing

Auto-instrumentation misses Askar wallet operations because the aries-askar library uses a native C/Rust wrapper — there is no Python DB driver to hook into. Manual spans expose the boundary between agent logic and cryptographic operations.

Target: acapy_agent/wallet/askar.py (pack, unpack, sign operations)

from opentelemetry import trace

tracer = trace.get_tracer("acapy_agent.wallet")

async def pack_message(self, message, recipient_keys, routing_keys, sender_key):
    with tracer.start_as_current_span("askar.pack_message") as span:
        span.set_attribute("recipient_count", len(recipient_keys))
        span.set_attribute("has_routing", bool(routing_keys))
        span.set_attribute("wallet.type", "askar")
        return await self._pack_message_inner(message, recipient_keys, routing_keys, sender_key)

async def unpack_message(self, enc_message):
    with tracer.start_as_current_span("askar.unpack_message") as span:
        span.set_attribute("wallet.type", "askar")
        result = await self._unpack_message_inner(enc_message)
        span.set_attribute("message.type", result.message.get("@type", "unknown"))
        return result

Value: Separates agent processing latency from cryptographic operation time. Makes it immediately visible when latency spikes are caused by key operations vs. network vs. business logic.


2. DIDComm Message Loop Tracing

The inbound message handler in conductor.py is the entry point for all DIDComm protocol traffic. Wrapping it with a span captures the full lifecycle of every received message, including protocol dispatch time.

Target: acapy_agent/core/conductor.py

from opentelemetry import trace

tracer = trace.get_tracer("acapy_agent.conductor")

async def queue_message(self, inbound_message):
    with tracer.start_as_current_span("conductor.inbound_message") as span:
        msg_type = inbound_message.payload.get("@type", "unknown") if inbound_message.payload else "unknown"
        span.set_attribute("message.type", msg_type)
        span.set_attribute("connection.id", inbound_message.connection_id or "unconnected")
        span.set_attribute("transport", inbound_message.transport_type or "unknown")
        await self._inbound_queue.put(inbound_message)

async def dispatch_message(self, inbound_message):
    with tracer.start_as_current_span("conductor.dispatch") as span:
        span.set_attribute("message.type", inbound_message.payload.get("@type", "unknown"))
        await self._dispatcher.dispatch(inbound_message)

Value: Reveals dispatch bottlenecks, protocol-level latency breakdown, and message queue depth patterns over time.


3. Decentralized Trace Context Propagation

Standard OTel trace context propagation uses HTTP headers (traceparent, tracestate). DIDComm messages travel over multiple transports and may not carry standard HTTP headers. Injecting W3C trace context into DIDComm message decorators bridges traces across independent agent instances.

Target: acapy_agent/transport/outbound/manager.py

from opentelemetry import trace
from opentelemetry.propagate import inject

tracer = trace.get_tracer("acapy_agent.transport")

async def send_outbound_message(self, context, message, endpoint):
    with tracer.start_as_current_span("transport.outbound") as span:
        span.set_attribute("endpoint", endpoint.uri if endpoint else "unknown")

        # Inject W3C trace context into DIDComm message decorator
        trace_headers = {}
        inject(trace_headers)
        if trace_headers:
            message.set_decorator("~trace", trace_headers)

        await self._outbound_transport.handle_message(context, message, endpoint)

On the receiving end, extract the context before dispatching:

from opentelemetry.propagate import extract

async def receive_inbound_message(self, message_body, transport_type):
    # Extract trace context from DIDComm decorator if present
    trace_decorator = message_body.get("~trace", {})
    ctx = extract(trace_decorator) if trace_decorator else None

    with tracer.start_as_current_span("transport.inbound", context=ctx) as span:
        span.set_attribute("transport", transport_type)
        await self.queue_message(message_body)

Value: Stitches together a single distributed trace that follows a credential issuance flow across the Holder agent → Issuer agent boundary — making cross-agent latency visible as a single trace in Tempo rather than two disconnected traces.


Per-Service Phase 4 Targets (Placeholders)

Issues to be created when operational need is identified.

acapy-vc-authn-oidc

  • Authentication handshake spans in oidc-controller/api/core/oidc/provider.py (token signing, OIDC cryptographic operations)
  • SSE stream lifetime tracing in oidc-controller/api/routers/sse.py

traction

  • Reservation state machine transitions in traction_innkeeper/v1_0/innkeeper/tenant_manager.py (pending → approved → active)
  • W3C traceparent injection via Axios interceptors in services/tenant-ui/src/routes/router.ts

acapy-endorser-service

  • Webhook lifecycle spans in endorser/api/services/webhook_handlers.py
  • endorse_transaction custom attributes (Schema ID, Transaction ID, Connection ID) in endorser/api/services/endorse.py

didwebvh-server-py

  • SCID generation + log verification spans in server/app/plugins/didwebvh.py
  • Background task monitoring in server/app/tasks.py via @tracer.start_as_current_span decorator
  • Askar secure storage latency in server/app/plugins/askar.py

credo-ts

  • Dispatcher.dispatch() spans for DIDComm message hop tracking
  • Indy/Cheqd VDR module spans to isolate public ledger network latency
  • Key generation, signing, and packing operations

didcomm-mediator-credo

  • Queue contention spans in PostgresMessagePickupRepository.ts around takeFromQueue and addMessage
  • Firebase Push Notification dispatch tracing in PushNotificationsFcmService.ts

bc-wallet-demo

  • Traction API wrapper spans in server/src/utils/tractionHelper.ts
  • Socket.io real-time credential sync spans in server/src/index.ts
  • CredentialController business metric attributes (showcase.name, credential.type)