-
Notifications
You must be signed in to change notification settings - Fork 8
Phase 4 Advanced Instrumentation
Custom spans, DIDComm context propagation, wallet/cryptographic operation tracing. No timeline — implemented opportunistically, per service, when a specific operational or performance question warrants deeper visibility than auto-instrumentation provides.
Phase 4 issues are created on-demand when a concrete need is identified.
The following is a concrete example of Phase 4 instrumentation for acapy-agent. It demonstrates the three most valuable custom instrumentation targets and can serve as a pattern for other services.
Auto-instrumentation misses Askar wallet operations because the aries-askar library uses a native C/Rust wrapper — there is no Python DB driver to hook into. Manual spans expose the boundary between agent logic and cryptographic operations.
Target: acapy_agent/wallet/askar.py (pack, unpack, sign operations)
from opentelemetry import trace
tracer = trace.get_tracer("acapy_agent.wallet")
async def pack_message(self, message, recipient_keys, routing_keys, sender_key):
with tracer.start_as_current_span("askar.pack_message") as span:
span.set_attribute("recipient_count", len(recipient_keys))
span.set_attribute("has_routing", bool(routing_keys))
span.set_attribute("wallet.type", "askar")
return await self._pack_message_inner(message, recipient_keys, routing_keys, sender_key)
async def unpack_message(self, enc_message):
with tracer.start_as_current_span("askar.unpack_message") as span:
span.set_attribute("wallet.type", "askar")
result = await self._unpack_message_inner(enc_message)
span.set_attribute("message.type", result.message.get("@type", "unknown"))
return resultValue: Separates agent processing latency from cryptographic operation time. Makes it immediately visible when latency spikes are caused by key operations vs. network vs. business logic.
The inbound message handler in conductor.py is the entry point for all DIDComm protocol traffic. Wrapping it with a span captures the full lifecycle of every received message, including protocol dispatch time.
Target: acapy_agent/core/conductor.py
from opentelemetry import trace
tracer = trace.get_tracer("acapy_agent.conductor")
async def queue_message(self, inbound_message):
with tracer.start_as_current_span("conductor.inbound_message") as span:
msg_type = inbound_message.payload.get("@type", "unknown") if inbound_message.payload else "unknown"
span.set_attribute("message.type", msg_type)
span.set_attribute("connection.id", inbound_message.connection_id or "unconnected")
span.set_attribute("transport", inbound_message.transport_type or "unknown")
await self._inbound_queue.put(inbound_message)
async def dispatch_message(self, inbound_message):
with tracer.start_as_current_span("conductor.dispatch") as span:
span.set_attribute("message.type", inbound_message.payload.get("@type", "unknown"))
await self._dispatcher.dispatch(inbound_message)Value: Reveals dispatch bottlenecks, protocol-level latency breakdown, and message queue depth patterns over time.
Standard OTel trace context propagation uses HTTP headers (traceparent, tracestate). DIDComm messages travel over multiple transports and may not carry standard HTTP headers. Injecting W3C trace context into DIDComm message decorators bridges traces across independent agent instances.
Target: acapy_agent/transport/outbound/manager.py
from opentelemetry import trace
from opentelemetry.propagate import inject
tracer = trace.get_tracer("acapy_agent.transport")
async def send_outbound_message(self, context, message, endpoint):
with tracer.start_as_current_span("transport.outbound") as span:
span.set_attribute("endpoint", endpoint.uri if endpoint else "unknown")
# Inject W3C trace context into DIDComm message decorator
trace_headers = {}
inject(trace_headers)
if trace_headers:
message.set_decorator("~trace", trace_headers)
await self._outbound_transport.handle_message(context, message, endpoint)On the receiving end, extract the context before dispatching:
from opentelemetry.propagate import extract
async def receive_inbound_message(self, message_body, transport_type):
# Extract trace context from DIDComm decorator if present
trace_decorator = message_body.get("~trace", {})
ctx = extract(trace_decorator) if trace_decorator else None
with tracer.start_as_current_span("transport.inbound", context=ctx) as span:
span.set_attribute("transport", transport_type)
await self.queue_message(message_body)Value: Stitches together a single distributed trace that follows a credential issuance flow across the Holder agent → Issuer agent boundary — making cross-agent latency visible as a single trace in Tempo rather than two disconnected traces.
Issues to be created when operational need is identified.
- Authentication handshake spans in
oidc-controller/api/core/oidc/provider.py(token signing, OIDC cryptographic operations) - SSE stream lifetime tracing in
oidc-controller/api/routers/sse.py
- Reservation state machine transitions in
traction_innkeeper/v1_0/innkeeper/tenant_manager.py(pending → approved → active) - W3C
traceparentinjection via Axios interceptors inservices/tenant-ui/src/routes/router.ts
- Webhook lifecycle spans in
endorser/api/services/webhook_handlers.py -
endorse_transactioncustom attributes (Schema ID, Transaction ID, Connection ID) inendorser/api/services/endorse.py
- SCID generation + log verification spans in
server/app/plugins/didwebvh.py - Background task monitoring in
server/app/tasks.pyvia@tracer.start_as_current_spandecorator - Askar secure storage latency in
server/app/plugins/askar.py
-
Dispatcher.dispatch()spans for DIDComm message hop tracking - Indy/Cheqd VDR module spans to isolate public ledger network latency
- Key generation, signing, and packing operations
- Queue contention spans in
PostgresMessagePickupRepository.tsaroundtakeFromQueueandaddMessage - Firebase Push Notification dispatch tracing in
PushNotificationsFcmService.ts
- Traction API wrapper spans in
server/src/utils/tractionHelper.ts - Socket.io real-time credential sync spans in
server/src/index.ts -
CredentialControllerbusiness metric attributes (showcase.name,credential.type)