feat: Add Workflow Analytics Dashboards with OpenSearch integration#229
feat: Add Workflow Analytics Dashboards with OpenSearch integration#229
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 42044b8c24
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
0284482 to
8c83d0b
Compare
7afae76 to
bd71e89
Compare
Question: Scope of User StoriesLooking at
However,
Question: Are all 15 user stories expected to be completed in this PR, or is US-013 (the backend API) intentionally deferred to a later PR? If it's expected to be complete, the frontend may need to be wired up to the backend endpoints that were reportedly implemented. |
30b1504 to
5d92c8d
Compare
This was a relic from the hottest trend in AI, of |
b8d9c3a to
bd98d61
Compare
- Add nginx reverse proxy for unified entry point at http://localhost - Routes: / (frontend), /api (backend), /analytics (OpenSearch Dashboards) - Configure OpenSearch Dashboards with /analytics base path - Add production deployment with TLS and security plugin - SaaS multitenancy with per-customer tenant isolation - Certificate generation script (just generate-certs) - New commands: just dev, just prod-secure Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
- Add STALE status for orphaned run records (DB/Temporal mismatch) - Improve status inference from trace events when Temporal not found - Use correct TraceEventType values for status detection - Add amber badge color for STALE status - Extract WorkflowNode into modular directory structure - Document all execution statuses with transition diagram Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
…gration
Analytics Sink Component (core.analytics.sink):
- Index output data from any upstream node to OpenSearch
- Auto-detect asset correlation keys (host, domain, url, ip, etc.)
- Fire-and-forget with retry logic (3 attempts, exponential backoff)
- Configurable index suffix and fail-on-error modes
OpenSearch Integration:
- Daily index rotation: security-findings-{orgId}-{YYYY.MM.DD}
- Index template with standard metadata fields
- Multi-tenant data isolation per organization
Analytics API:
- POST /api/v1/analytics/query with OpenSearch DSL support
- Auto-scope queries to organization's index pattern
- Rate limiting: 100 req/min per user
- Protected routes require authentication
- Session cookie support for analytics route auth
UI Integration:
- Analytics Settings page with tier-based retention
- Dashboards link in sidebar (opens in new tab)
- View Analytics button uses Discover app with proper URL state
- Uses .keyword fields for exact match filtering
Component SDK Extensions:
- generateFindingHash() for deduplication
- Workflow context (workflowId, workflowName, organizationId)
- Results output port on nuclei, trufflehog, supabase-scanner
- Support for optional inputs in components
Bug fixes:
- Fix webhook URLs to include global API prefix (ENG-115)
- Add proper connectionType for list variable types
- Handle invalid_value errors for placeholder fields
Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
…ovisioning Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
Document the OpenSearch tenant identity resolution flow, Clerk active org session vs membership distinction, tenant provisioning details, and security guarantees. Add troubleshooting entry for workspace-user fallback with screenshots and diagnostic commands. Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
…objects Two-layer SaaS lockdown for OpenSearch Dashboards: 1. nginx whitelist: PCRE negative lookahead blocks non-whitelisted /analytics/app/* routes (returns 403). Allowed: Discover, Visualize, Dashboards, Alerting, Dev Tools, Data Explorer, Home. Blocked: ISM, Security, Management, Anomaly Detection, Maps, etc. Admin retains full access via direct Dashboards port (5601). 2. Role permissions: Replace ISM cluster permissions with Alerting permissions (monitor CRUD, alerts, destinations) for tenant roles. Add indices:data/write/bulk cluster permission required for Dashboards saved objects (visualizations, dashboards, saved searches). Without this, multitenancy's kibana_all_write grant is never reached. 3. Default landing page set to Discover instead of Home (which exposes all plugin links including blocked ones). Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
…e in prod Base compose configs (infra.yml, full.yml) now use `expose` instead of `ports` for all internal services. Dev-ports overlay binds everything to 127.0.0.1. Only nginx port 80 remains publicly accessible. Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
…ermission Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
- Merge nginx.full.conf into nginx.prod.conf (95% identical, prod has better proxy_redirect) - Consolidate DB init scripts: merge temporal DB creation into 01-create-instance-databases.sh - Remove orphaned scripts: dev-instance-manager.sh, instance-bootstrap.sh (unreferenced) - Remove deprecated opensearch-security/whitelist.yml (superseded by allowlist.yml) - Update docker-compose.full.yml and docs to reference nginx.prod.conf Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
The AnalyticsModule's controller and services depend on ConfigService and OpenSearchClient which aren't available in the MCP test module. Use overrideModule to replace the entire AnalyticsModule with mocks. Also add explicit ConfigModule import to AnalyticsModule. Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
000fe03 to
2918958
Compare
- Fix PM2 --only filter to use instance-suffixed names (shipsec-backend-0) - Fix Kafka broker port from 19092 to 9092 (matches single-listener Redpanda) - Add whitelist.yml required by securityadmin.sh alongside allowlist.yml Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
Summary
This PR adds a Security Analytics platform to ShipSec Studio that enables users to index workflow output data into OpenSearch and visualize it through dashboards. It also includes multi-tenant security, a unified developer experience, and component SDK improvements.
Key Features
Analytics Sink Component: New workflow node (
core.analytics.sink) that indexes output data from any upstream node to OpenSearchOpenSearch Integration:
security-findings-{orgId}-{YYYY.MM.DD}Multi-Tenant OpenSearch Security:
Analytics API:
POST /api/v1/analytics/queryendpoint supporting OpenSearch DSLUI Integration:
Nginx Reverse Proxy:
http://localhost/(frontend),/api(backend),/analytics(OpenSearch Dashboards)Unified
just devcommand:CLERK_SECRET_KEYinbackend/.envdev-insecurecommandWorkflow Status Improvements:
STALEstatus for orphaned run records (DB/Temporal mismatch)Component SDK Extensions:
generateFindingHash()utility for deduplicationCommands
Test Results
Justfile + OSD Verification Matrix
just dev— local auth (no Clerk)http://localhost/analytics(nginx)security-findings-*pre-created.just dev— secure mode (Clerk)http://localhost/analytics(nginx)just prod— standard (no certs)http://localhost/analytics(nginx)just prod— secure (with certs)just dev stopjust prod stopEnd-to-End Workflow Analytics Test (Secure Mode)
hackerone.comAutomated Checks
bun typecheckbun lintBug Found & Fixed During Testing
just devcrashes whenCLERK_SECRET_KEYcommented outgrepreturns exit code 1 underset -euo pipefail|| trueto grep pipeline (justfile line 50)Screenshots
PR #229 — Workflow Analytics Dashboards: File Journey Walkthrough
Stage 1: Developer Starts the App (
just dev)Auth Mode Auto-Detection
When a developer runs
just dev, the justfile is the entry point. It readsbackend/.envand checks whetherCLERK_SECRET_KEYis set:Files involved:
justfilepm2.config.cjsOPENSEARCH_SECURITY_ENABLED=true|falsefrom the shell env set by justfilebackend/.env.exampleworker/.env.examplefrontend/.env.exampleVITE_CLERK_PUBLISHABLE_KEY,VITE_OPENSEARCH_DASHBOARDS_URLInfrastructure Boots Up
The justfile composes Docker files depending on the mode:
docker-compose.infra.yml(base services) +docker-compose.dev-ports.yml(expose ports for host-based PM2)docker-compose.dev-secure.yml(TLS, security plugin, proxy auth)Files involved:
docker/docker-compose.infra.ymldocker/docker-compose.dev-ports.ymldocker/docker-compose.dev-secure.ymlopensearch-dashboards.prod.yml(with proxy auth settings), swaps OpenSearch entrypoint todocker-entrypoint-security.shdocker/docker-compose.full.ymldocker/docker-compose.prod.ymldocker/certs/.gitignoredocker/scripts/generate-certs.shOpenSearch Security Bootstrap (Secure Mode Only)
When the security overlay is active, OpenSearch starts with a custom entrypoint that templates the proxy auth config and runs
securityadmin.sh:docker/opensearch-security/docker-entrypoint-security.sh__INTERNAL_PROXIES__placeholder inconfig.ymlwith actual Docker network CIDR regex(172|192|10)\.\d+\.\d+\.\d+, then runssecurityadmin.shin the background (with a marker file to skip on restarts), finally execs the real OpenSearch entrypointdocker/opensearch-security/config.ymlx-proxy-userandx-proxy-rolesheaders from nginx), and a basic auth fallback for admin API accessdocker/opensearch-security/roles.ymladmin,dashboards_readwrite, andcustomer_template_ro(a template for per-tenant read-only roles that includeindices:data/write/bulk— critical for Dashboards saved objects)docker/opensearch-security/roles_mapping.ymladmin→ admin user,dashboards_readwrite→ dashboards_serverdocker/opensearch-security/internal_users.ymladmin(full access) anddashboards_server(for Dashboards → OpenSearch communication)docker/opensearch-security/tenants.ymlglobal_tenant(base); per-customer tenants are created dynamically at runtimedocker/opensearch-security/action_groups.ymldocker/opensearch-security/audit.ymldocker/opensearch-security/allowlist.ymldocker/opensearch-security/whitelist.ymldocker/opensearch-security/nodes_dn.ymldocker/scripts/security-init.shdocker/scripts/hash-password.shinternal_users.ymlOpenSearch Dashboards Custom Image
Dashboards uses a custom Dockerfile to remove plugins that could let SaaS tenants escape their sandbox:
docker/opensearch-dashboards.Dockerfileenabledconfig schema — settingplugin.enabled: falsecauses a fatalUnknown configuration keyerror. Must physically remove them.docker/opensearch-dashboards.ymlserver.basePath: "/analytics",rewriteBasePath: true, default route to Discoverdocker/opensearch-dashboards.prod.ymlrequestHeadersAllowlist: ["securitytenant", "Authorization", "x-forwarded-for"](x-forwarded-for is critical for proxy auth), disables security UI (opensearch_security.readonly_mode.roles: [customer_*])docker/opensearch-init.shsecurity-findings-*index pattern in Dashboards so Discover works immediately. In secure mode, skips this — index patterns are created per-tenant on first access.Stage 2: User Logs In
Frontend Auth Provider Selection
When the browser loads
http://localhost(via nginx), the frontend determines which auth mode to use:frontend/src/auth/AuthProvider.tsxVITE_AUTH_PROVIDERenv var → Clerk key availability → defaults. In dev mode defaults tolocalunless Clerk is explicitly configured.LocalAuthProviderstores admin credentials in Zustand, creates abasic-{base64}token, and setsshipsec_sessioncookie via the backend login endpoint.ClerkAuthProviderdelegates to Clerk's session management.frontend/src/components/auth/AdminLoginForm.tsx/api/v1/auth/loginfrontend/src/config/env.tsVITE_OPENSEARCH_DASHBOARDS_URL(controls whether the Dashboards sidebar link appears) andVITE_AUTH_PROVIDERBackend Auth Validation
backend/src/auth/providers/clerk-auth.provider.tsorganizationIdfrom Clerk's org membership. The org ID becomes the tenant key for all analytics scoping.backend/src/auth/providers/local-auth.provider.tsAuthContextwithorganizationId: 'local-dev'backend/src/auth/session.utils.tscreateSessionToken()→ HMAC-SHA256 signed{username, ts}.signatureencoded as base64.verifySessionToken()→ timing-safe comparison to prevent timing attacks, 7-day TTL.backend/src/app.controller.tsPOST /auth/login): validates credentials, callscreateSessionToken(), sets HTTP-onlyshipsec_sessioncookie. Logout endpoint (POST /auth/logout): clears cookie.backend/src/main.tsStage 3: User Sees the Dashboard Sidebar
Once authenticated, the user lands on the main app. The sidebar shows a "Dashboards" link if the env var is configured:
frontend/src/components/layout/AppLayout.tsxVITE_OPENSEARCH_DASHBOARDS_URL(typically/analytics/app/discover). Marked asexternal: trueso it opens in the same window but via the nginx proxy. Also adds an "Analytics Settings" link under the Settings section.frontend/src/components/layout/AppTopBar.tsxfrontend/src/components/layout/TopBar.tsxfrontend/src/pages/AnalyticsSettingsPage.tsxfrontend/src/App.tsx/settings/analyticsroute pointing toAnalyticsSettingsPageStage 4: User Builds a Workflow with Analytics
The Analytics Sink Component
Users can add an "Analytics Sink" node to any workflow. It collects output from upstream scanner nodes and indexes it to OpenSearch.
worker/src/components/core/analytics-sink.tscore.analytics.sink): accepts multiple configurable data inputs (each with a label and sourceTag), aggregates documents from all inputs, validates workflow context (orgId, workflowId, workflowName are required), then calls the indexer. Supports two modes: lenient (default, fire-and-forget, skips missing inputs) and strict (fails on any error).frontend/src/components/workflow/AnalyticsInputsEditor.tsxsourceTagfrom label names for filtering in Dashboards.worker/src/components/index.tsanalytics-sinkin the component registrybackend/src/dsl/validator.tsComponent SDK Extensions
The component SDK was extended to support analytics:
packages/component-sdk/src/analytics.tsanalyticsResultSchema()— Zod schema contract for indexed documents (scanner, finding_hash, severity, asset_key).generateFindingHash()— creates stable 16-char SHA-256 dedup keys from field values.packages/component-sdk/src/context.tsExecutionContextnow includesworkflowId,workflowName, andorganizationId— these are passed from the backend through Temporal to every component execution.packages/component-sdk/src/types.tsExecutionContextinterface with the new fieldspackages/component-sdk/src/index.tsWorkflow Context Injection
For analytics to work, the backend must pass org/workflow identity through to the worker:
backend/src/workflows/workflows.service.tsorganizationIdfrom the authenticated user's context and passes it (along withworkflowId,workflowName) to the Temporal workflow input. This is how the worker knows which org's index to write to.backend/src/workflows/workflows.controller.tsAuthContextto the service layer so org ID is availableStage 5: Workflow Runs → Data Gets Indexed
When a workflow runs, the Analytics Sink component calls the OpenSearch indexer:
worker/src/utils/opensearch-indexer.ts(not in PR file list but referenced)bulkIndex(): (1) ensures tenant is provisioned by callingPOST /api/v1/analytics/ensure-tenantwith the internal service token, (2) builds index namesecurity-findings-{orgId}-{YYYY.MM.DD}, (3) enriches each document with@timestampandshipsecmetadata block (org_id, workflow_id, run_id, component_id, asset_key), (4) sends bulk request with 3x retry + exponential backoff, (5) reports partial failures.worker/package.json@opensearch-project/opensearchdependencyBackend Analytics API
The backend exposes endpoints for both the worker (internal) and the frontend (user-facing):
backend/src/analytics/analytics.module.tsOpenSearchModule, providesSecurityAnalyticsService,OpenSearchTenantService,OrganizationSettingsServicebackend/src/analytics/analytics.controller.tsPOST /analytics/query(user-facing, auto-scoped to org's index pattern, rate-limited 100 req/min),GET /analytics/settings+PUT /analytics/settings(retention config),POST /analytics/ensure-tenant(internal, validatesX-Internal-Token, idempotent tenant provisioning)backend/src/analytics/security-analytics.service.tsquery()— builds OpenSearch query scoped tosecurity-findings-{orgId}-*, preventing cross-tenant data access at the application layerbackend/src/analytics/dto/analytics-query.dto.tsbackend/src/analytics/dto/analytics-settings.dto.tsbackend/src/analytics/organization-settings.service.tsensureTenantExists()on first accessbackend/src/app.module.tsAnalyticsModulein the appOpenSearch Configuration
backend/src/config/opensearch.config.tsOPENSEARCH_URL,OPENSEARCH_USERNAME,OPENSEARCH_PASSWORDfrom envbackend/src/config/opensearch.client.tsbackend/src/config/opensearch.module.tsOpenSearchClientacross the appbackend/scripts/setup-opensearch.tsDatabase Schema
backend/src/database/schema/organization-settings.tsorganization_settingstable: org_id (PK), subscription_tier, retention_days, timestampsbackend/src/database/schema/index.tsbackend/src/database/migration.guard.tsStage 6: User Clicks "Dashboards" → nginx Auth Gateway
This is where it all comes together. When a user clicks the "Dashboards" link, the browser navigates to
/analytics/app/discover. Here's the request flow:Step 1: nginx intercepts
/analytics/*nginx's
auth_requestdirective fires an internal subrequest:Step 2: Backend validates the session
The backend reads the
shipsec_sessioncookie (or Clerk token), verifies it, and returns org identity in response headers:Step 3: nginx injects tenant isolation headers
nginx captures these headers and sets proxy auth headers before forwarding to Dashboards:
Step 4: OpenSearch Security enforces isolation
The security plugin reads these headers (via proxy auth config), maps the role
customer_acme-corp_roto index patternsecurity-findings-acme-corp-*, and restricts all queries to that namespace.Files involved:
docker/nginx/nginx.dev.conf/_authinternal location proxies to backend/api/v1/auth/validate./analytics/location usesauth_request /_auth, captures$auth_org_idfrom response headers, setsx-proxy-user,x-proxy-roles,securitytenantheaders, proxies toopensearch-dashboards:5601. Critical gotcha:proxy_set_headerin a location block OVERRIDES all parent-level headers — must repeatHost,X-Forwarded-Foretc.docker/nginx/nginx.full.confif ($auth_org_id = "") { return 403; }), upstreams point to container names instead ofhost.docker.internaldocker/nginx/nginx.prod.confbackend/src/app.controller.ts/auth/validateendpoint: validates session, setsX-Auth-Organization-Idheader, triggers fire-and-forget tenant provisioning viaMap<string, Promise<boolean>>(concurrent requests share the same in-flight promise; failed provisioning is removed from cache to allow retry)backend/src/analytics/opensearch-tenant.service.tsensureTenantExists()— the 6-step provisioning sequence: (1) create OpenSearch Security tenant, (2) create customer read-only role withindices:data/write/bulkfor saved objects, (3) create role mapping, (4) create index template with field mappings, (5) create seed index (so Dashboards can resolve fields before real data arrives), (6) create index pattern in Dashboards API. All steps are idempotent with 3x retry + exponential backoff.Stage 7: Workflow Status & Execution Tracking
packages/shared/src/execution.tsSTALEworkflow status for orphaned run records (DB says running but Temporal has no matching workflow — detected during status sync)frontend/src/store/runStore.tsSTALEstatus in the run storefrontend/src/utils/statusBadgeStyles.tsSTALEstatus (grey/warning appearance)frontend/src/features/workflow-builder/WorkflowBuilder.tsxfrontend/src/features/workflow-builder/hooks/useWorkflowImportExport.tsfrontend/src/vite.config.ts/apiand/analyticsin dev mode so the Vite dev server forwards correctlyStage 8: Testing & Documentation
E2E Test
e2e-tests/analytics.test.tssign_in_tokensAPI → imports a Subfinder workflow with Analytics Sink → runs it againsthackerone.com→ waits for 32 docs to appear in OpenSearch → verifiesshipsecmetadata fields → checks Dashboards index pattern existsDocumentation
docs/analytics.mddocs/development/workflow-analytics.mdxdocs/development/analytics.mdxdocs/development/component-development.mdxdocs/components/core.mdxcore.analytics.sinkdocs/installation.mdxjust devmodesdocs/workflows/execution-status.mdSTALEstatusdocs/docs.jsondocs/media/clerk-user-local-org.pngdocs/media/clerk-user-test-org.pngdocs/media/opensearch-tenant-org-id.pngdocs/media/opensearch-tenant-workspace-fallback.pngdocker/README.mddocker/PRODUCTION.mddocker/SECURE-DEV-MODE.md.ai/analytics-output-port-design.mdSecurity Component Test Fixes
worker/src/components/security/__tests__/dnsx.test.tsworker/src/components/security/__tests__/httpx.test.tsOther
Dockerfilebackend/package.json@opensearch-project/opensearch,cookie-parserbun.lockArchitecture Summary
Security Model (Defense in Depth)
opensearch-dashboards.Dockerfilenginx.dev.conf,nginx.full.confsession.utils.ts,app.controller.tsroles.yml,config.yml,opensearch-tenant.service.tssecurity-analytics.service.tsKey Design Decisions
Fire-and-forget provisioning — Tenant setup happens async after auth validation returns 200. Uses
Map<string, Promise<boolean>>so concurrent requests share the same in-flight promise. Failed provisioning is removed from cache to allow retry.Seed indices — Index patterns in Dashboards need at least one backing index to resolve field types. A seed index with explicit mappings is created during provisioning so
@timestampcolumn is available before any real data arrives.indices:data/write/bulkat cluster level — Thecluster_composite_ops_roaction group does NOT include bulk write. Without explicitindices:data/write/bulkin cluster_permissions, the multitenancy plugin'skibana_all_writeindex-level grant is never reached, causing 403 on all.kibana_*saves (column preferences, default index pattern, etc.).Plugin removal via Dockerfile — OSD 2.x plugins that don't register an
enabledconfig schema cause fatal errors when you trypluginId.enabled: false. The only safe path is physical removal at the Docker image level.nginx header inheritance — A
proxy_set_headerin any location block OVERRIDES ALL parent-levelproxy_set_headerdirectives. The/analytics/block must repeat standard headers (Host,X-Forwarded-For) alongside the custom proxy auth headers.