Skip to content

feat: Add Workflow Analytics Dashboards with OpenSearch integration#229

Open
LuD1161 wants to merge 14 commits intomainfrom
eng-42/workflow-analytics-dashboards
Open

feat: Add Workflow Analytics Dashboards with OpenSearch integration#229
LuD1161 wants to merge 14 commits intomainfrom
eng-42/workflow-analytics-dashboards

Conversation

@LuD1161
Copy link
Contributor

@LuD1161 LuD1161 commented Jan 22, 2026

Summary

This PR adds a Security Analytics platform to ShipSec Studio that enables users to index workflow output data into OpenSearch and visualize it through dashboards. It also includes multi-tenant security, a unified developer experience, and component SDK improvements.

Key Features

  • Analytics Sink Component: New workflow node (core.analytics.sink) that indexes output data from any upstream node to OpenSearch

    • Supports array and object inputs with automatic bulk indexing
    • Auto-detects asset correlation keys (host, domain, subdomain, url, ip, etc.)
    • Configurable index suffix and fail-on-error modes
    • Fire-and-forget by default with retry logic (3 attempts with exponential backoff)
  • OpenSearch Integration:

    • Daily index rotation pattern: security-findings-{orgId}-{YYYY.MM.DD}
    • Index template with standard metadata fields
    • Multi-tenant data isolation per organization
  • Multi-Tenant OpenSearch Security:

    • TLS encryption for OpenSearch transport and HTTP layers
    • Security plugin with role-based access control
    • SaaS multitenancy with per-customer tenant isolation via proxy auth
    • Dynamic tenant provisioning with seed indices for Dashboards
    • Dashboards locked down for SaaS tenants (read-only, scoped to org data)
    • ISM policy permissions for automated index lifecycle management
  • Analytics API:

    • POST /api/v1/analytics/query endpoint supporting OpenSearch DSL
    • Auto-scopes queries to organization's index pattern
    • Rate limiting: 100 requests/minute per user
  • UI Integration:

    • "Dashboards" link in sidebar (opens OpenSearch Dashboards in new tab)
    • "Analytics Settings" page for tier-based retention configuration
    • "View Analytics" button on workflow detail page
  • Nginx Reverse Proxy:

    • Unified entry point at http://localhost
    • Routes: / (frontend), /api (backend), /analytics (OpenSearch Dashboards)
  • Unified just dev command:

    • Auto-detects auth mode from CLERK_SECRET_KEY in backend/.env
    • If Clerk creds present → secure mode (Clerk auth + OpenSearch Security + nginx)
    • If no Clerk creds → local auth mode (simpler, faster startup, analytics still work)
    • No need for separate dev-insecure command
  • Workflow Status Improvements:

    • New STALE status for orphaned run records (DB/Temporal mismatch)
    • Improved status inference from trace events
  • Component SDK Extensions:

    • generateFindingHash() utility for deduplication
    • Workflow context (workflowId, workflowName, organizationId) passed to components
    • Results output port added to nuclei, trufflehog, and supabase-scanner
    • Support for optional inputs in components
    • Dynamic Args Pattern for amass and subfinder

Commands

just dev              # Start dev (auto-detects: Clerk creds → secure mode, otherwise local auth)
just dev stop         # Stop everything
just dev clean        # Stop and remove all data
just prod             # Start production (auto-detects: TLS certs → secure mode)
just prod stop        # Stop production
just generate-certs   # Generate TLS certificates for production

Test Results

Justfile + OSD Verification Matrix

# Scenario Justfile OSD Access OSD Result Notes
1 just dev — local auth (no Clerk) PASS http://localhost/analytics (nginx) PASS Session cookie auth via nginx. Index pattern security-findings-* pre-created.
2 just dev — secure mode (Clerk) PASS http://localhost/analytics (nginx) PASS Proxy auth + tenant isolation. 32 hits in Discover after workflow run.
3 just prod — standard (no certs) PASS http://localhost/analytics (nginx) 403 (known) Login endpoint not in pre-built Docker image yet.
4 just prod — secure (with certs) PASS OSD returns 401 Expected Infra-only mode. Needs app layer for proxy auth.
5 just dev stop PASS Clean PM2 + Docker shutdown.
6 just prod stop PASS Clean Docker shutdown.

End-to-End Workflow Analytics Test (Secure Mode)

Step Result
Clerk auth via sign_in_tokens API PASS
Import Subfinder workflow (3 nodes) PASS
Run workflow with hackerone.com PASS (9.9s)
Analytics Sink indexes 32 docs PASS
OSD Discover shows 32 hits with filters PASS
Sidebar "Dashboards" link opens OSD PASS

Automated Checks

Check Result
bun typecheck PASS (0 errors)
bun lint PASS
Unit tests PASS
DCO (Signed-off-by) PASS

Bug Found & Fixed During Testing

Bug Root Cause Fix
just dev crashes when CLERK_SECRET_KEY commented out grep returns exit code 1 under set -euo pipefail Added || true to grep pipeline (justfile line 50)

Screenshots

image image

PR #229 — Workflow Analytics Dashboards: File Journey Walkthrough

A reviewer's guide to how every file in this PR connects — told as the story of a user going from just dev to seeing security findings in OpenSearch Dashboards.


Stage 1: Developer Starts the App (just dev)

Auth Mode Auto-Detection

When a developer runs just dev, the justfile is the entry point. It reads backend/.env and checks whether CLERK_SECRET_KEY is set:

CLERK_KEY=$(grep -E '^CLERK_SECRET_KEY=' backend/.env | cut -d= -f2- ... || true)
  • Clerk key presentSecure mode: Clerk auth + OpenSearch Security plugin + TLS + multi-tenant isolation
  • Clerk key absentLocal auth mode: simpler startup, admin/password login, no multitenancy

Files involved:

File Role
justfile Orchestrates everything — detects auth mode, composes Docker files, starts PM2
pm2.config.cjs Defines backend/frontend/worker processes; passes OPENSEARCH_SECURITY_ENABLED=true|false from the shell env set by justfile
backend/.env.example Documents all env vars (OpenSearch creds, Clerk keys, session secret)
worker/.env.example Documents worker-specific vars (OpenSearch URL, internal service token)
frontend/.env.example Documents VITE_CLERK_PUBLISHABLE_KEY, VITE_OPENSEARCH_DASHBOARDS_URL

Infrastructure Boots Up

The justfile composes Docker files depending on the mode:

  • Always: docker-compose.infra.yml (base services) + docker-compose.dev-ports.yml (expose ports for host-based PM2)
  • Secure mode adds: docker-compose.dev-secure.yml (TLS, security plugin, proxy auth)

Files involved:

File Role
docker/docker-compose.infra.yml Core services: PostgreSQL, Temporal, Redis, OpenSearch (security disabled), Dashboards (custom Dockerfile), nginx (port 80)
docker/docker-compose.dev-ports.yml Exposes container ports to host so PM2-based backend/frontend/worker can reach them
docker/docker-compose.dev-secure.yml Secure overlay — enables OpenSearch security plugin, mounts TLS certs, mounts security config, overrides Dashboards to use opensearch-dashboards.prod.yml (with proxy auth settings), swaps OpenSearch entrypoint to docker-entrypoint-security.sh
docker/docker-compose.full.yml Production all-in-one: runs backend/frontend/worker as containers too (not used in dev, but shares the same nginx/OpenSearch patterns)
docker/docker-compose.prod.yml Production overlay with TLS termination
docker/certs/.gitignore Keeps generated TLS certs out of version control
docker/scripts/generate-certs.sh Generates root CA + node + admin TLS certificates for OpenSearch

OpenSearch Security Bootstrap (Secure Mode Only)

When the security overlay is active, OpenSearch starts with a custom entrypoint that templates the proxy auth config and runs securityadmin.sh:

File Role
docker/opensearch-security/docker-entrypoint-security.sh First-boot bootstrap: replaces __INTERNAL_PROXIES__ placeholder in config.yml with actual Docker network CIDR regex (172|192|10)\.\d+\.\d+\.\d+, then runs securityadmin.sh in the background (with a marker file to skip on restarts), finally execs the real OpenSearch entrypoint
docker/opensearch-security/config.yml Configures the security plugin: enables XFF (X-Forwarded-For) parsing, defines proxy auth domain (reads x-proxy-user and x-proxy-roles headers from nginx), and a basic auth fallback for admin API access
docker/opensearch-security/roles.yml Defines RBAC roles: admin, dashboards_readwrite, and customer_template_ro (a template for per-tenant read-only roles that include indices:data/write/bulk — critical for Dashboards saved objects)
docker/opensearch-security/roles_mapping.yml Maps roles to users/backend_roles: admin → admin user, dashboards_readwrite → dashboards_server
docker/opensearch-security/internal_users.yml Defines built-in users: admin (full access) and dashboards_server (for Dashboards → OpenSearch communication)
docker/opensearch-security/tenants.yml Declares the global_tenant (base); per-customer tenants are created dynamically at runtime
docker/opensearch-security/action_groups.yml Custom action groups for fine-grained permissions
docker/opensearch-security/audit.yml Audit logging config (disabled in dev for performance)
docker/opensearch-security/allowlist.yml API allowlist config
docker/opensearch-security/whitelist.yml Legacy whitelist (kept for compatibility)
docker/opensearch-security/nodes_dn.yml Node distinguished names for TLS certificate validation
docker/scripts/security-init.sh Standalone security init script (alternative to entrypoint-based init)
docker/scripts/hash-password.sh Utility to hash passwords for internal_users.yml

OpenSearch Dashboards Custom Image

Dashboards uses a custom Dockerfile to remove plugins that could let SaaS tenants escape their sandbox:

File Role
docker/opensearch-dashboards.Dockerfile Removes 9 plugins: queryWorkbench, reports, anomalyDetection, customImportMap, securityAnalytics, searchRelevance, mlCommons, indexManagement, observability. Why not config? OSD 2.x plugins don't register an enabled config schema — setting plugin.enabled: false causes a fatal Unknown configuration key error. Must physically remove them.
docker/opensearch-dashboards.yml Base Dashboards config: server.basePath: "/analytics", rewriteBasePath: true, default route to Discover
docker/opensearch-dashboards.prod.yml Secure Dashboards config: adds requestHeadersAllowlist: ["securitytenant", "Authorization", "x-forwarded-for"] (x-forwarded-for is critical for proxy auth), disables security UI (opensearch_security.readonly_mode.roles: [customer_*])
docker/opensearch-init.sh Post-boot initialization: In insecure mode, creates a global security-findings-* index pattern in Dashboards so Discover works immediately. In secure mode, skips this — index patterns are created per-tenant on first access.

Stage 2: User Logs In

Frontend Auth Provider Selection

When the browser loads http://localhost (via nginx), the frontend determines which auth mode to use:

File Role
frontend/src/auth/AuthProvider.tsx Auth mode selection: checks VITE_AUTH_PROVIDER env var → Clerk key availability → defaults. In dev mode defaults to local unless Clerk is explicitly configured. LocalAuthProvider stores admin credentials in Zustand, creates a basic-{base64} token, and sets shipsec_session cookie via the backend login endpoint. ClerkAuthProvider delegates to Clerk's session management.
frontend/src/components/auth/AdminLoginForm.tsx The login form for local auth mode — username/password fields that POST to /api/v1/auth/login
frontend/src/config/env.ts Exposes VITE_OPENSEARCH_DASHBOARDS_URL (controls whether the Dashboards sidebar link appears) and VITE_AUTH_PROVIDER

Backend Auth Validation

File Role
backend/src/auth/providers/clerk-auth.provider.ts Clerk auth: validates Clerk session tokens, extracts organizationId from Clerk's org membership. The org ID becomes the tenant key for all analytics scoping.
backend/src/auth/providers/local-auth.provider.ts Local auth: validates Basic auth credentials against configured admin user, returns a synthetic AuthContext with organizationId: 'local-dev'
backend/src/auth/session.utils.ts Session token management: createSessionToken() → HMAC-SHA256 signed {username, ts}.signature encoded as base64. verifySessionToken() → timing-safe comparison to prevent timing attacks, 7-day TTL.
backend/src/app.controller.ts Login endpoint (POST /auth/login): validates credentials, calls createSessionToken(), sets HTTP-only shipsec_session cookie. Logout endpoint (POST /auth/logout): clears cookie.
backend/src/main.ts Enables cookie-parser middleware (required for session cookie handling), configures CORS

Stage 3: User Sees the Dashboard Sidebar

Once authenticated, the user lands on the main app. The sidebar shows a "Dashboards" link if the env var is configured:

File Role
frontend/src/components/layout/AppLayout.tsx Sidebar: conditionally renders a "Dashboards" link (BarChart3 icon) pointing to VITE_OPENSEARCH_DASHBOARDS_URL (typically /analytics/app/discover). Marked as external: true so it opens in the same window but via the nginx proxy. Also adds an "Analytics Settings" link under the Settings section.
frontend/src/components/layout/AppTopBar.tsx Top bar — shows org context
frontend/src/components/layout/TopBar.tsx Workflow-level top bar — adds a "View Analytics" button that links to Dashboards filtered by the current workflow
frontend/src/pages/AnalyticsSettingsPage.tsx Analytics Settings page: shows retention period configuration (30/90/180/365 days) based on subscription tier. Currently a UI scaffold — API integration planned for a future ticket.
frontend/src/App.tsx Registers the /settings/analytics route pointing to AnalyticsSettingsPage

Stage 4: User Builds a Workflow with Analytics

The Analytics Sink Component

Users can add an "Analytics Sink" node to any workflow. It collects output from upstream scanner nodes and indexes it to OpenSearch.

File Role
worker/src/components/core/analytics-sink.ts The core analytics component (core.analytics.sink): accepts multiple configurable data inputs (each with a label and sourceTag), aggregates documents from all inputs, validates workflow context (orgId, workflowId, workflowName are required), then calls the indexer. Supports two modes: lenient (default, fire-and-forget, skips missing inputs) and strict (fails on any error).
frontend/src/components/workflow/AnalyticsInputsEditor.tsx Config panel UI for the Analytics Sink: lets users add/remove/rename data inputs dynamically, auto-generates sourceTag from label names for filtering in Dashboards.
worker/src/components/index.ts Registers analytics-sink in the component registry
backend/src/dsl/validator.ts Validates workflow DSL — updated to allow the analytics sink's dynamic input ports

Component SDK Extensions

The component SDK was extended to support analytics:

File Role
packages/component-sdk/src/analytics.ts analyticsResultSchema() — Zod schema contract for indexed documents (scanner, finding_hash, severity, asset_key). generateFindingHash() — creates stable 16-char SHA-256 dedup keys from field values.
packages/component-sdk/src/context.ts ExecutionContext now includes workflowId, workflowName, and organizationId — these are passed from the backend through Temporal to every component execution.
packages/component-sdk/src/types.ts Updated ExecutionContext interface with the new fields
packages/component-sdk/src/index.ts Re-exports the new analytics utilities

Workflow Context Injection

For analytics to work, the backend must pass org/workflow identity through to the worker:

File Role
backend/src/workflows/workflows.service.ts When starting a workflow run, extracts organizationId from the authenticated user's context and passes it (along with workflowId, workflowName) to the Temporal workflow input. This is how the worker knows which org's index to write to.
backend/src/workflows/workflows.controller.ts Passes AuthContext to the service layer so org ID is available

Stage 5: Workflow Runs → Data Gets Indexed

When a workflow runs, the Analytics Sink component calls the OpenSearch indexer:

File Role
worker/src/utils/opensearch-indexer.ts (not in PR file list but referenced) The indexing engine — singleton OpenSearch client in the worker. bulkIndex(): (1) ensures tenant is provisioned by calling POST /api/v1/analytics/ensure-tenant with the internal service token, (2) builds index name security-findings-{orgId}-{YYYY.MM.DD}, (3) enriches each document with @timestamp and shipsec metadata block (org_id, workflow_id, run_id, component_id, asset_key), (4) sends bulk request with 3x retry + exponential backoff, (5) reports partial failures.
worker/package.json Adds @opensearch-project/opensearch dependency

Backend Analytics API

The backend exposes endpoints for both the worker (internal) and the frontend (user-facing):

File Role
backend/src/analytics/analytics.module.ts NestJS module wiring: imports OpenSearchModule, provides SecurityAnalyticsService, OpenSearchTenantService, OrganizationSettingsService
backend/src/analytics/analytics.controller.ts Endpoints: POST /analytics/query (user-facing, auto-scoped to org's index pattern, rate-limited 100 req/min), GET /analytics/settings + PUT /analytics/settings (retention config), POST /analytics/ensure-tenant (internal, validates X-Internal-Token, idempotent tenant provisioning)
backend/src/analytics/security-analytics.service.ts query() — builds OpenSearch query scoped to security-findings-{orgId}-*, preventing cross-tenant data access at the application layer
backend/src/analytics/dto/analytics-query.dto.ts Request/response DTOs for the query endpoint (supports OpenSearch DSL passthrough)
backend/src/analytics/dto/analytics-settings.dto.ts DTOs for analytics settings (tier, retention days)
backend/src/analytics/organization-settings.service.ts Manages per-org settings in PostgreSQL; triggers ensureTenantExists() on first access
backend/src/app.module.ts Registers the AnalyticsModule in the app

OpenSearch Configuration

File Role
backend/src/config/opensearch.config.ts NestJS config factory: reads OPENSEARCH_URL, OPENSEARCH_USERNAME, OPENSEARCH_PASSWORD from env
backend/src/config/opensearch.client.ts Injectable OpenSearch client wrapper — initializes with auth if credentials provided, skips TLS verification in dev
backend/src/config/opensearch.module.ts Global module that provides OpenSearchClient across the app
backend/scripts/setup-opensearch.ts Standalone script to manually bootstrap OpenSearch (index templates, seed data) — useful for debugging

Database Schema

File Role
backend/src/database/schema/organization-settings.ts Drizzle schema for organization_settings table: org_id (PK), subscription_tier, retention_days, timestamps
backend/src/database/schema/index.ts Re-exports the new schema
backend/src/database/migration.guard.ts Updated to handle the new table

Stage 6: User Clicks "Dashboards" → nginx Auth Gateway

This is where it all comes together. When a user clicks the "Dashboards" link, the browser navigates to /analytics/app/discover. Here's the request flow:

Step 1: nginx intercepts /analytics/*

Browser → nginx (port 80) → /analytics/app/discover

nginx's auth_request directive fires an internal subrequest:

nginx → /_auth → backend /api/v1/auth/validate

Step 2: Backend validates the session

The backend reads the shipsec_session cookie (or Clerk token), verifies it, and returns org identity in response headers:

X-Auth-Organization-Id: acme-corp
X-Auth-User-Id: user-123

Step 3: nginx injects tenant isolation headers

nginx captures these headers and sets proxy auth headers before forwarding to Dashboards:

x-proxy-user: acme-corp
x-proxy-roles: customer_acme-corp_ro
securitytenant: acme-corp

Step 4: OpenSearch Security enforces isolation

The security plugin reads these headers (via proxy auth config), maps the role customer_acme-corp_ro to index pattern security-findings-acme-corp-*, and restricts all queries to that namespace.

Files involved:

File Role
docker/nginx/nginx.dev.conf Dev routing: /_auth internal location proxies to backend /api/v1/auth/validate. /analytics/ location uses auth_request /_auth, captures $auth_org_id from response headers, sets x-proxy-user, x-proxy-roles, securitytenant headers, proxies to opensearch-dashboards:5601. Critical gotcha: proxy_set_header in a location block OVERRIDES all parent-level headers — must repeat Host, X-Forwarded-For etc.
docker/nginx/nginx.full.conf Production routing: same auth_request pattern, fail-closed (if ($auth_org_id = "") { return 403; }), upstreams point to container names instead of host.docker.internal
docker/nginx/nginx.prod.conf Standalone production routing with TLS termination
backend/src/app.controller.ts /auth/validate endpoint: validates session, sets X-Auth-Organization-Id header, triggers fire-and-forget tenant provisioning via Map<string, Promise<boolean>> (concurrent requests share the same in-flight promise; failed provisioning is removed from cache to allow retry)
backend/src/analytics/opensearch-tenant.service.ts ensureTenantExists() — the 6-step provisioning sequence: (1) create OpenSearch Security tenant, (2) create customer read-only role with indices:data/write/bulk for saved objects, (3) create role mapping, (4) create index template with field mappings, (5) create seed index (so Dashboards can resolve fields before real data arrives), (6) create index pattern in Dashboards API. All steps are idempotent with 3x retry + exponential backoff.

Stage 7: Workflow Status & Execution Tracking

File Role
packages/shared/src/execution.ts Adds STALE workflow status for orphaned run records (DB says running but Temporal has no matching workflow — detected during status sync)
frontend/src/store/runStore.ts Handles the new STALE status in the run store
frontend/src/utils/statusBadgeStyles.ts Adds badge styling for STALE status (grey/warning appearance)
frontend/src/features/workflow-builder/WorkflowBuilder.tsx Workflow builder updates to support analytics sink node configuration
frontend/src/features/workflow-builder/hooks/useWorkflowImportExport.ts Import/export handles the new analytics sink component
frontend/src/vite.config.ts Adds proxy rules for /api and /analytics in dev mode so the Vite dev server forwards correctly

Stage 8: Testing & Documentation

E2E Test

File Role
e2e-tests/analytics.test.ts Full end-to-end test: authenticates via Clerk sign_in_tokens API → imports a Subfinder workflow with Analytics Sink → runs it against hackerone.com → waits for 32 docs to appear in OpenSearch → verifies shipsec metadata fields → checks Dashboards index pattern exists

Documentation

File Role
docs/analytics.md Architecture overview: multi-tenant model, index naming, provisioning flow, troubleshooting
docs/development/workflow-analytics.mdx Developer guide: how to add Analytics Sink to workflows, index patterns, querying
docs/development/analytics.mdx Analytics development reference
docs/development/component-development.mdx Updated with analytics SDK utilities
docs/components/core.mdx Component catalog — documents core.analytics.sink
docs/installation.mdx Updated install docs with just dev modes
docs/workflows/execution-status.md Documents the new STALE status
docs/docs.json Docs navigation — adds analytics section
docs/media/clerk-user-local-org.png Screenshot: Clerk user with local org
docs/media/clerk-user-test-org.png Screenshot: Clerk user with test org
docs/media/opensearch-tenant-org-id.png Screenshot: OpenSearch tenant using org ID
docs/media/opensearch-tenant-workspace-fallback.png Screenshot: workspace fallback
docker/README.md Docker setup documentation
docker/PRODUCTION.md Production deployment guide
docker/SECURE-DEV-MODE.md Secure dev mode setup guide
.ai/analytics-output-port-design.md Design doc: how the analytics output port pattern was designed

Security Component Test Fixes

File Role
worker/src/components/security/__tests__/dnsx.test.ts Test fixture updated for new component SDK context fields
worker/src/components/security/__tests__/httpx.test.ts Same — test fixtures updated

Other

File Role
Dockerfile Updated for production build — includes OpenSearch client dependency
backend/package.json Adds @opensearch-project/opensearch, cookie-parser
bun.lock Lockfile updated with new dependencies

Architecture Summary

┌─────────────────────────────────────────────────────────────────┐
│                        just dev                                  │
│  Detects CLERK_SECRET_KEY → secure mode or local auth            │
│  Composes: infra.yml [+ dev-secure.yml] + dev-ports.yml          │
│  Starts PM2: frontend, backend, worker                           │
└──────────┬──────────────────────────────────────────────────────┘
           │
           ▼
┌──────────────────────┐     ┌──────────────────────────────────┐
│   nginx (port 80)    │     │     Frontend (Vite, port 5173)   │
│                      │     │                                  │
│  / → frontend        │◄────│  AuthProvider: Clerk or local    │
│  /api → backend      │     │  Sidebar: "Dashboards" link      │
│  /analytics → OSD    │     │  AnalyticsInputsEditor           │
│    ↓ auth_request    │     │  AnalyticsSettingsPage            │
│    → /_auth          │     └──────────────────────────────────┘
│    → backend/validate│
│    → set proxy hdrs  │     ┌──────────────────────────────────┐
│    → forward to OSD  │     │     Backend (NestJS, port 3211)  │
└──────────────────────┘     │                                  │
                              │  /auth/login, /auth/validate     │
                              │  /analytics/query (org-scoped)   │
                              │  /analytics/ensure-tenant        │
                              │  OpenSearchTenantService          │
                              │    → 6-step provisioning          │
                              └──────────────────────────────────┘

┌──────────────────────┐     ┌──────────────────────────────────┐
│  OpenSearch + OSD    │     │   Worker (Temporal)               │
│                      │     │                                  │
│  Security plugin:    │◄────│  analytics-sink component         │
│    proxy auth        │     │    → aggregates scanner data     │
│    per-tenant roles  │     │    → opensearch-indexer           │
│    index isolation   │     │      → bulkIndex() with retry    │
│                      │     │      → document enrichment       │
│  Index pattern:      │     │        (@timestamp, shipsec.*)   │
│  security-findings-  │     │      → tenant provisioning       │
│    {orgId}-{date}    │     │        (1-hour cache)            │
└──────────────────────┘     └──────────────────────────────────┘

Security Model (Defense in Depth)

Layer Mechanism Files
Image level Remove dangerous Dashboards plugins opensearch-dashboards.Dockerfile
Proxy level nginx auth_request + tenant header injection nginx.dev.conf, nginx.full.conf
Auth level Session token verification (HMAC-SHA256) session.utils.ts, app.controller.ts
Data level OpenSearch Security: per-tenant roles, index isolation roles.yml, config.yml, opensearch-tenant.service.ts
Application level Query endpoint scopes to org's index pattern security-analytics.service.ts

Key Design Decisions

  1. Fire-and-forget provisioning — Tenant setup happens async after auth validation returns 200. Uses Map<string, Promise<boolean>> so concurrent requests share the same in-flight promise. Failed provisioning is removed from cache to allow retry.

  2. Seed indices — Index patterns in Dashboards need at least one backing index to resolve field types. A seed index with explicit mappings is created during provisioning so @timestamp column is available before any real data arrives.

  3. indices:data/write/bulk at cluster level — The cluster_composite_ops_ro action group does NOT include bulk write. Without explicit indices:data/write/bulk in cluster_permissions, the multitenancy plugin's kibana_all_write index-level grant is never reached, causing 403 on all .kibana_* saves (column preferences, default index pattern, etc.).

  4. Plugin removal via Dockerfile — OSD 2.x plugins that don't register an enabled config schema cause fatal errors when you try pluginId.enabled: false. The only safe path is physical removal at the Docker image level.

  5. nginx header inheritance — A proxy_set_header in any location block OVERRIDES ALL parent-level proxy_set_header directives. The /analytics/ block must repeat standard headers (Host, X-Forwarded-For) alongside the custom proxy auth headers.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42044b8c24

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@LuD1161 LuD1161 force-pushed the eng-42/workflow-analytics-dashboards branch 12 times, most recently from 0284482 to 8c83d0b Compare January 23, 2026 02:39
@LuD1161 LuD1161 requested a review from betterclever January 23, 2026 02:44
@LuD1161 LuD1161 force-pushed the eng-42/workflow-analytics-dashboards branch 9 times, most recently from 7afae76 to bd71e89 Compare January 27, 2026 04:49
@betterclever
Copy link
Contributor

Question: Scope of User Stories

Looking at tasks/prd.json, I see all user stories (US-001 through US-015) are marked with "passes": true, including:

  • US-012: Analytics settings page UI ✅
  • US-013: Retention settings API endpoints ✅

However, frontend/src/pages/AnalyticsSettingsPage.tsx still contains:

  • Mock data for current tier and retention (line 42)
  • TODO comments referencing US-013 for API integration (lines 58, 70)

Question: Are all 15 user stories expected to be completed in this PR, or is US-013 (the backend API) intentionally deferred to a later PR? If it's expected to be complete, the frontend may need to be wired up to the backend endpoints that were reportedly implemented.

@LuD1161 LuD1161 force-pushed the eng-42/workflow-analytics-dashboards branch from 30b1504 to 5d92c8d Compare January 29, 2026 19:41
@LuD1161
Copy link
Contributor Author

LuD1161 commented Jan 29, 2026

Question: Scope of User Stories

Looking at tasks/prd.json, I see all user stories (US-001 through US-015) are marked with "passes": true, including:

* **US-012**: Analytics settings page UI ✅

* **US-013**: Retention settings API endpoints ✅

However, frontend/src/pages/AnalyticsSettingsPage.tsx still contains:

* Mock data for current tier and retention (line 42)

* TODO comments referencing US-013 for API integration (lines 58, 70)

Question: Are all 15 user stories expected to be completed in this PR, or is US-013 (the backend API) intentionally deferred to a later PR? If it's expected to be complete, the frontend may need to be wired up to the backend endpoints that were reportedly implemented.

This was a relic from the hottest trend in AI, of ralph . Henceforth removed :)

@LuD1161 LuD1161 force-pushed the eng-42/workflow-analytics-dashboards branch 2 times, most recently from b8d9c3a to bd98d61 Compare January 30, 2026 04:59
- Add nginx reverse proxy for unified entry point at http://localhost
- Routes: / (frontend), /api (backend), /analytics (OpenSearch Dashboards)
- Configure OpenSearch Dashboards with /analytics base path
- Add production deployment with TLS and security plugin
- SaaS multitenancy with per-customer tenant isolation
- Certificate generation script (just generate-certs)
- New commands: just dev, just prod-secure

Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
- Add STALE status for orphaned run records (DB/Temporal mismatch)
- Improve status inference from trace events when Temporal not found
- Use correct TraceEventType values for status detection
- Add amber badge color for STALE status
- Extract WorkflowNode into modular directory structure
- Document all execution statuses with transition diagram

Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
…gration

Analytics Sink Component (core.analytics.sink):
- Index output data from any upstream node to OpenSearch
- Auto-detect asset correlation keys (host, domain, url, ip, etc.)
- Fire-and-forget with retry logic (3 attempts, exponential backoff)
- Configurable index suffix and fail-on-error modes

OpenSearch Integration:
- Daily index rotation: security-findings-{orgId}-{YYYY.MM.DD}
- Index template with standard metadata fields
- Multi-tenant data isolation per organization

Analytics API:
- POST /api/v1/analytics/query with OpenSearch DSL support
- Auto-scope queries to organization's index pattern
- Rate limiting: 100 req/min per user
- Protected routes require authentication
- Session cookie support for analytics route auth

UI Integration:
- Analytics Settings page with tier-based retention
- Dashboards link in sidebar (opens in new tab)
- View Analytics button uses Discover app with proper URL state
- Uses .keyword fields for exact match filtering

Component SDK Extensions:
- generateFindingHash() for deduplication
- Workflow context (workflowId, workflowName, organizationId)
- Results output port on nuclei, trufflehog, supabase-scanner
- Support for optional inputs in components

Bug fixes:
- Fix webhook URLs to include global API prefix (ENG-115)
- Add proper connectionType for list variable types
- Handle invalid_value errors for placeholder fields

Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
…ovisioning

Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
Document the OpenSearch tenant identity resolution flow, Clerk active
org session vs membership distinction, tenant provisioning details,
and security guarantees. Add troubleshooting entry for workspace-user
fallback with screenshots and diagnostic commands.

Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
…objects

Two-layer SaaS lockdown for OpenSearch Dashboards:

1. nginx whitelist: PCRE negative lookahead blocks non-whitelisted
   /analytics/app/* routes (returns 403). Allowed: Discover, Visualize,
   Dashboards, Alerting, Dev Tools, Data Explorer, Home.
   Blocked: ISM, Security, Management, Anomaly Detection, Maps, etc.
   Admin retains full access via direct Dashboards port (5601).

2. Role permissions: Replace ISM cluster permissions with Alerting
   permissions (monitor CRUD, alerts, destinations) for tenant roles.
   Add indices:data/write/bulk cluster permission required for
   Dashboards saved objects (visualizations, dashboards, saved searches).
   Without this, multitenancy's kibana_all_write grant is never reached.

3. Default landing page set to Discover instead of Home (which exposes
   all plugin links including blocked ones).

Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
…e in prod

Base compose configs (infra.yml, full.yml) now use `expose` instead of
`ports` for all internal services. Dev-ports overlay binds everything to
127.0.0.1. Only nginx port 80 remains publicly accessible.

Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
…ermission

Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
- Merge nginx.full.conf into nginx.prod.conf (95% identical, prod has better proxy_redirect)
- Consolidate DB init scripts: merge temporal DB creation into 01-create-instance-databases.sh
- Remove orphaned scripts: dev-instance-manager.sh, instance-bootstrap.sh (unreferenced)
- Remove deprecated opensearch-security/whitelist.yml (superseded by allowlist.yml)
- Update docker-compose.full.yml and docs to reference nginx.prod.conf

Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
@LuD1161 LuD1161 added enhancement New feature or request core platform Anything related to the core platform. security engineering Things we do for security engineering :) labels Feb 6, 2026
The AnalyticsModule's controller and services depend on ConfigService
and OpenSearchClient which aren't available in the MCP test module.
Use overrideModule to replace the entire AnalyticsModule with mocks.
Also add explicit ConfigModule import to AnalyticsModule.

Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
@LuD1161 LuD1161 force-pushed the eng-42/workflow-analytics-dashboards branch from 000fe03 to 2918958 Compare February 6, 2026 21:06
- Fix PM2 --only filter to use instance-suffixed names (shipsec-backend-0)
- Fix Kafka broker port from 19092 to 9092 (matches single-listener Redpanda)
- Add whitelist.yml required by securityadmin.sh alongside allowlist.yml

Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
Keep deletion of scripts/dev-instance-manager.sh from feature branch —
script was removed as orphaned in 283d37a; main's bash-compat fix
(f100991) is no longer needed.

Signed-off-by: Aseem Shrey <LuD1161@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core platform Anything related to the core platform. enhancement New feature or request security engineering Things we do for security engineering :)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants