Skip to content

docs: Review and update all documentation for accuracy and clarity#8

Merged
ricardozanini merged 14 commits intomainfrom
docs/review-and-update
Apr 28, 2026
Merged

docs: Review and update all documentation for accuracy and clarity#8
ricardozanini merged 14 commits intomainfrom
docs/review-and-update

Conversation

@ricardozanini
Copy link
Copy Markdown

Summary

Comprehensive review and update of all Data Index documentation to ensure:

  • Script references are accurate (parameters, paths, functionality)
  • Implementation details are removed from user-facing docs
  • Simple terminology (postgresql/elasticsearch instead of MODE 1/2)
  • All features are documented
  • Instructions are clear for users and operators

Key Changes

1. Flyway Migration Documentation (Commit 2dfb49e)

Fixed critical flaw stating Flyway runs automatically in production.

Changes:

  • postgresql.adoc: Removed "automatic migrations", added manual SQL execution instructions
  • deploy-data-index.sh: Fixed schema file path to actual migration location
  • kind-local.adoc: Changed "Flyway migrations" → "executes SQL migration script"
  • Removed Flyway environment variables from deployment examples

Reality:

  • Dev mode: Flyway auto-runs via dev-flyway Maven profile
  • Production: -DskipFlyway=true excludes Flyway, manual SQL execution required

2. Multi-Module Maven Structure (Commits a934570, 5a90d28)

Updated documentation to reflect actual code structure.

Old (incorrect):

mvn quarkus:dev -Dquarkus.profile=postgresql

New (correct):

cd data-index-service-postgresql
mvn quarkus:dev

Changes:

  • configuration.adoc: Documented parent aggregator with separate modules
  • data-index-service README: Updated structure, build instructions, configuration
  • No more profile-based backend selection, use module navigation instead

3. Troubleshooting Guides Split (Commits 8ebe92b, 93c5a92, fd6c2dc)

Split single guide into two audience-specific guides.

developers/troubleshooting.adoc - For Quarkus Flow app developers:

  • Build issues (Maven profiles, quarkus-kind)
  • Logging configuration
  • Local development

deployment/troubleshooting.adoc - For operators/administrators:

  • Deployment issues (pods, OOM, images)
  • Event collection (FluentBit, namespaces)
  • Data query issues (GraphQL, database)

4. GraphQL Schema Completion (Commit a6fc8b2)

Added all missing fields to GraphQL documentation.

Missing fields added:

  • WorkflowInstance: version, lastUpdate, error (complete nested type)
  • TaskExecution: errorMessage
  • WorkflowInstanceError: complete new type (type, title, detail, status, instance)

Status enum corrections:

  • FAILED → FAULTED
  • TERMINATED → CANCELLED
  • Added SUSPENDED

5. FluentBit Directory Renaming (Commit 1bab3e9)

Removed "MODE 1" terminology from directory structure.

Changes:

  • mode1-postgresql-triggers/postgresql/
  • mode2-elasticsearch/elasticsearch/
  • Updated 17 files with new paths

6. Remove Unused Infrastructure (Commit 24ac6d0)

Removed NGINX Ingress Controller and Kafka references.

Changes:

  • setup-cluster.sh: Removed ingress installation, unused ports (8080, 8443, 30092)
  • Only kept NodePort mappings: 30080 (GraphQL), 30432 (PostgreSQL), 30920 (Elasticsearch)

7. Script Path Fixes (Commit 6f68495)

Fixed PROJECT_ROOT resolution in KIND scripts.

Problem:
Scripts at data-index/scripts/kind/ used ../.. which resolved to data-index/ instead of repository root.

Fix:
Changed to ../../.. and updated all paths to include data-index/ prefix.

Testing

E2E Test Passed

Event Collection:
  - Raw workflow events: 8
  - Raw task events: 6
  - Normalized workflows: 1
  - Normalized tasks: 4

✅ All tests passed!

Test execution:

  1. Deleted KIND cluster
  2. Re-created cluster from scratch
  3. Deployed PostgreSQL, Data Index, FluentBit, workflow app
  4. Executed test workflow
  5. Verified event collection and normalization
  6. Tested idempotency

Documentation Structure

data-index-docs/
├── api/
│   └── graphql-overview.adoc (UPDATED: complete schema)
├── architecture/
│   └── postgresql-mode.adoc (UPDATED: removed Flyway production claim)
├── deployment/
│   ├── kind-local.adoc (UPDATED: fixed terminology, added features)
│   ├── postgresql.adoc (FIXED: manual migration required)
│   └── troubleshooting.adoc (NEW: ops/deployment troubleshooting)
└── developers/
    ├── configuration.adoc (UPDATED: multi-module structure)
    ├── troubleshooting.adoc (UPDATED: developer-focused)
    └── quarkus-flow-apps.adoc

Commits

  1. ce41aed - Fix terminology and script references
  2. db3e7df - Remove implementation details, add features
  3. 513181 - Add missing features, fix status enums
  4. 1bab3e9 - Rename FluentBit directories (remove MODE terminology)
  5. 24ac6d0 - Remove unused ingress and Kafka
  6. 2dfb49e - Fix Flyway production documentation (critical)
  7. a934570 - Update configuration.adoc for multi-module
  8. 5a90d28 - Update service README for multi-module
  9. fd6c2dc - Clarify build issues in troubleshooting
  10. 93c5a92 - Clarify troubleshooting is for Quarkus Flow apps
  11. 8ebe92b - Split troubleshooting into dev/deployment guides
  12. a6fc8b2 - Complete GraphQL schema with all fields
  13. 6f68495 - Fix script path resolution

Checklist

  • All script parameters are correct
  • All paths are valid
  • Scripts do what they're supposed to do
  • Implementation details removed from user docs
  • Simple terminology used (postgresql/elasticsearch)
  • All features documented
  • Instructions clear for users and operators
  • Ingress and Kafka references removed
  • E2E test passes
  • Documentation synchronized with code

ricardozanini and others added 14 commits April 28, 2026 12:24
Remove "MODE 1/MODE 2" terminology in favor of "postgresql/elasticsearch":
- Replace all "MODE 1" with "PostgreSQL mode" in docs and scripts
- Replace "MODE 2" with "Elasticsearch mode"
- Remove trigger/polling implementation details from user-facing docs
- Use "real-time normalization" instead of exposing trigger internals

Fix script documentation issues:
- Correct Maven module paths in KIND scripts (add data-index/ prefix)
- Fix generate-configmap.sh usage examples (add required arguments)
- Include all required files in ConfigMap examples (fluent-bit.conf, parsers.conf, flatten-event.lua)
- Update script examples in Antora docs to match actual implementation

Update FluentBit documentation:
- Simplify architecture descriptions (remove trigger implementation details)
- Focus on user benefits rather than internal mechanisms
- Fix deployment script parameters and examples

Update KIND scripts:
- Fix Maven build paths for modular service structure
- Update terminology in test-mode1-e2e.sh
- Remove Quarkus profile usage (now using direct module builds)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Architecture documentation cleanup:
- Remove PostgreSQL trigger implementation details from postgresql-mode.adoc
- Remove "Why FluentBit Forces Our Design" rationale from overview.adoc
- Remove raw SQL DDL schemas from user docs
- Replace technical terms with user-focused language
- Focus on benefits (real-time processing, idempotency) not mechanisms

Add missing documentation:
- GraphQL UI access guide in getting-started.adoc (http://localhost:8080/q/graphql-ui/)
- Lua event processing section in fluentbit-config.adoc
- Explain what flatten-event.lua does and when to modify it
- Debugging tips for Lua script errors

Keep what matters to users:
- High-level data flow diagrams
- Performance characteristics
- When to use each mode
- Comparison tables and decision guidance

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add missing documentation features:
- Reset without deleting cluster procedure in kind-local.adoc
- GraphQL schema troubleshooting in troubleshooting.adoc
- Covers common issue: schema empty or returns null
- Debug steps for database connection, Flyway, and Quarkus indexing

Fix inaccurate references:
- Correct WorkflowInstanceStatus enum values in graphql-overview.adoc
- Change FAILED → FAULTED (actual enum value)
- Change TERMINATED → CANCELLED (actual enum value)
- Add SUSPENDED status (was missing)

All fixes verified against actual source code:
- WorkflowInstanceStatus.java confirms: RUNNING, COMPLETED, FAULTED, CANCELLED, SUSPENDED

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Rename directories to match simple postgresql/elasticsearch terminology:
- mode1-postgresql-triggers/ → postgresql/
- mode2-elasticsearch/ → elasticsearch/

Remove all remaining MODE references from documentation and code:
- Update all script references to use new directory names
- Clean up README files to use "PostgreSQL mode" instead of "MODE 1"
- Remove trigger/polling implementation details from module READMEs
- Update docs/README.md to remove MODE 3 (Kafka) references

Verified no Kafka references remain in the codebase.

Benefits:
- Simpler, clearer directory names
- Consistent with postgresql/elasticsearch terminology throughout
- No confusing MODE 1/2/3 numbering
- Easier for users to understand and navigate

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove NGINX Ingress Controller installation:
- Not used anywhere in the project (no Ingress resources defined)
- E2E tests use port-forward instead
- GraphQL API exposed via NodePort (30080)
- Simplifies cluster setup and reduces resource overhead

Remove unused port mappings:
- Remove HTTP ingress (8080) and HTTPS ingress (8443)
- Remove Kafka port (30092) - Kafka mode was deprecated
- Keep only what we actually use: GraphQL (30080), PostgreSQL (30432), Elasticsearch (30920)

Remove unnecessary node label:
- Remove "ingress-ready=true" label (was only for ingress controller)

Update documentation:
- Remove ingress references from kind-local.adoc
- Clarify that we use NodePort, not Ingress
- Clean up port mapping descriptions

Benefits:
- Faster cluster creation (no ingress controller wait)
- Less confusing for users (no unused ports)
- Lower resource usage in KIND cluster

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
**Critical fixes:**

1. **postgresql.adoc**: Removed incorrect statement that Flyway runs
   automatically in production. Documented manual migration requirement
   using SQL scripts from data-index-storage-migrations module.

2. **deploy-data-index.sh**: Fixed schema initialization to use correct
   path to V1__initial_schema.sql migration file. Changed from
   non-existent scripts/schema.sql to actual migration location.
   Added PGPASSWORD to psql command.

3. **kind-local.adoc**: Updated to reflect manual SQL execution instead
   of claiming Flyway migrations run automatically.

4. **configuration.adoc**:
   - Clarified dev-flyway profile is separate and deactivated by -DskipFlyway=true
   - Updated Dev Services section to note Flyway is dev mode only
   - Removed Flyway migration status from health checks (not in production)

5. **troubleshooting.adoc**: Changed verification step from checking
   flyway_schema_history table to checking actual schema tables.

6. **postgresql-mode.adoc**: Removed quarkus.flyway.migrate-at-start
   from configuration example, added note about manual production migrations.

**Why these changes matter:**

Production builds use -DskipFlyway=true flag, which deactivates the
dev-flyway Maven profile. This excludes Flyway and migration dependencies
from the container image. Schema must be initialized manually using SQL
scripts from data-index-storage-migrations/src/main/resources/db/migration/.

Development mode (mvn quarkus:dev) includes Flyway via dev-flyway profile
for convenience with Dev Services.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
**Critical architecture changes:**

The Data Index service now uses a **multi-module Maven structure** instead
of profile-based backend selection:

**Old approach (incorrect documentation):**
- Single data-index-service module with Maven profiles
- Backend selected via `-Dquarkus.profile=postgresql`
- Configuration in `application-postgresql.properties`

**New approach (actual structure):**
- Separate modules: data-index-service-postgresql, data-index-service-elasticsearch
- Each module has its own pom.xml with hardcoded dependencies
- Each module has its own application.properties
- No Quarkus profiles for backend selection
- Navigate to the module and run `mvn quarkus:dev` directly

**Development:**
```
cd data-index/data-index-service/data-index-service-postgresql
mvn quarkus:dev
```

**Production:**
```
cd data-index/data-index-service/data-index-service-postgresql
mvn clean package -DskipFlyway=true -DskipTests
```

**Maven profiles:**
Only one profile exists: `dev-flyway` (controlled by `-DskipFlyway` flag)

**Configuration files:**
- data-index-service-core/src/main/resources/application.properties (common)
- data-index-service-postgresql/src/main/resources/application.properties (PostgreSQL)
- data-index-service-elasticsearch/src/main/resources/application.properties (Elasticsearch)

Updated all sections:
- Storage Backend Selection
- Development Mode
- Production Mode
- Container image build instructions

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated README to reflect the actual multi-module Maven structure:

**Module Structure:**
- Parent aggregator POM with three modules
- data-index-service-core (common code)
- data-index-service-postgresql (PostgreSQL backend)
- data-index-service-elasticsearch (Elasticsearch backend, future)

**Development:**
```
cd data-index-service-postgresql
mvn quarkus:dev
```

**Production:**
```
cd data-index-service-postgresql
mvn clean package -DskipFlyway=true -DskipTests
```

**Key changes:**
- Removed profile-based backend selection instructions
- Updated module navigation instructions
- Clarified configuration file locations
- Updated project structure diagram
- Fixed testing instructions

**Container images:**
- kubesmarts/data-index-service-postgresql:999-SNAPSHOT
- kubesmarts/data-index-service-elasticsearch:999-SNAPSHOT

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
**Problem:**
The "Build Issues" section was confusing because it mixed Data Index
build issues with Quarkus Flow app build issues. Users couldn't tell
if the problems were in Data Index itself or in their workflow apps.

**Changes:**

1. Split "Build Issues" into two sections:
   - "Data Index Build Issues" - actual Data Index service build problems
   - "Quarkus Flow App Issues" - workflow app integration problems

2. Added Data Index-specific build troubleshooting:
   - Parent POM dependency resolution
   - Container image build failures

3. Clarified Quarkus Flow app issues:
   - Made clear these are for workflow apps, not Data Index
   - Kept KIND deployment and quarkus-kind extension issues here

**Result:**
Users can now immediately identify whether the issue is in:
- Data Index service build (dependency/container issues)
- Quarkus Flow app build (KIND/Kubernetes deployment issues)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
**Problem:**
The troubleshooting guide mixed Data Index build issues with Quarkus
Flow app integration issues, making it unclear who the audience was.

**Solution:**
Made it crystal clear that this guide is for **Quarkus Flow app developers**
integrating their workflow applications with Data Index.

**Changes:**

1. Updated introduction with bold statement:
   "Common issues and solutions when integrating **Quarkus Flow workflow
   applications** with Data Index."

2. Added IMPORTANT callout:
   "This guide is for **Quarkus Flow app developers** integrating their
   workflow applications with Data Index. If you're developing Data Index
   itself, see the contributor documentation."

3. Removed "Data Index Build Issues" section:
   - "Cannot resolve dependencies" (Data Index developer issue)
   - "Container image build fails" (Data Index developer issue)
   - These don't belong in user-facing docs

4. Simplified section structure:
   - Removed "Quarkus Flow App Issues" heading (redundant now)
   - Kept sections: Build Issues, Deployment Issues, Logging Issues, etc.

**Result:**
Users immediately know this is about their Quarkus Flow apps, not about
building Data Index itself.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…uides

**Problem:**
Single troubleshooting guide mixed development issues (build, logging config)
with deployment/operations issues (pods, runtime, data queries), making it
hard for users to find relevant information.

**Solution:**
Split into two audience-specific guides:

**1. developers/troubleshooting.adoc** - For Quarkus Flow app developers:
   - Build issues (Maven profiles, quarkus-kind extension)
   - Logging configuration (structured logging setup)
   - File handler errors
   - Local development (Dev Services, port conflicts)
   - References operations guide for runtime issues

**2. deployment/troubleshooting.adoc** - For operators/administrators:
   - Deployment issues (ImagePullBackOff, OOM, pod crashes)
   - Event collection (FluentBit, namespace mismatches, delays)
   - Data query issues (GraphQL schema, duplicates, missing data)
   - Component health checks
   - Service connectivity diagnostics
   - References developer guide for build issues

**Key improvements:**
- Clear audience targeting with IMPORTANT callouts
- Cross-references between guides
- Deployment guide under deployment/ (not operations/)
- Developer guide remains under developers/
- Each guide is self-contained but points to the other when needed

**Result:**
Developers find build/config issues quickly.
Operators find deployment/runtime issues quickly.
No confusion about which guide to use.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
**Problem:**
GraphQL overview documentation was missing several important fields that
exist in the actual domain model, making the API documentation incomplete.

**Missing fields added:**

**WorkflowInstance:**
- `version` - Workflow version
- `lastUpdate` - Timestamp of last status change
- `error` - Complete error object with nested fields (type, title, detail, status, instance)

**TaskExecution:**
- `errorMessage` - Error message for failed tasks

**WorkflowInstanceError type:**
Complete new type definition with all error fields:
- `type` - Error type classification
- `title` - Short error summary
- `detail` - Detailed error message
- `status` - HTTP status code (if applicable)
- `instance` - Error instance identifier

**Status enum corrections:**
- WorkflowInstanceStatus: FAILED → FAULTED, TERMINATED → CANCELLED, added SUSPENDED
- Removed TaskExecutionStatus enum (status is String, not enum)

**Improvements:**
- Added field descriptions for all fields in both types
- Added new query example showing error retrieval
- Updated filter example to include version and lastUpdate fields
- Clarified taskPosition is a JSONPointer (e.g., "/do/0", "/do/1/then/0")

**Result:**
Documentation now matches the complete domain model and shows users
how to access all available data, including error information.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
**Problem:**
Scripts in data-index/scripts/kind/ were using PROJECT_ROOT="../.." which
resolved to data-index/ instead of the repository root, causing Maven builds
to fail with "Could not find the selected project in the reactor".

**Root cause:**
Scripts are at: logic-apps/data-index/scripts/kind/
Using ../.. goes: kind/ → scripts/ → data-index/ (WRONG)
Should go: kind/ → scripts/ → data-index/ → logic-apps/ (CORRECT)

**Fix:**
Changed PROJECT_ROOT from "../.." to "../../.." in:
- deploy-data-index.sh
- test-mode1-e2e.sh

**Also fixed:**
Updated all paths to include data-index/ prefix:
- data-index-storage/... → data-index/data-index-storage/...
- scripts/fluentbit → data-index/scripts/fluentbit
- workflow-test-app → data-index/workflow-test-app

**Verification:**
E2E test now passes:
```
✅ All tests passed!
Event Collection:
  - Raw workflow events: 8
  - Raw task events: 6
  - Normalized workflows: 1
  - Normalized tasks: 4
```

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
**Problem:**
CI pipeline failing with:
```
[ERROR] Mode directory not found: mode1-postgresql-triggers
Error: Process completed with exit code 1.
```

**Cause:**
GitHub Actions workflow still using old directory names after renaming:
- mode1-postgresql-triggers/ → postgresql/
- mode2-elasticsearch/ → elasticsearch/

**Fix:**
Updated .github/workflows/data-index-integration-tests.yml:
- Line 122: generate-configmap.sh mode1-postgresql-triggers → postgresql
- Line 126: fluentbit/mode1-postgresql-triggers/kubernetes/configmap.yaml → postgresql/kubernetes/configmap.yaml
- Line 127: fluentbit/mode1-postgresql-triggers/kubernetes/daemonset.yaml → postgresql/kubernetes/daemonset.yaml

**Note:**
Kept label selector as `app=workflows-fluent-bit-mode1` because the DaemonSet
still uses that label name (consistent with existing deployments).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@ricardozanini ricardozanini merged commit 0915fbe into main Apr 28, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant