Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
cf8fdd4
Initial pass of moving from gorm to golang-migrate + sqlc
iplay88keys Mar 27, 2026
fbde2c7
Remove unnecessary timout change
iplay88keys Mar 27, 2026
7f5d714
Use pgx instead of lib/sql
iplay88keys Mar 30, 2026
0e3e1e9
Merge branch 'main' into iplay88keys/replace-gorm-with-sqlc-migrations
iplay88keys Mar 30, 2026
3421e26
Simplify migrations by requiring kagent 0.8.0+ before upgrade
iplay88keys Mar 30, 2026
9951ef0
Better matching of what GORM produces for initial migrations
iplay88keys Mar 30, 2026
2707135
Merge branch 'main' into iplay88keys/replace-gorm-with-sqlc-migrations
iplay88keys Mar 30, 2026
81c041d
Remove github workflow check for migration changes on 0.7.x release b…
iplay88keys Mar 30, 2026
e9aa20d
Add new db migrations skill info and add force command to migrator
iplay88keys Mar 31, 2026
e4bd217
Remove temporary upgrade test
iplay88keys Mar 31, 2026
1d5564d
Merge branch 'main' into iplay88keys/replace-gorm-with-sqlc-migrations
iplay88keys Mar 31, 2026
7034b1c
Switch to using in-app migration with extension
iplay88keys Apr 1, 2026
bb1281d
Updates based on review and sqlc generate
iplay88keys Apr 1, 2026
6ddc959
Add additional checks and update skills for migration safety
iplay88keys Apr 2, 2026
d0b1a53
Merge branch 'main' into iplay88keys/replace-gorm-with-sqlc-migrations
iplay88keys Apr 3, 2026
5d08355
Updates to simplify
iplay88keys Apr 3, 2026
217a91b
Remove unused code
iplay88keys Apr 3, 2026
b093c8b
Merge branch 'main' into iplay88keys/replace-gorm-with-sqlc-migrations
iplay88keys Apr 3, 2026
46f9fb7
Updates based on review
iplay88keys Apr 3, 2026
ffcafc0
Small update
iplay88keys Apr 3, 2026
f54084a
Split out a few column modifications to bring gorm in-line with clean…
iplay88keys Apr 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .claude/skills/kagent-dev/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ make helm-install # Builds images and deploys to Kind
make controller-manifests # generate + copy CRDs to helm (recommended)
make -C go generate # DeepCopy methods only

# sqlc (after editing go/core/internal/database/queries/*.sql)
cd go/core/internal/database && sqlc generate # regenerate gen/ — commit both

# Build & test
make -C go test # Unit tests (includes golden file checks)
make -C go e2e # E2E tests (needs KAGENT_URL)
Expand All @@ -43,7 +46,7 @@ kagent/
│ ├── api/ # Shared types module
│ │ ├── v1alpha2/ # Current CRD types (agent_types.go, etc.)
│ │ ├── adk/ # ADK config types (types.go) — flows to Python runtime
│ │ ├── database/ # GORM models
│ │ ├── database/ # database models
│ │ ├── httpapi/ # HTTP API types
│ │ └── config/crd/bases/ # Generated CRD YAML
│ ├── core/ # Infrastructure module
Expand Down Expand Up @@ -231,7 +234,7 @@ curl -v $KAGENT_URL/healthz # Controller reach

**Reproducing locally (without cluster):** Follow `go/core/test/e2e/README.md` — extract agent config, start mock LLM server, run agent with `kagent-adk test`. Much faster iteration than full cluster.

**CI-specific:** E2E runs in matrix (`sqlite` + `postgres`). If only one database variant fails, it's likely database-related. If both fail, it's infrastructure. Most common CI-only failure: mock LLM unreachability because `KAGENT_LOCAL_HOST` detection fails on Linux.
**CI-specific:** Most common CI-only failure: mock LLM unreachability because `KAGENT_LOCAL_HOST` detection fails on Linux.

See `references/e2e-debugging.md` for comprehensive debugging techniques.

Expand Down Expand Up @@ -349,3 +352,4 @@ Don't use Go template syntax (`{{ }}`) in doc comments — Helm will try to pars
- `references/translator-guide.md` - Translator patterns, `deployments.go` and `adk_api_translator.go`
- `references/e2e-debugging.md` - Comprehensive E2E debugging, local reproduction
- `references/ci-failures.md` - CI failure patterns and fixes
- `references/database-migrations.md` - Migration authoring rules, sqlc workflow, multi-instance safety, expand/contract pattern
2 changes: 2 additions & 0 deletions .claude/skills/kagent-dev/references/ci-failures.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Common GitHub Actions CI failures and how to fix them.
| Failure | Likely Cause | Quick Fix |
|---------|--------------|-----------|
| manifests-check | CRD manifests out of date | `make -C go generate && cp go/api/config/crd/bases/*.yaml helm/kagent-crds/templates/` |
| sqlc-generate-check | `gen/` out of sync with queries | `cd go/core/internal/database && sqlc generate`, commit `gen/` |
| go-lint depguard | Forbidden package used | Replace with allowed alternative (e.g., `slices.Sort` not `sort.Strings`) |
| test-e2e timeout | Agent not starting or KAGENT_URL wrong | Check pod status, verify KAGENT_URL setup in CI |
| golden files mismatch | Translator output changed | `UPDATE_GOLDEN=true make -C go test` and commit |
Expand Down Expand Up @@ -520,6 +521,7 @@ make init-git-hooks
Before submitting PR:

- [ ] Ran `make -C go generate` after CRD changes
- [ ] Ran `cd go/core/internal/database && sqlc generate` after query changes, committed `gen/`
- [ ] Ran `make lint` and fixed issues
- [ ] Ran `make -C go test` and all pass
- [ ] Regenerated golden files if translator changed
Expand Down
155 changes: 155 additions & 0 deletions .claude/skills/kagent-dev/references/database-migrations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Database Migrations Guide

kagent uses [golang-migrate](https://github.com/golang-migrate/migrate) with embedded SQL files and [sqlc](https://sqlc.dev/) for type-safe query generation. Migrations run **in-app at startup** — the controller applies them before accepting traffic.

## Structure

```
go/core/pkg/migrations/
├── migrations.go # Embeds the FS (go:embed); exports FS for downstream consumers
├── runner.go # RunUp (applies pending migrations at startup)
├── core/ # Core schema (tracked in schema_migrations table)
│ ├── 000001_initial.up.sql / .down.sql
│ ├── 000002_add_session_source.up.sql / .down.sql
│ └── ...
└── vector/ # pgvector schema (tracked in vector_schema_migrations table)
├── 000001_vector_support.up.sql / .down.sql
└── ...

go/core/internal/database/
├── queries/ # Hand-written SQL queries (source of truth)
│ ├── sessions.sql
│ ├── memory.sql
│ └── ...
├── gen/ # sqlc-generated Go code — DO NOT edit manually
│ ├── db.go
│ ├── models.go
│ └── *.sql.go
└── sqlc.yaml # sqlc configuration
```

Migrations manage two independent tracks — `core` and `vector` — and roll back both if either fails. The `--database-vector-enabled` flag (default `true`) controls whether the vector track runs.

## sqlc Workflow

When you add or change a SQL query:

1. Edit (or add) a `.sql` file under `go/core/internal/database/queries/`
2. Regenerate:
```bash
cd go/core/internal/database && sqlc generate
```
3. Commit both the query file and the updated `gen/` files together.

A CI check (`.github/workflows/sqlc-generate-check.yaml`) fails the PR if `gen/` is out of sync with the queries. Never edit `gen/` by hand.

**sqlc annotations used:**
- `:one` — returns a single row
- `:many` — returns a slice
- `:exec` — returns only error (use for INSERT/UPDATE/DELETE that don't need the result)

## Writing Migrations

### Backward-compatible schema changes

During a rolling deploy, old pods will be reading and writing a schema that has already been upgraded. **Every migration must be backward-compatible with the previous version's code.**

| Change | Old code behavior | Safe? |
|--------|------------------|-------|
| Add nullable column | SELECT ignores it; INSERT omits it (goes NULL) | ✅ |
| Add column with `DEFAULT x` | INSERT omits it; DB fills default | ✅ |
| Add NOT NULL column **without** default | Old INSERT missing the column → error | ❌ |
| Add index | Invisible to application code | ✅ |
| Add foreign key | Old INSERT may fail constraint | ❌ |
| Drop/rename column old code references | Old SELECT/INSERT errors | ❌ |
| Change compatible type (e.g. `int` → `bigint`) | Usually fine | ⚠️ |

**Expand-then-contract pattern for schema changes:**
1. **Version N (Expand)**: add the new column/table (nullable or with default); old code still works
2. **Version N (Deploy)**: ship new code that uses the new structure
3. **Version N+1 (Contract)**: drop the old column/table once version N is fully deployed and no pods run version N-1

### Idempotency and cross-track safety

All DDL statements must use `IF EXISTS` / `IF NOT EXISTS` guards:

```sql
-- Up
CREATE TABLE IF NOT EXISTS foo (...);
ALTER TABLE foo ADD COLUMN IF NOT EXISTS bar TEXT;

-- Down
DROP TABLE IF EXISTS foo;
ALTER TABLE foo DROP COLUMN IF EXISTS bar;
```

Guards provide defense-in-depth for crash recovery and dirty-state cleanup, where a partially-applied migration may be re-run or rolled back.

### Naming

Files must follow `NNNNNN_description.up.sql` / `NNNNNN_description.down.sql` with zero-padded 6-digit sequence numbers.

### Down migrations

Every `.up.sql` must have a corresponding `.down.sql` that exactly reverses it. Down migrations are used for rollbacks and by automatic rollback on migration failure. They must be **idempotent** — the two-track rollback logic (roll back core if vector fails) may call them more than once in failure scenarios.

## Multi-Instance Safety

### How the advisory lock works

The migration runner acquires a PostgreSQL **session-level** advisory lock (`pg_advisory_lock`) before running.

### Rolling deploy concurrency

If multiple pods start simultaneously (e.g., rolling deploy with replicas > 1):
1. One controller acquires the advisory lock and runs migrations.
2. Others block on `pg_advisory_lock`.
3. When the winner finishes and its connection closes, the next waiter acquires the lock, calls `Up()`, gets `ErrNoChange`, and exits immediately.

This is safe. The only risk is if the winning controller crashes mid-migration (see Dirty State below).

### Dirty state recovery

If the controller crashes mid-migration, the migration runner records the version as `dirty = true` in the tracking table. The next startup detects dirty state and calls `rollbackToVersion`, which:
1. Calls `mg.Force(version - 1)` to clear the dirty flag.
2. Runs the down migration to restore the previous clean state.
3. Re-runs the failed up migration.

**Requirement**: down migrations must be idempotent and correctly reverse their up migration. A missing or broken down migration requires manual recovery.

### Rollout strategy

For backward-compatible migrations a rolling update is safe:

1. New pod starts → migration runner applies pending migrations (advisory lock serializes concurrent runs)
2. New pod passes readiness probe → old pod terminates
3. Backward-compatible schema means old pods continue operating during the window

For a migration that is **not** backward-compatible, restructure it using the expand-then-contract pattern (add new column/table in version N, ship code that uses it, drop the old column in version N+1).

## Static Analysis Enforcement

The policies above are enforced by static analysis tests in `go/core/pkg/migrations/cross_track_test.go`. These run against the embedded SQL files — no database required.

| Test | What it enforces |
|------|-----------------|
| `TestNoCrossTrackDDL` | No track may `ALTER TABLE` or `CREATE INDEX ON` a table owned by another track |
| `TestMigrationGuards` | Up migrations must use `IF NOT EXISTS` on all `CREATE`/`ADD COLUMN`; down migrations must use `IF EXISTS` on all `DROP` statements |

**Adding a new track**: add the track directory name to the `tracks` slice in each test so the new track is covered by the same checks.

These tests catch policy violations at PR time without needing a running database. They complement the integration tests in `runner_test.go`, which verify the runner's rollback and concurrency behavior against a real Postgres instance.

## Downstream Extension Model

The migration layer is designed for downstream consumers to extend with their own migrations alongside OSS. The extension points are:

1. **SQL files as the contract.** The migration files in `go/core/pkg/migrations/core/` and `vector/` are the stable interface. Downstream consumers sync these files into their own repos and build their own migration runners. Don't move or reorganize migration file paths without considering downstream impact.

2. **`MigrationRunner` DI callback.** Downstream consumers pass a custom `MigrationRunner` to `app.Start` to take full ownership of the migration process — running OSS migrations alongside their own in whatever order they need. The signature `func(ctx context.Context, url string, vectorEnabled bool) error` is stable.

3. **Vector track stays separate.** The vector track is conditionally applied and has its own tracking table. Downstream extensions should not modify vector-owned tables (enforced by `TestNoCrossTrackDDL`).

### What this means for OSS development

- **Migration immutability is cross-repo.** Once a migration file is merged and tagged, downstream consumers may have synced it. Modifying it breaks their tracking table state.
35 changes: 35 additions & 0 deletions .github/workflows/migration-immutability.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Migration Immutability

on:
pull_request:
branches: [main]
paths:
- "go/core/pkg/migrations/**"

jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Fail if any existing migration file was modified
run: |
# List files under go/core/pkg/migrations/ that were changed relative
# to the merge base of this PR. We only care about modifications (M)
# and renames (R); additions (A) are fine.
BASE=$(git merge-base HEAD origin/${{ github.base_ref }})
MODIFIED=$(git diff --name-only --diff-filter=MR "$BASE" HEAD \
-- 'go/core/pkg/migrations/**/*.sql')

if [ -n "$MODIFIED" ]; then
echo "ERROR: The following migration files were modified."
echo "Migration files are immutable once merged."
echo "Fix bugs with a new migration instead."
echo ""
echo "$MODIFIED"
exit 1
fi

echo "OK: no existing migration files were modified."
38 changes: 38 additions & 0 deletions .github/workflows/sqlc-generate-check.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: sqlc Generate Check

on:
pull_request:
branches: [main]
paths:
- "go/core/internal/database/queries/**"
- "go/core/internal/database/sqlc.yaml"
- "go/core/pkg/migrations/**"

jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-go@v6
with:
go-version: "1.26"
cache: true
cache-dependency-path: go/go.sum

- name: Install sqlc
run: go install github.com/sqlc-dev/sqlc/cmd/sqlc@v1.30.0

- name: Run sqlc generate
working-directory: go/core/internal/database
run: sqlc generate

- name: Fail if generated files differ
run: |
if ! git diff --quiet go/core/internal/database/gen/; then
echo "ERROR: sqlc generate produced changes. Run sqlc generate locally and commit the result."
echo ""
git diff go/core/internal/database/gen/
exit 1
fi
echo "OK: generated files are up to date."
1 change: 0 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,6 @@ APP_IMAGE_TAG ?= $(VERSION)
KAGENT_ADK_IMAGE_TAG ?= $(VERSION)
GOLANG_ADK_IMAGE_TAG ?= $(VERSION)
SKILLS_INIT_IMAGE_TAG ?= $(VERSION)

CONTROLLER_IMG ?= $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(CONTROLLER_IMAGE_NAME):$(CONTROLLER_IMAGE_TAG)
UI_IMG ?= $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(UI_IMAGE_NAME):$(UI_IMAGE_TAG)
APP_IMG ?= $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(APP_IMAGE_NAME):$(APP_IMAGE_TAG)
Expand Down
4 changes: 2 additions & 2 deletions docs/architecture/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ The controller uses SQLite (default) or PostgreSQL for persistent state that sup
**Why a separate DB?** The Kubernetes API is not designed for high-frequency read patterns like listing conversations or searching tools. The DB provides fast lookups for the HTTP API and UI, while the CRDs remain the source of truth for agent configuration.

**Key files:**
- `go/api/database/models.go` — GORM models
- `go/api/database/models.go` — database models
- `go/core/internal/database/client.go` — Database client implementation
- `go/core/internal/database/service.go` — Business logic with atomic upserts

Expand Down Expand Up @@ -398,7 +398,7 @@ go/
├── go.work
├── api/ # github.com/kagent-dev/kagent/go/api
│ ├── v1alpha2/ # CRD type definitions
│ ├── database/ # GORM database models
│ ├── database/ # database models
│ ├── httpapi/ # HTTP API request/response types
│ ├── client/ # REST client SDK for the HTTP API
│ └── config/crd/ # Generated CRD manifests
Expand Down
2 changes: 1 addition & 1 deletion go/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ go/
│ ├── v1alpha1/ # Legacy CRD types
│ ├── v1alpha2/ # Current CRD types
│ ├── adk/ # ADK config & model types
│ ├── database/ # GORM model structs & Client interface
│ ├── database/ # database model structs & Client interface
│ ├── httpapi/ # HTTP API request/response types
│ ├── client/ # REST HTTP client SDK
│ ├── utils/ # Shared utility functions
Expand Down
1 change: 1 addition & 0 deletions go/adk/pkg/a2a/converter_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ func TestMessageToGenAIContent_TextPart(t *testing.T) {
}
if content == nil {
t.Fatal("expected non-nil content")
return
}
if len(content.Parts) != 1 {
t.Fatalf("expected 1 part, got %d", len(content.Parts))
Expand Down
3 changes: 3 additions & 0 deletions go/adk/pkg/a2a/hitl_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -481,13 +481,15 @@ func TestBuildResumeHITLMessage(t *testing.T) {
resume := BuildResumeHITLMessage(storedTask, incoming)
if resume == nil {
t.Fatal("BuildResumeHITLMessage() returned nil")
return
}
if len(resume.Parts) != 1 {
t.Fatalf("resume parts len = %d, want 1", len(resume.Parts))
}
dp := asDataPart(resume.Parts[0])
if dp == nil {
t.Fatal("resume part is not a DataPart")
return
}
if dp.Data[PartKeyName] != "adk_request_confirmation" {
t.Fatalf("resume FunctionResponse name = %#v", dp.Data[PartKeyName])
Expand Down Expand Up @@ -516,6 +518,7 @@ func TestProcessHitlDecision(t *testing.T) {
dp := asDataPart(parts[0])
if dp == nil {
t.Fatal("part is not DataPart")
return
}
if dp.Data[PartKeyName] != "adk_request_confirmation" {
t.Errorf("name = %v", dp.Data[PartKeyName])
Expand Down
2 changes: 2 additions & 0 deletions go/adk/pkg/config/config_loader_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ func TestLoadAgentConfig(t *testing.T) {

if config == nil {
t.Fatal("LoadAgentConfig() returned nil config")
return
}

// Check that model was loaded
Expand Down Expand Up @@ -92,6 +93,7 @@ func TestLoadAgentCard(t *testing.T) {

if card == nil {
t.Fatal("LoadAgentCard() returned nil card")
return
}

if card.Name != "test-agent" {
Expand Down
Loading
Loading