Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .RALPH/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# .RALPH – moltworker Pattern Library

This is the project-level knowledge base for `moltworker` (the OpenClaw-based Cloudflare
Worker + Sandbox project). Every validated pattern, recurring problem, and architectural
decision from this project is logged here.

> For the future **nanoworker** project, see `nanoworker/.RALPH/` — it inherits and
> extends many of these patterns.

## Rules for agents

1. **Before trying an approach**, check `patterns.md` and `problems.md`.
2. **After solving a non-trivial problem**, add an entry here.
3. **After making an architectural decision**, log it in `decisions.md`.

## Files

| File | Purpose |
|------|---------|
| `patterns.md` | Validated reusable implementation strategies |
| `problems.md` | Recurring problems and their confirmed solutions |
| `decisions.md` | Architectural decisions with rationale |
71 changes: 71 additions & 0 deletions .RALPH/decisions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Architectural Decisions – moltworker

---

## ADR-001 – Config patcher runs unconditionally on every container boot

**Date**: 2026-03-03
**Status**: Accepted

**Context**: Should the Node.js config patcher in `start-openclaw.sh` run only when
no config exists (i.e. first boot), or unconditionally?

**Decision**: Unconditionally, after any R2 restore and before the gateway starts.

**Rationale**: Running it conditionally means that changing a Cloudflare secret requires
manually deleting the R2 config to force re-onboard. This is error-prone and was the
direct cause of PROB-001 and PROB-002 in production. Running it unconditionally means
`wrangler secret put` + `npm run deploy` is always sufficient to propagate new secret values.

**Trade-offs**: Startup adds a small overhead (~50 ms for the Node.js one-shot). Manual
in-container edits to patched fields (provider apiKey, channel tokens, gateway token) will
be overwritten on next restart. This is documented and acceptable.

---

## ADR-002 – Use rclone (not rsync or s3fs) for R2 persistence

**Date**: 2026-03-03
**Status**: Accepted

**Context**: The container needs to persist OpenClaw config and workspace to R2 across restarts.

**Decision**: rclone with `--fast-list --s3-no-check-bucket`, not rsync or s3fs mount.

**Rationale**: R2 does not support setting file timestamps. `rsync -a` (which preserves
timestamps) fails with I/O errors against R2 (PROB-004). rclone works correctly with R2
by default and does not attempt to set timestamps.

---

## ADR-003 – CF AI Gateway requires `CF_AI_GATEWAY_MODEL` to be explicitly set

**Date**: 2026-03-03
**Status**: Accepted

**Context**: Should the config patcher try to infer the model from other config,
or require an explicit `CF_AI_GATEWAY_MODEL` env var?

**Decision**: Require explicit `CF_AI_GATEWAY_MODEL` (format: `{provider}/{model}`).

**Rationale**: Inferring the model is ambiguous and error-prone. An explicit var makes
the configuration unambiguous, testable, and easy to change without touching code.
The format `{provider}/{model}` allows the patcher to construct the correct gateway base URL
and set the correct `api` mode (`anthropic-messages` vs `openai-completions`).

---

## ADR-004 – `MOLTBOT_GATEWAY_TOKEN` is mapped to `OPENCLAW_GATEWAY_TOKEN` in the container

**Date**: 2026-03-03
**Status**: Accepted

**Context**: The Worker-facing secret is named `MOLTBOT_GATEWAY_TOKEN` (worker-level
naming convention). The OpenClaw container expects `OPENCLAW_GATEWAY_TOKEN`.

**Decision**: `buildEnvVars()` maps `MOLTBOT_GATEWAY_TOKEN` → `OPENCLAW_GATEWAY_TOKEN`.
The `start-openclaw.sh` script reads `OPENCLAW_GATEWAY_TOKEN` internally.

**Rationale**: Keeps the Worker env namespace decoupled from the container's internal
naming. If OpenClaw is ever replaced, only `buildEnvVars()` needs to change, not the
Worker-facing secret name.
121 changes: 121 additions & 0 deletions .RALPH/patterns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Validated Patterns – moltworker

---

## P-001 – Inline config patcher (always runs on every container boot)

**Date**: 2026-03-03
**Context**: OpenClaw reads provider config from `~/.openclaw/openclaw.json`. Secrets live
in Cloudflare Worker env and must reach the container. The container may have a persisted
config from R2 which must not be fully overwritten.

**Approach**: In `start-openclaw.sh`, after the R2 restore, run an inline Node.js heredoc
that reads the existing config, writes/overrides only the sections it owns (provider entry,
gateway auth, channels), and writes it back. This runs **unconditionally** — not just on
first boot.

**Location**: `start-openclaw.sh` lines 141–265
**Result**: ✅ Validated. Fixes stale R2 config issues (PROB-002). Ensures new secrets
take effect on next container restart after redeploy.

**Caveats**:
- Patcher must not write fields that fail OpenClaw's strict config validation (PROB-006).
- Patcher must be idempotent (running twice produces the same output).
- Test: run `openclaw status` after patching; non-zero exit = bad config.

---

## P-002 – CF AI Gateway provider injection via config patcher

**Date**: 2026-03-03
**Context**: Using Cloudflare AI Gateway as the model provider. Requires building a provider
entry in `openclaw.json` with a `baseUrl`, `apiKey`, and `models` array.

**Approach**: In the patcher, detect `CF_AI_GATEWAY_MODEL` (format: `{provider}/{model}`).
Extract the provider prefix and model ID. Build the base URL:
```
https://gateway.ai.cloudflare.com/v1/{CF_AI_GATEWAY_ACCOUNT_ID}/{CF_AI_GATEWAY_GATEWAY_ID}/{provider}
```
Write a provider entry named `cf-ai-gw-{provider}` with:
- `baseUrl`: gateway URL
- `apiKey`: value of `CLOUDFLARE_AI_GATEWAY_API_KEY`
- `api`: `"anthropic-messages"` for Anthropic provider, `"openai-completions"` otherwise
- `models`: array with the single specified model

Set `agents.defaults.model.primary` to `cf-ai-gw-{provider}/{modelId}`.

**Location**: `start-openclaw.sh` lines 183–219
**Result**: ✅ Validated. This is the working path for CF AI Gateway models.

**Caveats**:
- For `workers-ai` provider, append `/v1` to the base URL.
- All four env vars must be set together: `CLOUDFLARE_AI_GATEWAY_API_KEY`,
`CF_AI_GATEWAY_ACCOUNT_ID`, `CF_AI_GATEWAY_GATEWAY_ID`, `CF_AI_GATEWAY_MODEL`.
- `apiKey` must be non-empty — do not write an empty string.

---

## P-003 – Worker WebSocket proxy with token injection

**Date**: 2026-03-03
**Context**: Cloudflare Workers proxy WebSocket connections to Sandbox containers.
CF Access redirects strip query parameters, losing the `?token=` needed by the gateway.

**Approach**: In the WS proxy handler (`src/index.ts`):
1. Check if `MOLTBOT_GATEWAY_TOKEN` is set and URL lacks `?token=`.
2. If so, clone the URL and inject the token as `?token={value}`.
3. Use the modified URL for `sandbox.wsConnect()`.
4. Create a `WebSocketPair`, accept both ends, wire `message`/`close`/`error` relays.
5. Return `new Response(null, { status: 101, webSocket: clientWs })`.

**Location**: `src/index.ts` lines 283–429
**Result**: ✅ Validated. Fixes PROB-005.

**Caveats**:
- WS close reasons must be ≤ 123 bytes (WebSocket spec); truncate if longer.
- `containerWs` may be null if container not ready; handle gracefully.
- Error messages from the gateway can be transformed before relaying to the client.

---

## P-004 – rclone for R2 config sync (not rsync)

**Date**: 2026-03-03
**Context**: Container config and workspace must persist across restarts via R2.

**Approach**: Use `rclone` (not `rsync`) with these flags:
```bash
rclone sync "$LOCAL_DIR/" "r2:${R2_BUCKET}/{prefix}/" \
--transfers=16 --fast-list --s3-no-check-bucket \
--exclude='*.lock' --exclude='*.log' --exclude='*.tmp' --exclude='.git/**'
```
Background sync loop checks for changed files every 30 s via `find -newer {marker}`.

**Location**: `start-openclaw.sh` lines 270–310
**Result**: ✅ Validated. Avoids PROB-004 (timestamp errors on R2).

**Caveats**:
- Never use `rsync -a` or `rsync --times` against R2.
- Update the marker file (`touch $MARKER`) after each sync, not before.
- The sync loop runs in background (`&`); do not wait for it before starting gateway.

---

## P-005 – `buildEnvVars()` — Worker env → container env mapping

**Date**: 2026-03-03
**Context**: Worker secrets must be forwarded to the container as process env vars.

**Approach**: A dedicated `buildEnvVars(env: MoltbotEnv): Record<string, string>` function
in `src/gateway/env.ts` handles all mapping logic:
- Conditionally includes only vars that are set (no empty strings).
- Handles provider priority: CF AI Gateway > Anthropic (with legacy AI Gateway as override).
- Maps `MOLTBOT_GATEWAY_TOKEN` → `OPENCLAW_GATEWAY_TOKEN` (container-internal name).

**Location**: `src/gateway/env.ts`
**Result**: ✅ Validated. Well-tested (see `src/gateway/env.test.ts`).

**Caveats**:
- Never log secret values from `buildEnvVars()` output. Log `Object.keys(envVars)` only.
- Legacy AI Gateway path (`AI_GATEWAY_API_KEY` + `AI_GATEWAY_BASE_URL`) overrides direct
Anthropic key when both are set — this is intentional but can be surprising.
98 changes: 98 additions & 0 deletions .RALPH/problems.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Recurring Problems – moltworker

---

## PROB-001 – `"x-api-key header is required"` on model calls

**Date**: 2026-03-03
**Symptom**:
```json
{ "type": "error", "error": { "type": "authentication_error", "message": "x-api-key header is required" } }
```
**Root causes (ordered by likelihood)**:

1. **`CF_AI_GATEWAY_MODEL` not set** — Without this var, the inline Node.js config patcher
in `start-openclaw.sh` never creates the `cf-ai-gw-{provider}` provider entry with `apiKey`.
Fix: `wrangler secret put CF_AI_GATEWAY_MODEL` (format: `{provider}/{model}`) → redeploy.

2. **API key secret missing from deployed worker** — Key only exists in `.dev.vars`, not
set via `wrangler secret put`. Fix: `wrangler secret put ANTHROPIC_API_KEY` → redeploy.

3. **Stale R2 config** — First deploy ran with no key; a keyless provider entry was written to R2.
Subsequent boots skip `openclaw onboard` and load the stale config. The inline Node patcher
(which always runs) should overwrite this — if it doesn't, check that `CF_AI_GATEWAY_MODEL`
is set so the patcher block is triggered.

4. **Two provider entries — agent using the keyless one** — Config has both the stale keyless
`cloudflare-ai-gateway` provider AND the correctly keyed `cf-ai-gw-anthropic` provider,
but `agents.defaults.model.primary` points to the keyless one. Fix: verify
`/debug/container-config` and ensure `agents.defaults.model.primary` matches the entry
with a non-empty `apiKey`.

5. **Deploy cancelled (Ctrl-C)** — Secret was set but deploy never completed. Old worker
version is still running. Fix: run `npm run deploy` again and let it complete.

**Verification**: `GET /_admin/` is not relevant. Hit `/debug/container-config` and inspect
`models.providers.{name}.apiKey` — must be non-empty.

---

## PROB-002 – Stale R2 config not updated after adding new secrets

**Date**: 2026-03-03
**Symptom**: After setting new Cloudflare secrets and redeploying, the container behaves as
if the secrets are not there. `/debug/container-config` shows old values.
**Cause**: `start-openclaw.sh` only runs `openclaw onboard` if no config exists. R2-persisted
config survives redeploy. Onboard is skipped; new secrets are never applied.
**Fix**: The inline Node patcher in `start-openclaw.sh` always runs and overwrites provider
entries from the current env. Ensure the patcher logic covers the field you changed.
If the patcher doesn't cover it, add it.

---

## PROB-003 – Deploy interrupted by Ctrl-C; new secrets not live

**Date**: 2026-03-03
**Symptom**: Secret added via `wrangler secret put` but issue persists after what looks like
a deploy. `wrangler tail` shows `Has ANTHROPIC_API_KEY: false`.
**Cause**: `npm run deploy` was interrupted. The old worker version is still serving.
`wrangler secret put` succeeds independently of deploy; the worker must be redeployed to
pick up the new secret.
**Fix**: `npm run deploy` — let it run to completion. Verify with `wrangler tail`.

---

## PROB-004 – rclone/rsync fails with "Input/output error" on R2

**Date**: 2026-03-03
**Symptom**: R2 sync exits non-zero with timestamp-related errors.
**Cause**: R2 does not support setting file timestamps. `rsync -a` preserves timestamps
and fails.
**Fix**: Use `rclone sync` with `--transfers=16 --fast-list --s3-no-check-bucket`.
Never use `rsync -a` or `rsync --times` against R2.

---

## PROB-005 – WebSocket drops immediately after CF Access redirect

**Date**: 2026-03-03
**Symptom**: User authenticates via CF Access and is redirected, but WebSocket connections
fail with code 1006 or 4001.
**Cause**: CF Access redirects strip query parameters. `?token=` is lost.
**Fix**: In `src/index.ts` WS proxy handler, inject the token server-side before calling
`sandbox.wsConnect()` — already implemented. Confirm `MOLTBOT_GATEWAY_TOKEN` is set as
a Worker secret.

---

## PROB-006 – OpenClaw config validation fails after manual edits or patcher bugs

**Date**: 2026-03-03
**Symptom**: Gateway fails to start; logs show config parsing/validation error from OpenClaw.
**Common causes**:
- `agents.defaults.model` set to a bare string instead of `{ "primary": "provider/model" }`.
- Provider entry missing `models` array or `api` field.
- Channel config containing stale keys from an old backup.
- Empty string written for `apiKey` (some OpenClaw versions reject this).
**Fix**: Use `/debug/container-config` to inspect the config. Fix `start-openclaw.sh`
patcher to not write the offending field, or write it correctly.
5 changes: 5 additions & 0 deletions .dev.vars.example
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,8 @@ MOLTBOT_GATEWAY_TOKEN=dev-token-change-in-prod
# CDP (Chrome DevTools Protocol) configuration for browser automation
# CDP_SECRET=shared-secret-for-cdp-auth
# WORKER_URL=https://your-worker.example.com

# Trading bridge (optional)
# TRADING_ENABLED=true
# TRADE_BRIDGE_URL=https://trade-bridge.internal
# TRADE_BRIDGE_HMAC_SECRET=replace-with-shared-secret
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ FROM docker.io/cloudflare/sandbox:0.7.0
# Install Node.js 22 (required by OpenClaw) and rclone (for R2 persistence)
# The base image has Node 20, we need to replace it with Node 22
# Using direct binary download for reliability

ENV NODE_VERSION=22.13.1
RUN ARCH="$(dpkg --print-architecture)" \
&& case "${ARCH}" in \
Expand Down
51 changes: 51 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,8 @@ https://your-worker.workers.dev/?token=YOUR_TOKEN
wss://your-worker.workers.dev/ws?token=YOUR_TOKEN
```



**Note:** Even with a valid token, new devices still require approval via the admin UI at `/_admin/` (see Device Pairing above).

For local development only, set `DEV_MODE=true` in `.dev.vars` to skip Cloudflare Access authentication and enable `allowInsecureAuth` (bypasses device pairing entirely).
Expand Down Expand Up @@ -438,6 +440,55 @@ The previous `AI_GATEWAY_API_KEY` + `AI_GATEWAY_BASE_URL` approach is still supp
| `SLACK_APP_TOKEN` | No | Slack app token |
| `CDP_SECRET` | No | Shared secret for CDP endpoint authentication (see [Browser Automation](#optional-browser-automation-cdp)) |
| `WORKER_URL` | No | Public URL of the worker (required for CDP) |
| `TRADING_ENABLED` | No | Set to `true` to enable admin trading controls that call trade-bridge |
| `TRADE_BRIDGE_URL` | No | Base URL for the trade-bridge service (e.g. private tunnel URL) |
| `TRADE_BRIDGE_HMAC_SECRET` | No | Shared HMAC secret used to sign outbound requests to trade-bridge |


## Trade Bridge Integration

`moltworker` never talks to exchange APIs directly. Instead, the admin routes call an external `trade-bridge` service that is responsible for risk checks and Freqtrade execution.

### Connection Flow

1. Operator calls a protected admin endpoint in this worker (Cloudflare Access auth already enforced for `/api/admin/*`).
2. Worker checks feature/config gates:
- `TRADING_ENABLED` must be `true`
- `TRADE_BRIDGE_URL` and `TRADE_BRIDGE_HMAC_SECRET` must be set
3. Worker signs the outbound request with HMAC-SHA256 using the canonical string:
- `{timestamp}.{nonce}.{method}.{path}.{jsonBody}`
4. Worker sends request to `TRADE_BRIDGE_URL` with these headers:
- `X-Molt-Timestamp`
- `X-Molt-Nonce`
- `X-Molt-Signature`
- `X-Molt-Skew-Ms`
5. `trade-bridge` validates signature + timestamp + nonce replay protection before executing anything.

### Admin API -> Trade Bridge API Mapping

| Moltworker endpoint | Bridge endpoint | Purpose |
|---|---|---|
| `POST /api/admin/trading/signal` | `POST /signals` | Submit a signed trading signal payload (for example `TON/USDT`). |
| `GET /api/admin/trading/status` | `GET /status` | Read bridge/trading mode and health status. |
| `POST /api/admin/trading/pause` | `POST /pause` | Pause new trade execution. |
| `POST /api/admin/trading/kill-switch` | `POST /kill-switch` | Trigger global emergency stop. |

### Example Signal Request

```json
{
"symbol": "TON/USDT",
"action": "buy",
"strategy": "manual-test",
"notional": 25
}
```

### Deployment Notes

- Keep `TRADE_BRIDGE_URL` private (Cloudflare Tunnel / WireGuard / private network).
- Keep `TRADE_BRIDGE_HMAC_SECRET` unique per environment (`local`, `staging`, `prod`).
- Leave `TRADING_ENABLED` unset or `false` by default; enable only where intended.

## Security Considerations

Expand Down
Loading