Add import-mbox (supports HEY.com exports) with resume/checkpoints#103
Add import-mbox (supports HEY.com exports) with resume/checkpoints#103
Conversation
Support importing email exports from MBOX files or .zip bundles, including HEY.com. Includes streaming MBOX reader, importer with resume checkpoints, and CLI docs/tests.
- Track absolute MBOX offsets after Seek() so checkpoints resume correctly. - Start multi-file zip imports from the active checkpoint file to avoid discarding progress.
| ext := strings.ToLower(filepath.Ext(abs)) | ||
| switch ext { | ||
| case ".mbox", ".mbx": | ||
| return []string{abs}, nil |
There was a problem hiding this comment.
🚨 Path traversal via unvalidated zip extraction (high severity)
extractMboxFromZip uses filepath.Base() to flatten paths but does not validate the result against directory traversal sequences before joining with destDir. A malicious zip could contain entries with names like '..%2f..%2fetc%2fpasswd' that survive filepath.Base() encoding. Use filepath.Clean() and check that the result does not escape destDir via filepath.Rel() or strings.HasPrefix() validation after cleaning.
Automated security review by Claude 4.5 Sonnet - Human review still required
| if err == nil && len(files) > 0 { | ||
| return files, nil | ||
| } | ||
| // Sentinel exists but no files found; fall through to re-extract. |
There was a problem hiding this comment.
extractMboxFromZip does not validate the size of individual zip entries or the total extracted size. A malicious zip could exhaust disk space or cause denial of service. Add limits on individual file size (e.g., check zf.UncompressedSize64) and cumulative extracted bytes before writing.
Automated security review by Claude 4.5 Sonnet - Human review still required
internal/importer/mbox_import.go
Outdated
| func storeAttachment(st *store.Store, attachmentsDir string, messageID int64, att *mime.Attachment) error { | ||
| if attachmentsDir == "" || len(att.Content) == 0 || att.ContentHash == "" { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
storeAttachment creates parent directories with os.MkdirAll(..., 0755), making them world-readable. Attachment content is sensitive (documents, images from 20+ years of email). Use fileutil.SecureMkdirAll(..., 0700) instead to match OAuth token directory permissions and prevent local multi-user disclosure.
Automated security review by Claude 4.5 Sonnet - Human review still required
| continue | ||
| } | ||
| name := filepath.Base(zf.Name) | ||
| ext := strings.ToLower(filepath.Ext(name)) |
There was a problem hiding this comment.
🚨 Path traversal in zip extraction via malicious entry names (high severity)
The code flattens zip entry names to base name but still uses zf.Name in the collision hash, allowing an attacker to craft a zip with entries like '../../../.ssh/authorized_keys.mbox' that would be extracted to the intended directory but could cause issues. While filepath.Join prevents traversal above destDir, the logic should validate that zf.Name contains no path separators before processing. Use filepath.Base(zf.Name) consistently or reject entries with directory components.
Automated security review by Claude 4.5 Sonnet - Human review still required
internal/importer/mbox_import.go
Outdated
| storagePath := filepath.Join(subdir, att.ContentHash) | ||
| fullPath := filepath.Join(attachmentsDir, storagePath) | ||
|
|
||
| if err := os.MkdirAll(filepath.Dir(fullPath), 0755); err != nil { |
There was a problem hiding this comment.
The storeAttachment function uses att.ContentHash[:2] as a subdirectory without validating that ContentHash is non-empty or has sufficient length, though line 386 checks len>0. However, if ContentHash is exactly 1 character (malformed), this would panic. Add explicit length validation: if len(att.ContentHash) < 2, return an error before using it for path construction.
Automated security review by Claude 4.5 Sonnet - Human review still required
|
Addressed the automated security review items:
All checks re-run: go test -tags fts5 ./... |
|
Follow-up security hardening pushed:
Re-ran: make test, go vet ./..., make lint. |
|
Addressed the latest automated note about mutable globals for zip limits:
Re-ran: make test, go vet ./..., make lint. |
|
I'll look at this more closely tomorrow, but thank you for working on this! Highly recommend using https://www.roborev.io/ to help you develop faster |
|
great tip, doing it right now :) |
- Validate From separator lines by parsing the date suffix to avoid incorrectly splitting on unescaped "From " lines in message bodies - Normalize checkpoint file paths to absolute before comparison to handle relative/absolute mismatches during resume - Ensure extracted zip files use absolute destination paths - Compute attachment content hash when missing instead of silently skipping storage - Set HasAttachments/AttachmentCount to false/0 when attachments are disabled - Improve e2e test isolation by running through full CLI invocation
Changes: - Fix `internal/mbox` reader to handle long lines (`bufio.ErrBufferFull`) by accumulating partial reads with a max-line cap. - Accept `"From "` separators with named timezones (adds `"Mon Jan 2 15:04:05 MST 2006"`), and mirror this in importer date parsing. - Route `import-mbox` output through Cobra’s configured stdout/stderr, and warn if the ZIP extraction sentinel can’t be written. - Update importer to correct `has_attachments`/`attachment_count` based on attachments actually stored. - Add tests for long-line MBOX parsing and named-timezone separators (plus importer date parsing). Proof: - `GOCACHE=/tmp/go-build go build ./...` - `GOCACHE=/tmp/go-build go test ./...`
Changes:
- Prevent attachment DB upserts when attachment file `Stat` fails (importer + sync) to avoid records pointing at unwritten files
- Ensure zip extraction collision disambiguation loops until the output name is unique to avoid clobbering on crafted exports
- Batch MBOX import existence checks to avoid per-message `MessageExistsBatch(...[]{id})` DB round trips
- Clarify that Unix `fileutil.Secure*` helpers are best-effort wrappers (no symlink/TOCTOU hardening)
- Add tests for crafted zip name collisions, attachment stat errors, and resume across multi-mbox zip exports
Build/tests:
- `GOCACHE=/tmp/go-build go build ./...`
- `GOCACHE=/tmp/go-build go test ./...`
Changes: - Validate MBOX inputs up front and fail the sync run on non-mbox files (`internal/importer/mbox_import.go`). - Mark `sync_runs` as `failed` (not `completed`) when an import finishes with `errors_count > 0` (`internal/importer/mbox_import.go`). - Print `Import complete (with errors).` when any errors occurred (`cmd/msgvault/cmd/import_mbox.go`). - Dedupe per-flush pending messages by `SourceMsg` to avoid over-counting `MessagesAdded` for duplicates in the same batch (`internal/importer/mbox_import.go`). - Add best-effort symlink checks for zip extraction dirs and attachment paths (`cmd/msgvault/cmd/import_mbox.go`, `internal/importer/mbox_import.go`, `internal/sync/sync.go`) plus targeted tests (`internal/importer/mbox_import_test.go`, `cmd/msgvault/cmd/import_mbox_test.go`). Verification: - `GOCACHE=/tmp/go-build go build ./...` - `GOCACHE=/tmp/go-build go test ./...` Roborev:
Changes: - Prevent permanent “skips” after partial ingest by skipping only when `message_raw` exists (MBOX importer + Gmail sync). - Fix `UpsertMessage` to always return the correct message ID after upserts (don’t rely on `LastInsertId` when an upsert resolves to UPDATE). - `import-mbox`: reject non-regular input paths and sanitize extracted ZIP entry filenames for Windows. - `import-mbox`: print `Imported (partial)` for the last file when interrupted mid-file. - Centralize attachment content-hash validation on `export.ValidateContentHash` and avoid nested `append` chains when building address lists. - Add tests for “partial ingest then rerun repairs raw data”, Windows-invalid ZIP names, and non-regular export path rejection. - Verified `GOCACHE=/tmp/go-build go build ./...` and `GOCACHE=/tmp/go-build go test ./...`.
Changes: - Harden zip extraction to avoid writing into pre-populated extraction dirs and to create extracted files exclusively (no clobber/write-through-symlink). - Reject symlinks/special files when reusing cached zip extractions (via `.done`). - Enforce a max per-message size in the MBOX reader and use it during import to avoid unbounded memory growth. - Log a warning (and count an error) if the initial checkpoint save fails. Verification: - `GOCACHE=/tmp/go-build go build ./...` - `GOCACHE=/tmp/go-build go test ./...` Proof: - Zip fixes: `cmd/msgvault/cmd/import_mbox.go:320`, `cmd/msgvault/cmd/import_mbox.go:393`, `cmd/msgvault/cmd/import_mbox.go:537`
Changes: - ImportMbox: on existence-check errors, attempt ingest instead of silently skipping messages. - ImportMbox: stop advancing (and saving) checkpoints past ingest failures so a resumed run retries the failed message. - Zip extraction: handle name collisions case-insensitively on macOS/Windows and fix disambiguation base/ext handling. - Attachments: allow symlinked attachments directories by resolving the symlink target (still reject symlinked subdirs/files). Proof: - `GOCACHE=/tmp/go-build go build ./...` - `GOCACHE=/tmp/go-build go test ./...`
Changes: - Fix incremental label updates/removals when a message exists but `message_raw` is missing by re-ingesting raw before applying labels. - Prevent `import-mbox` checkpoint offsets from freezing after the first ingest error (resume offsets continue advancing) and log failed `source_msg`/offset. - Add a regression test for label removal behavior when raw is missing.
Changes: - Normalize saved MBOX checkpoint paths and match resume files via `os.SameFile` (importer + `import-mbox` CLI). - Make MBOX `source_message_id` include an offset disambiguator (raw SHA-256 still used for thread fallback). - Eliminate attachment write TOCTOU by using atomic `O_EXCL` creates (importer + sync), validating existing files are regular. - Update MBOX import tests to cover symlink/realpath resume and the new `source_message_id` behavior. - Document attachment permission expectations in `SECURITY.md`. - Verified: `GOCACHE=/tmp/go-build go build ./...` and `GOCACHE=/tmp/go-build go test ./...`.
Changes: - Prevent `import-mbox` checkpoints from advancing past a failed ingest so resume won’t skip failed messages - Count attachment storage failures in import error totals/checkpoints (without aborting message ingest) - Compute missing Gmail attachment SHA-256 hashes in `storeAttachment` before validating - Harden zip extraction cleanup by resolving/import-dir symlink checks and validating extraction paths - Add focused tests for checkpoint-on-failure and “missing attachment hash” behavior
Changes: - Added shared `export.StoreAttachmentFile` used by importer/sync to validate existing attachment files (size + SHA-256) before reusing them, preventing silent corruption on dedupe. - Fixed `import-mbox` signal handler defer order so `signal.Stop` runs before closing the `done` channel (avoids late-signal exit race during shutdown). - Hardened zip extraction cache reuse by validating cached extracted `.mbox/.mbx` files against the zip’s expected output names + sizes, and rejecting unexpected cached files. - Made zip filename collision handling always case-fold keys (covers case-insensitive Linux mounts too). - Added tests for invalid attachment `ContentHash` in sync and for existing-file hash mismatch in attachment storage; verified with `GOCACHE=/tmp/go-build go build ./...` and `GOCACHE=/tmp/go-build go test ./...`.
Changes: - Resume multi-file zip imports from the next file after the last completed sync (avoids rescanning already-finished files after an in-between-files interrupt). - Clear the importer’s pending batch slice on flush to avoid retaining large `Raw` buffers longer than needed. - Update the MCP export-attachment test to use a temp `HOME` with `Downloads/` so it doesn’t try to write to the real `~/Downloads`. Verification: - `GOCACHE=/tmp/go-build go build ./...` - `GOCACHE=/tmp/go-build go test ./...`
Changes: - Make `import-mbox` return a non-nil error when the import completes with `totalErrors > 0`. - Advance the saved resume checkpoint past skippable MBOX reader errors (and log `next_offset`) to avoid re-hitting the same failing region after an interrupt. - Store attachment `storage_path` with forward slashes (`/`) and convert with `filepath.FromSlash` when joining on disk. - Fix `resolveMboxExport` unsupported-format error text to include `.mbx`. - Document `copyWithLimit` overflow behavior (may consume one extra byte from `src`). - Add tests covering non-zero `import-mbox` exit on partial failure and checkpoint advancement on reader error + interrupt. - Verified `GOCACHE=/tmp/go-build go build ./...` and `GOCACHE=/tmp/go-build go test ./...`.
Changes: - Only mark MBOX import sync runs as `failed` on hard ingest errors; complete runs with non-fatal errors so resume-between-files can use the last successful sync. - Skip strict zip cache size validation when a zip entry reports unknown `UncompressedSize64` (0). - Move MBOX importer test knobs (`MaxMessageBytes`, `IngestFunc`) into `MboxImportOptions` to avoid global mutation/data races. - Harden attachment validation by hashing via a single opened FD (uses `O_NOFOLLOW` on Unix) to close the TOCTOU window. - Add targeted tests for “soft errors still complete sync” and “zip cache validation with unknown size”. Build: `GOCACHE=/tmp/go-build go build ./...` Tests: `GOCACHE=/tmp/go-build go test ./...`
Changes: - Extract ZIP MBOX exports into a fresh temp dir and `os.Rename` into place (avoids `RemoveAll(destDir)` TOCTOU) - Broaden MBOX `From ` separator detection to accept more common date variants and ignore trailing tokens like `remote from ...` (plus tests) - Remove redundant extension local in ZIP name collision disambiguation - Document that `StoreAttachmentFile` resolves a symlinked `attachmentsDir`
Changes: - Harden zip extraction cache validation: treat empty `.mbox/.mbx` entries as size-known and reject extra unexpected cache files (beyond `.done`). - Add tests for empty-entry cache size enforcement and polluted cache directories. - Strengthen attachment storage base path handling by creating `attachmentsDir`, resolving with `EvalSymlinks`, and validating the resolved directory. - Fix `import-mbox` signal handler teardown to avoid late `os.Exit(130)` from queued signals (close `done` first, drain `sigChan`). - Verified `GOCACHE=/tmp/go-build go build ./...` and `GOCACHE=/tmp/go-build go test ./...`.
Changes: - Refused reuse of cached ZIP extraction when a zip entry’s uncompressed size is unknown (prevents accepting truncated/corrupt cached files). - Fixed multi-file skip logic to only skip a file when `checkpointOffset == fileSize` (ignore last-sync cursor when `checkpointOffset > fileSize`). - Tracked “repaired” ingests as `MessagesUpdated` (via `MessageExistsBatch`) instead of counting them as `MessagesAdded`, and surfaced `Updated` in CLI output. - Normalized `attachmentsDir` with `filepath.Abs` before `EvalSymlinks`/writes to avoid surprising relative-path behavior. - Verified: `GOCACHE=/tmp/go-build go build ./...` and `GOCACHE=/tmp/go-build go test ./...`.
Changes: - Remove the ineffective symlink rejection after resolving `attachmentsDir` symlinks (still validates the resolved dir). (`internal/export/store_attachment.go`) - Avoid hashing entire `.zip` exports on resume by using a size+mtime cache key for the extraction directory. (`cmd/msgvault/cmd/import_mbox.go`) - Make `parseFromLineDate` parse only the expected 5/6-token prefix so `remote from ...` suffixes don’t break fallback date parsing. (`internal/importer/mbox_import.go`) - Use SQLite `RETURNING id` in `UpsertMessage` to avoid an extra `SELECT id` per message. (`internal/store/messages.go`) Proof: - Relevant code: `internal/export/store_attachment.go#L45`, `cmd/msgvault/cmd/import_mbox.go#L365`, `internal/importer/mbox_import.go#L513`, `internal/store/messages.go#L120` - Build: `GOCACHE=/tmp/go-build go build ./...` - Tests: `GOCACHE=/tmp/go-build go test ./...`
Changes: - Fix `StoreAttachmentFile` concurrent-writer false hash/size mismatches by writing to a temp file and renaming into place. - Add a concurrent-writer unit test for attachment storage de-duping. - Make `import-mbox` zip extraction cache key depend on zip entry names/sizes/CRC32 to avoid reusing stale extractions. - Share mbox `"From "` separator date parsing via `mbox.ParseFromSeparatorDate` and use it for importer fallback dates. - Add a fallback in `UpsertMessage` when SQLite doesn’t support `RETURNING` (Exec + SELECT). Proof: - `GOCACHE=/tmp/go-build go build ./...` - `GOCACHE=/tmp/go-build go test ./...`
Changes: - Recompute SHA-256 in `StoreAttachmentFile` and reject provided `ContentHash` values that don’t match the attachment bytes (plus a unit test). - Make `copyWithLimit` fail fast on `(0, nil)` reads (returns `io.ErrNoProgress`) to avoid hangs (plus a unit test). - Strengthen zip extraction cache validation by checking cached mbox CRC32 (plus a unit test). - Verified `GOCACHE=/tmp/go-build go build ./...` and `GOCACHE=/tmp/go-build go test ./...`.
Changes: - Make zip extracted-mbox cache validation O(1) by default (size/limit checks; optional CRC via `MSGVAULT_ZIP_CACHE_VALIDATE_CRC32`) - Reuse cached extractions even when the zip central directory omits uncompressed sizes (no hard-fail on “unknown size”) - Strengthen zip cache key by including zip file size + modtime in addition to entry metadata - Fix zip name disambiguation to derive the extension from the sanitized output name (fallback to original entry)
Changes: - Make attachment store failures abort ingest before writing `message_raw`, so reruns retry instead of skipping. - Return `context.Canceled` for `import-mbox` interruptions and map cancellation to exit code 130. - Enable CRC32 validation by default for small, size-known extracted ZIP cache files; add regression tests for cache corruption, interrupt status, and attachment retry. Proof: - Build: `GOCACHE=/tmp/go-build go build ./...` - Tests: `GOCACHE=/tmp/go-build go test ./...` - Added tests: `TestImportMbox_RerunRetriesAttachmentsAfterStoreFailure`, `TestImportMboxCmd_ReturnsCanceledWhenContextCanceled`, `TestExtractMboxFromZip_CacheValidationRejectsSameSizeCRCMismatchByDefault`
Changes: - Run CRC32 cache validation for extracted `.mbox/.mbx` even when the zip central directory reports unknown uncompressed size (when under the CRC threshold), and update the unknown-size cache test to expect tamper rejection. - Remove zip `ModTime` from `zipMboxCacheKey` to avoid redundant extraction dirs; add a test that touching the zip doesn’t invalidate the cache. - Clarify the attachment-store rename race comment and document the Windows `openNoFollow` limitation. Proof: - `GOCACHE=/tmp/go-build go build ./...` (ok) - `GOCACHE=/tmp/go-build go test ./...` (ok)
Changes: - Treat zip entry `Close()` (CRC/integrity) errors as extraction failures and delete the partially written output. - Warn when cached zip extraction CRC32 validation is skipped due to size. - Add `ParseFromSeparatorDateStrict` and use it for sent_at fallback to avoid incorrect UTC from unknown TZ abbreviations. - Add tests for zip checksum corruption and strict From-line TZ parsing. - Verified: `GOCACHE=/tmp/go-build go build ./...`; `GOCACHE=/tmp/go-build go test ./...`.
Changes: - Prefix MBOX `source_message_id` with a per-file discriminator to avoid cross-file collisions. - Treat `import-mbox` CLI as non-failing for soft errors; exit non-zero only when the import reports hard ingest failures. - On `os.Rename` failure during attachment storage, validate and reuse an existing destination file (better concurrent-writer handling on Windows). - Document that permissive `From ` separator detection can mis-split unescaped body lines in edge cases. Verification: - `GOCACHE=/tmp/go-build go build ./...` - `GOCACHE=/tmp/go-build go test ./...`
Changes: - Normalize provided attachment `ContentHash` to lowercase before validation/compare so uppercase hex digests are accepted and canonicalized. - Keep `messages.has_attachments` / `attachment_count` reflecting MIME reality even under `--no-attachments` (skip storing attachment rows/files) and clarify the flag help text. - Remove the unused `attachmentErrors` return path by simplifying the MBOX ingest hook to return only `error`, updating call sites/tests. - Refine ZIP cache `SizeKnown` heuristic to avoid relying on `CRC32 == 0`, and update the related cache validation test. - Add regression tests for uppercase `ContentHash` acceptance and `--no-attachments` attachment-metadata behavior; verified with `GOCACHE=/tmp/go-build go build ./...` and `GOCACHE=/tmp/go-build go test ./...`.
Changes: - `internal/importer/mbox_import.go`, `internal/sync/sync.go`: use `len(att.Content)` when upserting attachments so the DB size matches the validated/stored bytes. - `internal/importer/mbox_import.go`: generate `source_message_id` using a per-message ordinal instead of byte offsets to avoid duplicates from tiny offset shifts. - `cmd/msgvault/cmd/import_mbox.go`: always CRC32-validate cached extracted `.mbox/.mbx` files by default (remove size-based skip) so large cached files can’t be silently tampered with at same size. - `internal/export/open_nofollow_unix.go`, `internal/export/open_nofollow_other.go`: narrow the Unix build tag and add a non-Unix fallback to avoid non-Windows/non-Unix build breakage.
Changes: - Make MBOX `source_message_id` stable across resume by using `nextOffset` (file position) instead of a monotonic ordinal. - Derive `fileDisc` from a cheap content fingerprint (size + first/last 64KiB) so re-imports are idempotent across path changes. - Remove redundant `seenOK` duplicate-guarding in `flushPending`, and add tests covering resume-after-hard-error and path-change idempotency. Proof: - `GOCACHE=/tmp/go-build go build ./...` - `GOCACHE=/tmp/go-build go test ./...`
Changes: - Validate resumed MBOX checkpoint offsets against file size and fail sync when the offset is beyond EOF (with test). - Reduce zip cache validation I/O by skipping CRC32 checks for size-known cached mbox files unless explicitly enabled (tests updated). - Fix symlink rejection for the zip extraction imports dir by checking the pre-`EvalSymlinks` path (with test). - Add an option to disable mboxrd `>From ` unescaping (with test) and document the behavior. - Clarify in `import-mbox --no-attachments` help that attachments won’t be backfilled on later reruns.
Changes: - Always CRC32-validate zip extraction cache hits to avoid reusing corrupted extracted `.mbox` content. - Tighten attachment directory permissions on existing directories and cache successful attachment validations to avoid repeated full-file SHA256 re-hashing. - Make `import-mbox` `source_message_id` stable by using a per-file message sequence number and persist that sequence in checkpoints for correct resume behavior. Verification: `GOCACHE=/tmp/go-build go build ./...` and `GOCACHE=/tmp/go-build go test ./...`.
Changes: - Avoid full-file CRC32 rescans on cached ZIP reruns by default; still CRC-validate when ZIP uncompressed size is unknown, and allow opt-in CRC validation via `MSGVAULT_ZIP_CACHE_VALIDATE_CRC32`. - Make attachment validation cache safer by including file `mtime` in the cache key so rewrites trigger re-validation. - Reword MBOX discriminator comment to reflect it’s a content-derived fingerprint, not true file identity. - Only exit with code 130 on `context.Canceled` when the top-level signal context is actually canceled.
Security Review: 4 High/Medium Issues FoundClaude's automated security review identified potential security concerns. Please review each finding below. Note: 1 low severity issue(s) were omitted to reduce noise. 🚨 Potential path traversal via mboxFileDiscriminator fallback (high)Location: When mboxFileDiscriminator() fails, it falls back to a hash of cpFile (line 122-124). If cpFile contains attacker-controlled symlinks or path components, this could allow path confusion attacks. The fileDisc is later used in source_message_id generation, which could enable collision attacks if an attacker can influence the file path. Always validate that cpFile is within expected boundaries and consider using only the resolved absolute path for hashing.
|
|
@wesm it did 31 review rounds and still not done, but at this point it seems like it's doing a whole review of the codebase which honestly seems to fall outside the scope of this PR ;) feel free to merge at any step |
|
Oh, did you just run refine without any guardrails? You need to be a little more in the loop than that. I’ll take a look when I can. |
roborev: Combined Review
Agent: codex | Type: security | Status: doneSummary Findings
Agent: gemini | Type: security | Status: doneThis The review found that the new feature has been implemented with significant attention to security. Furthermore, a refactoring related to the new functionality resolved a pre-existing high-severity TOCTOU (Time-of-Check, Time-of-Use) vulnerability in a Findings
No new issues found. Agent: codex | Type: review | Status: doneFindings
Summary Open Questions / Assumptions
Agent: gemini | Type: review | Status: doneThis change adds an Review Findings
|
|
I haven't forgotten about this. I have been in triage mode this week and will work on reviewing and testing this with my own .mbox files when I am able, hopefully this weekend! |
Why
msgvault currently supports Gmail sync, but many email providers (notably HEY.com) don’t offer IMAP/POP access. They do offer exports as MBOX (often delivered as a
.zipcontaining one or more.mboxfiles). This PR adds an offline import path so those accounts can be brought into msgvault with the same local-first guarantees.What’s in this PR
msgvault import-mbox <identifier> <export-file>.mbox/.mbxor a.zipcontaining.mboxfiles.--source-type(example:--source-type hey).--labelto apply a label to newly imported messages.--no-resumeto start fresh.dataDir/imports/mbox/<sha256(zip)>with a.donesentinel and stable ordering.internal/mbox) with mboxrd-style unescaping.internal/importer) that stores:{file, offset}and uses absolute offsets.How to test
Automated:
make testgo vet ./...make lintEnd-to-end CLI test added:
cmd/msgvault/cmd/import_mbox_e2e_test.gowrites a real.mboxfile to disk, runs the cobra command, and asserts DB + attachment file results.Manual example (HEY.com export):
Notes
--no-attachmentsdisables attachment storage (disk + DB). Existing-message fast-skip means reruns won’t backfill attachments/labels for already-imported messages.review-loop-decisions.md.