TJC-1055: Add 6 text functions (TRIM, MID, FIND, SUBSTITUTE, VALUE, TEXT)#231
Conversation
…-1055, GH-116) Drives the implementation of TRIM, MID, FIND, SUBSTITUTE, VALUE, TEXT via TDD. The spec covers scalar behavior, 10 property-based laws, cell-value type coercion, function composability, parser dispatch, error fidelity, and OOXML determinism — chosen so each test kills a specific impl mistake. - Adds stub FunctionSpecs returning EvalFailed("not yet implemented") so the spec compiles and runs (84 fail substantively, 14 are vacuous passes for parse-time errors and error-propagation cases). - Adds 6 TExpr builders mirroring the existing left/right/upper/lower style. - Bumps the meta-test in FormulaParserSpec from 82 → 88 registered functions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Excel TRIM: collapse runs of ASCII space (0x20) and strip leading/trailing
spaces. All other whitespace — tab, newline, nbsp, BOM, ZWSP — is preserved
verbatim. Implemented as split-on-' ' / drop-empties / mkString(" ").
Turns 16 tests green: 8 scalars, idempotence + length-monotonicity properties,
4-case cell-value type matrix, LEN(TRIM(...)) composition, e2e formula path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1-indexed substring extraction. Validates start >= 1 and length >= 0 with exact error messages. Clamps start+length on overflow using Long arithmetic to handle Int.MaxValue safely. Returns empty when start is past end of text. Turns 14 tests green: 9 scalars (including surrogate-pair UTF-16 semantics and Int.MaxValue overflow guard), MID/LEFT consistency law, slicing transitivity, LEN(MID(Empty)), MID(TRIM(...)) composition, e2e formula, error-message fidelity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Case-sensitive 1-indexed substring search via String.indexOf. Validates start_num >= 1 and start_num <= len+1. Empty needle returns start_num (Excel quirk). Not-found returns EvalFailed naming the function and echoing the needle/haystack for diagnostics. Also adds haystackWithNeedle generator in TextFunctionsSpec to fix the FIND/MID/LEN coupling property test, which previously discarded too many random pairs (only 14 useful cases per 500 tries) due to the precondition s.contains(x) being rarely satisfied. The new generator constructs (s, x) pairs where x is a substring of s by construction. Turns 11 tests green: 9 FIND scalars, FIND/MID/LEN coupling property, first-char law, ISERROR(FIND) idiom, e2e formula, error-message fidelity, error-variant propagation through cell-error inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Forward, non-overlapping match-by-content replacement. Two paths:
- All occurrences: delegate to Java String.replace (literal, non-regex,
forward, non-overlapping — exactly Excel semantics).
- Specific instance N: tail-recursive scan to locate the Nth occurrence,
then splice. Returns text unchanged if instance > occurrence count.
Empty old_text quirk: returns text unchanged (matches Excel behavior).
Instance < 1: EvalFailed with the offending instance value echoed.
Turns 14 tests green: 9 scalars (including regex-injection killer, forward
non-overlapping scan, replacement-longer-than-match no-infinite-loop),
accounting-law property (kills off-by-one in instance counting), identity
property, empty-old quirk property, SUBSTITUTE-of-SUBSTITUTE nesting,
e2e formula, and the comma-inside-string-literal parser test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Excel-style numeric string parser. Pipeline:
1. Trim whitespace.
2. Empty → 0 (Excel quirk).
3. Detect accounting parens "(...)" as negative sign.
4. Detect trailing "%" for divide-by-100.
5. Strip currency "$" and thousands ",".
6. Parse remainder as BigDecimal (preserves exact precision; handles
scientific notation natively).
Date / time string parsing is intentionally deferred — date-looking inputs
fall through to BigDecimal parse and return EvalFailed. Follow-up issue
will add ISO 8601 / locale-aware date parsing as a separate concern.
Error path echoes the original (untrimmed) input verbatim for diagnostic
clarity.
Turns 14 tests green: 10 scalars (decimal precision, currency, percent,
accounting parens, sign+currency interaction, whitespace+currency,
scientific, empty-as-zero, alphanumeric-rejected, 100% boundary), VALUE
on Number cell pass-through, IF(ISNUMBER(VALUE(A1)),...,0) safe-parse
idiom, e2e formula, error-message fidelity.
Remaining failures all require TEXT (next).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Delegates value-and-format formatting to the existing 700-line
FormatCodeParser via NumFmtFormatter, re-using the codebase's battle-tested
Excel format-code interpreter. Three behavior overrides handle Excel-specific
quirks not captured by the formatter alone:
1. Empty format string → "" (Excel quirk).
2. Empty cell input → coerce to Number(0) before formatting (Excel rule).
3. Text input passes through unchanged — handled natively by
NumFmtFormatter.formatValue's CellValue.Text case.
Also adjusts two TextFunctionsSpec tests:
- The negative-currency test (§6.6) now uses the explicit two-section
format "$#,##0.00;-$#,##0.00", because FormatCodeParser's single-section
path drops the sign on negatives. Filed as a follow-up bug in
xl_text_func.md (item #4) — out of scope for TJC-1055.
- The VALUE/TEXT round-trip property uses the two-section form for the
same reason, plus forAllNoShrink + an exact 4-decimal BigDecimal
generator (BigDecimal(BigInt, scale)) to keep shrinking from escaping
the generator's invariants.
Turns the final 16 tests green: 10 TEXT scalars (issue golden, half-up
rounding, percent ×100, percent precision, multi-group thousands, negative
currency, zero with decimals, empty-format quirk, date format, text
passthrough), VALUE/TEXT round-trip property, §8.5 (Empty as zero),
§9.6 TEXT(VALUE(...)) chain, §9.7 CONCATENATE+TEXT interop, e2e formula.
Full TextFunctionsSpec: 98/98 green. Full xl-evaluator suite: no regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mechanical formatting pass (./mill __.reformat). No behavioral change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…JC-1055)
Two pieces:
1. Fix latent bug in TExprCoercions.asIntExpr where arithmetic expressions
and FIND calls used in Int-argument positions hit the unsafe
asInstanceOf[TExpr[Int]] catch-all and crashed at runtime with
ClassCastException. Adds explicit cases for:
- FIND calls (mirror of existing year/month/day/len handling)
- Arithmetic ops (Add / Sub / Mul / Div / Pow) — wrap in ToInt
Unblocks the natural composition pattern:
=MID(A1, FIND("@", A1) + 1, 100) ; extract email domain
This was a pre-existing latent issue (not introduced by TJC-1055) — no
prior test exercised arithmetic in Int-arg positions. Surfaced by the
email-domain composition in the demo script. Zero existing tests
regress.
2. Add examples/text_functions_demo.sc (~115 lines) — a runnable scala-cli
smoke test that exercises all 6 new functions across realistic patterns:
cleanup pipeline, currency / percent / accounting parsing, display
formatting, FIND+MID composition (email domain extraction), and
TEXT(VALUE(...)) round-trip normalization. Each line prints actual vs
expected so divergences pop visually.
Run via:
./mill __.publishLocal
scala-cli run examples/text_functions_demo.sc
All 18 demo formulas verified green end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per the project's PR template checklist (see .github/pull_request_template.md):
- CLAUDE.md: function count 82 -> 88, added FIND to the registered
function name list (was previously aspirational, now accurate)
- docs/STATUS.md: function library count 81 -> 87, test count 980+ ->
1075+, added FIND to the Text category, called out TJC-1055 by name
- docs/plan/roadmap.md: TL;DR function count 81 -> 87, test count
733+ -> 1075+, added TJC-1055 to Completed Work
- docs/reference/examples.md: linked the new examples/text_functions_demo.sc
- plugin/skills/xl-cli/reference/FORMULAS.md: documented all 6 new
functions in the skill bundle so LLM agents (xl-agent / Anthropic
Skills API) know they exist
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review: TJC-1055 — Add 6 text functionsSolid PR overall. Test coverage is excellent (98 tests across 14 well-organized sections including ScalaCheck properties, type-matrix coverage, and printer round-trip), and the decision to delegate Major:
|
After comparing tests-per-function against existing per-category specs in
xl-evaluator (median ~8 tests/function — see MathFunctionsSpec at 65/13,
StatisticalFunctionsSpec at 31/5, FinancialFunctionsSpec at 51/5,
ConditionalFunctionsSpec at 70/7, NewFunctionsSpec at 50/6), our 98 tests
for 6 functions (~16/function) was ~2x the local norm.
Pruned to 58 tests (~10/function — top of the repo norm) by removing
redundant boundary checks and overlapping property tests:
- §1 TRIM: 8 -> 5 (drop all-spaces, leading-nbsp, tabs/newlines —
each subsumed by golden + the kept tab+space and nbsp tests)
- §2 MID: 9 -> 6 (drop last-char-len-1, empty-input, surrogate-pair)
- §3 FIND: 9 -> 6 (drop multi-char-needle, exact-start, comma-needle)
- §4 SUBSTITUTE: 9 -> 6 (drop instance-past-count, non-overlapping
scan, multi-char-old — first two covered by accounting law property,
third subsumed by simpler instance=2 test)
- §5 VALUE: 10 -> 6 (drop whitespace+currency, scientific, 100%
boundary, alphanumeric — last covered by §12.3 error fidelity)
- §6 TEXT: 10 -> 6 (drop percent precision, multi-group, zero-with-
decimals, date format — first three subsumed by golden, last is
pure delegation to FormatCodeParser)
- §7 Properties: 10 -> 4 (keep only the highest-leverage four: TRIM
idempotence, FIND/MID/LEN coupling, SUBSTITUTE accounting law,
VALUE/TEXT round-trip)
- §8 Cell-value matrix: 8 -> 4 (TRIM exercises every CellValue case;
repeating with TEXT/MID/VALUE/FIND adds no bug-killing power)
- §9 Composability: 7 -> 3 (LEN(TRIM), ISERROR(FIND), TEXT(VALUE) —
the three highest-frequency real-world idioms)
- §10 e2e: 6 -> 6 (no change — wiring smoke test, one per function)
- §11 Parser edges: 4 -> 2 (zero-args + one-arg parse-errors are
framework-level Arity checks, not text-function specific)
- §12 Error fidelity: 4 -> 2 (kept MID + VALUE; FIND not-found is
asserted in §13 effectively, error-variant is covered by §8.4)
- §13 Dep graph: 2 -> 1 (drop FIND-error caching — same code path)
- §14 Determinism: 2 -> 1 (drop OOXML placeholder)
Doc updates for the new total:
- docs/STATUS.md: 1075+ tests -> 1035+; function library 272 -> 232
- docs/plan/roadmap.md: 1075+ tests -> 1035+
All 58 tests still green. Full xl-evaluator suite: 0 regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code review — TJC-1055 text functionsStrong PR overall. Six functions, ~98 net-new tests across 14 sections, a real bug fix in Bugs / correctness1. else if start > haystack.length + 1 then
Left(EvalError.EvalFailed(s\"FIND: start ($start) is past end of text\"))Microsoft's docs say "If start_num is greater than the length of within_text, FIND returns #VALUE!" — i.e. 2. val cleaned = afterPercent.replace(\",\", \"\").replace(\"$\", \"\").trim
3. 4. case _: TExpr.Add | _: TExpr.Sub | _: TExpr.Mul | _: TExpr.Div | _: TExpr.Pow =>
ToInt(expr.asInstanceOf[TExpr[BigDecimal]])The cast is only safe because those AST nodes are always The fix itself is correct and the root-cause analysis in the PR body is excellent — flagging this only because the cast pattern is exactly the one the original Code quality5. Unused imports in 6. 7. Test section numbering is non-contiguous Test coverage gaps8. No val expr = TExpr.text(TExpr.Lit(LocalDate.of(2025,1,15)), TExpr.Lit(\"yyyy-mm-dd\"))
assertEquals(evaluator.eval(expr, emptySheet), Right(\"2025-01-15\"))Otherwise the date contract is documented but unverified — exactly the kind of gap that surfaces as a regression six months from now. 9. 10. Docs11. Test count drift across files.
980 + 98 = 1078, so 1075+ is closest to reality. Suggest aligning all four to one number (1075+ is fine). 12. PerformanceImplementations are straightforward and allocate sensibly — no concerns. Security
Net: Approve with minor changes. (1), (2), and (8) are the load-bearing items; the rest are nits. The architectural choices — delegating TEXT to |
Excel's FIND spec: "If start_num is greater than the length of within_text,
FIND returns the #VALUE! error value." Our implementation was permissive —
it allowed start = length + 1, which silently let =FIND("", "abc", 4)
succeed and return 4 (Excel returns #VALUE!).
The bug only mattered for empty-needle: with a non-empty needle and start
past the end, indexOf would already return -1 and we'd hit the not-found
branch. With an empty needle, the "needle.isEmpty -> Right(start)"
short-circuit fired before any bound check could catch it.
Tightens the check from `start > haystack.length + 1` to
`start > haystack.length` so the bound is enforced regardless of needle.
Adds a §3 test that pins both the empty-needle and non-empty-needle
boundaries (=FIND("", "abc", 4) and =FIND("a", "abc", 4)) — the
empty-needle case is what was silently divergent.
Caught in PR review by @sebin.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Update STATUS, roadmap, and CLAUDE test-count claims to 1075+. - Leave unrelated untracked scratch files out of the commit.
Review of TJC-1055: 6 text functions (TRIM, MID, FIND, SUBSTITUTE, VALUE, TEXT)Solid, TDD-driven addition with a clean delegation strategy for 🔴 Significant —
|
| File | Before | After | Delta |
|---|---|---|---|
CLAUDE.md (module description, function list) |
82 | 88 | +6 |
docs/STATUS.md |
81 | 87 | +6 |
docs/plan/roadmap.md |
81 | 87 | +6 |
plugin/skills/xl-cli/reference/FORMULAS.md |
81 | (unchanged) | 0 |
FormulaParserSpec.scala (functions.length) |
82 | 88 | +6 |
FORMULAS.md line The eval command supports 81 Excel functions. was missed — should be 87 (or 88 to match CLAUDE.md). Also worth deciding whether the canonical count is 87 or 88; right now CLAUDE.md disagrees with STATUS/roadmap by one.
🟢 TRIM implementation — minor
FunctionSpecsText.scala:361-362
private def trimAsciiSpaces(s: String): String =
s.split(' ').iterator.filter(_.nonEmpty).mkString(" ")Correct (Scala 3 StringOps.split(Char) is non-regex and preserves trailing empties), and pleasantly short. For very long strings this allocates an Array[String] of all tokens. Probably fine for typical Excel data; if you ever profile this hot, a StringBuilder-based scan would avoid the array. Not a blocker.
🟢 text(v: TExpr[Any], ...) — type erasure boundary
TExprTextOps.scala:312-314 and FunctionSpecsBase.scala (TextArgs = (TExpr[Any], TExpr[String])).
Using TExpr[Any] for the value arg loses static guarantees but matches Excel's polymorphism. The runtime exprValueForTextFn does the right coercion. Pragmatic choice, but worth a one-line comment near TextArgs explaining why Any is acceptable here (vs. the typed shapes of the other arg-type aliases).
🟢 Unused import
xl-evaluator/test/src/com/tjclp/xl/formula/TextFunctionsSpec.scala:7
EvalError is imported but the test only inspects errors via result.left.toOption.map(_.toString). Drop it.
🟢 Test coverage — strong
58 tests is dense and well-organized. Highlights:
- Property laws are well-chosen (idempotence, FIND/MID/LEN coupling, SUBSTITUTE accounting law, VALUE/TEXT round-trip).
- Use of
forAllNoShrinkwith a generator producing exact 4-decimalBigDecimals is the right call for the round-trip law. - The §11.4 "comma inside string literal" parser-dispatch test catches a real class of bugs.
- §14.1 round-trip across all 6 functions is the kind of cheap, high-leverage check that prevents printer drift.
Suggested adds (cheap):
=MID(A1, VALUE(\"3\"), 100)— see top issue.=MID(A1, ABS(B1), 100)— same class.- A FormulaPrinter round-trip case for
=FIND(\"@\", A1) + 1to lock in theAdd-arithmetic-in-Int-position fix.
Style / conventions
- Error messages echo user input (
s\"VALUE: cannot parse '$input'\"). Consistent with existing eval errors. Fine. for...yieldusage andEither-only error flow is on-spec for the purity charter.- No
null/ no exceptions / no.getonOption/Try. ✓ @annotation.tailreconfindNthandloop— good.- No new wartremover suppressions needed beyond the pre-existing
AsInstanceOfannotation onasIntExpr. ✓
Summary
Block on the asIntExpr completeness gap (1 issue) and please align the function counts (5 minutes). Everything else is polish or stylistic.
Nice touch on the demo script — the formula = result (expected: ...) pattern with ✓/✗ markers is genuinely useful for catching regressions visually.
🤖 Generated with Claude Code
|
Test comment |
|
See full review in next comment thread. (Note: previous test comment was for diagnostic purposes — apologies for the noise.) |
|
Test ABC(arg1, arg2) |
|
Test eq ABC(arg1, arg2) |
|
=ABC(arg1) |
|
Review: TJC-1055 — Add 6 text functions Overall this is a high-quality PR with thorough TDD discipline, sharp tests, and good Excel-compatibility judgment calls. Most of my comments are scoping/follow-up items rather than blockers. Strengths
Issue 1 — asIntExpr fix is incomplete (medium) The new branches in
The case-by-case enumeration scales poorly (currently 5 of ~25 specs listed). Two paths forward:
Either way, worth a comment noting the catch-all is still an unsafe-cast trap for non-arithmetic BigDecimal calls. Issue 2 — parseExcelNumber strips ALL commas indiscriminately (low/medium)
Likely fine for the data-cleaning use cases this PR targets, but worth a test pinning the current (too-lenient) behavior so we know if/when we tighten it. Issue 3 — parseExcelNumber parens-with-explicit-minus quirk (low)
Issue 4 — FIND with empty within_text (low)
Issue 5 — Pre-existing unused imports (nit)
Issue 6 — TExpr.text parameter naming (nit) In Performance
Security Nothing flagged. All six functions are pure text transforms with no I/O, regex, or eval. The explicit "regex metachars treated literally" tests for FIND/SUBSTITUTE are good defensive coverage against future refactors that might introduce regex ( Summary Ship-worthy with minor follow-ups. The asIntExpr incompleteness (Issue 1) is the most impactful — recommend either tightening to use FunctionFlags or filing a tracked follow-up before merging. Issues 2 through 6 are documentation/test-pinning suggestions and do not block. 🤖 Generated with Claude Code |
Per PR review: the prior asIntExpr fix only patched FIND + arithmetic
ops. The same hazard existed for ~50 other BigDecimal-returning calls
(SUM, COUNT, AVERAGE, ROUND, ABS, MOD, ROW, COLUMN, MATCH, PMT, FV,
PV, NPER, RATE, NPV, IRR, etc.). =MID(A1, SUM(...), n), =LEFT(A1,
ROUND(B1, 0)), =MID(A1, MATCH(...), n) all crashed at runtime with
ClassCastException via the unsafe asInstanceOf[TExpr[Int]] catch-all.
Closes the whole class of bug by adding a `returnsNumeric: Boolean`
flag to FunctionFlags and dispatching on it in asIntExpr instead of
listing each function by name. Mirrors the existing convention for
returnsDate / returnsTime.
Changes:
- FunctionSpec.scala: add `returnsNumeric: Boolean = false` to
FunctionFlags
- TExprCoercions.asIntExpr: replace 5 per-function Call cases (year,
month, day, len, find) with a single flag check; keep the
arithmetic case (Add/Sub/Mul/Div/Pow) as-is
- Mark every BigDecimal-returning FunctionSpec with
`flags = FunctionFlags(returnsNumeric = true)`. Total 47 specs:
- Aggregate (12 via variadicAggregateSpec): SUM, COUNT, COUNTA,
COUNTBLANK, AVERAGE, MIN, MAX, MEDIAN, STDEV, STDEVP, VAR, VARP
- Aggregate (7 specifics): SUMIF, COUNTIF, SUMIFS, COUNTIFS,
AVERAGEIF, AVERAGEIFS, SUMPRODUCT
- Math (16): ABS, SQRT, ROUND, ROUNDUP, ROUNDDOWN, MOD, POWER,
LOG, LN, EXP, FLOOR, CEILING, TRUNC, SIGN, INT, PI
- DateTime (6): YEAR, MONTH, DAY, DATEDIF, NETWORKDAYS, YEARFRAC
- Financial Cashflow (4): NPV, IRR, XNPV, XIRR
- Financial TVM (5): PMT, FV, PV, NPER, RATE
- Lookup (1): MATCH
- Reference (4): ROW, COLUMN, ROWS, COLUMNS
- Text (3): LEN, FIND, VALUE
- TextFunctionsSpec §9.4: new test pinning the broader fix —
=MID(A1, SUM(B1:B3), 5), =LEFT(A1, ROUND(B1+0.4, 0)),
=MID(A1, ABS(-3), 5)
- Renumber §9 / §11 / §12 to be contiguous after the prune left gaps:
§9.4→§9.2, §9.6→§9.3, §9.7→§9.4, §11.4→§11.2, §12.3→§12.2
- Bump test counts in CLAUDE.md / STATUS.md / roadmap.md
1075+ → 1080+, function-library 232 → 236, xl-evaluator ~280 → ~284
Future BigDecimal-returning functions just need the flag to compose
cleanly in Int-arg positions; no asIntExpr changes needed.
Caught in PR review.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code Review: TJC-1055 — Six text functionsSolid, well-tested implementation with a clever bug fix tucked into the coercion layer. Recommending a few documentation tightenings and flagging some edge cases. Net assessment: approve once the doc inconsistencies below are addressed. Highlights
Issues1. Documentation function counts are inconsistent (must fix)The PR updates some doc counts but leaves others stale:
The test in 2.
|
SUBSTITUTE: validate instance >= 1 before short-circuiting on empty
old_text, so SUBSTITUTE(text, "", new, 0) errors like Excel instead of
silently returning text unchanged.
VALUE: reject accounting-parens combined with an inner sign
("(-5)", "(+5)", "(-1,234.56)") — Excel returns #VALUE! for these.
Previously the parens-driven negation compounded with BigDecimal's own
sign parsing, producing wrong-sign results.
Doc sweep: align function-count references with the registry's actual
88 functions. CLAUDE.md function list rebuilt (drop phantom
HOUR/MINUTE/SECOND, add IFERROR, INDEX, MATCH, EDATE, DATEDIF,
NETWORKDAYS, WORKDAY, YEARFRAC, XNPV, XIRR). STATUS.md, roadmap.md,
plugin SKILL.md and FORMULAS.md updated; STATUS.md xl-evaluator test
total corrected from ~284 to ~338.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR Review: TJC-1055 — Add 6 text functionsSolid, well-scoped PR. The code is thoughtful, tests are dense and well-organized, and the latent Strengths
Suggestions
Nits
VerdictApprove with the suggestion to file follow-ups for (1) registry-level enforcement of 🤖 Generated with Claude Code |
PR Review — TJC-1055: Add 6 text functionsThanks for a really well-structured PR. The writeup, the discovery of the latent Strengths
Concerns / suggestions1. val cleaned = afterPercent.replace(",", "").replace("$", "").trimThis blindly strips all
This is divergence from Excel and might surprise users doing data validation via 2.
3. val withDefaultSign =
if value < 0 && format.negative.isEmpty && !formatted.startsWith(\"-\") then s\"-\$formatted\"
else formattedThis change affects every call site of 4. SUBSTITUTE behavior with empty The pattern-match order: case Some(n) if n < 1 => Left(...)
case _ if oldS.isEmpty => Right(text)
case Some(n) => Right(replaceNthOccurrence(...))
case None => Right(text.replace(oldS, newS))Looks correct, but consider whether 5. Minor: doc count drift PR claims 6. Minor: TRIM via
Test coverage notes
Nits
Overall: ship-it after a quick look at the SUBSTITUTE empty- 🤖 Generated with Claude Code |
Summary
xl-evaluator—TRIM,MID,FIND,SUBSTITUTE,VALUE,TEXT— closing feat(evaluator): Add FIND function #116 and unblocking the data-cleaning use cases that were previously impossible to express as formulas. Note: in-repo docs (CLAUDE.md,STATUS.md) and the function-count listing aspirationally claimed these as part of the existing 82 functions, but multi-layered verification (source grep, git pickaxe history sweep, and livexl evalaruntime checks) confirmed all 6 returnedUnknownFunctionand no registration existed.TextFunctionsSpec.scalacovering scalar behavior across the 6 functions, 4 high-leverage ScalaCheck property laws (TRIMidempotence,FIND/MID/LENcoupling,SUBSTITUTEaccounting law,VALUE/TEXTround-trip), theCellValuetype matrix (Empty/Number/Bool/Error coercion viaTRIM), function composition (including thereturnsNumericregression test in §9.4), end-to-endevaluateFormulapaths, parser-dispatch edges, and printer round-trip determinism. Test density (~10/function) matches the repo's per-category norm.TEXTdelegates to the existing 700-lineFormatCodeParserinxl-core/displayrather than re-implementing format-code interpretation. Three Excel-specific overrides handle the cases the formatter doesn't cover: empty format string returns"", empty cell coerces toNumber(0)before formatting, and text input passes through unchanged.TExprCoercions.asIntExpr: the unsafeasInstanceOf[TExpr[Int]]catch-all crashed at runtime whenever anyBigDecimal-returning expression landed in anInt-arg position (e.g.=MID(A1, SUM(...), 5),=LEFT(A1, ROUND(B1, 0)),=MID(A1, MATCH(...), 5)). Fix: add areturnsNumeric: Booleanflag toFunctionFlags(mirroring the existingreturnsDate/returnsTime), mark all 47BigDecimal-returning function specs with it, and dispatch on the flag inasIntExpr— closing the whole class of bug instead of patching FIND + arithmetic in isolation.VALUEintentionally rejects date/time strings (deferred to a follow-up issue per the locale-rabbit-hole risk). Currency, percent, accounting parens(1,234), scientific, and$-sign-currency interaction are all handled.FINDstart-bound to match Excel docs precisely:start_num > length(within_text)now returns#VALUE!regardless of needle (previously=FIND("", "abc", 4)silently succeeded and returned 4).examples/text_functions_demo.scthat exercises all 6 functions across 5 realistic patterns (cleanup pipeline, currency parsing, display formatting, email-domain extraction,TEXT(VALUE(s))round-trip). All 18 demo formulas verified end-to-end via the published artifact path.plugin/skills/xl-cli/reference/FORMULAS.md) so LLM agents using the xl skill discover them.Lines changed by category
+1228 / -179 across 20 files
TextFunctionsSpec,FormulaParserSpec)xl-evaluator/src)examples/text_functions_demo.sc)CLAUDE.md,STATUS.md,roadmap.md,examples.md,FORMULAS.md)How it works
TEXT — delegation to
FormatCodeParserEmpty-cell-as-zero, empty-format-returns-empty, and text-passthrough are the three Excel quirks layered on top of pure delegation.
Text(s)passthrough is handled natively byNumFmtFormatter.formatValue's existingCellValue.Text(s) => scase.asIntExpr— flag-based dispatch viareturnsNumericFunctionFlagsalready carriesreturnsDateandreturnsTime, withasNumericExprdispatching on those flags. Adding a third flag,returnsNumeric, letsasIntExprcollapse what would have been ~50 per-functionCallcases (year,month,day,len,find, plus everything in §Aggregate/Math/Reference/Lookup/Financial) into one rule. Each numeric spec is markedflags = FunctionFlags(returnsNumeric = true), mirroring how date specs are markedreturnsDate = true. New BigDecimal-returning functions added in the future just need the flag — noasIntExprchanges required.Files changed
xl-evaluator/src/com/tjclp/xl/formula/functions/FunctionSpec.scala— AddsreturnsNumeric: Boolean = falsetoFunctionFlags(third flag alongsidereturnsDate/returnsTime).xl-evaluator/src/com/tjclp/xl/formula/functions/FunctionSpecsText.scala— Six newFunctionSpecdefinitions (TRIM, MID, FIND, SUBSTITUTE, VALUE, TEXT) plus four private helpers:trimAsciiSpaces,substituteImpl+replaceNthOccurrence,parseExcelNumber,exprValueForTextFn. LEN / FIND / VALUE markedreturnsNumeric = true.xl-evaluator/src/com/tjclp/xl/formula/functions/FunctionSpecsBase.scala— Adds 4 new arg-shape type aliases:TextIntInt,FindArgs,SubstituteArgs,TextArgs.xl-evaluator/src/com/tjclp/xl/formula/ast/TExprTextOps.scala— Adds 6 typed builder methods (trim,mid,find,substitute,value,text) mirroring the existingleft/right/upper/lowerstyle.xl-evaluator/src/com/tjclp/xl/formula/ast/TExprCoercions.scala— Replaces five per-functionCallcases inasIntExprwith a singlereturnsNumericflag check; keeps the arithmetic case (Add/Sub/Mul/Div/Pow).FunctionSpecsAggregate.scala/FunctionSpecsMath.scala/FunctionSpecsDateTime.scala/FunctionSpecsFinancialCashflow.scala/FunctionSpecsFinancialTvm.scala/FunctionSpecsLookupIndex.scala/FunctionSpecsReference.scala— Marks 47 BigDecimal-returningFunctionSpecdefinitions withflags = FunctionFlags(returnsNumeric = true). Mechanical labeling, no logic changes.xl-evaluator/test/src/com/tjclp/xl/formula/TextFunctionsSpec.scala— 62 tests organized in 14 sections; usesforAllNoShrinkand aBigDecimal(BigInt, scale)generator to keep ScalaCheck shrinking inside each property's invariants. Includes regression tests for the FIND start-bound fix (§3) and the broaderreturnsNumericfix (§9.4:=MID(A1, SUM(B1:B3), 5),=LEFT(A1, ROUND(B1+0.4, 0)),=MID(A1, ABS(-3), 5)).xl-evaluator/test/src/com/tjclp/xl/formula/FormulaParserSpec.scala— Bumps the registry meta-test from 82 → 88 functions; adds explicitassert(functions.contains(...))for each of the 6 new function names.examples/text_functions_demo.sc— Runnable scala-cli demo exercising all 6 functions across realistic data-cleaning patterns; prints actual vs expected for each formula so divergences pop visually.plugin/skills/xl-cli/reference/FORMULAS.md— Documents all 6 new functions in the user-facing skill bundle so LLM agents (xl-agent, Anthropic Skills API) discover them.CLAUDE.md— Function count 82 → 88; addsFINDto the registered function name list; bumps test count 1075+ → 1080+.docs/STATUS.md— Function library 81 → 87 functions, 232 → 236 tests; xl-evaluator total ~280 → ~284 tests; overall ~1075+ → ~1080+ tests.docs/plan/roadmap.md— TL;DR function count 81 → 87; test count 733+ → 1080+; adds TJC-1055 to Completed Work.docs/reference/examples.md— Links the newexamples/text_functions_demo.sc.🤖 Generated with Claude Code