Skip to content

fix(vi): use Vi-standard word boundary rules for w/b/e motions#1059

Draft
sim590 wants to merge 8 commits into
nushell:mainfrom
sim590:vim-word-boundaries-fix
Draft

fix(vi): use Vi-standard word boundary rules for w/b/e motions#1059
sim590 wants to merge 8 commits into
nushell:mainfrom
sim590:vim-word-boundaries-fix

Conversation

@sim590
Copy link
Copy Markdown

@sim590 sim590 commented Apr 21, 2026

Summary

Fixes Vi small-word motions (w, b, e) and their operator variants (dw, de, db, cw, ce, cb, yw, ye, yb) to use standard Vi word boundary rules instead of Unicode UAX #29 segmentation.

Additionally, refactors all 28 flat word-related EditCommand variants into 3 parameterized ones (MoveWord, CutWord, CopyWord) using WordFlavor, WordSize, and MotionTarget enums, as suggested by @kronberger-droid.

Problem

The w, e, and b motions in Vi mode shared the same word boundary functions as Emacs mode, which use unicode_segmentation::split_word_bound_indices() (UAX #29). This treats punctuation inconsistently — for example, in foo.bar, Vi should see three words (foo, ., bar), but reedline treated it as one.

Additionally, inclusive Vi operator commands (de, dE, ce, cE, ye, yE) shared EditCommand variants with exclusive Emacs commands (Alt+d), causing incorrect cut/copy ranges.

Solution

Word boundary logic

  • Introduced a three-class character classification (Keyword, Punctuation, Whitespace) matching the Vim/POSIX standard, and a vi_word_segments() iterator that segments text accordingly.
  • Added vi_word_* functions (vi_word_right_index, vi_word_right_start_index, vi_word_right_end_index, vi_word_left_index, vi_current_word_range) using Vi three-class segmentation.
  • Added emacs_* aliases (emacs_word_right_index, emacs_word_right_start_index, etc.) for the existing UAX added a slightly-fancy prompt #29-based functions. The unprefixed word_* variants are deprecated to encourage callers to choose explicitly.

EditCommand refactoring

Replaced 28 flat word-related EditCommand variants with 3 parameterized ones:

MoveWord { flavor: WordFlavor, size: WordSize, target: MotionTarget, select: bool }
CutWord  { flavor: WordFlavor, size: WordSize, target: MotionTarget }
CopyWord { flavor: WordFlavor, size: WordSize, target: MotionTarget }
  • WordFlavor: Emacs or Vi (extensible to Helix later)
  • WordSize: Word or BigWord
  • MotionTarget: Left, Right, RightStart, RightEnd, RightToNext

BigWord dispatch uses a wildcard on flavor since word/WORD semantics are identical across Vi and Emacs. Invalid combinations are caught with unreachable!().

Other changes

  • Updated Vi and Emacs command dispatch throughout.
  • No changes to Emacs behavior — all Emacs commands continue to use the original UAX added a slightly-fancy prompt #29 word boundaries.

Tests

  • 7 unit tests for vi_word_segments()
  • 16 parameterized cases for Vi word motions with punctuation (w, e, b)
  • 24 basic/edge-case tests for vi_word_right_start_index, vi_word_right_end_index, vi_word_left_index, vi_word_right_index, and vi_current_word_range
  • 14 cases for Vi cut/copy operations in editor (de, dE, db, ye, yE, yb)
  • All existing Emacs tests unchanged and passing (904 pass, 0 fail)

Note on other open PRs

This PR introduces a structural change to how word motions are handled — word-related EditCommand variants have been parameterized and Vi/Emacs word boundary paths are now fully decoupled. Open PRs that touch word motions or EditCommand variants (notably #1016 and #960) may need to rebase after this lands. PRs that only touch other areas are unaffected.

Closes #563, closes #667, addresses #788 (point 2)

sim590 added 4 commits April 21, 2026 00:27
Replace unicode_segmentation word boundaries with a Vi-compatible
three-class system (keyword, punctuation, whitespace) for w/e/b
motions. Separate inclusive Vi cut/copy (de/dE/ce/cE/ye/yE) from
exclusive Emacs word commands (Alt+d) by adding dedicated EditCommand
variants.

Closes: nushell#563, nushell#667
Addresses: nushell#788 (point 2)
…iases

Add emacs_word_right_index, emacs_word_right_end_index,
emacs_word_right_start_index, emacs_word_left_index, and
emacs_current_word_range as the canonical Emacs-path functions.
Deprecate the unprefixed word_* variants to encourage callers to
choose explicitly between emacs_* and vi_* word boundaries.

All internal callers migrated to emacs_* variants.
Cover basic cases (spaces, single word, empty string, edge positions)
for vi_word_right_start_index, vi_word_right_end_index,
vi_word_left_index, vi_word_right_index, and vi_current_word_range.
Add Vi prefix to CutWordRightEnd, CutBigWordRightEnd,
CopyWordRightEnd, and CopyBigWordRightEnd for consistency with
other Vi-specific EditCommand variants.
@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Apr 21, 2026

What LLM did you use? The description is verbose and repetitive which makes me concerned that the code will be the same.

@sim590
Copy link
Copy Markdown
Author

sim590 commented Apr 21, 2026

I'm not sure where the description is repetitive. But the LLM used is Claude Opus 4.6.

@kronberger-droid
Copy link
Copy Markdown
Contributor

I have a thought about this fix, motivated from when I was working on my helix mode POC.

The Vi/Emacs boundary decoupling in LineBuffer is correct and necessary.

One concern I have on the EditCommand side is that this adds 9 new variants but the same fix would be possible with 0 new variants by parameterizing the existing ones, e.g.:

MoveWordLeft { select: bool, flavor: WordFlavor }
CutWordRight { flavor: WordFlavor, inclusive: bool }

Same LineBuffer work either way.

Long-term there's something I see coming anyways, which is decoupling motions from operators entirely. Which might come up naturally in #960 once we bring in the keybindings state machine for helix.

I raise this now since helix will add a lot more to EditCommand if we decide to stay flat. So this essentially is quite similar to my question in #960, so it would make sense to think about it together.

I don't want to block the bug fix. If the consensus is to stay flat, this is fine.

@fdncred fdncred marked this pull request as draft April 23, 2026 17:17
@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented Apr 23, 2026

I'd rather wait until the helix changes are in. Marking this draft.

@sim590
Copy link
Copy Markdown
Author

sim590 commented Apr 23, 2026

Thanks for the feedback @kronberger-droid and @fdncred.

To summarize the situation: this PR fixes Vi word boundary behavior (w/b/e), which has been broken since the beginning (see #563, #667, #788, nushell/nushell#17286). The core fix lives in LineBuffer — new vi_word_* functions using the standard three-class rule (keyword/punctuation/whitespace), separate from the existing Emacs word functions. This part is uncontroversial.

The discussion point is how the new EditCommand variants are structured. Currently we added 9 flat Vi-prefixed variants. @kronberger-droid suggested a parameterized approach instead, e.g. MoveWordRight { flavor: WordFlavor }.

Having looked at #960, I note that the Helix mode takes a similar "separate commands per mode" approach — HxWordMotion { target, movement, count } and other Hx*-prefixed variants, fully independent from Vi/Emacs commands. So in practice, each mode already has its own set of commands.

Regarding the longer-term idea of decoupling motions from operators entirely: if that refactor is coming anyway, then whatever approach we pick today (flat or parameterized) is equally temporary — it'll all be reworked when that happens. So there's little benefit in over-engineering the interim solution.

That said, we're open to restructuring the EditCommand variants if you have a specific approach in mind. Would a parameterized approach like MoveWordLeft { select: bool, flavor: WordFlavor } work, or is the current flat approach acceptable given that it mirrors how Helix handles its own commands?

This bug has been open for a long time and affects everyday Vi users. We'd like to avoid blocking on #960 if possible, but we're happy to adapt the implementation to your preferences.

@kronberger-droid
Copy link
Copy Markdown
Contributor

@fdncred @schlich @sim590
After now some time sitting on it I think parameterized would be my preference.

Concrete shape:

EditCommand::MoveWord { flavor: WordFlavor, target: MotionTarget, select: bool }
EditCommand::CutWord  { flavor: WordFlavor, target: MotionTarget }

WordFlavor::{Emacs, Vi} to start, Helix added later. The 9 new variants collapse to 2. Same LineBuffer work and only dispatch and variants move.

I believe this lines up with where a later motion/operator decoupling would land rather than competing with it, since (flavor, target, select) are the same semantic axes that refactor would use, so it should carry forward into it.
Happy to help either way.

@fdncred
Copy link
Copy Markdown
Contributor

fdncred commented May 8, 2026

ok, sounds good to me

sim590 added 2 commits May 8, 2026 21:44
Add tests for cut_vi_big_word_right_end (dE), copy_vi_word_right_end
(ye), and copy_vi_big_word_right_end (yE) which contain non-trivial
inclusive-end logic via grapheme_right_index_from_pos.
Replace 28 flat word-related EditCommand variants with 3
parameterized ones: MoveWord, CutWord, CopyWord, each taking
WordFlavor, WordSize, and MotionTarget fields.

Add WordFlavor (Emacs/Vi), WordSize (Word/BigWord), and
MotionTarget (Left/Right/RightStart/RightEnd/RightToNext) enums.

BigWord commands use wildcard flavor matching since word/WORD
semantics are identical across Vi and Emacs. Invalid flavor/target
combinations are caught with unreachable!().

Fix Emacs cut/copy_word_right_to_next incorrectly using
vi_word_right_start_index instead of emacs_word_right_start_index.
@sim590 sim590 force-pushed the vim-word-boundaries-fix branch from 7fba736 to 799f070 Compare May 9, 2026 21:46
@sim590
Copy link
Copy Markdown
Author

sim590 commented May 9, 2026

Pushed a new commit that implements the parameterized approach as discussed. The 28 flat word-related EditCommand variants have been replaced with 3 parameterized ones:

MoveWord { flavor: WordFlavor, target: MotionTarget, size: WordSize, select: bool }
CutWord  { flavor: WordFlavor, target: MotionTarget, size: WordSize }
CopyWord { flavor: WordFlavor, target: MotionTarget, size: WordSize }

WordFlavor is Emacs or Vi (extensible to Helix later). WordSize is Word or BigWord. MotionTarget is Left, Right, RightStart, RightEnd, or RightToNext.

BigWord dispatch uses a wildcard on flavor since word/WORD semantics are identical across Vi and Emacs. Invalid combinations are caught with unreachable!().

All tests pass (904/904), clippy clean, fmt clean.

@kronberger-droid does this match what you had in mind?

@kronberger-droid
Copy link
Copy Markdown
Contributor

Walking back the parameterization I suggested.
Over three things I noticed:

  • Every call site hardcodes flavor
  • Three unreachable!() arms in editor.rs
  • 33 dispatch arms routing static info to differently-named helpers.

Flat variants fit better, then mode picks the command, editor dispatches 1:1:

EditCommand::MoveEmacsWordLeft { select: bool },
EditCommand::MoveViWordLeft { select: bool },
EditCommand::MoveBigWordLeft { select: bool },   // shared

~37 statically-valid variants vs 3 with unreachable!()s.
The many variants were turning me off before, but reading it now changed my mind.

The fully mode-agnostic version (EditMode::resolve_word_motion callback, flavor-free EditCommand) is its own PR.
flat-rename should migrate to it as cleanly as parameterized would.

The Vi word-boundary fix itself (line_buffer.rs helpers + tests) is independent of the dispatch shape.
So only the dispatch and emit sites would change.

Sorry for the churn.

@kronberger-droid
Copy link
Copy Markdown
Contributor

Also one thing:
the new vi_word_*_index helpers fit the same pattern as #1068.
Pure functions on &str, with thin wrappers on LineBuffer.

sim590 added 2 commits May 10, 2026 15:43
Rename word-related EditCommand variants to make the mode explicit:
MoveWordLeft -> MoveEmacsWordLeft, CutWordRight -> CutEmacsWordRight,
CopyWordRightToNext -> CopyEmacsWordRightToNext, etc.

BigWord variants remain unprefixed since word/WORD semantics are
identical across Vi and Emacs modes.

Update all dispatch, keybinding, and call sites accordingly.
@sim590
Copy link
Copy Markdown
Author

sim590 commented May 10, 2026

Thanks for the honest feedback @kronberger-droid — the unreachable!() arms were a fair signal that the parameterized approach was fighting the type system rather than working with it.

Reverted the parameterization and went with flat variants instead, with explicit mode prefixes:

  • MoveEmacsWordLeft, CutEmacsWordRight, CopyEmacsWordRightToNext, etc. for Emacs-specific motions
  • MoveViWordLeft, CutViWordRightEnd, etc. for Vi-specific motions (already in place)
  • MoveBigWordLeft, CutBigWordRight, etc. remain unprefixed since BigWord semantics are shared

All 904 tests pass, clippy clean, fmt clean.

Happy to adjust further if anything still looks off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vi mode word definition shouldn't include the period . Incorrect word boundary in VI mode

3 participants