Avoid calling `length` on chunks in lazy `splitAt` #676

dolio · 2025-12-15T21:14:37Z

This PR tweaks the implementation of the lazy splitAt to avoid calling length on the strict Text chunks.

While doing some experimenting, I found that producing a lazy text with a very large chunk had disastrous effects for a parser consuming that text. The reason is that the parser was calling splitAt to chop up the input, and that repeatedly counts all the characters in the input chunks.

The strict splitAt uses the measureOff function to split, which either reports where to partition the underlying array, or tells you how many characters are in the Text if it isn't long enough (with a negated result). So, I switched to using this in the lazy loop.

I haven't done extensive benchmarking, but it seems like even on reasonably chunked input, my parser runs faster with this than with the old version.

Bodigrim · 2025-12-15T22:01:14Z

This fails 32-bit CI jobs, we need to take more care with int64ToInt. Namely, if intToInt64 (int64ToInt n) /= n then n > 2^32 and we can certainly loop forward because len cannot exceed 2^32 on 32-bit system.

dolio · 2025-12-15T22:08:44Z

Yeah. Just trying to decide how to structure it for 32-bit.

Here's a question. Do you care about the corner case where there is actually a maxBound :: Int32 sized chunk? Or just that the logic works on somewhat reasonably sized chunks?

This instead uses the `measureOff` function used in the strict `splitAt` to count only as many characters as are needed.

Bodigrim · 2025-12-15T22:45:19Z

Can we add as the very first pattern guard

  | intToInt64 (int64ToInt n) /= n -> 
    let (ts', ts'') = loop (n - intToInt64 (T.length t)) ts
    in (Chunk t ts', ts'')

?

dolio · 2025-12-15T22:59:53Z

Yeah, perhaps just testing that case separately is better.

Bodigrim · 2025-12-16T20:08:14Z

Thanks!

Lysxia approved these changes Dec 15, 2025

View reviewed changes

Avoid calling length on chunks in lazy splitAt

f3a572f

This instead uses the `measureOff` function used in the strict `splitAt` to count only as many characters as are needed.

dolio force-pushed the topic/splitAt branch from b4aabc3 to f3a572f Compare December 15, 2025 22:47

Check for maxBound in lazy splitAt in its own case

2091e7e

Bodigrim approved these changes Dec 16, 2025

View reviewed changes

Bodigrim merged commit 0c6b026 into haskell:master Dec 16, 2025
24 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid calling `length` on chunks in lazy `splitAt` #676

Avoid calling `length` on chunks in lazy `splitAt` #676

Uh oh!

dolio commented Dec 15, 2025

Uh oh!

Bodigrim commented Dec 15, 2025

Uh oh!

dolio commented Dec 15, 2025 •

edited

Loading

Uh oh!

Bodigrim commented Dec 15, 2025 •

edited

Loading

Uh oh!

dolio commented Dec 15, 2025

Uh oh!

Uh oh!

Bodigrim commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Avoid calling length on chunks in lazy splitAt #676

Avoid calling length on chunks in lazy splitAt #676

Uh oh!

Conversation

dolio commented Dec 15, 2025

Uh oh!

Bodigrim commented Dec 15, 2025

Uh oh!

dolio commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bodigrim commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dolio commented Dec 15, 2025

Uh oh!

Uh oh!

Bodigrim commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Avoid calling `length` on chunks in lazy `splitAt` #676

Avoid calling `length` on chunks in lazy `splitAt` #676

dolio commented Dec 15, 2025 •

edited

Loading

Bodigrim commented Dec 15, 2025 •

edited

Loading