Skip to content

Conversation

@dolio
Copy link
Contributor

@dolio dolio commented Dec 15, 2025

This PR tweaks the implementation of the lazy splitAt to avoid calling length on the strict Text chunks.

While doing some experimenting, I found that producing a lazy text with a very large chunk had disastrous effects for a parser consuming that text. The reason is that the parser was calling splitAt to chop up the input, and that repeatedly counts all the characters in the input chunks.

The strict splitAt uses the measureOff function to split, which either reports where to partition the underlying array, or tells you how many characters are in the Text if it isn't long enough (with a negated result). So, I switched to using this in the lazy loop.

I haven't done extensive benchmarking, but it seems like even on reasonably chunked input, my parser runs faster with this than with the old version.

@Bodigrim
Copy link
Contributor

This fails 32-bit CI jobs, we need to take more care with int64ToInt. Namely, if intToInt64 (int64ToInt n) /= n then n > 2^32 and we can certainly loop forward because len cannot exceed 2^32 on 32-bit system.

@dolio
Copy link
Contributor Author

dolio commented Dec 15, 2025

Yeah. Just trying to decide how to structure it for 32-bit.

Here's a question. Do you care about the corner case where there is actually a maxBound :: Int32 sized chunk? Or just that the logic works on somewhat reasonably sized chunks?

This instead uses the `measureOff` function used in the strict `splitAt`
to count only as many characters as are needed.
@Bodigrim
Copy link
Contributor

Bodigrim commented Dec 15, 2025

Can we add as the very first pattern guard

  | intToInt64 (int64ToInt n) /= n -> 
    let (ts', ts'') = loop (n - intToInt64 (T.length t)) ts
    in (Chunk t ts', ts'')

?

@dolio
Copy link
Contributor Author

dolio commented Dec 15, 2025

Yeah, perhaps just testing that case separately is better.

@Bodigrim Bodigrim merged commit 0c6b026 into haskell:master Dec 16, 2025
24 of 26 checks passed
@Bodigrim
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants