-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
This is a long-term thing so I don't forget about it.
This is related to #4: the standard regex::Regex is fast, convenient, and feature-rich and I think it makes a good default, but there's no denying with the number of regexes you'd put in a regex-filtered set it can get rather memory-intensive. So specific users may want to trade performance and / or convenience for lower memory use. Possibilities here are:
regex-lite, that is Switch from regex crate to regex-lite #4's attempt and the memory savings are tremendous (for about the same features minus rich unicode support), the performances are terrible unless the prefilter has extremely high discriminatory power, but for more resource-constrained uses, or users who are already on regex-lite (and don't mind the lower performances) it could be a nice optionregex::bytes, the memory savings are much less than lite but they can still be quite respectable, this trades away a lot of convenience as you get bytes out- lazy compilation of any of those, using
std::sync::LazyLock(oronce_cell::sync::Lazyfor lower MSRV), for highly biased sets which have a very small number of "hot" regexes, and a much larger sets of regexes which are essentially never used, the engine would keep much more compactStringor even&straround until the regex is actually needed for post-filtering and matching, this trades for less consistent behaviour however (memory will grow over time and any matching can take arbitrarily long if it triggers the compilation of several regexes), this is especially attractive for the cases where the regex set is static and embedded in the binary (so the source strings are "free").
This would likely require a trait per crate:
regex-filteredneeds to parameterize on the "regex" object being stored, which may be lifetime-parameterized how to construct it the interface tomatchitua-parserfurther needs the extracted data kind and a way to mix that and the replacement values
Metadata
Metadata
Assignees
Labels
No labels