fix: parse year-before-month-name expressions (e.g. "2024 Aug")#648
Open
chatman-media wants to merge 1 commit into
Open
fix: parse year-before-month-name expressions (e.g. "2024 Aug")#648chatman-media wants to merge 1 commit into
chatman-media wants to merge 1 commit into
Conversation
"2024 Aug" was parsed as August 2023 instead of August 2024. The year
token before the month name was not recognized by ENMonthNameParser, so
the month fell back to the year closest to the reference date. Worse,
ENMonthNameLittleEndianParser split the contiguous 4-digit number into a
bogus day range (20-24 Aug), which won overlap resolution.
- ENMonthNameParser: recognize an optional leading 4-digit year so
"yyyy MMM" ("2024 Aug", "2024-Aug", "2012 January") is parsed as a
single, year-certain result.
- ENMonthNameLittleEndianParser: require a real separator between the two
day numbers of a range, so a 4-digit year is no longer split into a
spurious day-to-day range. Valid ranges ("10 - 22 August",
"10 to 22 August", "10-22 August", space-only "10 22 August") are
unaffected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #639
Problem
Parsing
"2024 Aug"returns August 2023 instead of August 2024 (theyyyy MMMorder is a common format). Reproduced with a pinned reference date:Two parsers were involved:
ENMonthNameParseronly recognized a trailing year (MMM yyyy), so for"2024 Aug"it matched just"Aug", found no year, and fell back tofindYearClosestToRef— which, from a January reference, picks the previous August.ENMonthNameLittleEndianParserthen matched the contiguous2024as a bogus day range:20+ (empty connector) +24→ "the 20th–24th of August". Because both results spanned the same text, overlap resolution kept this wrong one.The reverse order (
"Aug 2024") already worked, which is why this looked inconsistent.Fix
ENMonthNameParser: recognize an optional leading 4-digit year, so"2024 Aug","2024-Aug","2024.Aug", and"2012 January"parse as a single result with a certain year. A 4-digit year is required (not 2-digit) to avoid clashing with the day-month format ("09 Aug"must stay "9 August"). A guard rejects supplying a year on both sides ("2012 January 2013").ENMonthNameLittleEndianParser: require a real separator between the two day numbers of a range, so a contiguous 4-digit number is no longer split into a spuriousday-to-dayrange. Existing ranges are unaffected:"10 - 22 August","10 to 22 August","10-22 August", and space-only"10 22 August"all still parse as ranges.Tests
Added a
Year-Month expression (year before month)block intest/en/en_month.test.tscovering"2024 Aug","2024 August","2024-Aug", and"2012 January"with pinned reference dates. These fail onmaster(Expected: 2024, Received: 2023) and pass with the fix.Full suite green: 615 passed.