Make Char.isLower and Char.isUpper Unicode-aware#970
Make Char.isLower and Char.isUpper Unicode-aware#970Janiczek wants to merge 1 commit intoelm:masterfrom Janiczek:patch-1
Conversation
This allows people with non-ASCII alphabets work with `Char.isLower` and `Char.isUpper`. Uses `toUpper` and `toLower` underneath, which use Javascript's `String.prototype.toLower/UpperCase()`. The second condition in the functions is there to distinguish between characters that have an upper/lower-case pairing, and those that don't (`'0' == Char.toLower '0'` but we don't want `isLower '0'` to be true).
|
Is related to #385. |
|
What's considered an uppercase character depends on your locale. This PR is still a major improvement. Related to #942. |
|
For future reference, the
So it seems that I do not want us to theorize about these things here. The next step is to find nice links that describe:
I would prefer to understand the problem more completely before changing things. |
|
From my cursory googling and research: 1. How is "upper case" defined?I think this FAQ is the link you want. In short, yes, there is a big table. Three, in fact.
Here is the relevant section of standard. It has some sense of inter-version stability between the Unicode versions. 2. How is "locale" defined?Again, an Unicode FAQ; and this time there's a whole homepage. You can download the current version, there are a lot of XML files inside with various data (casing of dates / languages / ..., etc.), to be interpreted according to LDML. They are also transformed from the XML into JSON, which might be a better fit for Elm? |
|
We might try to be extra-pure and host the big table, in Elm format, ourselves, but that would make elm/core very big, I imagine. The browser already has that cached in the form of The I mean, even main : Html msg
main =
"2018-05-10"
|> Date.fromString
|> toString
|> Html.textshows |
|
I ran into the same issue when doing the exercise in the forms section, namely checking the uppercase password. At first glance, I thought this was a serious omission. However, then I wondered if it was worth letting users set their passwords to Unicode. However, in any case, this is not decided at the stage of front-end approval, but much earlier. Thus, this limitation may cause frustration for developers from regions other than English. And negatively affects the use of Elm as the main front-end stack in the Enterprise environment. This means about the popularity and development of the language. But I believe that such an annoying flaw will still not be a problem for Elm. Meanwhile. |
|
FYI, this package can currently be used to deal with unicode strings: https://package.elm-lang.org/packages/BrianHicks/elm-string-graphemes/latest/ |
|
Lines 84 to 85 in e47edeb Wouldn't
Same applies to (I tried using code comments, didn't work, idk why) Edit: The one scenario where this might make a difference is if there's a "middle case" character that has both an upper and a lower case variant. But I don't think such a character exists, and even if it does, should |
This does not fully solve the problem of detecting case in Unicode, as it can also vary by locale. This does make the isUpper/Lower and toUpper/Lower functions consistent. Make Char.isLower and Char.isUpper Unicode-aware This allows people with non-ASCII alphabets work with `Char.isLower` and `Char.isUpper`. Uses `toUpper` and `toLower` underneath, which use Javascript's `String.prototype.toLower/UpperCase()`. The second condition in the functions is there to distinguish between characters that have an upper/lower-case pairing, and those that don't (`'0' == Char.toLower '0'` but we don't want `isLower '0'` to be true).
|
|
@miniBill, good to know. So, would you argue that |
|
https://ellie-app.com/gBmgVVFzhbRa1 this contains a table of all the 1441 codepoints that give wrong results with the current proposal I personally think the proposal is the best compromise between accuracy and size/speed. Getting better results belongs in external packages (like elm-unicode) |
|
This PR causes a compile error when it is used: elm-janitor/apply-patches#1 |
|
@rupertlssmith I can't edit this PR's code anymore, see #1138 for compilable code. |
This allows people with non-ASCII alphabets work with
Char.isLowerandChar.isUpper. UsestoUpperandtoLowerunderneath, which use Javascript'sString.prototype.toLower/UpperCase().The second condition in the functions is there to distinguish between characters that have an upper/lower-case pairing, and those that don't (
'0' == Char.toLower '0'but we don't wantisLower '0'to be true).EDIT: I can't update code of this PR anymore; there is #1138 with a fix for the
==and/=import.