-
-
Notifications
You must be signed in to change notification settings - Fork 348
Description
- Are you running the latest version?
- Have you included sample input, output, error, and expected output?
- Have you checked if you are using correct configuration?
- Did you try online tool?
- Have you checked the docs for helpful APIs and examples?
Description
I encountered a case where a text content like <...> inside a tag is incorrectly handled by the parser. The parser tries to interpret this as a tag name and breaks the tree structure. This is also applicable to the symbol \U+2026 Horizontal Ellipsis
What does <...> mean?
In text, <...> usually means something has been intentionally left out. It’s a placeholder that implies omitted words, omitted text that isn’t being shown, quoted, or repeated
However, the . symbol can indeed be used in the XML tag name, but only if it is not the starting character (NameStartChar in spec). And the ellipsis symbol seems to be invalid for NameChar. Please look at the Names and Tokens section here https://www.w3.org/TR/xml/#sec-common-syn
Also, while writing this issue, I tried using only the angle bracket characters individually and saw strange behavior. I've described this below as a second case.
I assume that the correct parser behavior if the angle brackets don't form a valid tag name according to XML specification would be to leave the angle bracket characters in the #text element or escape them in entities.
Input
Code
const xmlParser = new XMLParser({
preserveOrder: true,
allowBooleanAttributes: true,
ignoreAttributes: false,
ignoreDeclaration: true,
});
const firstCase = `<?xml version="1.0"?>
<root>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer pretium odio non ex hendrerit, eu convallis sapien ultricies <...> Sed sagittis at est auctor varius. Donec sit amet nibh sodales, varius nunc eu, tempus turpis. <...> Nulla gravida erat a tortor sollicitudin laoreet.</p>
<foo></foo>
</root>`;
const secondCase = `<?xml version="1.0"?>
<root>
<p>if (1 < 3) return text;</p>
</root>
`;
const jObj = xmlParser.parse(firstCase); // check first and second case
const xmlBuilder = new XMLBuilder({
ignoreAttributes: false,
preserveOrder: true,
});
const xmlContent = xmlBuilder.build(jObj);
console.log(xmlContent);Output
In the first case:
jObj
[
{
"root": [
{
"p": [
{
"#text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer pretium odio non ex hendrerit, eu convallis sapien ultricies"
},
{
"...": [
{
"#text": "Sed sagittis at est auctor varius. Donec sit amet nibh sodales, varius nunc eu, tempus turpis."
},
{
"...": [
{
"#text": "Nulla gravida erat a tortor sollicitudin laoreet."
}
]
},
{
"foo": []
}
]
}
]
}
]
}
]<root><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer pretium odio non ex hendrerit, eu convallis sapien ultricies<...>Sed sagittis at est auctor varius. Donec sit amet nibh sodales, varius nunc eu, tempus turpis.<...>Nulla gravida erat a tortor sollicitudin laoreet.</...><foo></foo></...></p></root>In the second case:
jObj
[
{
"root": [
{
"p": [
{
"#text": "if (1"
},
{
"": [],
":@": {
"@_3)": true,
"@_return": true,
"@_text;": true,
"@_</p": true
}
}
]
}
]
}
]<root><p>if (1< 3) return text; </p></></p></root>expected data
In the first case:
<root><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer pretium odio non ex hendrerit, eu convallis sapien ultricies <...> Sed sagittis at est auctor varius. Donec sit amet nibh sodales, varius nunc eu, tempus turpis. <...> Nulla gravida erat a tortor sollicitudin laoreet.</p><foo></foo></root>In the second case:
<root><p>if (1 < 3) return text;</p></root>Would you like to work on this issue?
- Yes
- No
Bookmark this repository for further updates. Visit SoloThought to know about recent features.