Skip to content

Conversation

@barijaona
Copy link
Member

Fix issue #2002: Vienna crashes when the NSXMLDocument parser tries to parse a particular HTML page with the NSXMLDocumentTidyXML option.

Also fixes issue #1545: problem parsing feeds which contain HTML tags instead of their XHTML equivalents, without reopening issue #1073: crash when attempting to fetch https://news.ycombinator.com/item?id=16264662

@barijaona
Copy link
Member Author

This also works around a problem I noticed in the JoyOfTech feed https://www.joyoftech.com/joyoftech/jotblog/atom.xml
caused by <item> tag mismatches.

However, this led me to realize that the split of XML feed parsing into multiple classes we did in c41991e also made the parser less tolerant to ill-formed feeds: when the feed structure confuses Atom tags and RSS tags (for instance using <description> instead of <content> or vice-versa), the former version was able to retrieve informations, while the current one will not.

I much prefer the prior behavior.

@Eitot
Copy link
Contributor

Eitot commented Aug 26, 2025

However, this led me to realize that the split of XML feed parsing into multiple classes we did in c41991e also made the parser less tolerant to ill-formed feeds: when the feed structure confuses Atom tags and RSS tags (for instance using <description> instead of <content> or vice-versa), the former version was able to retrieve informations, while the current one will not.

I don't recall that there was mixed parsing. The prior RichXMLParser class had separate code paths depending on the root element. That is why I chose to split the class into multiple classes in the first place. The actual parsing code didn't change.

I would also not be in favour of mixing the parsing code. It leads to more complexity for what presumably is a rare problem. I am sympathetic to having a lenient parser, but there should be limits to this. Parsing known elements from other feed types is a bit too much, in my opinion.

@barijaona
Copy link
Member Author

Added more flexibility in each parser

@barijaona barijaona changed the title Improve and secure parsing of XML/XHTML mixes Secure parsing of XML/XHTML mixes or Atom/RSS confusions Aug 27, 2025
Fix issue ViennaRSS#2002: Vienna crashes when the NSXMLDocument parser tries to
parse a particular HTML page with the NSXMLDocumentTidyXML option.

Also fixes issue ViennaRSS#1545: problem parsing feeds which contain HTML tags
instead of their XHTML equivalents, without reopening issue ViennaRSS#1073: crash
when attempting to fetch https://news.ycombinator.com/item?id=16264662
Some feeds confuse RSS and Atom specifications
@barijaona barijaona merged commit b3bee1a into ViennaRSS:master Sep 5, 2025
2 checks passed
@barijaona barijaona deleted the betterFeedParser branch September 5, 2025 04:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants