Secure parsing of XML/XHTML mixes or Atom/RSS confusions #2008

barijaona · 2025-08-24T14:08:18Z

Fix issue #2002: Vienna crashes when the NSXMLDocument parser tries to parse a particular HTML page with the NSXMLDocumentTidyXML option.

Also fixes issue #1545: problem parsing feeds which contain HTML tags instead of their XHTML equivalents, without reopening issue #1073: crash when attempting to fetch https://news.ycombinator.com/item?id=16264662

barijaona · 2025-08-26T08:46:40Z

This also works around a problem I noticed in the JoyOfTech feed https://www.joyoftech.com/joyoftech/jotblog/atom.xml
caused by <item> tag mismatches.

However, this led me to realize that the split of XML feed parsing into multiple classes we did in c41991e also made the parser less tolerant to ill-formed feeds: when the feed structure confuses Atom tags and RSS tags (for instance using <description> instead of <content> or vice-versa), the former version was able to retrieve informations, while the current one will not.

I much prefer the prior behavior.

Eitot · 2025-08-26T15:11:39Z

However, this led me to realize that the split of XML feed parsing into multiple classes we did in c41991e also made the parser less tolerant to ill-formed feeds: when the feed structure confuses Atom tags and RSS tags (for instance using <description> instead of <content> or vice-versa), the former version was able to retrieve informations, while the current one will not.

I don't recall that there was mixed parsing. The prior RichXMLParser class had separate code paths depending on the root element. That is why I chose to split the class into multiple classes in the first place. The actual parsing code didn't change.

I would also not be in favour of mixing the parsing code. It leads to more complexity for what presumably is a rare problem. I am sympathetic to having a lenient parser, but there should be limits to this. Parsing known elements from other feed types is a bit too much, in my opinion.

barijaona · 2025-08-27T09:53:57Z

Added more flexibility in each parser

Fix issue ViennaRSS#2002: Vienna crashes when the NSXMLDocument parser tries to parse a particular HTML page with the NSXMLDocumentTidyXML option. Also fixes issue ViennaRSS#1545: problem parsing feeds which contain HTML tags instead of their XHTML equivalents, without reopening issue ViennaRSS#1073: crash when attempting to fetch https://news.ycombinator.com/item?id=16264662

Some feeds confuse RSS and Atom specifications

barijaona changed the title ~~Improve and secure parsing of XML/XHTML mixes~~ Secure parsing of XML/XHTML mixes or Atom/RSS confusions Aug 27, 2025

barijaona added 3 commits August 28, 2025 11:42

Be flexible on parsing potential publication dates

649a8b6

Be flexible on parsing potential article texts

51d2372

Some feeds confuse RSS and Atom specifications

barijaona force-pushed the betterFeedParser branch from ba45c58 to 51d2372 Compare August 28, 2025 09:42

barijaona merged commit b3bee1a into ViennaRSS:master Sep 5, 2025
2 checks passed

barijaona deleted the betterFeedParser branch September 5, 2025 04:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Secure parsing of XML/XHTML mixes or Atom/RSS confusions #2008

Secure parsing of XML/XHTML mixes or Atom/RSS confusions #2008

Uh oh!

barijaona commented Aug 24, 2025

Uh oh!

barijaona commented Aug 26, 2025

Uh oh!

Eitot commented Aug 26, 2025

Uh oh!

barijaona commented Aug 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Secure parsing of XML/XHTML mixes or Atom/RSS confusions #2008

Secure parsing of XML/XHTML mixes or Atom/RSS confusions #2008

Uh oh!

Conversation

barijaona commented Aug 24, 2025

Uh oh!

barijaona commented Aug 26, 2025

Uh oh!

Eitot commented Aug 26, 2025

Uh oh!

barijaona commented Aug 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants