Skip to content

Conversation

karfau
Copy link
Member

@karfau karfau commented Jan 19, 2025

When introducing stricter parsing of the DOCTYPE and supporting more complex DTDs, we have not been aware that the HTML spec allows any kind of casing but only certain values.

To make sure this change doesn't cause more harm,
xmldom will still parse anything beside DTD syntax in HTML, and report warnings if there is anything unexpected in the systemId. (One example of this is parsing XHTML as HTML.)

closes #817

https://html.spec.whatwg.org/multipage/syntax.html#the-doctype

When introducing stricter parsing of the DOCTYPE and supporting more complex DTDs,
we have not been aware that the HTML spec allows any kind of casing but only certain values.

To make sure this change doesn't cause more harm,
xmldom will still parse anything beside DTD syntax in HTML,
and report warnings if there is anything unexpected in the systemId.
(One example of this is parsing XHTML as HTML.)

closes #817

https://html.spec.whatwg.org/multipage/syntax.html#the-doctype
@karfau karfau added xml:well-formed https://www.w3.org/TR/xml11/#dt-wellformed spec:HTML doctype everything related to doctype parsing and DTD labels Jan 19, 2025
@karfau karfau added this to the 0.9.7 milestone Jan 19, 2025
@karfau karfau requested a review from shunkica January 19, 2025 15:00
Copy link

codecov bot commented Jan 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.12%. Comparing base (c090f50) to head (b87ffc2).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #819      +/-   ##
==========================================
+ Coverage   95.08%   95.12%   +0.04%     
==========================================
  Files           8        8              
  Lines        2177     2195      +18     
  Branches      571      577       +6     
==========================================
+ Hits         2070     2088      +18     
  Misses        107      107              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@karfau karfau merged commit f3d67d1 into master Jan 19, 2025
38 checks passed
@karfau karfau deleted the fix-html-doctype-casing branch January 19, 2025 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doctype everything related to doctype parsing and DTD spec:HTML xml:well-formed https://www.w3.org/TR/xml11/#dt-wellformed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ParseError: Not well-formed XML starting with "<!" when doctype is lowercased

1 participant