-
-
Notifications
You must be signed in to change notification settings - Fork 739
fix(core): text expression parsing and LSP file opening #7901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 737e103 The changes in this PR will be included in the next version bump. This PR includes changesets to release 13 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
CodSpeed Performance ReportMerging #7901 will not alter performanceComparing Summary
Footnotes
|
WalkthroughThis PR fixes a crash in the HTML parser when opening Astro files with experimental full HTML support enabled. It refactors the lexer's Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
crates/biome_html_parser/src/lexer/mod.rs (1)
225-228: Consider using a depth counter instead of a stack.Since you only need to track nesting depth (not the actual brace positions), a simple
usizecounter would be more idiomatic thanVec<()>:- let mut brackets_stack = Vec::<()>::new(); + let mut depth = 0; if self.previous_byte() == Some(b'{') { - brackets_stack.push(()); + depth += 1; }Then update the match arms accordingly (
depth += 1for{,depth -= 1for}, checkdepth == 0).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
crates/biome_html_parser/tests/html_specs/error/astro/nested_expression.astro.snapis excluded by!**/*.snapand included by**crates/biome_html_parser/tests/html_specs/ok/astro/issue_7837.astro.snapis excluded by!**/*.snapand included by**crates/biome_html_parser/tests/html_specs/ok/astro/multiple_nested_expression.astro.snapis excluded by!**/*.snapand included by**crates/biome_html_parser/tests/html_specs/ok/astro/nested_expression.astro.snapis excluded by!**/*.snapand included by**
📒 Files selected for processing (11)
.changeset/proud-peaches-hide.md(1 hunks).changeset/wild-eels-send.md(1 hunks)crates/biome_html_parser/src/lexer/mod.rs(2 hunks)crates/biome_html_parser/src/syntax/mod.rs(1 hunks)crates/biome_html_parser/tests/html_specs/error/astro/nested_expression.astro(1 hunks)crates/biome_html_parser/tests/html_specs/ok/astro/issue_7837.astro(1 hunks)crates/biome_html_parser/tests/html_specs/ok/astro/multiple_nested_expression.astro(1 hunks)crates/biome_html_parser/tests/html_specs/ok/astro/nested_expression.astro(1 hunks)crates/biome_lsp/src/handlers/text_document.rs(0 hunks)crates/biome_parser/src/lexer.rs(1 hunks)crates/biome_service/src/workspace/server.rs(2 hunks)
💤 Files with no reviewable changes (1)
- crates/biome_lsp/src/handlers/text_document.rs
🧰 Additional context used
📓 Path-based instructions (4)
.changeset/*.md
📄 CodeRabbit inference engine (CONTRIBUTING.md)
.changeset/*.md: In changeset files, only use #### or ##### headers
Changesets should describe user-facing changes; internal-only changes do not need changesets
Use past tense for what you did in the changeset description and present tense for current behavior
For bug fixes, start the changeset description with a link to the issue (e.g., Fixed #1234: ...)
When referencing a rule or assist in a changeset, include a link to the rule/assist page on the website
Include a code block in the changeset when applicable to illustrate the change
End every sentence in a changeset with a full stop (.)
Files:
.changeset/proud-peaches-hide.md.changeset/wild-eels-send.md
crates/biome_service/src/workspace/server.rs
📄 CodeRabbit inference engine (crates/biome_service/CONTRIBUTING.md)
WorkspaceServer must maintain workspace state and is used in the daemon and CLI daemonless mode
Files:
crates/biome_service/src/workspace/server.rs
**/*.{rs,toml}
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Format Rust and TOML files before committing (e.g., via
just f)
Files:
crates/biome_service/src/workspace/server.rscrates/biome_parser/src/lexer.rscrates/biome_html_parser/src/syntax/mod.rscrates/biome_html_parser/src/lexer/mod.rs
**/*.rs
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Document rules, assists, and their options with inline rustdoc in the Rust source
Files:
crates/biome_service/src/workspace/server.rscrates/biome_parser/src/lexer.rscrates/biome_html_parser/src/syntax/mod.rscrates/biome_html_parser/src/lexer/mod.rs
🧠 Learnings (1)
📚 Learning: 2025-10-15T09:24:31.042Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-10-15T09:24:31.042Z
Learning: Lexer must implement the biome_parser::Lexer trait
Applied to files:
crates/biome_parser/src/lexer.rs
🧬 Code graph analysis (2)
crates/biome_service/src/workspace/server.rs (2)
packages/@biomejs/backend-jsonrpc/src/workspace.ts (1)
DocumentFileSource(9474-9482)crates/biome_service/src/file_handlers/mod.rs (1)
from_path(283-285)
crates/biome_parser/src/lexer.rs (2)
crates/biome_html_parser/src/lexer/mod.rs (1)
position(836-838)crates/biome_css_parser/src/lexer/mod.rs (1)
position(115-117)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: Test Node.js API
- GitHub Check: Documentation
- GitHub Check: Test (depot-windows-2022-16)
- GitHub Check: Test (depot-ubuntu-24.04-arm-16)
- GitHub Check: Check Dependencies
- GitHub Check: Lint project (depot-windows-2022)
- GitHub Check: autofix
- GitHub Check: Bench (biome_css_parser)
- GitHub Check: Bench (biome_css_formatter)
- GitHub Check: Bench (biome_css_analyze)
- GitHub Check: Bench (biome_graphql_parser)
- GitHub Check: Bench (biome_js_analyze)
- GitHub Check: Bench (biome_js_formatter)
- GitHub Check: Bench (biome_graphql_formatter)
- GitHub Check: Bench (biome_js_parser)
- GitHub Check: Bench (biome_json_formatter)
- GitHub Check: Bench (biome_json_parser)
- GitHub Check: Bench (biome_json_analyze)
🔇 Additional comments (10)
crates/biome_service/src/workspace/server.rs (2)
308-324: The source rewriting logic correctly addresses the LSP crash.This fix properly handles the case where the LSP provides a
document_file_sourcethat doesn't account for experimental full HTML support. The conditional rewrite ensures Astro, Vue, Svelte, and HTML files get the correct source determination.
401-401: Verify whether this debug log is intentional.This debug statement dumps the entire parse tree before embedded language parsing. If this was added for debugging the fix, it should be removed. If it's intended to remain, please add a comment explaining why.
.changeset/proud-peaches-hide.md (1)
1-5: LGTM!The changeset correctly describes the parsing fix and follows all guidelines.
crates/biome_html_parser/tests/html_specs/ok/astro/issue_7837.astro (1)
1-16: Appropriate test fixture for issue #7837.This test file correctly represents the scenario that was causing the crash, providing good regression coverage.
crates/biome_html_parser/tests/html_specs/ok/astro/nested_expression.astro (1)
1-5: Good test coverage for nested expressions.This test ensures the parser correctly handles nested curly brackets within mapped expressions, which was the root cause of the parsing failure.
crates/biome_html_parser/tests/html_specs/error/astro/nested_expression.astro (1)
1-5: Good error case coverage.This test ensures the parser properly reports errors for malformed nested expressions (missing closing parenthesis), complementing the success case.
crates/biome_html_parser/src/syntax/mod.rs (1)
566-575: Text expression parsing updated to align with lexer changes.The change from
HtmlLexContext::TextExpression(self.kind)toHtmlLexContext::InsideTagfor single text expressions correctly reflects the updated lexer behavior, whereconsume_single_text_expressionnow handles nested braces internally with a bracket stack.crates/biome_html_parser/tests/html_specs/ok/astro/multiple_nested_expression.astro (1)
1-17: Test coverage looks solid.The test file appropriately exercises the nested brace parsing logic with both expected and error cases. The comments clearly document the intent, which will help maintainers understand the edge cases being tested.
crates/biome_html_parser/src/lexer/mod.rs (2)
779-782: Call site correctly updated.The call to
consume_single_text_expression()properly reflects the signature change.
226-228: This review comment is incorrect—the code already handles the scenario safely.The function
consume_single_text_expression()can only be called after the parser has verified and consumed an opening{(see syntax/mod.rs:512). At that point, position ≥ 1, soprevious_byte()will never returnNone.Even hypothetically, if the brackets stack started empty, the code is safe:
Vec::pop()returnsNonewithout panicking, and the subsequentis_empty()check correctly identifies the condition, causing the loop to break on the first}encountered.Likely an incorrect or invalid review comment.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
crates/biome_html_parser/src/lexer/mod.rs(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{rs,toml}
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Format Rust and TOML files before committing (e.g., via
just f)
Files:
crates/biome_html_parser/src/lexer/mod.rs
**/*.rs
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Document rules, assists, and their options with inline rustdoc in the Rust source
Files:
crates/biome_html_parser/src/lexer/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: End-to-end tests
- GitHub Check: Test (depot-ubuntu-24.04-arm-16)
- GitHub Check: Documentation
- GitHub Check: Test (depot-windows-2022-16)
- GitHub Check: Check Dependencies
- GitHub Check: Lint project (depot-windows-2022)
- GitHub Check: Lint project (depot-ubuntu-24.04-arm-16)
- GitHub Check: Test Node.js API
- GitHub Check: autofix
- GitHub Check: Validate PR title
🔇 Additional comments (1)
crates/biome_html_parser/src/lexer/mod.rs (1)
781-781: Call site correctly updated.The call to
consume_single_text_expression()now matches the updated signature.
| fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind { | ||
| let mut brackets_stack = 0; | ||
| if self.prev_byte() == Some(b'{') { | ||
| brackets_stack += 1; | ||
| } | ||
| while let Some(current) = self.current_byte() { | ||
| match current { | ||
| b'}' => { | ||
| brackets_stack -= 1; | ||
| if brackets_stack == 0 { | ||
| break; | ||
| } else { | ||
| self.advance(1); | ||
| } | ||
| } | ||
| HTML_LITERAL | ||
| b'{' => { | ||
| brackets_stack += 1; | ||
| self.advance(1); | ||
| } | ||
|
|
||
| _ => { | ||
| self.advance(1); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| HTML_LITERAL | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Clarify the brackets_stack initialisation to prevent potential underflow.
The initialisation logic on lines 226-228 conditionally sets brackets_stack = 1 only if prev_byte() returns Some(b'{'). If this precondition isn't met, the stack remains at 0, and the first } encountered would decrement it to -1 (since Rust infers i32). The condition brackets_stack == 0 would then never be true, causing the function to consume bytes well beyond the intended expression boundary until EOF.
Whilst the precondition likely always holds in the current implementation (since this function is called after consuming an opening brace), making this assumption explicit would improve robustness and clarity, especially against future refactoring.
Consider one of these approaches:
Option 1: Add a debug assertion to verify the precondition
fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind {
+ debug_assert_eq!(self.prev_byte(), Some(b'{'), "consume_single_text_expression must be called immediately after consuming '{'");
let mut brackets_stack = 0;
if self.prev_byte() == Some(b'{') {
brackets_stack += 1;
}Option 2: Unconditionally initialise to 1 (if the precondition always holds)
fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind {
- let mut brackets_stack = 0;
- if self.prev_byte() == Some(b'{') {
- brackets_stack += 1;
- }
+ // We're always called after consuming an opening '{', so start at depth 1
+ let mut brackets_stack = 1;Option 3: Use an unsigned type and handle underflow explicitly
- let mut brackets_stack = 0;
+ let mut brackets_stack: u32 = 0;
if self.prev_byte() == Some(b'{') {
brackets_stack += 1;
}
while let Some(current) = self.current_byte() {
match current {
b'}' => {
+ if brackets_stack == 0 {
+ // Unexpected closing brace without matching opening
+ break;
+ }
brackets_stack -= 1;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind { | |
| let mut brackets_stack = 0; | |
| if self.prev_byte() == Some(b'{') { | |
| brackets_stack += 1; | |
| } | |
| while let Some(current) = self.current_byte() { | |
| match current { | |
| b'}' => { | |
| brackets_stack -= 1; | |
| if brackets_stack == 0 { | |
| break; | |
| } else { | |
| self.advance(1); | |
| } | |
| } | |
| HTML_LITERAL | |
| b'{' => { | |
| brackets_stack += 1; | |
| self.advance(1); | |
| } | |
| _ => { | |
| self.advance(1); | |
| } | |
| } | |
| } | |
| HTML_LITERAL | |
| } | |
| fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind { | |
| debug_assert_eq!(self.prev_byte(), Some(b'{'), "consume_single_text_expression must be called immediately after consuming '{'"); | |
| let mut brackets_stack = 0; | |
| if self.prev_byte() == Some(b'{') { | |
| brackets_stack += 1; | |
| } | |
| while let Some(current) = self.current_byte() { | |
| match current { | |
| b'}' => { | |
| brackets_stack -= 1; | |
| if brackets_stack == 0 { | |
| break; | |
| } else { | |
| self.advance(1); | |
| } | |
| } | |
| b'{' => { | |
| brackets_stack += 1; | |
| self.advance(1); | |
| } | |
| _ => { | |
| self.advance(1); | |
| } | |
| } | |
| } | |
| HTML_LITERAL | |
| } |
| fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind { | |
| let mut brackets_stack = 0; | |
| if self.prev_byte() == Some(b'{') { | |
| brackets_stack += 1; | |
| } | |
| while let Some(current) = self.current_byte() { | |
| match current { | |
| b'}' => { | |
| brackets_stack -= 1; | |
| if brackets_stack == 0 { | |
| break; | |
| } else { | |
| self.advance(1); | |
| } | |
| } | |
| HTML_LITERAL | |
| b'{' => { | |
| brackets_stack += 1; | |
| self.advance(1); | |
| } | |
| _ => { | |
| self.advance(1); | |
| } | |
| } | |
| } | |
| HTML_LITERAL | |
| } | |
| fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind { | |
| // We're always called after consuming an opening '{', so start at depth 1 | |
| let mut brackets_stack = 1; | |
| while let Some(current) = self.current_byte() { | |
| match current { | |
| b'}' => { | |
| brackets_stack -= 1; | |
| if brackets_stack == 0 { | |
| break; | |
| } else { | |
| self.advance(1); | |
| } | |
| } | |
| b'{' => { | |
| brackets_stack += 1; | |
| self.advance(1); | |
| } | |
| _ => { | |
| self.advance(1); | |
| } | |
| } | |
| } | |
| HTML_LITERAL | |
| } |
| fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind { | |
| let mut brackets_stack = 0; | |
| if self.prev_byte() == Some(b'{') { | |
| brackets_stack += 1; | |
| } | |
| while let Some(current) = self.current_byte() { | |
| match current { | |
| b'}' => { | |
| brackets_stack -= 1; | |
| if brackets_stack == 0 { | |
| break; | |
| } else { | |
| self.advance(1); | |
| } | |
| } | |
| HTML_LITERAL | |
| b'{' => { | |
| brackets_stack += 1; | |
| self.advance(1); | |
| } | |
| _ => { | |
| self.advance(1); | |
| } | |
| } | |
| } | |
| HTML_LITERAL | |
| } | |
| fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind { | |
| let mut brackets_stack: u32 = 0; | |
| if self.prev_byte() == Some(b'{') { | |
| brackets_stack += 1; | |
| } | |
| while let Some(current) = self.current_byte() { | |
| match current { | |
| b'}' => { | |
| if brackets_stack == 0 { | |
| // Unexpected closing brace without matching opening | |
| break; | |
| } | |
| brackets_stack -= 1; | |
| if brackets_stack == 0 { | |
| break; | |
| } else { | |
| self.advance(1); | |
| } | |
| } | |
| b'{' => { | |
| brackets_stack += 1; | |
| self.advance(1); | |
| } | |
| _ => { | |
| self.advance(1); | |
| } | |
| } | |
| } | |
| HTML_LITERAL | |
| } |
🤖 Prompt for AI Agents
In crates/biome_html_parser/src/lexer/mod.rs around lines 224-251, the
brackets_stack is conditionally initialized which can underflow if prev_byte()
isn't '{'; replace the conditional init with an unconditional let mut
brackets_stack = 1 and add a debug_assert!(self.prev_byte() == Some(b'{'))
before the loop to document the precondition and prevent silent underflow while
keeping the existing loop logic unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai While this is technically true, the function is called always after the { is consumed, so it's safe to assume that the previous by is {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ematipico, thank you for clarifying! You're right—if the function is always called immediately after consuming {, the current initialization is safe. I appreciate the context about the call sites.
Summary
Closes #7837
I wanted to fix the panic too, however, that would require more work and knowledge that I don't have at the moment. With
tokio, there'scatch_unwindfunction, however it raises various panics, https://stackoverflow.com/a/78509280/273355We can do this later once we find the right types to use.
Test Plan
Added new tests, and manually tested the LSP issue.
Docs
N/A