Skip to content

Conversation

@ematipico
Copy link
Member

Summary

Closes #7837

I wanted to fix the panic too, however, that would require more work and knowledge that I don't have at the moment. With tokio, there's catch_unwind function, however it raises various panics, https://stackoverflow.com/a/78509280/273355

We can do this later once we find the right types to use.

Test Plan

Added new tests, and manually tested the LSP issue.

Docs

N/A

@changeset-bot
Copy link

changeset-bot bot commented Oct 27, 2025

🦋 Changeset detected

Latest commit: 737e103

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 13 packages
Name Type
@biomejs/biome Patch
@biomejs/cli-win32-x64 Patch
@biomejs/cli-win32-arm64 Patch
@biomejs/cli-darwin-x64 Patch
@biomejs/cli-darwin-arm64 Patch
@biomejs/cli-linux-x64 Patch
@biomejs/cli-linux-arm64 Patch
@biomejs/cli-linux-x64-musl Patch
@biomejs/cli-linux-arm64-musl Patch
@biomejs/wasm-web Patch
@biomejs/wasm-bundler Patch
@biomejs/wasm-nodejs Patch
@biomejs/backend-jsonrpc Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@ematipico ematipico requested review from a team October 27, 2025 14:23
@github-actions github-actions bot added A-Project Area: project A-Parser Area: parser A-LSP Area: language server protocol L-HTML Language: HTML labels Oct 27, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Oct 27, 2025

CodSpeed Performance Report

Merging #7901 will not alter performance

Comparing fix/parsing-text-expressions (239a712) with main (d0dcc7e)

Summary

✅ 53 untouched
⏩ 85 skipped1

Footnotes

  1. 85 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 27, 2025

Walkthrough

This PR fixes a crash in the HTML parser when opening Astro files with experimental full HTML support enabled. It refactors the lexer's consume_single_text_expression method to use a bracket-stack mechanism for tracking nested curly brackets, updates syntax parsing logic accordingly, and adjusts the workspace server's file source selection. Debug logging is cleaned up and test files for nested expressions are added.

Possibly related PRs

Suggested labels

A-Core

Suggested reviewers

  • dyc3
  • siketyan

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title Check ✅ Passed The title "fix(core): text expression parsing and LSP file opening" clearly and concisely describes the two main aspects of this PR. It accurately reflects both the nested curly bracket parsing fix in the HTML lexer and the LSP file opening fix for astro file crashes. The title is specific enough that someone scanning commit history would quickly understand the nature of these changes without relying on emoji or vague terminology.
Linked Issues Check ✅ Passed The PR successfully addresses all coding requirements from issue #7837. The lexer modifications fix nested curly bracket handling through a brackets_stack mechanism in consume_single_text_expression, the parser syntax changes adjust single text expression remapping, and the workspace server modifications ensure experimental HTML support is properly applied to astro/vue/svelte/html files. These changes directly resolve the reported downcast error and prevent the crash that occurred when opening astro files. Test files provide coverage for the fixed scenarios.
Out of Scope Changes Check ✅ Passed All changes in the PR remain in scope for fixing issue #7837. The lexer and parser modifications directly address the nested expression parsing crash, the workspace server changes ensure experimental HTML support is properly enabled for astro files, and the test files verify the fixes work correctly. The debug logging removal and addition are minor ancillary changes that don't introduce unrelated functionality. No out-of-scope changes detected.
Description Check ✅ Passed The description is directly related to the changeset. It references issue #7837, explains the crash problem when opening astro files with experimental HTML support enabled, outlines the fixes implemented, and documents the testing approach (new tests added and manual LSP testing). While concise, the description provides sufficient context about the problem being solved and avoids being off-topic or vague.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/parsing-text-expressions

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
crates/biome_html_parser/src/lexer/mod.rs (1)

225-228: Consider using a depth counter instead of a stack.

Since you only need to track nesting depth (not the actual brace positions), a simple usize counter would be more idiomatic than Vec<()>:

-    let mut brackets_stack = Vec::<()>::new();
+    let mut depth = 0;
     if self.previous_byte() == Some(b'{') {
-        brackets_stack.push(());
+        depth += 1;
     }

Then update the match arms accordingly (depth += 1 for {, depth -= 1 for }, check depth == 0).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0dcc7e and 5b85e3f.

⛔ Files ignored due to path filters (4)
  • crates/biome_html_parser/tests/html_specs/error/astro/nested_expression.astro.snap is excluded by !**/*.snap and included by **
  • crates/biome_html_parser/tests/html_specs/ok/astro/issue_7837.astro.snap is excluded by !**/*.snap and included by **
  • crates/biome_html_parser/tests/html_specs/ok/astro/multiple_nested_expression.astro.snap is excluded by !**/*.snap and included by **
  • crates/biome_html_parser/tests/html_specs/ok/astro/nested_expression.astro.snap is excluded by !**/*.snap and included by **
📒 Files selected for processing (11)
  • .changeset/proud-peaches-hide.md (1 hunks)
  • .changeset/wild-eels-send.md (1 hunks)
  • crates/biome_html_parser/src/lexer/mod.rs (2 hunks)
  • crates/biome_html_parser/src/syntax/mod.rs (1 hunks)
  • crates/biome_html_parser/tests/html_specs/error/astro/nested_expression.astro (1 hunks)
  • crates/biome_html_parser/tests/html_specs/ok/astro/issue_7837.astro (1 hunks)
  • crates/biome_html_parser/tests/html_specs/ok/astro/multiple_nested_expression.astro (1 hunks)
  • crates/biome_html_parser/tests/html_specs/ok/astro/nested_expression.astro (1 hunks)
  • crates/biome_lsp/src/handlers/text_document.rs (0 hunks)
  • crates/biome_parser/src/lexer.rs (1 hunks)
  • crates/biome_service/src/workspace/server.rs (2 hunks)
💤 Files with no reviewable changes (1)
  • crates/biome_lsp/src/handlers/text_document.rs
🧰 Additional context used
📓 Path-based instructions (4)
.changeset/*.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

.changeset/*.md: In changeset files, only use #### or ##### headers
Changesets should describe user-facing changes; internal-only changes do not need changesets
Use past tense for what you did in the changeset description and present tense for current behavior
For bug fixes, start the changeset description with a link to the issue (e.g., Fixed #1234: ...)
When referencing a rule or assist in a changeset, include a link to the rule/assist page on the website
Include a code block in the changeset when applicable to illustrate the change
End every sentence in a changeset with a full stop (.)

Files:

  • .changeset/proud-peaches-hide.md
  • .changeset/wild-eels-send.md
crates/biome_service/src/workspace/server.rs

📄 CodeRabbit inference engine (crates/biome_service/CONTRIBUTING.md)

WorkspaceServer must maintain workspace state and is used in the daemon and CLI daemonless mode

Files:

  • crates/biome_service/src/workspace/server.rs
**/*.{rs,toml}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Format Rust and TOML files before committing (e.g., via just f)

Files:

  • crates/biome_service/src/workspace/server.rs
  • crates/biome_parser/src/lexer.rs
  • crates/biome_html_parser/src/syntax/mod.rs
  • crates/biome_html_parser/src/lexer/mod.rs
**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Document rules, assists, and their options with inline rustdoc in the Rust source

Files:

  • crates/biome_service/src/workspace/server.rs
  • crates/biome_parser/src/lexer.rs
  • crates/biome_html_parser/src/syntax/mod.rs
  • crates/biome_html_parser/src/lexer/mod.rs
🧠 Learnings (1)
📚 Learning: 2025-10-15T09:24:31.042Z
Learnt from: CR
PR: biomejs/biome#0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-10-15T09:24:31.042Z
Learning: Lexer must implement the biome_parser::Lexer trait

Applied to files:

  • crates/biome_parser/src/lexer.rs
🧬 Code graph analysis (2)
crates/biome_service/src/workspace/server.rs (2)
packages/@biomejs/backend-jsonrpc/src/workspace.ts (1)
  • DocumentFileSource (9474-9482)
crates/biome_service/src/file_handlers/mod.rs (1)
  • from_path (283-285)
crates/biome_parser/src/lexer.rs (2)
crates/biome_html_parser/src/lexer/mod.rs (1)
  • position (836-838)
crates/biome_css_parser/src/lexer/mod.rs (1)
  • position (115-117)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: Test Node.js API
  • GitHub Check: Documentation
  • GitHub Check: Test (depot-windows-2022-16)
  • GitHub Check: Test (depot-ubuntu-24.04-arm-16)
  • GitHub Check: Check Dependencies
  • GitHub Check: Lint project (depot-windows-2022)
  • GitHub Check: autofix
  • GitHub Check: Bench (biome_css_parser)
  • GitHub Check: Bench (biome_css_formatter)
  • GitHub Check: Bench (biome_css_analyze)
  • GitHub Check: Bench (biome_graphql_parser)
  • GitHub Check: Bench (biome_js_analyze)
  • GitHub Check: Bench (biome_js_formatter)
  • GitHub Check: Bench (biome_graphql_formatter)
  • GitHub Check: Bench (biome_js_parser)
  • GitHub Check: Bench (biome_json_formatter)
  • GitHub Check: Bench (biome_json_parser)
  • GitHub Check: Bench (biome_json_analyze)
🔇 Additional comments (10)
crates/biome_service/src/workspace/server.rs (2)

308-324: The source rewriting logic correctly addresses the LSP crash.

This fix properly handles the case where the LSP provides a document_file_source that doesn't account for experimental full HTML support. The conditional rewrite ensures Astro, Vue, Svelte, and HTML files get the correct source determination.


401-401: Verify whether this debug log is intentional.

This debug statement dumps the entire parse tree before embedded language parsing. If this was added for debugging the fix, it should be removed. If it's intended to remain, please add a comment explaining why.

.changeset/proud-peaches-hide.md (1)

1-5: LGTM!

The changeset correctly describes the parsing fix and follows all guidelines.

crates/biome_html_parser/tests/html_specs/ok/astro/issue_7837.astro (1)

1-16: Appropriate test fixture for issue #7837.

This test file correctly represents the scenario that was causing the crash, providing good regression coverage.

crates/biome_html_parser/tests/html_specs/ok/astro/nested_expression.astro (1)

1-5: Good test coverage for nested expressions.

This test ensures the parser correctly handles nested curly brackets within mapped expressions, which was the root cause of the parsing failure.

crates/biome_html_parser/tests/html_specs/error/astro/nested_expression.astro (1)

1-5: Good error case coverage.

This test ensures the parser properly reports errors for malformed nested expressions (missing closing parenthesis), complementing the success case.

crates/biome_html_parser/src/syntax/mod.rs (1)

566-575: Text expression parsing updated to align with lexer changes.

The change from HtmlLexContext::TextExpression(self.kind) to HtmlLexContext::InsideTag for single text expressions correctly reflects the updated lexer behavior, where consume_single_text_expression now handles nested braces internally with a bracket stack.

crates/biome_html_parser/tests/html_specs/ok/astro/multiple_nested_expression.astro (1)

1-17: Test coverage looks solid.

The test file appropriately exercises the nested brace parsing logic with both expected and error cases. The comments clearly document the intent, which will help maintainers understand the edge cases being tested.

crates/biome_html_parser/src/lexer/mod.rs (2)

779-782: Call site correctly updated.

The call to consume_single_text_expression() properly reflects the signature change.


226-228: This review comment is incorrect—the code already handles the scenario safely.

The function consume_single_text_expression() can only be called after the parser has verified and consumed an opening { (see syntax/mod.rs:512). At that point, position ≥ 1, so previous_byte() will never return None.

Even hypothetically, if the brackets stack started empty, the code is safe: Vec::pop() returns None without panicking, and the subsequent is_empty() check correctly identifies the condition, causing the loop to break on the first } encountered.

Likely an incorrect or invalid review comment.

ematipico and others added 2 commits October 27, 2025 22:12
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@ematipico ematipico requested a review from dyc3 October 28, 2025 08:22
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 239a712 and 737e103.

📒 Files selected for processing (1)
  • crates/biome_html_parser/src/lexer/mod.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{rs,toml}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Format Rust and TOML files before committing (e.g., via just f)

Files:

  • crates/biome_html_parser/src/lexer/mod.rs
**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Document rules, assists, and their options with inline rustdoc in the Rust source

Files:

  • crates/biome_html_parser/src/lexer/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: End-to-end tests
  • GitHub Check: Test (depot-ubuntu-24.04-arm-16)
  • GitHub Check: Documentation
  • GitHub Check: Test (depot-windows-2022-16)
  • GitHub Check: Check Dependencies
  • GitHub Check: Lint project (depot-windows-2022)
  • GitHub Check: Lint project (depot-ubuntu-24.04-arm-16)
  • GitHub Check: Test Node.js API
  • GitHub Check: autofix
  • GitHub Check: Validate PR title
🔇 Additional comments (1)
crates/biome_html_parser/src/lexer/mod.rs (1)

781-781: Call site correctly updated.

The call to consume_single_text_expression() now matches the updated signature.

Comment on lines +224 to 251
fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind {
let mut brackets_stack = 0;
if self.prev_byte() == Some(b'{') {
brackets_stack += 1;
}
while let Some(current) = self.current_byte() {
match current {
b'}' => {
brackets_stack -= 1;
if brackets_stack == 0 {
break;
} else {
self.advance(1);
}
}
HTML_LITERAL
b'{' => {
brackets_stack += 1;
self.advance(1);
}

_ => {
self.advance(1);
}
}
}

HTML_LITERAL
}
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Clarify the brackets_stack initialisation to prevent potential underflow.

The initialisation logic on lines 226-228 conditionally sets brackets_stack = 1 only if prev_byte() returns Some(b'{'). If this precondition isn't met, the stack remains at 0, and the first } encountered would decrement it to -1 (since Rust infers i32). The condition brackets_stack == 0 would then never be true, causing the function to consume bytes well beyond the intended expression boundary until EOF.

Whilst the precondition likely always holds in the current implementation (since this function is called after consuming an opening brace), making this assumption explicit would improve robustness and clarity, especially against future refactoring.

Consider one of these approaches:

Option 1: Add a debug assertion to verify the precondition

 fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind {
+    debug_assert_eq!(self.prev_byte(), Some(b'{'), "consume_single_text_expression must be called immediately after consuming '{'");
     let mut brackets_stack = 0;
     if self.prev_byte() == Some(b'{') {
         brackets_stack += 1;
     }

Option 2: Unconditionally initialise to 1 (if the precondition always holds)

 fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind {
-    let mut brackets_stack = 0;
-    if self.prev_byte() == Some(b'{') {
-        brackets_stack += 1;
-    }
+    // We're always called after consuming an opening '{', so start at depth 1
+    let mut brackets_stack = 1;

Option 3: Use an unsigned type and handle underflow explicitly

-    let mut brackets_stack = 0;
+    let mut brackets_stack: u32 = 0;
     if self.prev_byte() == Some(b'{') {
         brackets_stack += 1;
     }
     while let Some(current) = self.current_byte() {
         match current {
             b'}' => {
+                if brackets_stack == 0 {
+                    // Unexpected closing brace without matching opening
+                    break;
+                }
                 brackets_stack -= 1;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind {
let mut brackets_stack = 0;
if self.prev_byte() == Some(b'{') {
brackets_stack += 1;
}
while let Some(current) = self.current_byte() {
match current {
b'}' => {
brackets_stack -= 1;
if brackets_stack == 0 {
break;
} else {
self.advance(1);
}
}
HTML_LITERAL
b'{' => {
brackets_stack += 1;
self.advance(1);
}
_ => {
self.advance(1);
}
}
}
HTML_LITERAL
}
fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind {
debug_assert_eq!(self.prev_byte(), Some(b'{'), "consume_single_text_expression must be called immediately after consuming '{'");
let mut brackets_stack = 0;
if self.prev_byte() == Some(b'{') {
brackets_stack += 1;
}
while let Some(current) = self.current_byte() {
match current {
b'}' => {
brackets_stack -= 1;
if brackets_stack == 0 {
break;
} else {
self.advance(1);
}
}
b'{' => {
brackets_stack += 1;
self.advance(1);
}
_ => {
self.advance(1);
}
}
}
HTML_LITERAL
}
Suggested change
fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind {
let mut brackets_stack = 0;
if self.prev_byte() == Some(b'{') {
brackets_stack += 1;
}
while let Some(current) = self.current_byte() {
match current {
b'}' => {
brackets_stack -= 1;
if brackets_stack == 0 {
break;
} else {
self.advance(1);
}
}
HTML_LITERAL
b'{' => {
brackets_stack += 1;
self.advance(1);
}
_ => {
self.advance(1);
}
}
}
HTML_LITERAL
}
fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind {
// We're always called after consuming an opening '{', so start at depth 1
let mut brackets_stack = 1;
while let Some(current) = self.current_byte() {
match current {
b'}' => {
brackets_stack -= 1;
if brackets_stack == 0 {
break;
} else {
self.advance(1);
}
}
b'{' => {
brackets_stack += 1;
self.advance(1);
}
_ => {
self.advance(1);
}
}
}
HTML_LITERAL
}
Suggested change
fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind {
let mut brackets_stack = 0;
if self.prev_byte() == Some(b'{') {
brackets_stack += 1;
}
while let Some(current) = self.current_byte() {
match current {
b'}' => {
brackets_stack -= 1;
if brackets_stack == 0 {
break;
} else {
self.advance(1);
}
}
HTML_LITERAL
b'{' => {
brackets_stack += 1;
self.advance(1);
}
_ => {
self.advance(1);
}
}
}
HTML_LITERAL
}
fn consume_single_text_expression(&mut self) -> HtmlSyntaxKind {
let mut brackets_stack: u32 = 0;
if self.prev_byte() == Some(b'{') {
brackets_stack += 1;
}
while let Some(current) = self.current_byte() {
match current {
b'}' => {
if brackets_stack == 0 {
// Unexpected closing brace without matching opening
break;
}
brackets_stack -= 1;
if brackets_stack == 0 {
break;
} else {
self.advance(1);
}
}
b'{' => {
brackets_stack += 1;
self.advance(1);
}
_ => {
self.advance(1);
}
}
}
HTML_LITERAL
}
🤖 Prompt for AI Agents
In crates/biome_html_parser/src/lexer/mod.rs around lines 224-251, the
brackets_stack is conditionally initialized which can underflow if prev_byte()
isn't '{'; replace the conditional init with an unconditional let mut
brackets_stack = 1 and add a debug_assert!(self.prev_byte() == Some(b'{'))
before the loop to document the precondition and prevent silent underflow while
keeping the existing loop logic unchanged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai While this is technically true, the function is called always after the { is consumed, so it's safe to assume that the previous by is {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ematipico, thank you for clarifying! You're right—if the function is always called immediately after consuming {, the current initialization is safe. I appreciate the context about the call sites.

@ematipico ematipico merged commit 0d17b05 into main Oct 28, 2025
13 checks passed
@ematipico ematipico deleted the fix/parsing-text-expressions branch October 28, 2025 13:48
@github-actions github-actions bot mentioned this pull request Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LSP Area: language server protocol A-Parser Area: parser A-Project Area: project L-HTML Language: HTML

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Biome crashed while open astro file

3 participants