Skip to content

Conversation

@dyc3
Copy link
Contributor

@dyc3 dyc3 commented Dec 20, 2025

Summary

This PR makes it so we avoid touching the basename store if we can trivially check to see that a delimiter comes immediately after. This only works for ascii, but since tailwind is majority ascii characters, this works great.

On my machine, this results in anywhere from +2-18% throughput for uncached benches, and +2-36% for cached benches. No idea why codspeed isn't showing that.

Test Plan

Docs

@changeset-bot
Copy link

changeset-bot bot commented Dec 20, 2025

⚠️ No Changeset found

Latest commit: a020914

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor Author

dyc3 commented Dec 20, 2025

@github-actions github-actions bot added A-Parser Area: parser L-Tailwind Language: Tailwind CSS labels Dec 20, 2025
@dyc3 dyc3 requested review from a team December 20, 2025 20:50
@dyc3 dyc3 marked this pull request as ready for review December 20, 2025 20:51
@dyc3 dyc3 changed the base branch from dyc3/tw-parser-fixes to graphite-base/8528 December 20, 2025 20:51
@dyc3 dyc3 force-pushed the graphite-base/8528 branch from 77eeab3 to b8d3bcb Compare December 20, 2025 20:52
@dyc3 dyc3 force-pushed the dyc3/tw-parser-shortcircuit-trie branch from 8d85823 to 4ea9996 Compare December 20, 2025 20:52
@graphite-app graphite-app bot changed the base branch from graphite-base/8528 to main December 20, 2025 20:53
@dyc3 dyc3 force-pushed the dyc3/tw-parser-shortcircuit-trie branch from 4ea9996 to a020914 Compare December 20, 2025 20:53
@codspeed-hq
Copy link

codspeed-hq bot commented Dec 20, 2025

CodSpeed Performance Report

Merging #8528 will not alter performance

Comparing dyc3/tw-parser-shortcircuit-trie (a020914) with main (b8d3bcb)

Summary

✅ 10 untouched
⏩ 145 skipped1

Footnotes

  1. 145 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 20, 2025

Walkthrough

This PR optimises the Tailwind basename tokenisation by introducing an ASCII fast-path that scans for dashes and delimiters before consulting the trie-based BASENAME_STORE. The is_delimiter function in base_name_store.rs is made accessible at crate level via pub(crate) visibility, and the lexer's basename matching logic now leverages this for early termination. The handling of the "data" keyword (emitting DATA_KW) is preserved, with fallback to the existing dashed-basename trie when the fast-path doesn't yield a definitive result.

Possibly related PRs

Suggested reviewers

  • ematipico

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main performance optimisation: avoiding unnecessary trie lookups by checking for delimiters first in ASCII strings.
Description check ✅ Passed The PR description clearly explains the motivation (avoiding unnecessary trie lookups for ASCII delimiters), the approach (ASCII fast-path check), and reports benchmark improvements.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dyc3/tw-parser-shortcircuit-trie

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/biome_tailwind_parser/src/lexer/mod.rs (1)

125-161: Add explicit test case for the data keyword.

The fast-path logic is exercised through existing tests (e.g., "block", "hover:") and the property-based losslessness test. However, there's no explicit test for the "data" keyword, which receives special handling in consume_base. Add a test case for "data" as a standalone token to document this behaviour.

🧹 Nitpick comments (2)
crates/biome_tailwind_parser/src/lexer/mod.rs (2)

131-150: Nice optimisation! Fast-path logic looks sound.

The ASCII scan correctly handles undashed basenames before consulting the trie, with proper fallback when hitting a dash. The early exits for "data" and simple basenames are well-positioned.

One clarification: the boundary check on line 147 intentionally excludes dash (unlike the trie's is_boundary_byte), ensuring dashed basenames like "border-t" fall through to the trie. Consider adding an inline comment noting this distinction for future maintainers.

💡 Optional: Add clarifying comment
+        // Note: We only accept delimiter or end-of-input here (not dash).
+        // If we hit a dash, we defer to the trie to match dashed basenames like "border-t".
         if end > 0 && (end == slice.len() || is_delimiter(slice[end])) {
             self.advance(end);
             return TW_BASE;
         }

142-145: Duplicate "data" check—consider clarifying intent.

The "data" keyword is checked at lines 142–145 (fast-path) and again at lines 156–157 (trie fallback). The second check appears redundant in typical cases since the fast-path catches "data" before a dash or delimiter. If it's defensive for edge cases, a brief comment would help future readers.

💡 Optional: Add comment explaining redundancy
         // Fallback to dashed-basename trie matching for cases with '-' inside the basename
         let dashed_end = BASENAME_STORE.matcher(slice).base_end();
         self.advance(dashed_end);
 
+        // Redundant safety check: the fast-path above typically handles "data",
+        // but we recheck here in case of unusual trie-fallback edge cases.
         if dashed_end == 4 && &slice[..dashed_end] == b"data" {
             DATA_KW
         } else {
             TW_BASE
         }

Also applies to: 156-157

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8d3bcb and a020914.

📒 Files selected for processing (2)
  • crates/biome_tailwind_parser/src/lexer/base_name_store.rs (1 hunks)
  • crates/biome_tailwind_parser/src/lexer/mod.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
crates/**/*.rs

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates

Files:

  • crates/biome_tailwind_parser/src/lexer/base_name_store.rs
  • crates/biome_tailwind_parser/src/lexer/mod.rs
🧠 Learnings (13)
📓 Common learnings
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rule functions must be prefixed with `parse_` and use the name defined in the grammar file, e.g., `parse_for_statement` or `parse_expression`

Applied to files:

  • crates/biome_tailwind_parser/src/lexer/base_name_store.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new language prefix to the `LANGUAGE_PREFIXES` constant in `language_kind.rs` file

Applied to files:

  • crates/biome_tailwind_parser/src/lexer/base_name_store.rs
  • crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`

Applied to files:

  • crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens

Applied to files:

  • crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities

Applied to files:

  • crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets

Applied to files:

  • crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/biome_rule_options/lib/**/*.rs : Use `rename_all = "camelCase"` in serde derive macro for rule options

Applied to files:

  • crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `ParseSeparatedList` and `ParseNodeList` for parsing lists with error recovery to avoid infinite loops

Applied to files:

  • crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes

Applied to files:

  • crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:12.048Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_service/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:12.048Z
Learning: Applies to crates/biome_service/src/workspace/watcher.tests.rs : Implement watcher tests for workspace methods in watcher.tests.rs and end-to-end tests in LSP tests

Applied to files:

  • crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options

Applied to files:

  • crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found

Applied to files:

  • crates/biome_tailwind_parser/src/lexer/mod.rs
🧬 Code graph analysis (1)
crates/biome_tailwind_parser/src/lexer/mod.rs (1)
crates/biome_tailwind_parser/src/lexer/base_name_store.rs (1)
  • is_delimiter (152-158)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Lint project (depot-ubuntu-24.04-arm-16)
  • GitHub Check: Test (depot-ubuntu-24.04-arm-16)
  • GitHub Check: Test (depot-windows-2022-16)
  • GitHub Check: Check Dependencies
  • GitHub Check: End-to-end tests
  • GitHub Check: Lint project (depot-windows-2022)
  • GitHub Check: Documentation
  • GitHub Check: Test Node.js API
  • GitHub Check: Bench (biome_tailwind_parser)
  • GitHub Check: autofix
🔇 Additional comments (2)
crates/biome_tailwind_parser/src/lexer/base_name_store.rs (1)

152-152: LGTM! Visibility change enables the fast-path optimisation.

Exposing is_delimiter at crate level allows the lexer to perform boundary checks during the ASCII fast-path without duplicating delimiter logic.

crates/biome_tailwind_parser/src/lexer/mod.rs (1)

4-4: LGTM!

Import addition enables the fast-path delimiter checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Parser Area: parser L-Tailwind Language: Tailwind CSS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants