-
-
Notifications
You must be signed in to change notification settings - Fork 791
perf(parse/tw): avoid going into basename store trie when not needed #8528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
77eeab3 to
b8d3bcb
Compare
8d85823 to
4ea9996
Compare
4ea9996 to
a020914
Compare
CodSpeed Performance ReportMerging #8528 will not alter performanceComparing Summary
Footnotes
|
WalkthroughThis PR optimises the Tailwind basename tokenisation by introducing an ASCII fast-path that scans for dashes and delimiters before consulting the trie-based BASENAME_STORE. The Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/biome_tailwind_parser/src/lexer/mod.rs (1)
125-161: Add explicit test case for thedatakeyword.The fast-path logic is exercised through existing tests (e.g., "block", "hover:") and the property-based losslessness test. However, there's no explicit test for the "data" keyword, which receives special handling in
consume_base. Add a test case for "data" as a standalone token to document this behaviour.
🧹 Nitpick comments (2)
crates/biome_tailwind_parser/src/lexer/mod.rs (2)
131-150: Nice optimisation! Fast-path logic looks sound.The ASCII scan correctly handles undashed basenames before consulting the trie, with proper fallback when hitting a dash. The early exits for "data" and simple basenames are well-positioned.
One clarification: the boundary check on line 147 intentionally excludes dash (unlike the trie's
is_boundary_byte), ensuring dashed basenames like "border-t" fall through to the trie. Consider adding an inline comment noting this distinction for future maintainers.💡 Optional: Add clarifying comment
+ // Note: We only accept delimiter or end-of-input here (not dash). + // If we hit a dash, we defer to the trie to match dashed basenames like "border-t". if end > 0 && (end == slice.len() || is_delimiter(slice[end])) { self.advance(end); return TW_BASE; }
142-145: Duplicate "data" check—consider clarifying intent.The "data" keyword is checked at lines 142–145 (fast-path) and again at lines 156–157 (trie fallback). The second check appears redundant in typical cases since the fast-path catches "data" before a dash or delimiter. If it's defensive for edge cases, a brief comment would help future readers.
💡 Optional: Add comment explaining redundancy
// Fallback to dashed-basename trie matching for cases with '-' inside the basename let dashed_end = BASENAME_STORE.matcher(slice).base_end(); self.advance(dashed_end); + // Redundant safety check: the fast-path above typically handles "data", + // but we recheck here in case of unusual trie-fallback edge cases. if dashed_end == 4 && &slice[..dashed_end] == b"data" { DATA_KW } else { TW_BASE }Also applies to: 156-157
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
crates/biome_tailwind_parser/src/lexer/base_name_store.rs(1 hunks)crates/biome_tailwind_parser/src/lexer/mod.rs(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
crates/**/*.rs
📄 CodeRabbit inference engine (CONTRIBUTING.md)
Update inline rustdoc documentation for rules, assists, and their options when adding new features or changing existing features in Rust crates
Files:
crates/biome_tailwind_parser/src/lexer/base_name_store.rscrates/biome_tailwind_parser/src/lexer/mod.rs
🧠 Learnings (13)
📓 Common learnings
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Parse rule functions must be prefixed with `parse_` and use the name defined in the grammar file, e.g., `parse_for_statement` or `parse_expression`
Applied to files:
crates/biome_tailwind_parser/src/lexer/base_name_store.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/language_kind.rs : Add a new language prefix to the `LANGUAGE_PREFIXES` constant in `language_kind.rs` file
Applied to files:
crates/biome_tailwind_parser/src/lexer/base_name_store.rscrates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Avoid string allocations by comparing against `&str` or using `TokenText`
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/lexer/mod.rs : Implement a `Lexer` trait from `biome_parser` crate for the lexer struct that consumes characters from source code and emits tokens
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement a token source struct that wraps the lexer and implements `TokenSourceWithBufferedLexer` and `LexerWithCheckpoint` for lookahead and re-lexing capabilities
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/*analyze/src/**/*.rs : Prefix line with `#` in documentation code examples sparingly; prefer concise complete snippets
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-12-19T12:53:30.399Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_analyze/CONTRIBUTING.md:0-0
Timestamp: 2025-12-19T12:53:30.399Z
Learning: Applies to crates/biome_analyze/**/biome_rule_options/lib/**/*.rs : Use `rename_all = "camelCase"` in serde derive macro for rule options
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `ParseSeparatedList` and `ParseNodeList` for parsing lists with error recovery to avoid infinite loops
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Use `p.eat(token)` for optional tokens, `p.expect(token)` for required tokens, `parse_rule(p).ok(p)` for optional nodes, and `parse_rule(p).or_add_diagnostic(p, error)` for required nodes
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:12.048Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_service/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:12.048Z
Learning: Applies to crates/biome_service/src/workspace/watcher.tests.rs : Implement watcher tests for workspace methods in watcher.tests.rs and end-to-end tests in LSP tests
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : A parser struct must implement the `Parser` trait and save the token source, parser context, and optional parser options
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
📚 Learning: 2025-11-24T18:06:03.545Z
Learnt from: CR
Repo: biomejs/biome PR: 0
File: crates/biome_parser/CONTRIBUTING.md:0-0
Timestamp: 2025-11-24T18:06:03.545Z
Learning: Applies to crates/biome_parser/**/src/**/*.rs : Implement error recovery in list parsing using `or_recover()` to wrap unparseable tokens in a `BOGUS_*` node and consume tokens until a recovery token is found
Applied to files:
crates/biome_tailwind_parser/src/lexer/mod.rs
🧬 Code graph analysis (1)
crates/biome_tailwind_parser/src/lexer/mod.rs (1)
crates/biome_tailwind_parser/src/lexer/base_name_store.rs (1)
is_delimiter(152-158)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Lint project (depot-ubuntu-24.04-arm-16)
- GitHub Check: Test (depot-ubuntu-24.04-arm-16)
- GitHub Check: Test (depot-windows-2022-16)
- GitHub Check: Check Dependencies
- GitHub Check: End-to-end tests
- GitHub Check: Lint project (depot-windows-2022)
- GitHub Check: Documentation
- GitHub Check: Test Node.js API
- GitHub Check: Bench (biome_tailwind_parser)
- GitHub Check: autofix
🔇 Additional comments (2)
crates/biome_tailwind_parser/src/lexer/base_name_store.rs (1)
152-152: LGTM! Visibility change enables the fast-path optimisation.Exposing
is_delimiterat crate level allows the lexer to perform boundary checks during the ASCII fast-path without duplicating delimiter logic.crates/biome_tailwind_parser/src/lexer/mod.rs (1)
4-4: LGTM!Import addition enables the fast-path delimiter checks.

Summary
This PR makes it so we avoid touching the basename store if we can trivially check to see that a delimiter comes immediately after. This only works for ascii, but since tailwind is majority ascii characters, this works great.
On my machine, this results in anywhere from +2-18% throughput for uncached benches, and +2-36% for cached benches. No idea why codspeed isn't showing that.
Test Plan
Docs