A Tree-sitter-based code parser that extracts structural information from source files across multiple programming languages.
- Java (.java)
- JavaScript (.js)
- TypeScript (.ts)
- Python (.py)
- C# (.cs)
- PHP (.php)
For each supported language, CodeFrame extracts:
-
Type Information
- Class/Interface declarations
- Base classes (extends)
- Implemented interfaces
-
Method Information
- Method/Function names
- Parameters
- Local variables
- Method calls with object context
./gradlew buildCodeFrame requires two arguments: <input-path> and <output-file>.
# Gradle
./gradlew run --args="<input-path> <output-file>"
# Direct JAR
java -jar codeframe.jar <input-path> <output-file>Examples:
# Analyze a single file, write to codeframe-out/analysis.jsonl
./gradlew run --args="src/main/java/org/example/MyClass.java codeframe-out/analysis.jsonl"
# Analyze an entire directory
./gradlew run --args="src/main/java codeframe-out/analysis.jsonl"
# Analyze the entire project
./gradlew run --args=". codeframe-out/analysis.jsonl"
# Run directly via java
java -jar codeframe.jar src codeframe-out/analysis.jsonlUse separate folders in the container:
/workspace: the CodeFrame project (bind-mounted to your repo)/src: the codebase to analyze (mounted read-only)- Results are written under
/workspace/.out(persisted on your host via the/workspacebind mount;.out/is gitignored)
docker build -t codeframe-dev .- Windows (PowerShell):
docker run --rm -it `
-v "$PWD:/workspace" `
-v "C:\data\repos\my-project\src:/src:ro" `
-w /workspace `
codeframe-dev- Linux/macOS:
docker run --rm -it \
-v "$PWD:/workspace" \
-v "/absolute/path/to/your/repo:/src:ro" \
-w /workspace \
codeframe-devOptional debug port:
docker run --rm -it -p 5005:5005 \
-e "JAVA_TOOL_OPTIONS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005" \
-v "$PWD:/workspace" \
-v "/absolute/path/to/your/repo:/src:ro" \
-w /workspace \
codeframe-dev./gradlew clean run --args="/src /workspace/.out/analysis.jsonl"The analysis results are written to the path you pass as the second argument (e.g., /workspace/.out/analysis.jsonl) in JSONL format (JSON Lines - one JSON object per line). Parent directories for the output file are created automatically, and .out/ is gitignored by default.
- Location: project root
.ignore(included in releases). - Default contents:
**node_modules** **.git** **.Designer.cs** **.Designer.vb** - Syntax:
- Blank lines and lines starting with
#are ignored. - Globs supported:
*(within a segment),**(across segments). - Paths are matched against normalized project paths relative to the input root.
- Blank lines and lines starting with
- Examples:
**node_modules**→ ignore anything under any node_modules folder.**.Designer.cs→ ignore files ending with.Designer.csanywhere.src/generated/**→ ignore everything undersrc/generated/.
How it works:
- CodeFrame loads
.ignoreat startup usingdx-ignoreand filters files before analysis. - If
.ignoreis missing, no files are excluded by ignore rules.
- Memory efficient: Constant memory usage regardless of codebase size
- Streamable: Process results line-by-line without loading entire file
- Resumable: Can stop/restart analysis without losing progress
- Parallel-friendly: Multiple threads can write safely
Each line is a separate JSON object with a kind field:
Line 1 - Run metadata:
{"kind":"run","started_at":"2025-09-30T11:00:00Z","input_path":"src","total_files":1000}Lines 2-N - File analyses:
{"filePath":"src/Example.java","language":"java","packageName":"com.example","types":[{"kind":"class","name":"Example","visibility":"public","modifiers":["public"],"annotations":["@Component"],"extendsType":"BaseClass","implementsInterfaces":["Interface1"]}],"fields":[{"name":"service","type":"MyService","visibility":"private","modifiers":["private","final"],"annotations":["@Autowired"]}],"methods":[{"name":"processData","returnType":"Result","visibility":"public","modifiers":["public"],"annotations":["@Override"],"parameters":[{"name":"input","type":"String"}],"localVariables":["result"],"methodCalls":[{"methodName":"validate","objectType":"String","objectName":"input","callCount":1}]}],"imports":["import com.example.MyService;"]}Error records (if any):
{"kind":"error","file":"src/Bad.java","language":"java","error":"Parse error"}Last line - Completion metadata:
{"kind":"done","ended_at":"2025-09-30T11:00:05Z","files_analyzed":998,"files_with_errors":2,"duration_seconds":5}Language- Enum defining supported languagesLanguageDetector- Detects language from file extensionLanguageAnalyzer- Interface for language-specific analyzersFileAnalysis- Model containing analysis results
Each language has a dedicated analyzer:
JavaAnalyzer- Parses Java classes, interfaces, methodsTypeScriptAnalyzer- Parses TypeScript classes, interfaces, functionsJavaScriptAnalyzer- Parses JavaScript classes and functionsPythonAnalyzer- Parses Python classes and functionsCSharpAnalyzer- Parses C# classes, interfaces, methodsPHPAnalyzer- Parses PHP classes, interfaces, functions
The project uses Tree-sitter grammar libraries:
tree-sitter-javatree-sitter-javascripttree-sitter-typescripttree-sitter-pythontree-sitter-c-sharptree-sitter-php
- Incremental, robust parsing: Tree-sitter provides concrete syntax trees with stable node types across languages, suitable for structural extraction (
types,methods,fields,calls). - Multi-language, consistent API: A single parsing approach across Java, JS/TS, Python, C#, PHP simplifies analyzer design and maintenance.
- Performance and memory: Fast parsing with small memory footprint; aligns with our streaming JSONL output to keep RAM low for large repos.
- Runtime constraints: In constrained runners/containers, we need deterministic, offline-friendly tooling. Tree-sitter grammars are shipped as Maven artifacts, avoiding runtime downloads or external CLIs.
- Bundled native libraries: The
io.github.bonede:tree-sitterartifacts include native binaries for Windows/Linux/macOS. This removes the need for a local C toolchain or building native libs during CI/runtime. - Cross-OS compatibility: Works the same on developer machines, Docker (Linux), and Windows hosts—critical for heterogeneous environments.
- Runtime constraints: In sandboxed environments we cannot install system packages or compile natives. Bonede’s prebuilt natives make the analyzer portable and ready-to-run without extra steps.
- Add the Tree-sitter grammar dependency to
build.gradle - Add the language to the
Languageenum - Update
LanguageDetectorwith file extension mapping - Create a new analyzer implementing
LanguageAnalyzer - Register the language and analyzer in
App.java
// 1. Add to Language enum
GO("go")
// 2. Update LanguageDetector
if (fileName.endsWith(".go")) {
return Optional.of(Language.GO);
}
// 3. Create GoAnalyzer.java
public class GoAnalyzer implements LanguageAnalyzer {
@Override
public FileAnalysis analyze(String filePath, String sourceCode, TSNode rootNode) {
// Implementation
}
}
// 4. Register in App.java
TREE_SITTER_LANGUAGES.put(Language.GO, new TreeSitterGo());
ANALYZERS.put(Language.GO, new GoAnalyzer());- Java 11+
- Gradle 8.x
- No native toolchain required (Tree-sitter natives are bundled via Maven artifacts)
This project uses Tree-sitter and its language grammars, which are licensed under MIT.
- Top-level fields/constants (for langauges that support them, e.g., JavaScript, TypeScript, Python, PHP) are not emitted as entries in the analysis output. The analyzer focuses on types (classes/interfaces/enums/records where applicable) and functions/methods.
- Destructured parameter extraction is leaf-only. For a signature like
fn({ data: { user, settings }, meta: { timestamp } }), parameters emitted areuser,settings,timestamp(notdata,meta). - Generator functions are marked using syntax-like modifiers:
- Top-level functions:
"function*"(e.g.,export function* name()) - Class methods:
"*"(e.g.,*methodName())
- Top-level functions:
- Dynamic import expressions
import("path")are not modeled as method calls and are currently ignored inmethodCalls.
-
Called constructors and fields are not captured
- Current call extraction focuses on method invocations and property accessors. Constructor calls (e.g.,
new Type(...)andbase(...)/this(...)) and direct field reads/writes are not emitted inmethodCalls.
- Current call extraction focuses on method invocations and property accessors. Constructor calls (e.g.,
-
Loop local variables are not captured
- Variables declared in loop headers (e.g.,
for (var i = 0; ...),foreach (var x in ...)) are not added tolocalVariables. - See
src/test/resources/samples/csharp/LoopLocalsSample.csfor examples.
- Variables declared in loop headers (e.g.,
-
Events are not handled
- Event declarations/subscriptions/raises are not modeled.
- See
src/test/resources/samples/csharp/DelegatesEventsLambdasSample.cs
-
Constructor calls are not captured
- Constructor invocations (e.g.,
new ClassName(...)) are not emitted inmethodCalls. - See
src/test/resources/samples/java/MultipleClasses.javafor an example (new ExtraClass()).
- Constructor invocations (e.g.,
-
Loop header locals are not captured
- Variables declared in loop headers (e.g.,
for (int i = 0; ...)) are not added tolocalVariables. - See
src/test/resources/samples/java/MultipleClasses.javafor an example (for (int i = 0; i < times; i++)).
- Variables declared in loop headers (e.g.,
-
Local and anonymous classes are not extracted as separate types
- Bodies are analyzed within the enclosing method or type, and their method calls are recorded.
- The classes themselves do not appear as distinct
typesentries. - See
src/test/resources/samples/java/AnonymousInnerClassesSample.java.
-
ApprovalTests-based strategy
- We use ApprovalTests-Java to snapshot analysis results. Each test verifies the pretty-printed JSON using an approved artifact.
- When output changes, a
.received.txtis generated next to the test class; review and promote it to.approved.txtif correct.
-
Running tests
- All tests:
./gradlew test - Single test method, e.g. Java generics:
./gradlew test --tests "*JavaAnalyzeApprovalTest.analyze_Java_GenericsSample"
- All tests:
-
Workflow
- Make a change → run tests → inspect
.received.txt→ approve if expected → commit both code and updated.approved.txt.
- Make a change → run tests → inspect