feat: improve token calc #4

gulivan · 2025-09-26T11:35:46Z

Summary by CodeRabbit

Bug Fixes
- Improved token counting by building complete, service-aware tool representations (including parameters and sensible schema defaults) for more accurate estimates.
- Tool names normalized per service context to align with external tokenization expectations.
- Estimation errors now log a warning and gracefully fallback to 0 to avoid verification failures.
Chores
- Applied a temporary adjustment factor to token estimates pending further tuning.

coderabbitai · 2025-09-26T11:35:53Z

Walkthrough

Change updates internal token counting in src/core/mcpClient.ts: the private method now accepts serverName, builds a Claude-facing tool representation (adds $schema, moves inputSchema under parameters, applies object-schema defaults, prefixes tool name with mcp__{serverName}__{tool.name}), serializes and counts tokens (×3), and falls back to 0 with a warning on errors.

Changes

Cohort / File(s)	Summary
MCP tool token counting `src/core/mcpClient.ts`	Signature changed to `calculateToolTokens(tools, serverName)`. For each tool build a Claude-style wrapper (`claudeToolRepresentation`) with `$schema`, optional `description`, `name` prefixed `mcp__{serverName}__{tool.name}`, and move `inputSchema` into `parameters` applying object-schema defaults (`type`, `properties`, `required`, `additionalProperties`). Wrap representation in `<function>…</function>` for counting, multiply token result by 3 (temporary), and on errors emit a warning and use 0 tokens. Public call updated to pass `server.name`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant MCPClient
  participant Builder as Tool Representation Builder
  participant Tokenizer

  Caller->>MCPClient: request token counts for tools (passes server.name)
  MCPClient->>Builder: for each tool -> build claudeToolRepresentation
  note right of Builder #DDF2E9: add `$schema`,\nmove inputSchema -> parameters,\napply object defaults,\nprefix name with mcp__{serverName}__
  Builder-->>MCPClient: complete tool representation
  MCPClient->>Tokenizer: wrap in <function>…</function>, serialize and count tokens (×3 adjustment)
  alt success
    Tokenizer-->>MCPClient: token count
  else error
    Tokenizer-->>MCPClient: error (MCPClient warns, uses 0)
  end
  MCPClient-->>Caller: return per-tool counts and total

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my whiskers, schemas bright,
$schema tucked beneath moonlight.
Parameters march in tidy rows,
Names prefixed where the river flows.
I count the tokens—hop!—and write. 🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title “feat: improve token calc” follows a conventional commit style and concisely indicates that the pull request enhances the token calculation logic, which is the main change introduced in the PR. It directly relates to the modifications in calculateToolTokens and the integration of serverName‐aware token counts. Although somewhat generic, it still conveys the primary intent of improving token calculation.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/improve_token_calc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b39277 and 107a8d7.

📒 Files selected for processing (1)

src/core/mcpClient.ts (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Write clean, maintainable code following TypeScript best practices
Use TypeScript for type safety when applicable
Follow consistent naming conventions
Add comments for complex logic
Maintain consistency with existing code style
Add appropriate error handling
Never expose sensitive information in code or logs
Follow security best practices for TypeScript
Validate all inputs
Use environment variables for configuration
Document all public APIs, functions, and classes with clear descriptions

Files:

src/core/mcpClient.ts

src/core/mcpClient.ts

- Only apply object-specific defaults (properties, required, additionalProperties) when schema is actually an object type - Correctly handle primitive types (string, number, array) without forcing object structure - Support union types that include object - Conditionally include description field only when it exists - Prevents token count inflation for non-object schemas 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/core/mcpClient.ts (1)
83-87: Avoid pretty‑printed JSON to prevent token inflation

Pretty printing adds whitespace that inflates token counts. Minify the JSON for a closer estimate to what models actually tokenize, unless you know Claude receives pretty JSON.

Apply this diff:
-        const toolText = JSON.stringify(toolForCounting, null, 2);
+        const toolText = JSON.stringify(toolForCounting);
Please confirm whether Claude’s tool context is pretty‑printed; if so, keep the current formatting.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 107a8d7 and 6f85f8b.

📒 Files selected for processing (1)

src/core/mcpClient.ts (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Write clean, maintainable code following TypeScript best practices
Use TypeScript for type safety when applicable
Follow consistent naming conventions
Add comments for complex logic
Maintain consistency with existing code style
Add appropriate error handling
Never expose sensitive information in code or logs
Follow security best practices for TypeScript
Validate all inputs
Use environment variables for configuration
Document all public APIs, functions, and classes with clear descriptions

Files:

src/core/mcpClient.ts

🧬 Code graph analysis (1)

src/core/mcpClient.ts (2)

src/types/index.ts (1)

MCPTool (189-193)

tests/__mocks__/contextcalc.js (2)

countTokens (4-13)

countTokens (4-13)

🔇 Additional comments (3)

src/core/mcpClient.ts (3)

43-44: Docstring clarity LGTM

Clearer intent for token calculation in Claude context.

79-81: Selective description inclusion LGTM

Including description only when defined avoids token bloat.

53-76: Non‑object schemas still receive object defaults when schema is present; refine guard

If a tool provides a non‑object schema (e.g., boolean, string), rawSchema becomes undefined, schemaType stays undefined, and the code applies object defaults. This regresses token accuracy for legitimate non‑object schemas and repeats the earlier concern. Apply defaults only when schema is missing OR the provided schema is actually an object (including unions that contain "object").

Apply this diff:

-        const rawSchema =
-          tool.inputSchema && typeof tool.inputSchema === "object"
-            ? tool.inputSchema
-            : undefined;
-        const schemaType = rawSchema?.type;
-        const parameters = {
-          $schema: "http://json-schema.org/draft-07/schema#",
-          ...(rawSchema ?? {}),
-        };
-
-        const isObjectSchema =
-          schemaType === "object" ||
-          (Array.isArray(schemaType) && schemaType.includes("object")) ||
-          schemaType === undefined;
-
-        // Only apply object-specific defaults when the schema is actually an object type
-        if (isObjectSchema) {
+        const hasSchema = tool.inputSchema !== undefined;
+        const rawSchema =
+          hasSchema &&
+          typeof tool.inputSchema === "object" &&
+          tool.inputSchema !== null &&
+          !Array.isArray(tool.inputSchema)
+            ? tool.inputSchema
+            : undefined;
+        const schemaType = rawSchema?.type;
+        const parameters: Record<string, any> = {
+          $schema: "http://json-schema.org/draft-07/schema#",
+          ...(rawSchema ?? {}),
+        };
+
+        const isObjectType =
+          schemaType === "object" ||
+          (Array.isArray(schemaType) && schemaType.includes("object")) ||
+          (schemaType === undefined && rawSchema !== undefined);
+        const applyObjectDefaults = !hasSchema || isObjectType;
+
+        // Only apply defaults when schema is missing OR actually an object
+        if (applyObjectDefaults) {
           if (!("type" in parameters)) parameters.type = "object";
           if (!("properties" in parameters)) parameters.properties = {};
           if (!("required" in parameters)) parameters.required = [];
           if (!("additionalProperties" in parameters))
             parameters.additionalProperties = false;
         }

As per coding guidelines: prefer explicit typing for complex objects.

Remove pretty-printing from token calculation to match how Claude actually receives the JSON schema. Pretty-printed JSON with whitespace inflates token counts and doesn't reflect the real context usage. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Update MCP tool token calculation to better match Claude's actual context representation: - Add server prefixes to tool names (mcp__server__toolname format) - Include function wrapper tags (<function>...</function>) in calculation - Use Claude's actual JSON schema format for function definitions - Escape descriptions properly in the JSON representation Results in ~160 tokens per tool average (up from ~60), much closer to Claude's reported 396-468 range. While still not exact, this provides users with much more realistic estimates of MCP tool context usage. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f85f8b and c7c5137.

📒 Files selected for processing (1)

src/core/mcpClient.ts (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Write clean, maintainable code following TypeScript best practices
Use TypeScript for type safety when applicable
Follow consistent naming conventions
Add comments for complex logic
Maintain consistency with existing code style
Add appropriate error handling
Never expose sensitive information in code or logs
Follow security best practices for TypeScript
Validate all inputs
Use environment variables for configuration
Document all public APIs, functions, and classes with clear descriptions

Files:

src/core/mcpClient.ts

🧬 Code graph analysis (1)

src/core/mcpClient.ts (2)

src/types/index.ts (1)

MCPTool (189-193)

tests/__mocks__/contextcalc.js (2)

countTokens (4-13)

countTokens (4-13)

src/core/mcpClient.ts

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c7c5137 and e7d3ef9.

📒 Files selected for processing (1)

src/core/mcpClient.ts (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Write clean, maintainable code following TypeScript best practices
Use TypeScript for type safety when applicable
Follow consistent naming conventions
Add comments for complex logic
Maintain consistency with existing code style
Add appropriate error handling
Never expose sensitive information in code or logs
Follow security best practices for TypeScript
Validate all inputs
Use environment variables for configuration
Document all public APIs, functions, and classes with clear descriptions

Files:

src/core/mcpClient.ts

🧬 Code graph analysis (1)

src/core/mcpClient.ts (2)

src/types/index.ts (1)

MCPTool (189-193)

tests/__mocks__/contextcalc.js (2)

countTokens (4-13)

countTokens (4-13)

src/core/mcpClient.ts

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7d3ef9 and b4a63e7.

📒 Files selected for processing (1)

src/core/mcpClient.ts (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Write clean, maintainable code following TypeScript best practices
Use TypeScript for type safety when applicable
Follow consistent naming conventions
Add comments for complex logic
Maintain consistency with existing code style
Add appropriate error handling
Never expose sensitive information in code or logs
Follow security best practices for TypeScript
Validate all inputs
Use environment variables for configuration
Document all public APIs, functions, and classes with clear descriptions

Files:

src/core/mcpClient.ts

🧬 Code graph analysis (1)

src/core/mcpClient.ts (2)

src/types/index.ts (1)

MCPTool (189-193)

tests/__mocks__/contextcalc.js (2)

countTokens (4-13)

countTokens (4-13)

coderabbitai · 2025-09-26T17:18:36Z

src/core/mcpClient.ts

+        const rawSchema =
+          tool.inputSchema && typeof tool.inputSchema === "object"
+            ? tool.inputSchema
+            : undefined;
+        const schemaType = rawSchema?.type;
+        const parameters = {
+          $schema: "http://json-schema.org/draft-07/schema#",
+          ...(rawSchema ?? {}),
+        };
+
+        const hasObjectHints =
+          rawSchema !== undefined &&
+          ["properties", "required", "additionalProperties", "patternProperties"].some(
+            (key) => key in rawSchema
+          );
+
+        const isObjectSchema =
+          rawSchema === undefined ||
+          schemaType === "object" ||
+          (Array.isArray(schemaType) && schemaType.includes("object")) ||
+          hasObjectHints;
+
+        // Only apply object-specific defaults when the schema is actually an object type
+        if (isObjectSchema) {
+          if (!("type" in parameters)) parameters.type = "object";
+          if (!("properties" in parameters)) parameters.properties = {};
+          if (!("required" in parameters)) parameters.required = [];
+          if (!("additionalProperties" in parameters))
+            parameters.additionalProperties = false;
+        }
+
+        // Create the prefixed tool name as it appears in Claude's context
+        const prefixedToolName = `mcp__${serverName}__${tool.name}`;
+
        const toolForCounting = {
-          name: tool.name,
-          description: tool.description || '',
-          inputSchema: tool.inputSchema || {}
+          name: prefixedToolName,
+          ...(tool.description !== undefined ? { description: tool.description } : {}),
+          parameters,
        };


⚠️ Potential issue | 🟠 Major

Handle boolean schemas without forcing object defaults

If a tool advertises a boolean JSON Schema (true/false), we currently drop into the rawSchema === undefined branch and apply the object defaults (type: "object", properties: {}, etc.). That materially changes the schema Claude would see, bloats the serialized payload, and throws off the very token-count accuracy this PR is trying to improve.

Please preserve boolean schemas verbatim and only layer defaults when we truly have an object (or no schema at all). One way to do that:

- const rawSchema = - tool.inputSchema && typeof tool.inputSchema === "object" - ? tool.inputSchema - : undefined; - const schemaType = rawSchema?.type; - const parameters = { - $schema: "http://json-schema.org/draft-07/schema#", - ...(rawSchema ?? {}), - }; - - const hasObjectHints = - rawSchema !== undefined && - ["properties", "required", "additionalProperties", "patternProperties"].some( - (key) => key in rawSchema - ); - - const isObjectSchema = - rawSchema === undefined || - schemaType === "object" || - (Array.isArray(schemaType) && schemaType.includes("object")) || - hasObjectHints; - - // Only apply object-specific defaults when the schema is actually an object type - if (isObjectSchema) { - if (!("type" in parameters)) parameters.type = "object"; - if (!("properties" in parameters)) parameters.properties = {}; - if (!("required" in parameters)) parameters.required = []; - if (!("additionalProperties" in parameters)) - parameters.additionalProperties = false; - } + const inputSchema = tool.inputSchema; + let parameters: unknown; + + if (typeof inputSchema === "boolean") { + parameters = inputSchema; + } else { + const rawSchema = + inputSchema && typeof inputSchema === "object" + ? inputSchema + : undefined; + const schemaType = rawSchema?.type; + const parameterObject = { + $schema: "http://json-schema.org/draft-07/schema#", + ...(rawSchema ?? {}), + }; + + const hasObjectHints = + rawSchema !== undefined && + ["properties", "required", "additionalProperties", "patternProperties"].some( + (key) => key in rawSchema + ); + + const isObjectSchema = + rawSchema === undefined || + schemaType === "object" || + (Array.isArray(schemaType) && schemaType.includes("object")) || + hasObjectHints; + + if (isObjectSchema) { + if (!("type" in parameterObject)) parameterObject.type = "object"; + if (!("properties" in parameterObject)) parameterObject.properties = {}; + if (!("required" in parameterObject)) parameterObject.required = []; + if (!("additionalProperties" in parameterObject)) + parameterObject.additionalProperties = false; + } + + parameters = parameterObject; + }

This keeps no-arg tools (or intentionally unsatisfiable ones) accurate while still giving object schemas the safe defaults.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const rawSchema =

tool.inputSchema && typeof tool.inputSchema === "object"

? tool.inputSchema

: undefined;

const schemaType = rawSchema?.type;

const parameters = {

$schema: "http://json-schema.org/draft-07/schema#",

...(rawSchema ?? {}),

};

const hasObjectHints =

rawSchema !== undefined &&

["properties", "required", "additionalProperties", "patternProperties"].some(

(key) => key in rawSchema

);

const isObjectSchema =

rawSchema === undefined ||

schemaType === "object" ||

(Array.isArray(schemaType) && schemaType.includes("object")) ||

hasObjectHints;

// Only apply object-specific defaults when the schema is actually an object type

if (isObjectSchema) {

if (!("type" in parameters)) parameters.type = "object";

if (!("properties" in parameters)) parameters.properties = {};

if (!("required" in parameters)) parameters.required = [];

if (!("additionalProperties" in parameters))

parameters.additionalProperties = false;

}

// Create the prefixed tool name as it appears in Claude's context

const prefixedToolName = `mcp__${serverName}__${tool.name}`;

const toolForCounting = {

name: tool.name,

description: tool.description || '',

inputSchema: tool.inputSchema || {}

name: prefixedToolName,

...(tool.description !== undefined ? { description: tool.description } : {}),

parameters,

};

const inputSchema = tool.inputSchema;

let parameters: unknown;

if (typeof inputSchema === "boolean") {

parameters = inputSchema;

} else {

const rawSchema =

inputSchema && typeof inputSchema === "object"

? inputSchema

: undefined;

const schemaType = rawSchema?.type;

const parameterObject = {

$schema: "http://json-schema.org/draft-07/schema#",

...(rawSchema ?? {}),

};

const hasObjectHints =

rawSchema !== undefined &&

["properties", "required", "additionalProperties", "patternProperties"].some(

(key) => key in rawSchema

);

const isObjectSchema =

rawSchema === undefined ||

schemaType === "object" ||

(Array.isArray(schemaType) && schemaType.includes("object")) ||

hasObjectHints;

if (isObjectSchema) {

if (!("type" in parameterObject)) parameterObject.type = "object";

if (!("properties" in parameterObject)) parameterObject.properties = {};

if (!("required" in parameterObject)) parameterObject.required = [];

if (!("additionalProperties" in parameterObject))

parameterObject.additionalProperties = false;

}

parameters = parameterObject;

}

// Create the prefixed tool name as it appears in Claude's context

const prefixedToolName = `mcp__${serverName}__${tool.name}`;

const toolForCounting = {

name: prefixedToolName,

...(tool.description !== undefined ? { description: tool.description } : {}),

parameters,

};

🤖 Prompt for AI Agents

In src/core/mcpClient.ts around lines 53 to 91, the code treats a boolean JSON Schema (true/false) like undefined and applies object defaults; instead detect boolean schemas and preserve them verbatim: if typeof rawSchema === "boolean" set parameters to rawSchema (do not build an object with $schema or spread), and ensure subsequent checks (hasObjectHints and isObjectSchema) only treat rawSchema as an object when typeof rawSchema === "object" (so boolean schemas do not trigger object defaults); then only apply the object-specific default fields when isObjectSchema is true and parameters is an object.

github-actions · 2025-09-26T17:20:30Z

🎉 This PR is included in version 1.4.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

feat: improve token calc

107a8d7