Chat LLM

Chat LLM is a Swiss Army Knife for LLM-powered tasks. It's a simple, zero-dependency tool to chat with an LLM (large language model) that provides advanced features like agent orchestration, context management, memory systems, task automation, and enterprise-grade monitoring.

It works seamlessly with both cloud-based LLM services (e.g., OpenAI GPT, Groq, OpenRouter) and locally hosted LLMs (e.g. llama.cpp, LM Studio, Ollama).

Chat LLM is accessible via the terminal or through its minimalist web interface.

Current Version: 2.2.0 (Development)

Quick Start

To run Chat LLM, ensure that Node.js (v18 or higher) or Bun is installed.

./chat-llm.js

To obtain quick responses, pipe a question directly:

echo "Top travel destinations in Indonesia?" | ./chat-llm.js

For specific tasks:

echo "Translate 'thank you' into German" | ./chat-llm.js

Chat LLM also includes a minimalist front-end web interface. To launch it, specify the environment variable HTTP_PORT, for example:

HTTP_PORT=5000 ./chat-llm.js

Then, open a web browser and go to localhost:5000.

v2 Features - Swiss Army Knife for AI

Chat LLM v2 introduces powerful multi-purpose agent capabilities:

Agent Orchestration

Specialized agents for different tasks:

Researcher - Information gathering and synthesis
Coder - Programming and debugging
Writer - Content creation and editing
Analyst - Data analysis and insights
Tutor - Educational explanations
Solver - Problem-solving methodology
Support - Customer service

./chat-llm.js agent-list              # View all agents
./chat-llm.js agent-activate coder    # Activate agent
./chat-llm.js agent-stats             # View statistics

Context Management

Work with custom data and knowledge bases:

./chat-llm.js context-create research      # Create context
./chat-llm.js context-list                 # List contexts
./chat-llm.js context-activate research    # Activate context

Advanced Prompt Templates

Pre-built templates for common tasks with variable substitution and conditionals:

./chat-llm.js prompt-list      # View templates
./chat-llm.js prompt-render analysis  # Display template

Intelligent Memory

Persistent conversation memory with automatic summarization:

./chat-llm.js memory-list      # List conversations
./chat-llm.js memory-stats     # Memory usage

Task & Workflow Management

Queue tasks, manage workflows, and batch process:

./chat-llm.js task-list        # View tasks
./chat-llm.js task-stats       # Queue statistics

Analysis & Logging

Built-in sentiment analysis, request logging, and statistics:

./chat-llm.js sentiment "text"  # Analyze sentiment
./chat-llm.js stats             # Request statistics
./chat-llm.js export json       # Export logs

v2.2 Features - Monitoring & Integration (Development)

Chat LLM v2.2 introduces enterprise-grade monitoring, webhooks, and observability:

Metrics Export & Monitoring

Export system metrics in Prometheus or JSON format for monitoring dashboards:

./chat-llm.js metrics-summary              # Quick metrics overview
./chat-llm.js metrics-export prometheus    # Prometheus format
./chat-llm.js metrics-export json          # JSON format

Metrics include:

System uptime and performance
Request/response statistics
Cache hit/miss ratios
Agent usage patterns
Memory and resource utilization

Webhook Integration

Trigger HTTP webhooks for events and integrate with external systems:

./chat-llm.js webhook-list                        # List webhooks
./chat-llm.js webhook-register event url          # Register webhook
./chat-llm.js webhook-stats                       # Delivery statistics

Features:

Event-based webhooks with pattern matching
Retry logic with exponential backoff
HMAC signatures for security
Delivery tracking and logs

Enhanced Testing

Comprehensive test suites for quality assurance:

node tests/e2e-tests.js         # End-to-end tests
node tests/test-suite.js        # Unit tests

For detailed v2 features and examples, see QUICK_START.md and DEVELOPMENT.md.

For v2.2 development roadmap, see V2_2_DEVELOPMENT_PLAN.md.

Response Caching & Configuration

Chat LLM automatically caches responses for 24 hours to avoid repeated calls to your LLM API. Cached results live in ./cache and are reused instantly in the terminal and web UI. This keeps latency low and saves API credits when you revisit the same prompt during a debugging or evaluation session.

./chat-llm.js cache-stats         # Inspect memory/disk cache usage
./chat-llm.js cache-clear         # Purge all cached responses
./chat-llm.js config-get caching.enabled
./chat-llm.js config-set caching.enabled false   # Disable caching
./chat-llm.js config-set caching.enabled true    # Re-enable caching

When caching is disabled via the config command, Chat LLM immediately falls back to live responses without touching the cache. Re-enabling restores the 24-hour TTL without restarting the app.

Agent, Context, and Memory Orchestration

Activating an agent (./chat-llm.js agent-activate coder) now feeds that persona’s system prompt straight into every chat request, so the CLI/web UI instantly adopts the tone and capabilities of the selected specialist. The active context (context-create, context-activate) is summarized and injected as another system message, giving the LLM a compact view of your tagged data and uploaded documents before it answers.

./chat-llm.js agent-list
./chat-llm.js agent-activate researcher
./chat-llm.js context-create customer-success
./chat-llm.js context-activate customer-success
./chat-llm.js memory-list
./chat-llm.js memory-stats

Terminal and web sessions persist into the new Memory Manager (./memory/), so memory-list can replay full transcripts even across restarts. Each cache hit is streamed via the existing delegate, logged with metadata (agent, context, model), and recorded in memory so you can audit automated workflows.

Prompt Templates

The Prompt Manager ships with battle-tested templates for analysis, coding, research, and more. You can now render them directly from the CLI with inline variables:

./chat-llm.js prompt-run analysis data="Q4 sales dipped 14%" focus="root-cause, mitigation"

Combine this with prompt-list and prompt-render to inspect or extend the templates before handing the generated instructions to the chat runtime.

Multi-language Support

Chat LLM is capable of conversing in multiple languages beyond English. It consistently responds in the same language as the question posed. Additionally, it supports seamless language switching between queries, as illustrated in the following example:

>> Which planet in our solar system is the largest?
Jupiter is the largest planet in our solar system.

>> ¿Y el más caliente?
Venus es el planeta más caliente, con hasta 475 grados Celsius.

The continuous integration workflows for Chat LLM include evaluation tests in English, Spanish, German, French, Italian, and Indonesian. All tests are conducted with LLMs that have at least a 128K context window length.

Using Local LLM Servers

Supported local LLM servers include llama.cpp, Jan, Ollama, LocalAI, LM Studio, and Msty.

To utilize llama.cpp locally with its inference engine, load a quantized model like Llama-3.2 3B or Gemma-3 4B. Then set the LLM_API_BASE_URL environment variable:

/path/to/llama-server -m Llama-3.2-3B-Instruct-Q4_K_M.gguf
export LLM_API_BASE_URL=http://127.0.0.1:8080/v1

To use Jan with its local API server, refer to its documentation. Load a model like Llama-3.2 3B or Gemma-3 4B, and set the following environment variables:

export LLM_API_BASE_URL=http://127.0.0.1:1337/v1
export LLM_CHAT_MODEL='llama3-8b-instruct'

To use Ollama locally, load a model and configure the environment variable LLM_API_BASE_URL:

ollama pull llama3.2
export LLM_API_BASE_URL=http://127.0.0.1:11434/v1
export LLM_CHAT_MODEL='llama3.2'

For LocalAI, initiate its container and adjust the environment variable LLM_API_BASE_URL:

docker run -ti -p 8080:8080 localai/localai llama-3.2-3b-instruct:q4_k_m
export LLM_API_BASE_URL=http://localhost:3928/v1

For LM Studio, pick a model (e.g., Llama-3.2 3B). Next, go to the Developer tab, select the model to load, and click the Start Server button. Then, set the LLM_API_BASE_URL environment variable, noting that the server by default runs on port 1234:

export LLM_API_BASE_URL=http://127.0.0.1:1234/v1

For Msty, choose a model (e.g., Llama-3.2 3B) and ensure the local AI is running. Go to the Settings menu, under Local AI, and note the Service Endpoint (which defaults to port 10002). Then set the LLM_API_BASE_URL environment variable accordingly:

export LLM_API_BASE_URL=http://127.0.0.1:10002/v1

Using Managed LLM Services

Supported LLM services include Cerebras, Deep Infra, DeepSeek, Fireworks, Groq, Hyperbolic, Mistral, Nebius, Novita, OpenAI, OpenRouter, and Together.

For configuration specifics, refer to the relevant section. The quality of answers can vary based on the model's performance.

Cerebras

export LLM_API_BASE_URL=https://api.cerebras.ai/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="llama3.1-8b"

Deep Infra

export LLM_API_BASE_URL=https://api.deepinfra.com/v1/openai
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/Meta-Llama-3.1-8B-Instruct"

DeepSeek

export LLM_API_BASE_URL=https://api.deepseek.com/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="deepseek-chat"

Fireworks

export LLM_API_BASE_URL=https://api.fireworks.ai/inference/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="accounts/fireworks/models/qwen3-30b-a3b"

Glama

export LLM_API_BASE_URL=https://glama.ai/api/gateway/openai/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL='ministral-3b-2410'

Groq

export LLM_API_BASE_URL=https://api.groq.com/openai/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="llama-3.1-8b-instant"

Hyperbolic

export LLM_API_BASE_URL=https://api.hyperbolic.xyz/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/Meta-Llama-3.1-8B-Instruct"

Mistral

export LLM_API_BASE_URL=https://api.mistral.ai/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="mistral-small-latest"

Nebius

export LLM_API_BASE_URL=https://api.studio.nebius.ai/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/Meta-Llama-3.1-8B-Instruct"

Novita

export LLM_API_BASE_URL=https://api.novita.ai/v3/openai
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/llama-3.1-8b-instruct"

OpenAI

export LLM_API_BASE_URL=https://api.openai.com/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="gpt-5-nano"

OpenRouter

export LLM_API_BASE_URL=https://openrouter.ai/api/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/llama-3.1-8b-instruct"

Together

export LLM_API_BASE_URL=https://api.together.xyz/v1
export LLM_API_KEY="yourownapikey"
export LLM_CHAT_MODEL="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"

Evaluating Questions

If there is a text file containing pairs of User and Assistant messages, it can be evaluated with Chat LLM:

User: Which planet is the largest?
Assistant: The largest planet is /Jupiter/.

User: and the smallest?
Assistant: The smallest planet is /Mercury/.

Assuming the above content is in qa.txt, executing the following command will initiate a multi-turn conversation with the LLM, asking questions sequentially and verifying answers using regular expressions:

./chat-llm.js qa.txt

For additional examples, please refer to the tests/ subdirectory.

v2 Advanced Features

Configuration Management

Chat LLM v2 includes a comprehensive configuration system for managing application settings and profiles:

# Get configuration values
./chat-llm.js config-get models.temperature
./chat-llm.js config-get caching.enabled

# Set configuration values
./chat-llm.js config-set models.temperature 0.8
./chat-llm.js config-set caching.ttl 3600000

# List available profiles
./chat-llm.js config-list

Default configuration includes:

Model settings (temperature, max tokens)
Caching behavior (TTL: 24 hours)
Logging configuration
API timeout and retry settings

Response Caching

Intelligent response caching reduces API calls and improves performance:

# View cache statistics
./chat-llm.js cache-stats

# Clear cache
./chat-llm.js cache-clear

The cache automatically stores responses with a 24-hour TTL and can be enabled/disabled via configuration.

Request Logging & Analytics

All requests are logged automatically for monitoring and debugging:

# View statistics
./chat-llm.js stats

# Export logs
./chat-llm.js export json > logs.json
./chat-llm.js export csv > logs.csv

Logs include:

Request timestamps
Operation types
Response times
Request/response content (truncated)

Sentiment Analysis

Built-in sentiment analysis for understanding user input and conversation tone:

./chat-llm.js sentiment "This is amazing!"

Returns sentiment classification (positive, negative, neutral) with scores.

Demo Mode

Test the UI and features without API credentials:

LLM_DEMO_MODE=1 HTTP_PORT=5000 ./chat-llm.js

Demo mode simulates intelligent responses for testing and development.

Architecture

Chat LLM v2 is built with the following components:

Core: Zero-dependency chat interface
Cache: Automatic response caching (memory + disk)
Config: Settings and profile management
Logger: Request tracking and analytics
Monitor: Performance metrics collection
Tools: Sentiment analysis and utilities
Agents: Multi-purpose agent orchestration
Context: Custom data and knowledge management
Prompts: Advanced template system
Memory: Conversation history and persistence
Tasks: Workflow and batch processing

All components are optional and can be disabled via configuration for minimal resource usage.

Documentation

Getting Started

README.md - Overview and quick start guide (this file)
QUICK_START.md - Quick reference for v2 features
DEVELOPMENT.md - Development guide and architecture details

API & Usage

API_REFERENCE.md - Complete API documentation for all modules
EXAMPLES.md - Practical examples and real-world use cases

Planning & Roadmap

ROADMAP.md - Current development roadmap and enhancements
FUTURE_FEATURES.md - Future feature proposals and ideas
RELEASE_NOTES_V2.md - Version 2 release notes

Testing

tests/ - Evaluation tests in multiple languages

Code Quality & Best Practices

Chat LLM v2 includes comprehensive improvements:

Input Validation

All public APIs validate inputs with descriptive error messages:

// Example: Type checking and validation
if (typeof text !== 'string' || text.trim().length === 0) {
  throw new TypeError('Text must be a non-empty string');
}

Memory Management

Automatic limits prevent memory overflow:

Request Logger: 10,000 in-memory logs
Response Cache: 1,000 memory entries
Performance Monitor: Configurable limit (default: 10,000)

Error Handling

All modules include comprehensive error handling:

try {
  const result = await operation();
  logger.logRequest('operation', input, result, duration);
} catch (error) {
  logger.logRequest('operation', input, error.message, duration, { error: true });
  throw error;
}

Performance Monitoring

Built-in metrics tracking:

P95/P99 latency percentiles
Cache hit rates
Memory usage monitoring
Operation statistics

Contributing

Contributions are welcome! Please:

Read the DEVELOPMENT.md guide
Review FUTURE_FEATURES.md for feature ideas
Check existing issues and PRs
Follow the code quality standards
Add tests for new features
Update documentation

Community & Support

Issues: Report bugs or request features on GitHub
Discussions: Share ideas and ask questions
Examples: Check EXAMPLES.md for common patterns
API Docs: See API_REFERENCE.md for detailed API info

License

See LICENSE file for details.

Acknowledgments

Chat LLM v2 is built with zero external dependencies, leveraging only Node.js built-in modules for maximum portability and minimal overhead.

Version: 2.0.0
Last Updated: December 8, 2025
Maintainer: yonikashi432

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
.github		.github
.vscode		.vscode
config		config
docs		docs
examples		examples
tests		tests
tools		tools
.editorconfig		.editorconfig
.gitignore		.gitignore
00_START_HERE_MERGE_STRATEGY.md		00_START_HERE_MERGE_STRATEGY.md
API.md		API.md
API_REFERENCE.md		API_REFERENCE.md
CHANGELOG.md		CHANGELOG.md
CODE_FUNCTIONS_REFERENCE.md		CODE_FUNCTIONS_REFERENCE.md
COMMIT_PACKAGE_V2.md		COMMIT_PACKAGE_V2.md
COMMIT_SUMMARY_2025_12_08.md		COMMIT_SUMMARY_2025_12_08.md
COMPLETE_DELIVERY.md		COMPLETE_DELIVERY.md
COMPREHENSIVE_V2_TEST_REPORT.md		COMPREHENSIVE_V2_TEST_REPORT.md
CONTRIBUTING.md		CONTRIBUTING.md
DELIVERY_SUMMARY.md		DELIVERY_SUMMARY.md
DEVELOPMENT.md		DEVELOPMENT.md
DOCUMENTATION_COMPLETE.md		DOCUMENTATION_COMPLETE.md
DOCUMENTATION_DELIVERY.md		DOCUMENTATION_DELIVERY.md
DOCUMENTATION_MASTER_INDEX.md		DOCUMENTATION_MASTER_INDEX.md
EXAMPLES.md		EXAMPLES.md
EXTENSIVE_TEST_REPORT.md		EXTENSIVE_TEST_REPORT.md
FEATURES.md		FEATURES.md
FEATURE_DEVELOPMENT_GUIDE.md		FEATURE_DEVELOPMENT_GUIDE.md
FINAL_COMMIT_READY.md		FINAL_COMMIT_READY.md
FINAL_DELIVERY_SUMMARY.md		FINAL_DELIVERY_SUMMARY.md
FUTURE_FEATURES.md		FUTURE_FEATURES.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
INDEX_MERGE_STRATEGY.md		INDEX_MERGE_STRATEGY.md
INTEGRATION_TEST_RESULTS.md		INTEGRATION_TEST_RESULTS.md
LICENSE		LICENSE
MANIFEST_MERGE_PACKAGE.md		MANIFEST_MERGE_PACKAGE.md
MERGE_COMPLETION_REPORT.md		MERGE_COMPLETION_REPORT.md
MERGE_EXECUTION_GUIDE.md		MERGE_EXECUTION_GUIDE.md
MERGE_INSTRUCTIONS.md		MERGE_INSTRUCTIONS.md
PRE_MERGE_TEST_REPORT.md		PRE_MERGE_TEST_REPORT.md
QUICK_REFERENCE.md		QUICK_REFERENCE.md
QUICK_START.md		QUICK_START.md
README.md		README.md
README_MERGE_v2.1.1.md		README_MERGE_v2.1.1.md
RELEASE_NOTES_V2.md		RELEASE_NOTES_V2.md
ROADMAP.md		ROADMAP.md
SEMVER_MERGE_STRATEGY.md		SEMVER_MERGE_STRATEGY.md
SESSION_COMPLETION_REPORT.md		SESSION_COMPLETION_REPORT.md
TEST_REVIEW_RESULTS.md		TEST_REVIEW_RESULTS.md
V2_1_DEVELOPMENT_COMPLETE.md		V2_1_DEVELOPMENT_COMPLETE.md
V2_1_RELEASE_SUMMARY.md		V2_1_RELEASE_SUMMARY.md
V2_2_DEVELOPMENT_PLAN.md		V2_2_DEVELOPMENT_PLAN.md
V2_2_FEATURES.md		V2_2_FEATURES.md
V2_2_PROGRESS_UPDATE.md		V2_2_PROGRESS_UPDATE.md
V2_CODE_ENHANCEMENTS.md		V2_CODE_ENHANCEMENTS.md
V2_CODE_IMPROVEMENTS.md		V2_CODE_IMPROVEMENTS.md
V2_COMPLETE_STATUS_REPORT.md		V2_COMPLETE_STATUS_REPORT.md
V2_COMPLETE_SUMMARY.md		V2_COMPLETE_SUMMARY.md
V2_COMPLETION_SUMMARY.md		V2_COMPLETION_SUMMARY.md
V2_DEVELOPMENT_ROADMAP.md		V2_DEVELOPMENT_ROADMAP.md
V2_ENHANCEMENT_SUMMARY.md		V2_ENHANCEMENT_SUMMARY.md
V2_FINAL_INTEGRATION.md		V2_FINAL_INTEGRATION.md
V2_FINAL_TEST_SUMMARY.md		V2_FINAL_TEST_SUMMARY.md
V2_IMPROVEMENTS_AND_ENHANCEMENTS.md		V2_IMPROVEMENTS_AND_ENHANCEMENTS.md
V2_MERGE_APPROVAL.md		V2_MERGE_APPROVAL.md
V2_QUICK_REFERENCE.md		V2_QUICK_REFERENCE.md
V2_REVIEW_COMPLETE.md		V2_REVIEW_COMPLETE.md
V2_SUMMARY.md		V2_SUMMARY.md
VERSION_BUMP_v2.1.1.md		VERSION_BUMP_v2.1.1.md
VISUAL_GUIDE_MERGE.md		VISUAL_GUIDE_MERGE.md
WORK_COMPLETE.md		WORK_COMPLETE.md
chat-llm.js		chat-llm.js
chat-llm.png		chat-llm.png
index.html		index.html
merge-v2-semver.sh		merge-v2-semver.sh
test-modules.js		test-modules.js

License

JK-Templates/chat-llm

Folders and files

Latest commit

History

Repository files navigation

Chat LLM

Quick Start

v2 Features - Swiss Army Knife for AI

Agent Orchestration

Context Management

Advanced Prompt Templates

Intelligent Memory

Task & Workflow Management

Analysis & Logging

v2.2 Features - Monitoring & Integration (Development)

Metrics Export & Monitoring

Webhook Integration

Enhanced Testing

Response Caching & Configuration

Agent, Context, and Memory Orchestration

Prompt Templates

Multi-language Support

Using Local LLM Servers

Using Managed LLM Services

Evaluating Questions

v2 Advanced Features

Configuration Management

Response Caching

Request Logging & Analytics

Sentiment Analysis

Demo Mode

Architecture

Documentation

Getting Started

API & Usage

Planning & Roadmap

Testing

Code Quality & Best Practices

Input Validation

Memory Management

Error Handling

Performance Monitoring

Contributing

Community & Support

License

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages