This project is based on tbphp/gpt-load with added Model Aggregation & Intelligent Routing functionality for aggregate groups.
- Repository: https://github.com/tbphp/gpt-load
- Description: High-performance, enterprise-grade AI API transparent proxy service
- Core Features: Intelligent key management, load balancing, distributed deployment, request monitoring, etc.
For detailed deployment and usage instructions, please refer to the original project documentation: https://www.gpt-load.com/docs
- Automatically select sub-groups based on the requested model parameter
- Auto-aggregate model lists from sub-groups
- Support weighted load balancing
- Intelligent /v1/models endpoint interception
- One-click export of group configurations (including group info, model lists, keys, sub-groups)
- Quick import of group configurations for migration and backup
- JSON format support for version control and sharing
- Standard groups: Auto-fetch model lists from upstream /v1/models API
- Aggregate groups: Auto-aggregate model lists from all sub-groups
- Support both manual configuration and automatic refresh
Automatically route to sub-groups that support the requested model:
` Request: {" model: \gpt-4, ...} Route to sub-group supporting gpt-4
Request: {\model: \claude-3-opus, ...} Route to sub-group supporting Claude `
When multiple sub-groups support the same model, distribute requests based on sub-group weights.
When accessing an aggregate group''s /v1/models, directly return the aggregated model list without forwarding to upstream.
# 1. Clone the repository
git clone https://github.com/alhza/GPT-Load.git
cd GPT-Load
# 2. Configure environment variables
cp .env.example .env
# Edit .env file and set necessary configurations
# 3. Build and start
docker-compose up -d
# Or use Makefile
make docker-compose-up# 1. Build image
docker build -t gpt-load:1.3.0 .
# Or use Makefile
make docker-build
# 2. Run container
docker run -d \
-p 3001:3001 \
-v $(pwd)/data:/app/data \
--env-file .env \
--name gpt-load \
gpt-load:1.3.0
# Or use Makefile
make docker-runmake docker-build # Build Docker image
make docker-build-no-cache # Build without cache
make docker-run # Run container
make docker-stop # Stop container
make docker-push # Push to registry
make docker-compose-up # Start docker-compose
make docker-compose-down # Stop docker-composeFor detailed build instructions, please refer to the original project documentation:
- Build from Source: https://www.gpt-load.com/docs/build
- Configuration: https://www.gpt-load.com/docs/configuration
Aggregate group intelligent routing allows you to create an aggregate group containing multiple sub-groups. The system automatically selects the sub-group that supports the requested model based on the model parameter, enabling intelligent routing and load balancing.
- Automatic Model Aggregation: Auto-aggregate all supported model lists from sub-groups
- Intelligent Routing: Route to sub-groups based on the requested model parameter
- Multi-Channel Aggregation: Support cross-channel sub-groups (OpenAI/Gemini/Anthropic, etc.), automatically convert request paths based on sub-group channel type
- Weighted Load Balancing: Distribute requests by weight among sub-groups supporting the same model
- Model List Management: Support auto-fetching from upstream API or manual configuration
- Transparent Proxy: /v1/models endpoint returns aggregated model list
Create aggregate group �i-mix with multiple sub-groups:
yaml Aggregate Group: ai-mix Sub-group A (weight: 50) Supported models: gpt-4, gpt-3.5-turbo, gpt-4-turbo Sub-group B (weight: 30) Supported models: claude-3-opus, claude-3-sonnet Sub-group C (weight: 20) Supported models: gemini-pro, gemini-pro-vision
Intelligent Routing Effect:
`�ash
curl -X POST http://localhost:3001/proxy/ai-mix/v1/chat/completions \ -H \Authorization: Bearer your-proxy-key\ \ -d ''{\model: \gpt-4, \messages: [...]}''
curl -X POST http://localhost:3001/proxy/ai-mix/v1/chat/completions \ -H \Authorization: Bearer your-proxy-key\ \ -d ''{\model: \claude-3-opus, \messages: [...]}''
curl -X POST http://localhost:3001/proxy/ai-mix/v1/chat/completions \ -H \Authorization: Bearer your-proxy-key\ \ -d ''{\model: \gemini-pro, \messages: [...]}'' `
Create aggregate group openai-cluster with multiple OpenAI instances:
yaml Aggregate Group: openai-cluster Instance A (weight: 60) - us-east Instance B (weight: 30) - eu-west Instance C (weight: 10) - ap-south
All instances support the same models. The system distributes requests by weight.
Method 1: Auto-fetch (Recommended)
- Navigate to sub-group details page
- Click \Refresh Models\ button
- System auto-fetches model list from upstream /v1/models API
Method 2: Manual Configuration
Use API to manually set model list:
�ash curl -X PUT http://localhost:3001/api/groups/{groupId}/models \\ -H \Authorization: Bearer your-auth-key\ \\ -H \Content-Type: application/json\ \\ -d ''{\models\: [\gpt-4\, \gpt-3.5-turbo\]}''
- Create a new group in Web UI
- Select group type as \aggregate\
- Add sub-groups and set weights
- Navigate to aggregate group details page
- Click \Refresh Models\ button
- System auto-aggregates model lists from all sub-groups
�ash GET /api/groups/{groupId}/models
Response example:
json { \models\: [ \gpt-4\, \gpt-3.5-turbo\, \claude-3-opus\, \gemini-pro\ ] }
�ash POST /api/groups/{groupId}/models/refresh
- Standard groups: Fetch from upstream API
- Aggregate groups: Aggregate from all sub-groups
`�ash PUT /api/groups/{groupId}/models Content-Type: application/json
{ \models: [\gpt-4, \gpt-3.5-turbo] } `
When accessing an aggregate group''s /v1/models endpoint, the system returns the aggregated model list directly without forwarding to upstream:
�ash curl http://localhost:3001/proxy/ai-mix/v1/models
Returns:
json { \object\: \list\, \data\: [ { \id\: \gpt-4\, \object\: \model\, \created\: 1728700800, \owned_by\: \ai-mix\ }, { \id\: \claude-3-opus\, \object\: \model\, \created\: 1728700800, \owned_by\: \ai-mix\ } // ... more models ] }
- Extract Model Parameter: Extract model field from request body
- Filter Sub-groups: Find all sub-groups supporting the model
- Weighted Selection: Load balance by weight among filtered results
- Forward Request: Use selected sub-group configuration to forward request
- Ensure each sub-group has correct model list configured
- Model names must match exactly (case-sensitive)
- Returns 503 error if no sub-group supports the requested model
- Sub-groups should have valid API Keys, otherwise requests cannot be forwarded even if model is supported
Backend (8 files):
- internal/migrations/009_add_models_to_groups.go - Database migration
- internal/services/model_collector.go - Model collection service
- internal/handler/model_handler.go - Model management API handler
- internal/services/subgroup_manager.go - Intelligent routing logic (modified)
- internal/proxy/server.go - Proxy request handling (modified)
- internal/handler/handler.go - Handler registration (modified)
- internal/container/container.go - Dependency injection (modified)
- internal/router/router.go - Route registration (modified)
Frontend (5 files):
- web/src/api/keys.ts - API calls (modified)
- web/src/components/keys/GroupInfoCard.vue - UI component (modified)
- web/src/locales/zh-CN.ts - Chinese i18n (modified)
- web/src/locales/en-US.ts - English i18n (modified)
- web/src/locales/ja-JP.ts - Japanese i18n (modified)
`go // Fetch model list from upstream API func FetchModelsFromAPI(ctx context.Context, group *models.Group, apiKey string) ([]string, error)
// Aggregate models from sub-groups func AggregateModelsFromSubGroups(subGroups []*models.Group) []string `
`go // Select sub-group supporting specified model func SelectSubGroup(group *models.Group, requestedModel string) (string, error)
// Filter sub-groups supporting the model func filterByModel(requestedModel string) []int
// Weighted load balancing func selectByWeightFromCandidates(candidateIndices []int) *subGroupItem `
go // Intercept /v1/models requests if c.Request.Method == \GET\ && c.Param(\path\) == \/v1/models\ { if ps.handleModelsRequest(c, originalGroup) { return // Directly return aggregated model list } }
Issues and Pull Requests are welcome!
MIT License - See LICENSE file for details
Thanks to tbphp/gpt-load for providing the excellent foundation!