Skip to content

xzhp/GPT-Load

 
 

Repository files navigation

GPT-Load - Model Aggregation & Intelligent Routing Extension

English | 中文 | 日本語

Go Version License

About This Project

This project is based on tbphp/gpt-load with added Model Aggregation & Intelligent Routing functionality for aggregate groups.

Original Project

  • Repository: https://github.com/tbphp/gpt-load
  • Description: High-performance, enterprise-grade AI API transparent proxy service
  • Core Features: Intelligent key management, load balancing, distributed deployment, request monitoring, etc.

For detailed deployment and usage instructions, please refer to the original project documentation: https://www.gpt-load.com/docs

New Features in This Fork

Model Aggregation & Intelligent Routing for Aggregate Groups

  • Automatically select sub-groups based on the requested model parameter
  • Auto-aggregate model lists from sub-groups
  • Support weighted load balancing
  • Intelligent /v1/models endpoint interception

📦 Group Import/Export

  • One-click export of group configurations (including group info, model lists, keys, sub-groups)
  • Quick import of group configurations for migration and backup
  • JSON format support for version control and sharing

New Features

1. Automatic Model Aggregation

  • Standard groups: Auto-fetch model lists from upstream /v1/models API
  • Aggregate groups: Auto-aggregate model lists from all sub-groups
  • Support both manual configuration and automatic refresh

2. Intelligent Model Routing

Automatically route to sub-groups that support the requested model:

` Request: {" model: \gpt-4, ...} Route to sub-group supporting gpt-4

Request: {\model: \claude-3-opus, ...} Route to sub-group supporting Claude `

3. Weighted Load Balancing

When multiple sub-groups support the same model, distribute requests based on sub-group weights.

4. /v1/models Endpoint Interception

When accessing an aggregate group''s /v1/models, directly return the aggregated model list without forwarding to upstream.

Deployment

Docker Deployment

Using Docker Compose (Recommended)

# 1. Clone the repository
git clone https://github.com/alhza/GPT-Load.git
cd GPT-Load

# 2. Configure environment variables
cp .env.example .env
# Edit .env file and set necessary configurations

# 3. Build and start
docker-compose up -d

# Or use Makefile
make docker-compose-up

Using Docker Commands

# 1. Build image
docker build -t gpt-load:1.3.0 .

# Or use Makefile
make docker-build

# 2. Run container
docker run -d \
  -p 3001:3001 \
  -v $(pwd)/data:/app/data \
  --env-file .env \
  --name gpt-load \
  gpt-load:1.3.0

# Or use Makefile
make docker-run

Makefile Commands

make docker-build          # Build Docker image
make docker-build-no-cache # Build without cache
make docker-run            # Run container
make docker-stop           # Stop container
make docker-push           # Push to registry
make docker-compose-up     # Start docker-compose
make docker-compose-down   # Stop docker-compose

Build from Source

For detailed build instructions, please refer to the original project documentation:

Aggregate Group Features

Overview

Aggregate group intelligent routing allows you to create an aggregate group containing multiple sub-groups. The system automatically selects the sub-group that supports the requested model based on the model parameter, enabling intelligent routing and load balancing.

Core Features

  • Automatic Model Aggregation: Auto-aggregate all supported model lists from sub-groups
  • Intelligent Routing: Route to sub-groups based on the requested model parameter
  • Multi-Channel Aggregation: Support cross-channel sub-groups (OpenAI/Gemini/Anthropic, etc.), automatically convert request paths based on sub-group channel type
  • Weighted Load Balancing: Distribute requests by weight among sub-groups supporting the same model
  • Model List Management: Support auto-fetching from upstream API or manual configuration
  • Transparent Proxy: /v1/models endpoint returns aggregated model list

Use Cases

Scenario 1: Multi-Vendor Model Integration

Create aggregate group �i-mix with multiple sub-groups:

yaml Aggregate Group: ai-mix Sub-group A (weight: 50) Supported models: gpt-4, gpt-3.5-turbo, gpt-4-turbo Sub-group B (weight: 30) Supported models: claude-3-opus, claude-3-sonnet Sub-group C (weight: 20) Supported models: gemini-pro, gemini-pro-vision

Intelligent Routing Effect:

`�ash

Request GPT-4 Routes to Sub-group A

curl -X POST http://localhost:3001/proxy/ai-mix/v1/chat/completions \ -H \Authorization: Bearer your-proxy-key\ \ -d ''{\model: \gpt-4, \messages: [...]}''

Request Claude Routes to Sub-group B

curl -X POST http://localhost:3001/proxy/ai-mix/v1/chat/completions \ -H \Authorization: Bearer your-proxy-key\ \ -d ''{\model: \claude-3-opus, \messages: [...]}''

Request Gemini Routes to Sub-group C

curl -X POST http://localhost:3001/proxy/ai-mix/v1/chat/completions \ -H \Authorization: Bearer your-proxy-key\ \ -d ''{\model: \gemini-pro, \messages: [...]}'' `

Scenario 2: Same Model Multi-Instance Load Balancing

Create aggregate group openai-cluster with multiple OpenAI instances:

yaml Aggregate Group: openai-cluster Instance A (weight: 60) - us-east Instance B (weight: 30) - eu-west Instance C (weight: 10) - ap-south

All instances support the same models. The system distributes requests by weight.

Configuration Steps

1. Configure Model Lists for Sub-groups

Method 1: Auto-fetch (Recommended)

  1. Navigate to sub-group details page
  2. Click \Refresh Models\ button
  3. System auto-fetches model list from upstream /v1/models API

Method 2: Manual Configuration

Use API to manually set model list:

�ash curl -X PUT http://localhost:3001/api/groups/{groupId}/models \\ -H \Authorization: Bearer your-auth-key\ \\ -H \Content-Type: application/json\ \\ -d ''{\models\: [\gpt-4\, \gpt-3.5-turbo\]}''

2. Create Aggregate Group

  1. Create a new group in Web UI
  2. Select group type as \aggregate\
  3. Add sub-groups and set weights

3. Refresh Aggregate Group Models

  1. Navigate to aggregate group details page
  2. Click \Refresh Models\ button
  3. System auto-aggregates model lists from all sub-groups

API Endpoints

Get Group Model List

�ash GET /api/groups/{groupId}/models

Response example:

json { \models\: [ \gpt-4\, \gpt-3.5-turbo\, \claude-3-opus\, \gemini-pro\ ] }

Refresh Model List

�ash POST /api/groups/{groupId}/models/refresh

  • Standard groups: Fetch from upstream API
  • Aggregate groups: Aggregate from all sub-groups

Manually Update Model List

`�ash PUT /api/groups/{groupId}/models Content-Type: application/json

{ \models: [\gpt-4, \gpt-3.5-turbo] } `

/v1/models Endpoint Behavior

When accessing an aggregate group''s /v1/models endpoint, the system returns the aggregated model list directly without forwarding to upstream:

�ash curl http://localhost:3001/proxy/ai-mix/v1/models

Returns:

json { \object\: \list\, \data\: [ { \id\: \gpt-4\, \object\: \model\, \created\: 1728700800, \owned_by\: \ai-mix\ }, { \id\: \claude-3-opus\, \object\: \model\, \created\: 1728700800, \owned_by\: \ai-mix\ } // ... more models ] }

Routing Logic

  1. Extract Model Parameter: Extract model field from request body
  2. Filter Sub-groups: Find all sub-groups supporting the model
  3. Weighted Selection: Load balance by weight among filtered results
  4. Forward Request: Use selected sub-group configuration to forward request

Notes

  • Ensure each sub-group has correct model list configured
  • Model names must match exactly (case-sensitive)
  • Returns 503 error if no sub-group supports the requested model
  • Sub-groups should have valid API Keys, otherwise requests cannot be forwarded even if model is supported

Technical Implementation

New Files

Backend (8 files):

  1. internal/migrations/009_add_models_to_groups.go - Database migration
  2. internal/services/model_collector.go - Model collection service
  3. internal/handler/model_handler.go - Model management API handler
  4. internal/services/subgroup_manager.go - Intelligent routing logic (modified)
  5. internal/proxy/server.go - Proxy request handling (modified)
  6. internal/handler/handler.go - Handler registration (modified)
  7. internal/container/container.go - Dependency injection (modified)
  8. internal/router/router.go - Route registration (modified)

Frontend (5 files):

  1. web/src/api/keys.ts - API calls (modified)
  2. web/src/components/keys/GroupInfoCard.vue - UI component (modified)
  3. web/src/locales/zh-CN.ts - Chinese i18n (modified)
  4. web/src/locales/en-US.ts - English i18n (modified)
  5. web/src/locales/ja-JP.ts - Japanese i18n (modified)

Core Implementation

1. Model Collection (model_collector.go)

`go // Fetch model list from upstream API func FetchModelsFromAPI(ctx context.Context, group *models.Group, apiKey string) ([]string, error)

// Aggregate models from sub-groups func AggregateModelsFromSubGroups(subGroups []*models.Group) []string `

2. Intelligent Routing (subgroup_manager.go)

`go // Select sub-group supporting specified model func SelectSubGroup(group *models.Group, requestedModel string) (string, error)

// Filter sub-groups supporting the model func filterByModel(requestedModel string) []int

// Weighted load balancing func selectByWeightFromCandidates(candidateIndices []int) *subGroupItem `

3. /v1/models Interception (proxy/server.go)

go // Intercept /v1/models requests if c.Request.Method == \GET\ && c.Param(\path\) == \/v1/models\ { if ps.handleModelsRequest(c, originalGroup) { return // Directly return aggregated model list } }

Contributing

Issues and Pull Requests are welcome!

License

MIT License - See LICENSE file for details

Acknowledgments

Thanks to tbphp/gpt-load for providing the excellent foundation!

About

No description, website, or topics provided.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 46.0%
  • Vue 36.7%
  • TypeScript 12.5%
  • CSS 3.5%
  • Makefile 0.4%
  • JavaScript 0.4%
  • Other 0.5%