Arks

Arks is an end-to-end framework for managing LLM-based applications within Kubernetes clusters. It provides a robust and extensible infrastructure tailored for deploying, orchestrating, and scaling LLM inference workloads in cloud-native environments.

Key Features

Distributed Inference

Multi-node scheduling: Run inference across multiple compute nodes.
Heterogeneous computing support: Works across different hardware types (CPU, GPU, etc.).
Multi-engine compatibility: Supports vLLM, SGLang, and Dynamo.
Auto service discovery & load balancing: Dynamically register and balance application traffic.
Automatic weight adjustment: Adapt to traffic and resource demands in real-time.
Horizontal Pod Autoscaling (HPA): Autoscale applications based on workload.

Model Management

Model caching & optimization: Efficiently download and cache models to reduce cold-start latency.
Model sharing: Share models across inference nodes to save bandwidth and memory.
Accelerated loading: Leverage local cache and preloaded strategies for fast startup.

Multi-Tenant Management

Fine-grained API Token control: Issue and manage tokens with scoped permissions.
Flexible quota strategies: Enforce usage limits by total token count or pricing-based policies.
Request throttling: Support rate limiting by TPM (tokens per minute) and RPM (requests per minute) and more rate limiting stategies.

Architecture

Arks consists of the following major components:

Gateway Layer: Acts as the unified entry point for all external traffic. It handles request routing and enforces access policies.
- ArksToken: Provides fine-grained multi-tenant access control with support for:
  - API token-based authentication
  - Quota enforcement (based on token usage or pricing)
  - Rate limiting (TPM, RPM)
- ArksEndpoint: Dynamically manages routing rules and traffic distribution across different ArksApplication instances.
  - Supports dynamic weight-based routing
  - Enables automatic application discovery
  - Adjusts traffic flow in real-time based on load or policies
Workload Layer: Each ArksApplication contains one or more runtime instances. Supported runtimes include vLLM, SGLang, Dynamo. Each runtime is deployed as a Kubernetes workload and benefits from:
- Distributed inference across multiple nodes
- Support for heterogeneous computing environments
- Autoscaling via Kubernetes HPA, based on predefined SLOs
Storage Layer: Using ArksModel to manage model storage.
- Supports auto caching of models to reduce cold start time
- Enables model sharing across applications and nodes
- Designed for high-throughput model loading and reuse

More docs:

Quick Start

Prerequisites

Kubernetes cluster (v1.20+)
kubectl configured to access your cluster

Installation

git clone https://github.com/scitix/arks.git
cd arks

# Install envoy gateway, lws dependencies
kubectl create -f dist/dependency.yaml

# Install arks operator
kubectl create -f dist/operator.yaml

# Install arks gateway plugins
kubectl create -f dist/gateway.yaml

verification:

# Check all component status, should be ready
kubectl get deployment -n arks-operator-system
---
NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
arks-gateway-plugins               1/1     1            1           22h
arks-operator-controller-manager   1/1     1            1           22h
arks-redis-master                  1/1     1            1           22h

# Check Envoy Gateway status
kubectl get deployment -n   envoy-gateway-system
--- 
NAME                                          READY   UP-TO-DATE   AVAILABLE   AGE
envoy-arks-operator-system-arks-eg-abcedefg   1/1     1            1           22h
envoy-gateway                                 1/1     1            1           22h

Examples

Install with:

kubectl create -f examples/quickstart/quickstart.yaml

Check resources ready:

# Check all ARKS custom resources
kubectl get arksapplication,arksendpoint,arksmodel,arksquota,arkstoken,httproute -owide
---
# REPLICAS should equals to READY, PHASE should be Running
NAME                                      PHASE     REPLICAS   READY   AGE   MODEL     RUNTIME   DRIVER
arksapplication.arks.ai/app-qwen   Running   1          1       21m   qwen-7b   sglang

NAME                                  AGE   DEFAULT WEIGHT
arksendpoint.arks.ai/qwen-7b   21m   5

# PHASE should be Ready
NAME                               AGE   MODEL                         PHASE
arksmodel.arks.ai/qwen-7b   21m   Qwen/Qwen2.5-7B-Instruct-1M   Ready

NAME                                   AGE
arksquota.arks.ai/basic-quota   21m

NAME                                     AGE
arkstoken.arks.ai/example-token   21m

NAME                                          HOSTNAMES   AGE
httproute.gateway.networking.k8s.io/qwen-7b               21m

Testing

Get the gateway IP:

# Option 1: Kubernetes cluster with LoadBalancer support
LB_IP=$(kubectl get svc -n envoy-gateway-system --selector=gateway.envoyproxy.io/owning-gateway-name=arks-eg -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
ENDPOINT="${LB_IP}:80"

# Option 2: Dev environment without LoadBalancer support. Use port forwarding way instead
ENVOY_SERVICE=$(kubectl get svc -n envoy-gateway-system --selector=gateway.envoyproxy.io/owning-gateway-name=arks-eg -o jsonpath='{.items[0].metadata.name}')
kubectl -n envoy-gateway-system port-forward service/${ENVOY_SERVICE} 8888:80 &
ENDPOINT="localhost:8888"

Curl the example app through Envoy proxy:

curl http://${ENDPOINT}/v1/chat/completions -k \
  -H "Authorization: Bearer sk-test123456" \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen-7b", "messages": [{"role": "user", "content": "Hello, who are you?"}]}'

Expected response

{
  "id":"xxxxxxxxx",
  "object":"chat.completion",
  "created": 12332454,
  "model":"qwen-7b",
  "choices":[{
    "index":0,
    "message":{
      "role":"assistant",
      "content":"I'm a large language model created by Alibaba Cloud. I go by the name Qwen.",
      "reasoning_content":null,
      "tool_calls":null
    },
    "logprobs":null,
    "finish_reason":"stop",
    "matched_stop":151645
  }],
  "usage":{
    "prompt_tokens":25,
    "total_tokens":45,
    "completion_tokens":20,
    "prompt_tokens_details":null
  }
}

Clean-Up

kubectl delete -f examples/quickstart/quickstart.yaml --ignore-not-found=true
kubectl delete -f dist/gateway.yaml
kubectl delete -f dist/operator.yaml
kubectl delete -f dist/dependency.yaml

Build

It is recommended to compile ARKS using Docker. Here are the relevant commands:

make docker-build-operator
make docker-build-gateway
make docker-build-scripts

License

Arks is licensed under the Apache 2.0 License.

Community, discussion, contribution, and support

For feedback, questions, or contributions, feel free to:

Open an issue on GitHub
Submit a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
.github		.github
api/v1		api/v1
cmd		cmd
config		config
dist		dist
dockerfiles		dockerfiles
docs		docs
examples/quickstart		examples/quickstart
hack		hack
internal/controller		internal/controller
pkg/gateway		pkg/gateway
scripts		scripts
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
PROJECT		PROJECT
README-zh.md		README-zh.md
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Arks

Key Features

Distributed Inference

Model Management

Multi-Tenant Management

Architecture

Quick Start

Prerequisites

Installation

Examples

Testing

Clean-Up

Build

License

Community, discussion, contribution, and support

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

scitix/arks

Folders and files

Latest commit

History

Repository files navigation

Arks

Key Features

Distributed Inference

Model Management

Multi-Tenant Management

Architecture

Quick Start

Prerequisites

Installation

Examples

Testing

Clean-Up

Build

License

Community, discussion, contribution, and support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages