Skip to content

Commit 57b0d1b

Browse files
authored
Merge pull request #39 from Nimbus318/docs/update-readme-with-chinese-version
docs: Add Chinese README and improve documentation
2 parents a3fd698 + cb3e07d commit 57b0d1b

File tree

2 files changed

+144
-17
lines changed

2 files changed

+144
-17
lines changed

README.md

Lines changed: 40 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,28 @@
11
# HAMi-core —— Hook library for CUDA Environments
22

3+
English | [中文](README_CN.md)
4+
35
## Introduction
46

5-
HAMi-core is the in-container gpu resource controller, it has beed adopted by [HAMi](https://github.com/HAMi-project/HAMi), [volcano](https://github.com/volcano-sh/devices)
7+
HAMi-core is the in-container gpu resource controller, it has beed adopted by [HAMi](https://github.com/Project-HAMi/HAMi), [volcano](https://github.com/volcano-sh/devices)
68

79
<img src="./docs/images/hami-arch.png" width = "600" />
810

911
## Features
1012

1113
HAMi-core has the following features:
1214
1. Virtualize device meory
13-
14-
![image](docs/images/sample_nvidia-smi.png)
15-
1615
2. Limit device utilization by self-implemented time shard
17-
1816
3. Real-time device utilization monitor
1917

18+
![image](docs/images/sample_nvidia-smi.png)
19+
2020
## Design
2121

2222
HAMi-core operates by Hijacking the API-call between CUDA-Runtime(libcudart.so) and CUDA-Driver(libcuda.so), as the figure below:
2323

2424
<img src="./docs/images/hami-core-position.png" width = "400" />
2525

26-
## Build
27-
28-
```bash
29-
make
30-
```
31-
3226
## Build in Docker
3327

3428
```bash
@@ -42,25 +36,55 @@ _CUDA_DEVICE_MEMORY_LIMIT_ indicates the upper limit of device memory (eg 1g,102
4236
_CUDA_DEVICE_SM_LIMIT_ indicates the sm utility percentage of each device
4337

4438
```bash
45-
# Add 1GB bytes limit And set max sm utility to 50% for all devices
39+
# Add 1GiB memory limit and set max SM utility to 50% for all devices
4640
export LD_PRELOAD=./libvgpu.so
4741
export CUDA_DEVICE_MEMORY_LIMIT=1g
4842
export CUDA_DEVICE_SM_LIMIT=50
4943
```
5044

5145
## Docker Images
46+
5247
```bash
53-
# Make docker image
54-
docker build . -f=dockerfiles/Dockerfile-tf1.8-cu90
48+
# Build docker image
49+
docker build . -f=dockerfiles/Dockerfile -t cuda_vmem:tf1.8-cu90
5550

56-
# Launch the docker image
51+
# Configure GPU device and library mounts for container
5752
export DEVICE_MOUNTS="--device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidiactl:/dev/nvidiactl"
5853
export LIBRARY_MOUNTS="-v /usr/cuda_files:/usr/cuda_files -v $(which nvidia-smi):/bin/nvidia-smi"
5954

55+
# Run container and check nvidia-smi output
6056
docker run ${LIBRARY_MOUNTS} ${DEVICE_MOUNTS} -it \
6157
-e CUDA_DEVICE_MEMORY_LIMIT=2g \
58+
-e LD_PRELOAD=/libvgpu/build/libvgpu.so \
6259
cuda_vmem:tf1.8-cu90 \
63-
python -c "import tensorflow; tensorflow.Session()"
60+
nvidia-smi
61+
```
62+
63+
After running, you will see nvidia-smi output similar to the following, showing memory limited to 2GiB:
64+
65+
```
66+
...
67+
[HAMI-core Msg(1:140235494377280:libvgpu.c:836)]: Initializing.....
68+
Mon Dec 2 04:38:12 2024
69+
+-----------------------------------------------------------------------------------------+
70+
| NVIDIA-SMI 550.107.02 Driver Version: 550.107.02 CUDA Version: 12.4 |
71+
|-----------------------------------------+------------------------+----------------------+
72+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
73+
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
74+
| | | MIG M. |
75+
|=========================================+========================+======================|
76+
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:03:00.0 Off | N/A |
77+
| 30% 36C P8 7W / 170W | 0MiB / 2048MiB | 0% Default |
78+
| | | N/A |
79+
+-----------------------------------------+------------------------+----------------------+
80+
81+
+-----------------------------------------------------------------------------------------+
82+
| Processes: |
83+
| GPU GI CI PID Type Process name GPU Memory |
84+
| ID ID Usage |
85+
|=========================================================================================|
86+
+-----------------------------------------------------------------------------------------+
87+
[HAMI-core Msg(1:140235494377280:multiprocess_memory_limit.c:497)]: Calling exit handler 1
6488
```
6589

6690
## Log
@@ -74,7 +98,6 @@ Use environment variable LIBCUDA_LOG_LEVEL to set the visibility of logs
7498
| 3 | infos,errors,warnings,messages |
7599
| 4 | debugs,errors,warnings,messages |
76100

77-
78101
## Test Raw APIs
79102

80103
```bash

README_CN.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# HAMi-core —— CUDA 环境的 Hook 库
2+
3+
[English](README.md) | 中文
4+
5+
## 介绍
6+
7+
HAMi-core 是一个容器内的 GPU 资源控制器,已被 [HAMi](https://github.com/Project-HAMi/HAMi)[volcano](https://github.com/volcano-sh/devices) 等项目采用。
8+
9+
<img src="./docs/images/hami-arch.png" width = "600" />
10+
11+
## 特性
12+
13+
HAMi-core 具有以下特性:
14+
1. GPU 显存虚拟化
15+
2. 通过自实现的时间片方式限制设备利用率
16+
3. 实时设备利用率监控
17+
18+
![image](docs/images/sample_nvidia-smi.png)
19+
20+
## 设计原理
21+
22+
HAMi-core 通过劫持 CUDA-Runtime(libcudart.so) 和 CUDA-Driver(libcuda.so) 之间的 API 调用来实现功能,如下图所示:
23+
24+
<img src="./docs/images/hami-core-position.png" width = "400" />
25+
26+
## 在Docker中编译
27+
28+
```bash
29+
make build-in-docker
30+
```
31+
32+
## 使用方法
33+
34+
_CUDA_DEVICE_MEMORY_LIMIT_ 用于指定设备内存的上限(例如:1g、1024m、1048576k、1073741824)
35+
36+
_CUDA_DEVICE_SM_LIMIT_ 用于指定每个设备的 SM 利用率百分比
37+
38+
```bash
39+
# 为挂载的设备添加 1GiB 内存限制并将最大 SM 利用率设置为 50%
40+
export LD_PRELOAD=./libvgpu.so
41+
export CUDA_DEVICE_MEMORY_LIMIT=1g
42+
export CUDA_DEVICE_SM_LIMIT=50
43+
```
44+
45+
## Docker镜像使用
46+
47+
```bash
48+
# 构建 Docker 镜像
49+
docker build . -f=dockerfiles/Dockerfile -t cuda_vmem:tf1.8-cu90
50+
51+
# 配置容器的 GPU 设备和库挂载选项
52+
export DEVICE_MOUNTS="--device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidiactl:/dev/nvidiactl"
53+
export LIBRARY_MOUNTS="-v /usr/cuda_files:/usr/cuda_files -v $(which nvidia-smi):/bin/nvidia-smi"
54+
55+
# 运行容器并查看 nvidia-smi 输出
56+
docker run ${LIBRARY_MOUNTS} ${DEVICE_MOUNTS} -it \
57+
-e CUDA_DEVICE_MEMORY_LIMIT=2g \
58+
-e LD_PRELOAD=/libvgpu/build/libvgpu.so \
59+
cuda_vmem:tf1.8-cu90 \
60+
nvidia-smi
61+
```
62+
63+
运行后,您将看到类似以下的 nvidia-smi 输出,显示内存被限制在 2GiB:
64+
65+
```
66+
...
67+
[HAMI-core Msg(1:140235494377280:libvgpu.c:836)]: Initializing.....
68+
Mon Dec 2 04:38:12 2024
69+
+-----------------------------------------------------------------------------------------+
70+
| NVIDIA-SMI 550.107.02 Driver Version: 550.107.02 CUDA Version: 12.4 |
71+
|-----------------------------------------+------------------------+----------------------+
72+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
73+
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
74+
| | | MIG M. |
75+
|=========================================+========================+======================|
76+
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:03:00.0 Off | N/A |
77+
| 30% 36C P8 7W / 170W | 0MiB / 2048MiB | 0% Default |
78+
| | | N/A |
79+
+-----------------------------------------+------------------------+----------------------+
80+
81+
+-----------------------------------------------------------------------------------------+
82+
| Processes: |
83+
| GPU GI CI PID Type Process name GPU Memory |
84+
| ID ID Usage |
85+
|=========================================================================================|
86+
+-----------------------------------------------------------------------------------------+
87+
[HAMI-core Msg(1:140235494377280:multiprocess_memory_limit.c:497)]: Calling exit handler 1
88+
```
89+
90+
## 日志级别
91+
92+
使用环境变量 LIBCUDA_LOG_LEVEL 来设置日志的可见性
93+
94+
| LIBCUDA_LOG_LEVEL | 描述 |
95+
| ----------------- | ----------- |
96+
| 0 | 仅错误信息 |
97+
| 1(默认),2 | 错误、警告和消息 |
98+
| 3 | 信息、错误、警告和消息 |
99+
| 4 | 调试、错误、警告和消息 |
100+
101+
## 测试原始API
102+
103+
```bash
104+
./test/test_alloc

0 commit comments

Comments
 (0)