Skip to content

Commit 931b610

Browse files
gabcoynebaxendamienrj
authored
Add VertexAgent for flow runs on GCP vertex (PrefectHQ#4989)
- VertexAgent class and tests - VertexRun for configuring job spec such as machine type - Documentation Co-authored-by: Bradley Axen <[email protected]> Co-authored-by: Damien Ramunno-Johnson <[email protected]> Co-authored-by: Bradley Axen <[email protected]> Co-authored-by: Damien Ramunno-Johnson <[email protected]>
1 parent 0ed6012 commit 931b610

File tree

15 files changed

+911
-2
lines changed

15 files changed

+911
-2
lines changed

changes/pr4989.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
enhancement:
2+
- "Add Vertex Agent and RunConfig - [#4989](https://github.com/PrefectHQ/prefect/pull/4989)"
3+
4+
contributor:
5+
- "[Bradley Axen](https://github.com/baxen)"
6+
- "[Damien Ramunno-Johnson](https://github.com/damienrj)"

docs/.vuepress/config.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,7 @@ module.exports = {
240240
'agents/local',
241241
'agents/docker',
242242
'agents/kubernetes',
243+
'agents/vertex',
243244
'agents/ecs',
244245
'agents/fargate'
245246
]

docs/orchestration/agents/overview.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,9 @@ platforms.
2929
- **Kubernetes**: The [Kubernetes Agent](./kubernetes.md) executes flow runs as
3030
[Kubernetes Jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job/).
3131

32+
- **GCP Vertex**: The [Vertex Agent](./vertex.md) executes flow runs as
33+
[Vertex Custom Jobs](https://cloud.google.com/vertex-ai/docs/training/create-custom-job).
34+
3235
- **AWS ECS**: The [ECS Agent](./ecs.md) executes flow runs as [AWS ECS
3336
tasks](https://aws.amazon.com/ecs/) (on either ECS or Fargate).
3437

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Vertex Agent
2+
3+
The Vertex Agent executes flow runs as [Vertex Custom Jobs](https://cloud.google.com/vertex-ai/docs/training/create-custom-job).
4+
Vertex describes these as "training" jobs, but they can be used to run any kind of flow.
5+
6+
## Requirements
7+
8+
The required dependencies for the Vertex Agent aren't [installed by
9+
default](/core/getting_started/installation.md). If you're a `pip` user you'll
10+
need to add the `gcp` extra. Likewise, with `conda` you'll need to install
11+
`google-cloud-aiplatform`:
12+
13+
:::: tabs
14+
::: tab Pip
15+
16+
```bash
17+
pip install prefect[gcp]
18+
```
19+
20+
:::
21+
::: tab Conda
22+
23+
```bash
24+
conda install -c conda-forge prefect google-cloud-aiplatform
25+
```
26+
27+
:::
28+
::::
29+
30+
::: warning Prefect Server
31+
In order to use this agent with Prefect Server the server's GraphQL API
32+
endpoint must be accessible. This _may_ require changes to your Prefect Server
33+
deployment and/or [configuring the Prefect API
34+
address](./overview.md#prefect-api-address) on the agent.
35+
:::
36+
37+
## Flow Configuration
38+
39+
The Vertex Agent will deploy flows using either a
40+
[UniversalRun](/orchestration/flow_config/run_configs.md#universalrun) (the
41+
default) or [VertexRun](/orchestration/flow_config/run_configs.md#vertexrun)
42+
`run_config`. Using a `VertexRun` object lets you customize the deployment
43+
environment for a flow (exposing `env`, `image`, `machine_type`, etc...):
44+
45+
```python
46+
from prefect.run_configs import VertexRun
47+
48+
# Configure extra environment variables for this flow,
49+
# and set a custom image and machine type
50+
flow.run_config = VertexRun(
51+
env={"SOME_VAR": "VALUE"},
52+
image="my-custom-image",
53+
machine_type="e2-highmem-16",
54+
)
55+
```
56+
57+
See the [VertexRun](/orchestration/flow_config/run_configs.md#vertexrun)
58+
documentation for more information.
59+
60+
## Agent Configuration
61+
62+
The Vertex agent can be started from the Prefect CLI as
63+
64+
```bash
65+
prefect agent vertex start
66+
```
67+
68+
::: tip API Keys <Badge text="Cloud"/>
69+
When using Prefect Cloud, this will require a service account API key, see
70+
[here](./overview.md#api_keys) for more information.
71+
:::
72+
73+
Below we cover a few common configuration options, see the [CLI
74+
docs](/api/latest/cli/agent.md#vertex-start) for a full list of options.
75+
76+
### Project
77+
78+
By default the agent will deploy flow run tasks into the current project (as defined by [google.auth.default](https://google-auth.readthedocs.io/en/latest/reference/google.auth.html))
79+
You can specify a different project using the `--project` option:
80+
81+
```bash
82+
prefect agent vertex start --project my-project
83+
```
84+
85+
This can be a different project than the agent is running in, as long as the account has permissions
86+
to start Vertex Custom Jobs in the specified project.
87+
88+
### Region
89+
90+
Vertex requires a region in which to run the flow, and will default to `us-central1`
91+
You can specify a different region using the `--region-name` option:
92+
93+
```bash
94+
prefect agent vertex start --region-name us-east1
95+
```
96+
97+
### Service Account
98+
99+
Vertex jobs can run as a specified service account. Vertex provides a default, but specifying a specific
100+
account can give you more control over what resources the flow runs are allowed to access.
101+
You can specify a non-default account using the `--service-account` option:
102+
103+
```bash
104+
prefect agent vertex start --service-account [email protected]
105+
```
106+

docs/orchestration/flow_config/run_configs.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -237,3 +237,71 @@ for this flow, stored in S3:
237237
```python
238238
flow.run_config = ECSRun(task_definition_path="s3://bucket/path/to/definition.yaml")
239239
```
240+
241+
### VertexRun
242+
243+
[VertexRun](/api/latest/run_configs.md#vertexrun) configures flow runs
244+
deployed as Vertex CustomJobs with a VertexAgent.
245+
246+
#### Examples
247+
248+
Use the defaults set on the agent:
249+
250+
```python
251+
from prefect.run_configs import VertexRun
252+
253+
flow.run_config = VertexRun()
254+
```
255+
256+
Set an environment variable in the flow run container:
257+
258+
```python
259+
flow.run_config = VertexRun(env={"SOME_VAR": "value"})
260+
```
261+
262+
Specify an [image](./docker.md) to use, if not using `Docker` storage. If you're using `Docker` storage, then
263+
that image will be used. If you do not use docker storage or provide an image in the run config, then the run
264+
will use the default `prefect` image.
265+
266+
```python
267+
flow.run_config = VertexRun(image="example/image-name:with-tag")
268+
```
269+
270+
Specify the machine type for this flow
271+
272+
```python
273+
flow.run_config = VertexRun(machine_type='e2-highmem-16')
274+
```
275+
276+
Set a specific service account or network ID for this flow run
277+
```python
278+
flow.run_config = VertexRun(service_account='[email protected]', network="my-network")
279+
```
280+
281+
Use the [scheduling](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/CustomJobSpec#Scheduling) option to set a timeout for the CustomJob
282+
```python
283+
flow.run_config = VertexRun(scheduling={'timeout': '3600s'})
284+
```
285+
286+
287+
Customize the full [worker pool specs](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/CustomJobSpec#workerpoolspec),
288+
which can be used for more advanced setups:
289+
290+
```python
291+
worker_pool_specs = [
292+
{"machine_spec": {"machine_type": "e2-standard-4"}, "replica_count": 1},
293+
{
294+
"machine_spec": {"machine_type": "e2-highmem-16"},
295+
"replica_count": 3,
296+
"container_spec": {"image": "my-image"},
297+
},
298+
]
299+
300+
flow.run_config = VertexRun(worker_pool_specs=worker_pool_specs)
301+
```
302+
303+
::: warning Container Spec
304+
Prefect will always control the container spec on the 0th entry in the worker pool spec,
305+
which is the pool that is reserved to run the flow. You will need to provide a container
306+
spec for any other worker pool specs.
307+
:::

docs/outline.toml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -267,7 +267,7 @@ classes = {Executor=[], LocalExecutor=[], LocalDaskExecutor=[], DaskExecutor=[]}
267267
[pages.run_configs]
268268
title = "Run Configuration"
269269
module = "prefect.run_configs"
270-
classes = ["RunConfig", "UniversalRun", "LocalRun", "DockerRun", "KubernetesRun", "ECSRun"]
270+
classes = ["RunConfig", "UniversalRun", "LocalRun", "DockerRun", "KubernetesRun", "ECSRun", "VertexRun"]
271271

272272
[pages.storage]
273273
title = "Storage"
@@ -637,6 +637,11 @@ title = "ECS Agent"
637637
module = "prefect.agent.ecs"
638638
classes = {ECSAgent = ["start"]}
639639

640+
[pages.agent.vertex]
641+
title = "Vertex Agent"
642+
module = "prefect.agent.vertex"
643+
classes = {VertexAgent = ["start"]}
644+
640645
[pages.artifacts.artifacts]
641646
title = "Artifacts"
642647
module = "prefect.artifacts"

setup.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,8 @@ def run(self):
3535
"gcp": [
3636
"google-cloud-secret-manager >= 2.4.0",
3737
"google-cloud-storage >= 1.13, < 2.0",
38+
"google-cloud-aiplatform >= 1.4.0, < 2.0",
39+
"google-auth >= 2.0, < 3.0",
3840
],
3941
"git": ["dulwich >= 0.19.7"],
4042
"github": ["PyGithub >= 1.51, < 2.0"],

src/prefect/agent/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@
77
import prefect.agent.kubernetes
88
import prefect.agent.local
99
import prefect.agent.ecs
10+
import prefect.agent.vertex
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from prefect.agent.vertex.agent import VertexAgent

0 commit comments

Comments
 (0)