This repo contains the libraries for writing a custom job operators such as tf-operator and pytorch-operator. To write a custom operator, user need to do following steps
-
Generate operator skeleton using kube-builder or operator-sdk
-
Define job crd and reuse common API. Check test_job for full example.
import (
commonv1 "github.com/kubeflow/common/pkg/apis/common/v1"
)
// reuse commonv1 api in your type.go
RunPolicy *commonv1.RunPolicy `json:"runPolicy,omitempty"`
TestReplicaSpecs map[TestReplicaType]*commonv1.ReplicaSpec `json:"testReplicaSpecs"`- Write a custom controller that implements controller interface, such as the TestJobController and instantiate a testJobController object
testJobController := TestJobController {
...
}- Instantiate a JobController struct object and pass in the custom controller written in step 1 as a parameter
import "github.com/kubeflow/common/pkg/controller.v1/common"
jobController := common.JobController {
Controller: testJobController,
Config: v1.JobControllerConfiguration{EnableGangScheduling: false},
Recorder: recorder,
}- Within you main reconcile loop, call the JobController.ReconcileJobs method.
reconcile(...) {
// Your main reconcile loop.
...
jobController.ReconcileJobs(...)
...
}Note that this repo is still under construction, API compatibility is not guaranteed at this point.
Please refer to the API documentation.
The API files are located under pkg/apis/common/v1:
- constants.go: the constants such as label keys.
- interface.go: the interfaces to be implemented by custom controllers.
- controller.go: the main
JobControllerthat contains theReconcileJobsAPI method to be invoked by user. This is the entrypoint of theJobControllerlogic. The rest of the code underjob_controller/folder contains the core logic for theJobControllerto work, such as creating and managing worker pods, services, etc.