-
Notifications
You must be signed in to change notification settings - Fork 250
Description
Request Type
Feature Request
Problem Description
Cortex can run analyzers and responders (collectively, neurons, if I'm using the term properly) as subprocesses (ProcessJobRunnerSrv) or using Docker (DockerJobRunnerSrv). When processes are used, all the neuron code ends up in the same filesystem, process, and network namespace as Cortex itself. When Docker is used, both Cortex itself and each neuron's code can run in their own containers. This is a maintainability triumph.
But in order to do this, Cortex's container has to have access to the Docker socket, and it has to share a directory with the neuron containers it runs. (With the DockerJobRunnerSrv, this filesystem sharing happens via access to a directory in the host OS' filesystem.) Access to the Docker socket is not a security best practice, because it's equivalent to local root; also it hinders scalability and flexibility in running neurons, and it means depending specifically on Docker, not generally any software that can make a container.
Kubernetes
Kubernetes offers APIs for running containers which are scalable to multi-node clusters, securable, and supported by multiple implementations. Some live natively in public clouds (EKS, AKS, GKE, etc.), some are trivial single-node clusters for development (minikube, KIND), and some are lightweight but production-grade (k3s, MicroK8S). Kubernetes has extension points for storage, networking, and container running, so these functions can be provided by plugins.
The net effect is that while there is a dizzying array of choices about how to set up a Kubernetes cluster, applications consuming the Kubernetes APIs don't need to make those choices, only to signify what they need. The cluster will make the right thing appear, subject to the choices of its operators. And the people using the cluster need not be the same people as the operators: public clouds support quick deployments with but few questions, I've heard.
Jobs
One of the patterns Kubernetes supports for using containers to get work done is the Job. (I'll capitalize it here to avoid confusion with Cortex jobs.) You create a Job with a Pod spec (which in turn specifies some volumes and some containers), and the Job will create and run the Pod, retrying until it succeeds, subject to limits on time, number of tries, rate limits, and the like. Upon succeeding it remains until deleted, or until a configured timeout expires, etc.
Running a Job with the Kubernetes API would be not unlike running a container with the Docker API, except that it would be done using a different client, and filesystem sharing would be accomplished in a different way.
Sharing files with a job
With the Docker job runner, a directory on the host is specified; it's mounted into the Cortex container; and when Cortex creates a neuron container, it's mounted into that neuron container. This implicitly assumes that the Cortex container and the neuron container are both run on the same host, they can have access to the same filesystem at the same time, and it provides persistent storage. None of these are necessarily true under Kubernetes.
Under Kubernetes, a PersistentVolumeClaim can be created, which backs a volume that can be mounted into a container (as signified in the spec for the container). That claim can have ReadWriteMany as its accessModes setting, which signifies to the cluster a requirement that multiple nodes should be able to read and write files in the PersistentVolume which the cluster provides to satisfy the claim. On a trivial, single-node cluster, the persistent volume can be a hostPath: the same way everything happens now with Docker, but more complicated. But different clusters can provide other kinds of volumes to satisfy such a claim: self-hosted clusters may use Longhorn or Rook to provide redundant, fault-tolerant storage; or public clouds may provide volumes of other types they have devised themselves (Elastic Filesystem, Azure Shared Disks, etc). The PersistentVolumeClaim doesn't care.
So the Cortex container is created with a ReadWriteMany PersistentVolumeClaim backed volume mounted as its job base directory. When running a neuron, it creates a directory for the job, and creates a Job, with volumes which are job-specific subPaths of the same PersistentVolumeClaim mounted as the /job/input (readOnly: true) and /job/output directories. How the files get shared is up to the cluster. The Job can only see and use the input and output directories for the Cortex job it's about. When the Job is finished, the output files will be visible in the Cortex container under the job base directory, as with other job running methods.
How to implement
- Choose a Kubernetes client.
- skuber exists specifically for Scala, but appears to have been last updated in September 2020, and is not automatically generated from the Kubernetes API definition, so it takes manual work to make updates.
- io.fabric8:kubernetes-client is generally for Java, and it's automatically generated, with the last update this month.
- Write a
KubernetesJobRunnerSrv, which takes information about a persistent volume claim passed in, and uses it to create a Kubernetes Job for a Cortex job, hewing closely to theDockerJobRunnerSrv. - Follow dependencies and add code as necessary until Cortex can be configured to run jobs this way, and can run the jobs.
- Document several use cases.
- The simplest way to run everything on a single machine.
- A simple self-hosted setup.
- Write a Helm chart or Operator, which will make Cortex deployment quick and easy, given a cluster.