Skip to main content
Version: v2.2.0

Helm Charts

Documentation for deploying Dragonfly on kubernetes using helm.

For more integrations such as Docker, CRI-O, Singularity/Apptainer, Nydus, eStargz, Harbor, Git LFS, Hugging Face, TorchServe, Triton Server, etc., refer to Integrations.

Prerequisites

NameVersionDocument
Kubernetes cluster1.20+kubernetes.io
Helmv3.8.0+helm.sh
containerdv1.5.0+containerd.io

Setup kubernetes cluster

Kind is recommended if no Kubernetes cluster is available for testing.

Create kind multi-node cluster configuration file kind-config.yaml, configuration content is as follows:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker

Create a kind multi-node cluster using the configuration file:

kind create cluster --config kind-config.yaml

Switch the context of kubectl to kind cluster:

kubectl config use-context kind-kind

Kind loads Dragonfly image

Pull Dragonfly latest images:

docker pull dragonflyoss/scheduler:latest
docker pull dragonflyoss/manager:latest
docker pull dragonflyoss/client:latest
docker pull dragonflyoss/dfinit:latest

Kind cluster loads Dragonfly latest images:

kind load docker-image dragonflyoss/scheduler:latest
kind load docker-image dragonflyoss/manager:latest
kind load docker-image dragonflyoss/client:latest
kind load docker-image dragonflyoss/dfinit:latest

Create Dragonfly cluster based on helm charts

Create the Helm Charts configuration file values.yaml, and set the container runtime to containerd. Please refer to the configuration documentation for details.

manager:
image:
repository: dragonflyoss/manager
tag: latest
metrics:
enable: true
config:
verbose: true
pprofPort: 18066

scheduler:
image:
repository: dragonflyoss/scheduler
tag: latest
metrics:
enable: true
config:
verbose: true
pprofPort: 18066

seedClient:
image:
repository: dragonflyoss/client
tag: latest
metrics:
enable: true
config:
verbose: true

client:
image:
repository: dragonflyoss/client
tag: latest
metrics:
enable: true
config:
verbose: true
dfinit:
enable: true
image:
repository: dragonflyoss/dfinit
tag: latest
config:
containerRuntime:
containerd:
configPath: /etc/containerd/config.toml
registries:
- hostNamespace: docker.io
serverAddr: https://index.docker.io
capabilities: ['pull', 'resolve']

Create a Dragonfly cluster using the configuration file:

$ helm repo add dragonfly https://dragonflyoss.github.io/helm-charts/
$ helm install --wait --create-namespace --namespace dragonfly-system dragonfly dragonfly/dragonfly -f values.yaml
NAME: dragonfly
LAST DEPLOYED: Thu Apr 18 19:26:39 2024
NAMESPACE: dragonfly-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the scheduler address by running these commands:
export SCHEDULER_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=scheduler" -o jsonpath={.items[0].metadata.name})
export SCHEDULER_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $SCHEDULER_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl --namespace dragonfly-system port-forward $SCHEDULER_POD_NAME 8002:$SCHEDULER_CONTAINER_PORT
echo "Visit http://127.0.0.1:8002 to use your scheduler"

2. Get the dfdaemon port by running these commands:
export DFDAEMON_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=dfdaemon" -o jsonpath={.items[0].metadata.name})
export DFDAEMON_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $DFDAEMON_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
You can use $DFDAEMON_CONTAINER_PORT as a proxy port in Node.

3. Configure runtime to use dragonfly:
https://d7y.io/docs/getting-started/quick-start/kubernetes/

Check that Dragonfly is deployed successfully:

$ kubectl get po -n dragonfly-system
NAME READY STATUS RESTARTS AGE
dragonfly-client-gvspg 1/1 Running 0 34m
dragonfly-client-kxrhh 1/1 Running 0 34m
dragonfly-manager-864774f54d-6t79l 1/1 Running 0 34m
dragonfly-mysql-0 1/1 Running 0 34m
dragonfly-redis-master-0 1/1 Running 0 34m
dragonfly-redis-replicas-0 1/1 Running 0 34m
dragonfly-redis-replicas-1 1/1 Running 0 32m
dragonfly-redis-replicas-2 1/1 Running 0 32m
dragonfly-scheduler-0 1/1 Running 0 34m
dragonfly-seed-client-0 1/1 Running 5 (21m ago) 34m

Containerd downloads images through Dragonfly

Pull alpine:3.19 image in kind-worker node:

docker exec -i kind-worker /usr/local/bin/crictl pull alpine:3.19

Verify

You can execute the following command to check if the alpine:3.19 image is distributed via Dragonfly.

# Find pod name.
export POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=client" -o=jsonpath='{.items[?(@.spec.nodeName=="kind-worker")].metadata.name}' | head -n 1 )

# Find task id.
export TASK_ID=$(kubectl -n dragonfly-system exec ${POD_NAME} -- sh -c "grep -hoP 'library/alpine.*task_id=\"\K[^\"]+' /var/log/dragonfly/dfdaemon/* | head -n 1")

# Check logs.
kubectl -n dragonfly-system exec -it ${POD_NAME} -- sh -c "grep ${TASK_ID} /var/log/dragonfly/dfdaemon/* | grep 'download task succeeded'"

# Download logs.
kubectl -n dragonfly-system exec ${POD_NAME} -- sh -c "grep ${TASK_ID} /var/log/dragonfly/dfdaemon/*" > dfdaemon.log

The expected output is as follows:

{
2024-04-19T02:44:09.259458Z INFO
"download_task":"dragonfly-client/src/grpc/dfdaemon_download.rs:276":: "download task succeeded"
"host_id": "172.18.0.3-kind-worker",
"task_id": "a46de92fcb9430049cf9e61e267e1c3c9db1f1aa4a8680a048949b06adb625a5",
"peer_id": "172.18.0.3-kind-worker-86e48d67-1653-4571-bf01-7e0c9a0a119d"
}

Performance testing

Containerd pull image back-to-source for the first time through Dragonfly

Pull alpine:3.19 image in kind-worker node:

time docker exec -i kind-worker /usr/local/bin/crictl pull alpine:3.19

When pull image back-to-source for the first time through Dragonfly, it takes 28.82s to download the alpine:3.19 image.

Containerd pull image hits the cache of remote peer

Delete the client whose Node is kind-worker to clear the cache of Dragonfly local Peer.

# Find pod name.
export POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=client" -o=jsonpath='{.items[?(@.spec.nodeName=="kind-worker")].metadata.name}' | head -n 1 )

# Delete pod.
kubectl delete pod ${POD_NAME} -n dragonfly-system

Delete alpine:3.19 image in kind-worker node:

docker exec -i kind-worker /usr/local/bin/crictl rmi alpine:3.19

Pull alpine:3.19 image in kind-worker node:

time docker exec -i kind-worker /usr/local/bin/crictl pull alpine:3.19

When pull image hits cache of remote peer, it takes 12.524s to download the alpine:3.19 image.

Containerd pull image hits the cache of local peer

Delete alpine:3.19 image in kind-worker node:

docker exec -i kind-worker /usr/local/bin/crictl rmi alpine:3.19

Pull alpine:3.19 image in kind-worker node:

time docker exec -i kind-worker /usr/local/bin/crictl pull alpine:3.19

When pull image hits cache of local peer, it takes 7.432s to download the alpine:3.19 image.

Preheat image

Expose manager's port 8080:

kubectl --namespace dragonfly-system port-forward service/dragonfly-manager 8080:8080

Please create personal access Token before calling Open API, and select job for access scopes, refer to personal-access-tokens.

Use Open API to preheat the image alpine:3.19 to Seed Peer, refer to preheat.

curl --location --request POST 'http://127.0.0.1:8080/oapi/v1/jobs' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer your_personal_access_token' \
--data-raw '{
"type": "preheat",
"args": {
"type": "image",
"url": "https://index.docker.io/v2/library/alpine/manifests/3.19",
"filteredQueryParams": "Expires&Signature",
"username": "your_registry_username",
"password": "your_registry_password"
}
}'

The command-line log returns the preheat job id:

{
"id": 1,
"created_at": "2024-04-18T08:51:55Z",
"updated_at": "2024-04-18T08:51:55Z",
"task_id": "group_2717f455-ff0a-435f-a3a7-672828d15a2a",
"type": "preheat",
"state": "PENDING",
"args": {
"filteredQueryParams": "Expires&Signature",
"headers": null,
"password": "",
"pieceLength": 4194304,
"platform": "",
"tag": "",
"type": "image",
"url": "https://index.docker.io/v2/library/alpine/manifests/3.19",
"username": ""
},
"scheduler_clusters": [
{
"id": 1,
"created_at": "2024-04-18T08:29:15Z",
"updated_at": "2024-04-18T08:29:15Z",
"name": "cluster-1"
}
]
}

Polling the preheating status with job id:

curl --request GET 'http://127.0.0.1:8080/oapi/v1/jobs/1' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer your_personal_access_token'

If the status is SUCCESS, the preheating is successful:

{
"id": 1,
"created_at": "2024-04-18T08:51:55Z",
"updated_at": "2024-04-18T08:51:55Z",
"task_id": "group_2717f455-ff0a-435f-a3a7-672828d15a2a",
"type": "preheat",
"state": "SUCCESS",
"args": {
"filteredQueryParams": "Expires&Signature",
"headers": null,
"password": "",
"pieceLength": 4194304,
"platform": "",
"tag": "",
"type": "image",
"url": "https://index.docker.io/v2/library/alpine/manifests/3.19",
"username": ""
},
"scheduler_clusters": [
{
"id": 1,
"created_at": "2024-04-18T08:29:15Z",
"updated_at": "2024-04-18T08:29:15Z",
"name": "cluster-1"
}
]
}

Pull alpine:3.19 image in kind-worker node:

time docker exec -i kind-worker /usr/local/bin/crictl pull alpine:3.19

When pull image hits preheat cache, it takes 11.030s to download the alpine:3.19 image.