Skip to main content

17 posts tagged with "nydus"

View All Tags

Dragonfly v2.2.0 has been released

· 8 min read

CNCF projects highlighted in this post, and migrated by mingcheng.

Dragonfly v2.2.0 is released! 🎉🎉🎉 Thanks the contributors who made this release happend and welcome you to visit d7y.io website.

Features

Client written in Rust

The client is written in Rust, offering advantages such as ensuring memory safety, improving performance, etc. The client is a submodule of Dragonfly, refer to dragonflyoss/client.

scheduler schema

second scheduler schema

Client supports bandwidth rate limiting for prefetching

Client now supports rate limiting for prefetch requests, which can prevent network overload and reduce competition with other active download tasks, thereby enhancing overall system performance. Refer to the documentation to configure the proxy.prefetchRateLimit option.

code

The following diagram illustrates the usage of download rate limit, upload rate limir, and prefetct rate limit for the client.

rate limit

Client supports leeching

If the user configures the client to disable sharing, it will become a leech.

code

Optimize client’s performance for handling a large number of small I/Os by Nydus

  • Add the X-Dragonfly-Prefetch HTTP header. If X-Dragonfly-Prefetch is set to true and it is a range request, the client will prefetch the entire task. This feature allows Nydus to control which requests need prefetching.
  • The client’s HTTP proxy adds an independent cache to reduce requests to the gRPC server, thereby reducing request latency.
  • Increase the memory cache size in RocksDB and enable prefix search for quickly searching piece metadata.
  • Use the CRC-32-Castagnoli algorithm with hardware acceleration to reduce the hash calculation cost for piece content.
  • Reuse the gRPC connections for downloading and optimize the download logic.

Defines the V2 of the P2P transfer protocol

Define the V2 of the P2P transfer protocol to make it more standard, clearer, and better performing, refer to dragonflyoss/api.

Enhanced Harbor Integration with P2P Preheating

Dragonfly improves its integration with Harbor v2.13 for preheating images, includes the following enhancements:

  • Support for preheating multi architecture images.
  • User can select the preheat scope for multi-granularity preheating. (Single Seed Peer, All Seed Peers, All Peers)
  • User can specify the scheduler cluster ids for preheating images to the desired Dragonfly clusters.

Refer to documentation for more details.

create P2P Provider policy

Task Manager

User can search all peers of cached task by task ID or download URL, and delete the cache on the selected peers, refer to the documentation.

dragonfly dashboard

Peer Manager

Manager will regularly synchronize peers’ information and also allows for manual refreshes. Additionally, it will display peers’ information on the Manager Console.

dragonfly dashboard

Add hostname regexes and CIDRs to cluster scopes for matching clients

When the client starts, it reports its hostname and IP to the Manager. The Manager then returns the best matching cluster (including schedulers and seed peers) to the client based on the cluster scopes configuration.

Creating a cluster on the Dragonfly dashboard

Supports distributed rate limiting for creating jobs across different clusters

User can configure rate limiting for job creation across different clusters in the Manager Console.

creating a cluster on the dragonfly dashboard

Support preheating images using self-signed certificates

Preheating requires calling the container registry to parse the image manifest and construct the URL for downloading blobs. If the container registry uses a self-signed certificate, user can configure the self-signed certificate in the Manager’s config for calling to the container registry.

code

Support mTLS for gRPC calls between services

By setting self-signed certificates in the configurations of the Manager, Scheduler, Seed Peer, and Peer, gRPC calls between services will use mTLS.

Observability

Dragonfly is recommending to use prometheus for monitoring. Prometheus and grafana configurations are maintained in the dragonflyoss/monitoring repository.

Grafana dashboards are listed below:

NameIDLinkDescription
Dragonfly Manager15945https://grafana.com/grafana/dashboards/15945Grafana dashboard for dragonfly manager.
Dragonfly Scheduler15944https://grafana.com/grafana/dashboards/15944Granafa dashboard for dragonfly scheduler.
Dragonfly Client21053https://grafana.com/grafana/dashboards/21053Grafana dashboard for dragonfly client and dragonfly seed client.
Dragonfly Seed Client21054https://grafana.com/grafana/dashboards/21054Grafana dashboard for dragonfly seed client.

dashboard

creating a cluster on the dragonfly dashboard

Nydus

Nydus v2.3.0 is released, refer to Nydus Image Service v2.3.0 for more details.

  • builder: support –parent-bootstrap for merge.
  • builder/nydusd: support batch chunks mergence.
  • nydusify/nydus-snapshotter: support OCI reference types.
  • nydusify: support export/import for remote images.
  • nydusify: support –push-chunk-size for large size image.
  • nydusd/nydus-snapshotter: support basic failover and hot upgrade.
  • nydusd: support overlay writable mount for fusedev.

Console

Console v0.2.0 is released, featuring a redesigned UI and an improved interaction flow. Additionally, more functional pages have been added, such as preheating, task manager, PATs(Personal Access Tokens) manager, etc. Refer to the documentation for more details.

cluster overview

deeper dive image into cluster-1 on dashboard

Document

Refactor the website documentation to make Dragonfly simpler and more practical for users, refer to d7y.io.

dragonfly website

Significant bug fixes

The following content only highlights the significant bug fixes in this release.

  • Fix the thread safety issue that occurs when constructing the DAG(Directed Acyclic Graph) during scheduling.
  • Fix the memory leak caused by the OpenTelemetry library.
  • Avoid hot reload when dynconfig refresh data from Manager.
  • Prevent concurrent download requests from causing failures in state machine transitions.
  • Use context.Background() to avoid stream cancel by dfdaemon.
  • Fix the database performance issue caused by clearing expired jobs when there are too many job records.
  • Reuse the gRPC connection pool to prevent redundant request construction.

AI Infrastructure

Model Spec

The Dragonfly community is collaboratively defining the OCI Model Specification. OCI Model Specification aims to provide a standard way to package, distribute and run AI models in a cloud native environment. The goal of this specification is to package models in an OCI artifact to take advantage of OCI distribution and ensure efficient model deployment, refer to CloudNativeAI/model-spec for more details.

OCI Model Specification image

node

Support accelerated distribution of AI models in Hugging Face Hub(Git LFS)

Distribute large files downloaded via the Git LFS protocol through Dragonfly P2P, refer to the documentation.

hugging face hub clusters

Maintainers

The community has added four new Maintainers, hoping to help more contributors participate in the community.

  • Han Jiang: He works for Kuaishou and will focus on the engineering work for Dragonfly.
  • Yuan Yang: He works for Alibaba Group and will focus on the engineering work for Dragonfly.

Other

You can see CHANGELOG for more details.

Triton Server accelerates distribution of models based on Dragonfly

· 10 min read

CNCF projects highlighted in this post, and migrated by mingcheng.

Project post by Yufei Chen, Miao Hao, and Min Huang, Dragonfly project

This document will help you experience how to use dragonfly with TritonServe. During the downloading of models, the file size is large and there are many services downloading the files at the same time. The bandwidth of the storage will reach the limit and the download will be slow.

Diagram flow showing nodes in Triton Server in Cluster A and Cluster B to Model Registry

Dragonfly can be used to eliminate the bandwidth limit of the storage through P2P technology, thereby accelerating file downloading.

Diagram flow showing Cluster A and Cluster B Peer to Root Peer to Model Registry

Installation

By integrating Dragonfly Repository Agent into Triton, download traffic through Dragonfly to pull models stored in S3, OSS, GCS, and ABS, and register models in Triton. The Dragonfly Repository Agent is in the dragonfly-repository-agent repository.

Prerequisites

NameVersionDocument
Kubernetes cluster1.20+kubernetes.io
Helm3.8.0+helm.sh
Triton Server23.08-py3Triton Server

Notice: Kind is recommended if no kubernetes cluster is available for testing.

Dragonfly Kubernetes Cluster Setup

For detailed installation documentation, please refer to  quick-start-kubernetes.

Prepare Kubernetes Cluster

Create kind multi-node cluster configuration file kind-config.yaml, configuration content is as follows:

kind: ClusterapiVersion: kind.x-k8s.io/v1alpha4nodes:  - role: control-plane  - role: worker  - role: worker

Create a kind multi-node cluster using the configuration file:

kind create cluster --config kind-config.yaml

Switch the context of kubectl to kind cluster:

kubectl config use-context kind-kind

Kind loads dragonfly image

Pull dragonfly latest images:

docker pull dragonflyoss/scheduler:latestdocker pull dragonflyoss/manager:latestdocker pull dragonflyoss/dfdaemon:latest

Kind cluster loads dragonfly latest images:

kind load docker-image dragonflyoss/scheduler:latestkind load docker-image dragonflyoss/manager:latestkind load docker-image dragonflyoss/dfdaemon:latest

Create dragonfly cluster based on helm charts

Create helm charts configuration file charts-config.yamland set dfdaemon.config.agents.regx to match the download path of the object storage. Example: add regx:.*models.* to match download request from object storage bucket models. Configuration content is as follows:

scheduler:  image: dragonflyoss/scheduler  tag: latest  replicas: 1  metrics:    enable: true  config:    verbose: true    pprofPort: 18066seedPeer:  image: dragonflyoss/dfdaemon  tag: latest  replicas: 1  metrics:    enable: true  config:    verbose: true    pprofPort: 18066dfdaemon:  image: dragonflyoss/dfdaemon  tag: latest  metrics:    enable: true  config:    verbose: true    pprofPort: 18066    proxy:      defaultFilter: 'Expires&Signature&ns'      security:        insecure: true        cacert: ''        cert: ''        key: ''      tcpListen:        namespace: ''        port: 65001      registryMirror:        url: https://index.docker.io        insecure: true        certs: []        direct: false      proxies:        - regx: blobs/sha256.*        # Proxy all http downlowd requests of model bucket path.        - regx: .*models.*manager:  image: dragonflyoss/manager  tag: latest  replicas: 1  metrics:    enable: true  config:    verbose: true    pprofPort: 18066jaeger:  enable: true

Create a dragonfly cluster using the configuration file:

helm repo add dragonfly https://dragonflyoss.github.io/helm-charts/
helm install --wait --create-namespace --namespace dragonfly-system dragonfly dragonfly/dragonfly -f charts-config.yaml

Example output:

LAST DEPLOYED: Wed Nov 29 21:23:48 2023
NAMESPACE: dragonfly-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the scheduler address by running these commands:
export SCHEDULER_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=scheduler" -o jsonpath={.items[0].metadata.name})
export SCHEDULER_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $SCHEDULER_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl --namespace dragonfly-system port-forward $SCHEDULER_POD_NAME 8002:$SCHEDULER_CONTAINER_PORT
echo "Visit http://127.0.0.1:8002 to use your scheduler"

2. Get the dfdaemon port by running these commands:
export DFDAEMON_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=dfdaemon" -o jsonpath={.items[0].metadata.name})
export DFDAEMON_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $DFDAEMON_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
You can use $DFDAEMON_CONTAINER_PORT as a proxy port in Node.

3. Configure runtime to use dragonfly:
https://d7y.io/docs/getting-started/quick-start/kubernetes/

4. Get Jaeger query URL by running these commands:
export JAEGER_QUERY_PORT=$(kubectl --namespace dragonfly-system get services dragonfly-jaeger-query -o jsonpath="{.spec.ports[0].port}")
kubectl --namespace dragonfly-system port-forward service/dragonfly-jaeger-query 16686:$JAEGER_QUERY_PORT
echo "Visit http://127.0.0.1:16686/search?limit=20&lookback=1h&maxDuration&minDuration&service=dragonfly to query download events"

Check that dragonfly is deployed successfully:

kubectl get pods -n dragonfly-systemNAME                                 READY   STATUS    RESTARTS       AGEdragonfly-dfdaemon-8qcpd             1/1     Running   4 (118s ago)   2m45sdragonfly-dfdaemon-qhkn8             1/1     Running   4 (108s ago)   2m45sdragonfly-jaeger-6c44dc44b9-dfjfv    1/1     Running   0              2m45sdragonfly-manager-549cd546b9-ps5tf   1/1     Running   0              2m45sdragonfly-mysql-0                    1/1     Running   0              2m45sdragonfly-redis-master-0             1/1     Running   0              2m45sdragonfly-redis-replicas-0           1/1     Running   0              2m45sdragonfly-redis-replicas-1           1/1     Running   0              2m7sdragonfly-redis-replicas-2           1/1     Running   0              101sdragonfly-scheduler-0                1/1     Running   0              2m45sdragonfly-seed-peer-0                1/1     Running   1 (52s ago)    2m45s

Expose the Proxy service port

Create the dfstore.yaml configuration file to expose the port on which the Dragonfly Peer’s HTTP proxy listens. The default port is 65001 and settargetPort to 65001.

kind: ServiceapiVersion: v1metadata:  name: dfstorespec:  selector:    app: dragonfly    component: dfdaemon    release: dragonfly  ports:    - protocol: TCP      port: 65001      targetPort: 65001  type: NodePort

Create service:

kubectl --namespace dragonfly-system apply -f dfstore.yaml

Forward request to Dragonfly Peer’s HTTP proxy:

kubectl --namespace dragonfly-system port-forward service/dfstore 65001:65001

Install Dragonfly Repository Agent

Set Dragonfly Repository Agent configuration

Create the dragonfly_config.jsonconfiguration file, the configuration is as follows:

{
"proxy": "http://127.0.0.1:65001",
"header": {},
"filter": [
"X-Amz-Algorithm",
"X-Amz-Credential",
"X-Amz-Date",
"X-Amz-Expires",
"X-Amz-SignedHeaders",
"X-Amz-Signature"
]
}
  • proxy: The address of Dragonfly Peer’s HTTP Proxy.
  • header: Adds a request header to the request.
  • filter: Used to generate unique tasks and filter unnecessary query parameters in the URL.

In the filter of the configuration, set different values when using different object storage:

TypeValue
OSS["Expires","Signature","ns"]
S3["X-Amz-Algorithm", "X-Amz-Credential", "X-Amz-Date", "X-Amz-Expires", "X-Amz-SignedHeaders", "X-Amz-Signature"]
OBS["X-Amz-Algorithm", "X-Amz-Credential", "X-Amz-Date", "X-Obs-Date", "X-Amz-Expires", "X-Amz-SignedHeaders", "X-Amz-Signature"]

Set Model Repository configuration

Create cloud_credential.json cloud storage credential, the configuration is as follows:

```json
{
"gs": {
"": "PATH_TO_GOOGLE_APPLICATION_CREDENTIALS",
"gs://gcs-bucket-002": "PATH_TO_GOOGLE_APPLICATION_CREDENTIALS_2"
},
"s3": {
"": {
"secret_key": "AWS_SECRET_ACCESS_KEY",
"key_id": "AWS_ACCESS_KEY_ID",
"region": "AWS_DEFAULT_REGION",
"session_token": "",
"profile": ""
},
"s3://s3-bucket-002": {
"secret_key": "AWS_SECRET_ACCESS_KEY_2",
"key_id": "AWS_ACCESS_KEY_ID_2",
"region": "AWS_DEFAULT_REGION_2",
"session_token": "AWS_SESSION_TOKEN_2",
"profile": "AWS_PROFILE_2"
}
},
"as": {
"": {
"account_str": "AZURE_STORAGE_ACCOUNT",
"account_key": "AZURE_STORAGE_KEY"
},
"as://Account-002/Container": {
"account_str": "",
"account_key": ""
}
}
}

In order to pull the model through Dragonfly, the model configuration file needs to be added following code in config.pbtxt file:

model_repository_agents{  agents [    {      name: "dragonfly",    }  ]}

The densenet_onnx example contains modified configuration and model file. Modified config.pbtxt such as:

name: "densenet_onnx"platform: "onnxruntime_onnx"max_batch_size : 0input [  {    name: "data_0"    data_type: TYPE_FP32    format: FORMAT_NCHW    dims: [ 3, 224, 224 ]    reshape { shape: [ 1, 3, 224, 224 ] }  }]output [  {    name: "fc6_1"    data_type: TYPE_FP32    dims: [ 1000 ]    reshape { shape: [ 1, 1000, 1, 1 ] }    label_filename: "densenet_labels.txt"  }]model_repository_agents{  agents [    {      name: "dragonfly",    }  ]}

Triton Server integrates Dragonfly Repository Agent plugin

Install Triton Server with Docker

Pull dragonflyoss/dragonfly-repository-agent image which is integrated Dragonfly Repository Agent plugin in Triton Server, refer to Dockerfile.

docker pull dragonflyoss/dragonfly-repository-agent:latest

Run the container and mount the configuration directory:

docker run --network host --rm \  -v ${path-to-config-dir}:/home/triton/ \  dragonflyoss/dragonfly-repository-agent:latest tritonserver \  --model-repository=${model-repository-path}
  • path-to-config-dir: The files path of dragonfly_config.json&cloud_credential.json.
  • model-repository-path: The path of remote model repository.

The correct output is as follows:

=============================== Triton Inference Server ===============================
successfully loaded 'densenet_onnx'
I1130 09:43:22.595672 1 server.cc:604]
+------------------+------------------------------------------------------------------------+
| Repository Agent | Path |
+------------------+------------------------------------------------------------------------+
| dragonfly | /opt/tritonserver/repoagents/dragonfly/libtritonrepoagent_dragonfly.so |
+------------------+------------------------------------------------------------------------+

I1130 09:43:22.596011 1 server.cc:631]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1130 09:43:22.596112 1 server.cc:674]
+---------------+---------+--------+
| Model | Version | Status |
+---------------+---------+--------+
| densenet_onnx | 1 | READY |
+---------------+---------+--------+

I1130 09:43:22.598318 1 metrics.cc:703] Collecting CPU metrics
I1130 09:43:22.599373 1 tritonserver.cc:2435]
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.37.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | s3://192.168.36.128:9000/models |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1130 09:43:22.610334 1 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001
I1130 09:43:22.612623 1 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
I1130 09:43:22.695843 1 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002

Execute the following command to check the Dragonfly logs:

kubectl exec -it -n dragonfly-system dragonfly-dfdaemon-<id> -- tail -f /var/log/dragonfly/daemon/core.log

Check downloaded successfully through Dragonfly:

{"level":"info","ts":"2024-02-02 05:28:02.631","caller":"peer/peertask_conductor.go:1349","msg":"peer task done, cost: 352ms","peer":"10.244.2.3-1-4398a429-d780-423a-a630-57d765f1ccfc","task":"974aaf56d4877cc65888a4736340fb1d8fecc93eadf7507f531f9fae650f1b4d","component":"PeerTask","trace":"4cca9ce80dbf5a445d321cec593aee65"}

Verify

Call inference API:

docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:23.08-py3-sdk /workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg

Check the response successful:

Request 01Image '/workspace/images/mug.jpg':    15.349563 (504) = COFFEE MUG    13.227461 (968) = CUP    10.424893 (505) = COFFEEPOT

Performance testing

Test the performance of single-machine model download by Triton API after the integration of Dragonfly P2P. Due to the influence of the network environment of the machine itself, the actual download time is not important, but The proportion of download speed in different scenarios is more meaningful:

Bar chart showing time to download large Triton API; Triton API & Dragonfly Cold Boot; Hit Dragonfly Remote Peer Cache; Hit, Dragonfly Local Peer Cache

  • Triton API: Use signed URL provided by Object Storage to download the model directly.
  • Triton API & Dragonfly Cold Boot: Use Triton Serve API to download model via Dragonfly P2P network and no cache hits.
  • Hit Remote Peer: Use Triton Serve API to download model via Dragonfly P2P network and hit the remote peer cache.
  • Hit Local Peer: Use Triton Serve API to download model via Dragonfly P2P network and hit the local peer cache.

Test results show Triton and Dragonfly integration. It can effectively reduce the file download time. Note that this test was a single-machine test, which means that in the case of cache hits, the performance limitation is on the disk. If Dragonfly is deployed on multiple machines for P2P download, the models download speed will be faster.

Resources

Dragonfly Community

NVIDIA Triton Inference Server

Dragonfly accelerates distribution of large files with Git LFS

· 11 min read

CNCF projects highlighted in this post, and migrated by mingcheng.

What is Git LFS?

Git LFS (Large File Storage) is an open-source extension for Git that enables users to handle large files more efficiently in Git repositories. Git is a version control system designed primarily for text files such as source code and it can become less efficient when dealing with large binary files like audio, videos, datasets, graphics and other large assets. These files can significantly increase the size of a repository and make cloning and fetching operations slow.

Diagram flow showing Remote to Large File Storage

Git LFS addresses this issue by storing these large files on a separate server and replacing them in the Git repository with small placeholder files (pointers). When a user clones or pulls from the repository, Git LFS fetches the large files from the LFS server as needed rather than downloading all the large files with the initial clone of the repository. For specifications, please refer to the Git LFS Specification. The server is implemented based on the HTTP protocol, refer to Git LFS API. Usually Git LFS’s content storage uses object storage to store large files.

Git LFS Usage

Git LFS manages large files

Github and GitLab usually manage large files based on Git LFS.

Git LFS manages AI models and AI datasets

Large files of models and datasets in AI are usually managed based on Git LFS. Hugging Face Hub and ModelScope Hub manage models and datasets based on Git LFS.

Hugging Face Hub’s Python Library implements Git LFS to download models and datasets. Hugging Face Hub’s Python Library distributes models and datasets to accelerate, refer to Hugging Face accelerates distribution of models and datasets based on Dragonfly.

Dragonfly eliminates the bandwidth limit of Git LFS’s content storage

This document will help you experience how to use dragonfly with Git LFS. During the downloading of large files, the file size is large and there are many services downloading the larges files at the same time. The bandwidth of the storage will reach the limit and the download will be slow.

Diagram flow showing Cluster A and Cluster B  to Large File Storage

Dragonfly can be used to eliminate the bandwidth limit of the storage through P2P technology, thereby accelerating large files downloading.

Diagram flow showing Cluster A and Cluster B  to Large File Storage using Peer and Root Peer

Dragonfly accelerates downloads with Git LFS

By proxying the HTTP protocol file download request of Git LFS to Dragonfly Peer Proxy, the file download traffic is forwarded to the P2P network. The following documentation is based on GitHub LFS.

Get the Content Storage address of Git LFS

Add GIT_CURL_VERBOSE=1 to print verbose logs of git clone and get the address of content storage of Git LFS.

GIT_CURL_VERBOSE=1 git clone git@github.com:{YOUR-USERNAME}/{YOUR-REPOSITORY}.git

Look for the trace git-lfs keyword in the logs and you can see the log of Git LFS download files. Pay attention to the content of actions and download in the log.

15:31:04.848308 trace git-lfs: HTTP: {"objects":[{"oid":"c036cbb7553a909f8b8877d4461924307f27ecb66cff928eeeafd569c3887e29","size":5242880,"actions":{"download":{"href":"https://github-cloud.githubusercontent.com/alambic/media/376919987/c0/36/c036cbb7553a909f8b8877d4461924307f27ecb66cff928eeeafd569c3887e29?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIMWPLRQEC4XCWWPA%2F20231221%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20231221T073104Z&X-Amz-Expires=3600&X-Amz-Signature=4dc757dff0ac96eac3f0cd2eb29ca887035d3a6afba41cb10200ed0aa22812fa&15:31:04.848403 trace git-lfs: HTTP: X-Amz-SignedHeaders=host&actor_id=15955374&key_id=0&repo_id=392935134&token=1","expires_at":"2023-12-21T08:31:04Z","expires_in":3600}}}]}

The download URL can be found in actions.download.href in the objects. You can find that the content storage of GitHub LFS is actually stored at github-cloud.githubusercontent.com. And query parameters include X-Amz-Algorithm, X-Amz-Credential, X-Amz-Date, X-Amz-Expires, X-Amz-Signature and X-Amz-SignedHeaders. The query parameters are AWS Authenticating Requests parameters. The keys of query parameters will be used later when configuring Dragonfly Peer Proxy.

Information about Git LFS :

  1. The content storage address of Git LFS is github-cloud.githubusercontent.com.
  2. The query parameters of the download URL include X-Amz-Algorithm, X-Amz-Credential, X-Amz-Date, X-Amz-Expires, X-Amz-Signature and X-Amz-SignedHeaders.

Installation

Prerequisites

Notice: Kind is recommended if no kubernetes cluster is available for testing.

Install dragonfly

For detailed installation documentation based on kubernetes cluster, please refer to quick-start-kubernetes.

Setup kubernetes cluster

Create kind multi-node cluster configuration file kind-config.yaml, configuration content is as follows:

kind: ClusterapiVersion: kind.x-k8s.io/v1alpha4nodes:  - role: control-plane  - role: worker    extraPortMappings:      - containerPort: 30950        hostPort: 65001  - role: worker

Create a kind multi-node cluster using the configuration file:

kind create cluster --config kind-config.yaml

Switch the context of kubectl to kind cluster:

kubectl config use-context kind-kind
Kind loads dragonfly image

Pull dragonfly latest images:

docker pull dragonflyoss/scheduler:latestdocker pull dragonflyoss/manager:latestdocker pull dragonflyoss/dfdaemon:latest

Kind cluster loads dragonfly latest images:

kind load docker-image dragonflyoss/scheduler:latestkind load docker-image dragonflyoss/manager:latestkind load docker-image dragonflyoss/dfdaemon:latest
Create dragonfly cluster based on helm charts

Create helm charts configuration file charts-config.yaml. Add the github-cloud.githubusercontent.com rule to dfdaemon.config.proxy.proxies.regx to forward the HTTP file download of content storage of Git LFS to the P2P network. And dfdaemon.config.proxy.defaultFilter adds X-Amz-Algorithm, X-Amz-Credential, X-Amz-Date, X-Amz-Expires, X-Amz-Signature and X-Amz-SignedHeaders parameters to filter the query parameters. Dargonfly generates a unique task id based on the URL, so it is necessary to filter the query parameters to generate a unique task id. Configuration content is as follows:

scheduler:
image: dragonflyoss/scheduler
tag: latest
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066

seedPeer:
image: dragonflyoss/dfdaemon
tag: latest
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066

dfdaemon:
image: dragonflyoss/dfdaemon
tag: latest
metrics:
enable: true
config:
verbose: true
pprofPort: 18066
proxy:
defaultFilter: "X-Amz-Algorithm&X-Amz-Credential&X-Amz-Date&X-Amz-Expires&X-Amz-Signature&X-Amz-SignedHeaders"
security:
insecure: true
cacert: ""
cert: ""
key: ""
tcpListen:
namespace: ""
port: 65001
registryMirror:
url: https://index.docker.io
insecure: true
certs: []
direct: false
proxies:
- regx: blobs/sha256.*
- regx: github-cloud.githubusercontent.com.*

manager:
image: dragonflyoss/manager
tag: latest
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066

jaeger:
enable: true

Create a dragonfly cluster using the configuration file:

helm repo add dragonfly https://dragonflyoss.github.io/helm-charts/
helm install --wait --create-namespace --namespace dragonfly-system dragonfly dragonfly/dragonfly -f charts-config.yaml

Output:

NAME: dragonfly
LAST DEPLOYED: Thu Dec 21 17:24:37 2023
NAMESPACE: dragonfly-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the scheduler address by running these commands:
export SCHEDULER_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=scheduler" -o jsonpath={.items[0].metadata.name})
export SCHEDULER_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $SCHEDULER_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl --namespace dragonfly-system port-forward $SCHEDULER_POD_NAME 8002:$SCHEDULER_CONTAINER_PORT
echo "Visit http://127.0.0.1:8002 to use your scheduler"

2. Get the dfdaemon port by running these commands:
export DFDAEMON_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=dfdaemon" -o jsonpath={.items[0].metadata.name})
export DFDAEMON_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $DFDAEMON_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
You can use $DFDAEMON_CONTAINER_PORT as a proxy port in Node.

3. Configure runtime to use dragonfly:
https://d7y.io/docs/getting-started/quick-start/kubernetes/

4. Get Jaeger query URL by running these commands:
export JAEGER_QUERY_PORT=$(kubectl --namespace dragonfly-system get services dragonfly-jaeger-query -o jsonpath="{.spec.ports[0].port}")
kubectl --namespace dragonfly-system port-forward service/dragonfly-jaeger-query 16686:$JAEGER_QUERY_PORT
echo "Visit http://127.0.0.1:16686/search?limit=20&lookback=1h&maxDuration&minDuration&service=dragonfly to query download events"

Check that dragonfly is deployed successfully:

kubectl get po -n dragonfly-systemNAME                                 READY   STATUS    RESTARTS       AGEdragonfly-dfdaemon-cttxz             1/1     Running   4 (116s ago)   2m51sdragonfly-dfdaemon-k62vd             1/1     Running   4 (117s ago)   2m51sdragonfly-jaeger-84dbfd5b56-mxpfs    1/1     Running   0              2m51sdragonfly-manager-5c598d5754-fd9tf   1/1     Running   0              2m51sdragonfly-mysql-0                    1/1     Running   0              2m51sdragonfly-redis-master-0             1/1     Running   0              2m51sdragonfly-redis-replicas-0           1/1     Running   0              2m51sdragonfly-redis-replicas-1           1/1     Running   0              106sdragonfly-redis-replicas-2           1/1     Running   0              78sdragonfly-scheduler-0                1/1     Running   0              2m51sdragonfly-seed-peer-0                1/1     Running   1 (37s ago)    2m51s

Create peer service configuration file peer-service-config.yaml, configuration content is as follows:

apiVersion: v1kind: Servicemetadata:  name: peer  namespace: dragonfly-systemspec:  type: NodePort  ports:    - name: http-65001      nodePort: 30950      port: 65001  selector:    app: dragonfly    component: dfdaemon    release: dragonfly

Create a peer service using the configuration file:

kubectl apply -f peer-service-config.yaml

Git LFS downlads large files via dragonfly

Proxy Git LFS download requests to Dragonfly Peer Proxy(http://127.0.0.1:65001) through Git configuration. Set Git configuration includes http.proxy, lfs.transfer.enablehrefrewrite and url.{YOUR-LFS-CONTENT-STORAGE}.insteadOf properties.

git config --global http.proxy http://127.0.0.1:65001git config --global lfs.transfer.enablehrefrewrite truegit config --global url.http://github-cloud.githubusercontent.com/.insteadOf https://github-cloud.githubusercontent.com/

Forward Git LFS download requests to the P2P network via Dragonfly Peer Proxy and Git clone the large files.

git clone git@github.com:{YOUR-USERNAME}/{YOUR-REPOSITORY}.git

Verify large files download with Dragonfly

Execute the command:

# find podskubectl -n dragonfly-system get pod -l component=dfdaemon# find logspod_name=dfdaemon-xxxxxkubectl -n dragonfly-system exec -it ${pod_name} -- grep "peer task done" /var/log/dragonfly/daemon/core.log

Example output:

2023-12-21T16:55:20.495+0800INFOpeer/peertask_conductor.go:1326peer task done, cost: 2238ms{"peer": "30.54.146.131-15874-f6729352-950e-412f-b876-0e5c8e3232b1", "task": "70c644474b6c986e3af27d742d3602469e88f8956956817f9f67082c6967dc1a", "component": "PeerTask", "trace": "35c801b7dac36eeb0ea43a58d1c82e77"}

Performance testing

Test the performance of single-machine large files download after the integration of Git LFS and Dragonfly P2P. Due to the influence of the network environment of the machine itself, the actual download time is not important, but the ratio of the increase in the download time in different scenarios is very important.

Bar chart showing time to download large files (512M and 1G) between Git LFS, Git LFS & Dragonfly Cold Boot, Hit Dragonfly Remote Peer Cache and Hit Dragonfly Local Peer Cache

  • Git LFS: Use Git LFS to download large files directly.
  • Git LFS & Dragonfly Cold Boot: Use Git LFS to download large files via Dragonfly P2P network and no cache hits.
  • Hit Dragonfly Remote Peer Cache: Use Git LFS to download large files via Dragonfly P2P network and hit the remote peer cache.
  • Hit Dragonfly Remote Local Cache: Use Git LFS to download large files via Dragonfly P2P network and hit the local peer cache.

Test results show Git LFS and Dragonfly P2P integration. It can effectively reduce the file download time. Note that this test was a single-machine test, which means that in the case of cache hits, the performance limitation is on the disk. If Dragonfly is deployed on multiple machines for P2P download, the large files download speed will be faster.

Dragonfly community

Git LFS

TorchServe accelerates the distribution of models based on Dragonfly

· 14 min read

CNCF projects highlighted in this post, and migrated by mingcheng.

This document will help you experience how to use dragonfly with TorchServe. During the downloading of models, the file size is large and there are many services downloading the files at the same time. The bandwidth of the storage will reach the limit and the download will be slow.

Diagram flow showing Model Registry flow from Cluster A and Cluster B

Dragonfly can be used to eliminate the bandwidth limit of the storage through P2P technology, thereby accelerating file downloading.

Diagram flow showing Model Registry flow from Cluster A and Cluster B

Architecture

Dragonfly Endpoint architecture

Dragonfly Endpoint plugin forwards TorchServe download model requests to the Dragonfly P2P network.

Dragonfly Endpoint architecture

The models download steps:

  1. TorchServe sends a model download request and the request is forwarded to the Dragonfly Peer.
  2. The Dragonfly Peer registers tasks with the Dragonfly Scheduler.
  3. Return the candidate parents to Dragonfly Peer.
  4. Dragonfly Peer downloads model from candidate parents.
  5. After downloading the model, TorchServe will register the model.

Installation

By integrating Dragonfly Endpoint into TorchServe, download traffic through Dragonfly to pull models stored in S3, OSS, GCS, and ABS, and register models in TorchServe. The Dragonfly Endpoint plugin is in the dragonfly-endpoint repository.

Prerequisites

NameVersionDocument
Kubernetes cluster1.20+kubernetes.io
Helm3.8.0+helm.sh
TorchServe0.4.0+pytorch.org/serve/

Notice: Kind is recommended if no kubernetes cluster is available for testing.

Dragonfly Kubernetes Cluster Setup

For detailed installation documentation, please refer to  quick-start-kubernetes.

Prepare Kubernetes Cluster

Create kind multi-node cluster configuration file kind-config.yaml, configuration content is as follows:

kind: ClusterapiVersion: kind.x-k8s.io/v1alpha4nodes:  - role: control-plane  - role: worker  - role: worker

Create a kind multi-node cluster using the configuration file:

kind create cluster --config kind-config.yaml

Switch the context of kubectl to kind cluster:

kubectl config use-context kind-kind

Kind loads dragonfly image

Pull dragonfly latest images:

docker pull dragonflyoss/scheduler:latestdocker pull dragonflyoss/manager:latestdocker pull dragonflyoss/dfdaemon:latest

Kind cluster loads dragonfly latest images:

kind load docker-image dragonflyoss/scheduler:latestkind load docker-image dragonflyoss/manager:latestkind load docker-image dragonflyoss/dfdaemon:latest

Create dragonfly cluster based on helm charts

Create helm charts configuration file charts-config.yaml and set dfdaemon.config.agents.regx to match the download path of the object storage, configuration content is as follows:

scheduler:
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066

seedPeer:
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066

dfdaemon:
hostNetwork: true
metrics:
enable: true
config:
verbose: true
pprofPort: 18066
proxy:
defaultFilter: "Expires&Signature&ns"
security:
insecure: true
cacert: ""
cert: ""
key: ""
tcpListen:
namespace: ""
port: 65001
registryMirror:
url: https://index.docker.io
insecure: true
certs: []
direct: false
proxies:
- regx: blobs/sha256.*
- regx: .*amazonaws.*

manager:
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066

jaeger:
enable: true

Create a dragonfly cluster using the configuration file:

$ helm repo add dragonfly https://dragonflyoss.github.io/helm-charts/
$ helm install --wait --create-namespace --namespace dragonfly-system dragonfly dragonfly/dragonfly -f charts-config.yaml
LAST DEPLOYED: Mon Sep 4 10:24:55 2023
NAMESPACE: dragonfly-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the scheduler address by running these commands:
export SCHEDULER_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=scheduler" -o jsonpath={.items[0].metadata.name})
export SCHEDULER_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $SCHEDULER_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl --namespace dragonfly-system port-forward $SCHEDULER_POD_NAME 8002:$SCHEDULER_CONTAINER_PORT
echo "Visit http://127.0.0.1:8002 to use your scheduler"
2. Get the dfdaemon port by running these commands:
export DFDAEMON_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=dfdaemon" -o jsonpath={.items[0].metadata.name})
export DFDAEMON_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $DFDAEMON_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
You can use $DFDAEMON_CONTAINER_PORT as a proxy port in Node.
3. Configure runtime to use dragonfly:
https://d7y.io/docs/getting-started/quick-start/kubernetes/
4. Get Jaeger query URL by running these commands:
export JAEGER_QUERY_PORT=$(kubectl --namespace dragonfly-system get services dragonfly-jaeger-query -o jsonpath="{.spec.ports[0].port}")
kubectl --namespace dragonfly-system port-forward service/dragonfly-jaeger-query 16686:$JAEGER_QUERY_PORT
echo "Visit http://127.0.0.1:16686/search?limit=20&lookback=1h&maxDuration&minDuration&service=dragonfly to query download events"

Check that dragonfly is deployed successfully:

$ kubectl get po -n dragonfly-system
NAME READY STATUS RESTARTS AGE
dragonfly-dfdaemon-7r2cn 1/1 Running 0 3m31s
dragonfly-dfdaemon-fktl4 1/1 Running 0 3m31s
dragonfly-jaeger-c7947b579-2xk44 1/1 Running 0 3m31s
dragonfly-manager-5d4f444c6c-wq8d8 1/1 Running 0 3m31s
dragonfly-mysql-0 1/1 Running 0 3m31s
dragonfly-redis-master-0 1/1 Running 0 3m31s
dragonfly-redis-replicas-0 1/1 Running 0 3m31s
dragonfly-redis-replicas-1 1/1 Running 0 3m5s
dragonfly-redis-replicas-2 1/1 Running 0 2m44s
dragonfly-scheduler-0 1/1 Running 0 3m31s
dragonfly-seed-peer-0 1/1 Running 0 3m31s

Expose the Proxy service port

Create the dfstore.yaml configuration to expose the port on which the Dragonfly Peer’s HTTP proxy listens. The default port is 65001 and settargetPort to 65001.

kind: Service
apiVersion: v1
metadata:
name: dfstore
spec:
selector:
app: dragonfly
component: dfdaemon
release: dragonfly
ports:
- protocol: TCP
port: 65001
targetPort: 65001
type: NodePort

Create service:

kubectl --namespace dragonfly-system apply -f dfstore.yaml

Forward request to Dragonfly Peer’s HTTP proxy:

kubectl --namespace dragonfly-system port-forward service/dfstore 65001:65001

Install Dragonfly Endpoint plugin

Set environment variables for Dragonfly Endpoint configuration

Create config.json configuration,and set DRAGONFLY_ENDPOINT_CONFIG environment variable for config.json file path.

export DRAGONFLY_ENDPOINT_CONFIG=/etc/dragonfly-endpoint/config.json

The default configuration path is:

  • linux: /etc/dragonfly-endpoint/config.json
  • darwin: ~/.dragonfly-endpoint/config.json

Dragonfly Endpoint configuration

Create the config.json configuration to configure the Dragonfly Endpoint for S3, the configuration is as follows:

{
"addr": "http://127.0.0.1:65001",
"header": {},
"filter": [
"X-Amz-Algorithm",
"X-Amz-Credential",
"X-Amz-Date",
"X-Amz-Expires",
"X-Amz-SignedHeaders",
"X-Amz-Signature"
],
"object_storage": {
"type": "s3",
"bucket_name": "your_s3_bucket_name",
"region": "your_s3_region",
"access_key": "your_s3_access_key",
"secret_key": "your_s3_secret_key"
}
}

In the filter of the configuration, set different values when using different object storage:

In the filter of the configuration, set different values when using different object storage:

TypeValue
OSS"Expires&Signature&ns"
S3"X-Amz-Algorithm&X-Amz-Credential&X-Amz-Date&X-Amz-Expires&X-Amz-SignedHeaders&X-Amz-Signature"
OBS"X-Amz-Algorithm&X-Amz-Credential&X-Amz-Date&X-Obs-Date&X-Amz-Expires&X-Amz-SignedHeaders&X-Amz-Signature"

Object storage configuration

In addition to S3, Dragonfly Endpoint plugin also supports OSS, GCS and ABS. Different object storage configurations are as follows:

OSS(Object Storage Service)

{
"addr": "http://127.0.0.1:65001",
"header": {},
"filter": ["Expires", "Signature"],
"object_storage": {
"type": "oss",
"bucket_name": "your_oss_bucket_name",
"endpoint": "your_oss_endpoint",
"access_key_id": "your_oss_access_key_id",
"access_key_secret": "your_oss_access_key_secret"
}
}

GCS(Google Cloud Storage)

{
"addr": "http://127.0.0.1:65001",
"header": {},
"object_storage": {
"type": "gcs",
"bucket_name": "your_gcs_bucket_name",
"project_id": "your_gcs_project_id",
"service_account_path": "your_gcs_service_account_path"
}
}

ABS(Azure Blob Storage)

{
"addr": "http://127.0.0.1:65001",
"header": {},
"object_storage": {
"type": "abs",
"account_name": "your_abs_account_name",
"account_key": "your_abs_account_key",
"container_name": "your_abs_container_name"
}
}

TorchServe integrates Dragonfly Endpoint plugin

For detailed installation documentation, please refer to TorchServe document.

Binary installation

The Prerequisites

NameVersionDocument
Python3.8.0+https://www.python.org/
TorchServe0.4.0+pytorch.org/serve/
Java11https://openjdk.org/projects/jdk/11/

Install TorchServe dependencies and torch-model-archiver:

python ./ts_scripts/install_dependencies.py
conda install torchserve torch-model-archiver torch-workflow-archiver -c pytorch

Clone TorchServe repository:

git clone https://github.com/pytorch/serve.gitcd serve

Create model-store directory to store the models:

mkdir model-storechmod 777 model-store

Create plugins-path directory to store the binaries of the plugin:

mkdir plugins-path

Package Dragonfly Endpoint plugin

Clone dragonfly-endpoint repository:

git clone https://github.com/dragonflyoss/dragonfly-endpoint.git

Build the dragonfly-endpoint project to generate file in the build/libs directory:

cd ./dragonfly-endpointgradle shadowJar

Note: Due to the limitations of TorchServe’s JVM, the best Java version for Gradle is 11, as a higher version will cause the plugin to fail to parse.

Move the Jar file into the plugins-path directory:

mv build/libs/dragonfly_endpoint-1.0-all.jar  <your plugins-path>

Prepare the plugin configuration config.json, and use S3 as the object storage:

{
"addr": "http://127.0.0.1:65001",
"header": {},
"filter": [
"X-Amz-Algorithm",
"X-Amz-Credential",
"X-Amz-Date",
"X-Amz-Expires",
"X-Amz-SignedHeaders",
"X-Amz-Signature"
],
"object_storage": {
"type": "s3",
"bucket_name": "your_s3_bucket_name",
"region": "your_s3_region",
"access_key": "your_s3_access_key",
"secret_key": "your_s3_secret_key"
}
}

Set the environment variables for the configuration:

export DRAGONFLY_ENDPOINT_CONFIG=/etc/dragonfly-endpoint/config.json

–model-storesets the previously created directory to store the models and –plugins-path sets the previously created directory to store the plugins. Start the TorchServe with Dragonfly Endpoint plugin:

torchserve --start --model-store <path-to-model-store-file> --plugins-path=<path-to-plugin-jars>

Verify

Prepare the model. Download a model from Model ZOO or package the model refer to Torch Model archiver for TorchServe. Use squeezenet1_1_scripted.mar model to verify:

wget https://torchserve.pytorch.org/mar_files/squeezenet1_1_scripted.mar

Upload the model to object storage. For detailed uploading the model to S3, please refer to S3

# Download the command line toolpip install awscli# Configure the key as promptedaws configure# Upload fileaws s3 cp < local file path > s3://< bucket name >/< Target path >

TorchServe plugin is named dragonfly, please refer to TorchServe Register API for details of plugin API. The url parameter are not supported and add the file_name parameter which is the model file name to download.

Download the model:

curl -X POST  "http://localhost:8081/dragonfly/models?file_name=squeezenet1_1.mar"

Verify the model download successful:

{"Status": "Model \"squeezenet1_1\" Version: 1.0 registered with 0 initial workers. Use scale workers API to add workers for the model."}

Added model worker for inference:

curl -v -X PUT "http://localhost:8081/models/squeezenet1_1?min_worker=1"

Check the number of worker is increased:

* About to connect() to localhost port 8081 (#0)
* Trying ::1...
* Connected to localhost (::1) port 8081 (#0)
> PUT /models/squeezenet1_1?min_worker=1 HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:8081
> Accept: */*
>
< HTTP/1.1 202 Accepted
< content-type: application/json
< x-request-id: 66761b5a-54a7-4626-9aa4-12041e0e4e63
< Pragma: no-cache
< Cache-Control: no-cache; no-store, must-revalidate, private
< Expires: Thu, 01 Jan 1970 00:00:00 UTC
< content-length: 47
< connection: keep-alive
<
{ "status": "Processing worker updates..."}
* Connection #0 to host localhost left intact

Call inference API:

# Prepare pictures that require reasoning
curl -O https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitten_small.jpg
curl -O https://raw.githubusercontent.com/pytorch/serve/master/docs/images/dogs-before.jpg

# Call inference API
curl http://localhost:8080/predictions/squeezenet1_1 -T kitten_small.jpg -T dogs-before.jpg

Check the response successful:

{
"lynx": 0.5455784201622009,
"tabby": 0.2794168293476105,
"Egyptian_cat": 0.10391931980848312,
"tiger_cat": 0.062633216381073,
"leopard": 0.005019133910536766
}

Install TorchServe with Docker

Docker configuration

Pull dragonflyoss/dragonfly-endpoint image with the plugin. The following is an example of the CPU version of TorchServe, refer to Dockerfile.

docker pull dragonflyoss/dragonfly-endpoint

Create model-store directory to store the model files:

mkdir model-storechmod 777 model-store

Prepare the plugin configuration config.json, and use S3 as the object storage:

{
"addr": "http://127.0.0.1:65001",
"header": {},
"filter": [
"X-Amz-Algorithm",
"X-Amz-Credential",
"X-Amz-Date",
"X-Amz-Expires",
"X-Amz-SignedHeaders",
"X-Amz-Signature"
],
"object_storage": {
"type": "s3",
"bucket_name": "your_s3_bucket_name",
"region": "your_s3_region",
"access_key": "your_s3_access_key",
"secret_key": "your_s3_secret_key"
}
}

Set the environment variables for the configuration:

export DRAGONFLY_ENDPOINT_CONFIG=/etc/dragonfly-endpoint/config.json

Mount the model-store and dragonfly-endpoint configuration directory. Run the container:

sudo docker run --rm -it --network host \
-v $(pwd)/model-store:/home/model-server/model-store \
-v ${DRAGONFLY_ENDPOINT_CONFIG}:${DRAGONFLY_ENDPOINT_CONFIG} \
dragonflyoss/dragonfly-endpoint:latest

How to Verify

Prepare the model. Download a model from Model ZOO or package the model refer to Torch Model archiver for TorchServe. Use squeezenet1_1_scripted.mar model to verify:

wget https://torchserve.pytorch.org/mar_files/squeezenet1_1_scripted.mar

Upload the model to object storage. For detailed uploading the model to S3, please refer to S3

# Download the command line tool
pip install awscli

# Configure the key as prompted
aws configure

# Upload file
aws s3 cp <local file path> s3://<bucket name>/<Target path>

TorchServe plugin is named dragonfly, please refer to TorchServe Register API for details of plugin API. The url parameter are not supported and add the file_name parameter which is the model file name to download.

Download a model:

curl -X POST  "http://localhost:8081/dragonfly/models?file_name=squeezenet1_1.mar"

Verify the model download successful:

{"Status": "Model \"squeezenet1_1\" Version: 1.0 registered with 0 initial workers. Use scale workers API to add workers for the model."}

Added model worker for inference:

curl -v -X PUT "http://localhost:8081/models/squeezenet1_1?min_worker=1"

Check the number of worker is increased:

* About to connect() to localhost port 8081 (#0)
* Trying ::1...
* Connected to localhost (::1) port 8081 (#0)
> PUT /models/squeezenet1_1?min_worker=1 HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:8081
> Accept: */*
>
< HTTP/1.1 202 Accepted
< content-type: application/json
< x-request-id: 66761b5a-54a7-4626-9aa4-12041e0e4e63
< Pragma: no-cache
< Cache-Control: no-cache; no-store, must-revalidate, private
< Expires: Thu, 01 Jan 1970 00:00:00 UTC
< content-length: 47
< connection: keep-alive
<
{ "status": "Processing worker updates..."}
* Connection #0 to host localhost left intact

Call inference API:

# Prepare pictures that require reasoning
curl -O https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitten_small.jpg
curl -O https://raw.githubusercontent.com/pytorch/serve/master/docs/images/dogs-before.jpg

# Call inference API
curl http://localhost:8080/predictions/squeezenet1_1 -T kitten_small.jpg -T dogs-before.jpg

Check the response successful:

{
"lynx": 0.5455784201622009,
"tabby": 0.2794168293476105,
"Egyptian_cat": 0.10391931980848312,
"tiger_cat": 0.062633216381073,
"leopard": 0.005019133910536766
}

Performance testing

Test the performance of single-machine model download by TorchServe API after the integration of Dragonfly P2P. Due to the influence of the network environment of the machine itself, the actual download time is not important, but the ratio of the increase in the download time in different scenarios is very important.

Bar chart showing TorchServe API, TouchServe API & Dragonfly Cold Boot, Hit Dragonfly Remote Peer Cache and Hit Dragonfly Local Peer Cache performance based on time to download

  • TorchServe API: Use signed URL provided by Object Storage to download the model directly.
  • TorchServe API & Dragonfly Cold Boot: Use TorchServe API to download model via Dragonfly P2P network and no cache hits.
  • Hit Remote Peer: Use TorchServe API to download model via Dragonfly P2P network and hit the remote peer cache.
  • Hit Local Peer: Use TorchServe API to download model via Dragonfly P2P network and hit the local peer cache.

Test results show TorchServe and Dragonfly integration. It can effectively reduce the file download time. Note that this test was a single-machine test, which means that in the case of cache hits, the performance limitation is on the disk. If Dragonfly is deployed on multiple machines for P2P download, the models download speed will be faster.

Dragonfly community

Pytorch

TorchServe Github Repo: https://github.com/pytorch/serve

Hugging Face accelerates distribution of models and datasets based on Dragonfly

· 10 min read

CNCF projects highlighted in this post, and migrated by mingcheng.

This document will help you experience how to use dragonfly with hugging face. During the downloading of datasets or models, the file size is large and there are many services downloading the files at the same time. The bandwidth of the storage will reach the limit and the download will be slow.

Diagram flow showing Hugging Face Hub flow from Cluster A and Cluster B

Dragonfly can be used to eliminate the bandwidth limit of the storage through P2P technology, thereby accelerating file downloading.

Diagram flow showing Hugging Face Hub flow from Cluster A and Cluster B

Prerequisites

Notice: Kind is recommended if no kubernetes cluster is available for testing.

Install dragonfly

For detailed installation documentation based on kubernetes cluster, please refer to quick-start-kubernetes.

Setup kubernetes cluster

Create kind multi-node cluster configuration file kind-config.yaml, configuration content is as follows:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
extraPortMappings:
- containerPort: 30950
hostPort: 65001
- role: worker

Create a kind multi-node cluster using the configuration file:

kind create cluster --config kind-config.yaml

Switch the context of kubectl to kind cluster:

kubectl config use-context kind-kind

Kind loads dragonfly image

Pull dragonfly latest images:

docker pull dragonflyoss/scheduler:latestdocker pull dragonflyoss/manager:latestdocker pull dragonflyoss/dfdaemon:latest

Kind cluster loads dragonfly latest images:

kind load docker-image dragonflyoss/scheduler:latestkind load docker-image dragonflyoss/manager:latestkind load docker-image dragonflyoss/dfdaemon:latest

Create dragonfly cluster based on helm charts

Create helm charts configuration file charts-config.yaml and set dfdaemon.config.proxy.registryMirror.url to the address of the Hugging Face Hub’s LFS server, configuration content is as follows:

scheduler:
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066

seedPeer:
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066

dfdaemon:
metrics:
enable: true
hostNetwork: true
config:
verbose: true
pprofPort: 18066
proxy:
defaultFilter: 'Expires&Key-Pair-Id&Policy&Signature'
security:
insecure: true
tcpListen:
listen: 0.0.0.0
port: 65001
registryMirror:
# When enable, using header "X-Dragonfly-Registry" for remote instead of url.
dynamic: true
# URL for the registry mirror.
url: https://cdn-lfs.huggingface.co
# Whether to ignore https certificate errors.
insecure: true
# Optional certificates if the remote server uses self-signed certificates.
certs: []
# Whether to request the remote registry directly.
direct: false
# Whether to use proxies to decide if dragonfly should be used.
useProxies: true
proxies:
- regx: repos.*
useHTTPS: true

manager:
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066

Create a dragonfly cluster using the configuration file:

$ helm repo add dragonfly https://dragonflyoss.github.io/helm-charts/
$ helm install --wait --create-namespace --namespace dragonfly-system dragonfly dragonfly/dragonfly -f charts-config.yaml
NAME: dragonfly
LAST DEPLOYED: Wed Oct 19 04:23:22 2022
NAMESPACE: dragonfly-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the scheduler address by running these commands:
export SCHEDULER_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=scheduler" -o jsonpath={.items[0].metadata.name})
export SCHEDULER_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $SCHEDULER_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl --namespace dragonfly-system port-forward $SCHEDULER_POD_NAME 8002:$SCHEDULER_CONTAINER_PORT
echo "Visit http://127.0.0.1:8002 to use your scheduler"
2. Get the dfdaemon port by running these commands:
export DFDAEMON_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=dfdaemon" -o jsonpath={.items[0].metadata.name})
export DFDAEMON_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $DFDAEMON_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
You can use $DFDAEMON_CONTAINER_PORT as a proxy port in Node.
3. Configure runtime to use dragonfly:
https://d7y.io/docs/getting-started/quick-start/kubernetes/

Check that dragonfly is deployed successfully:

$ kubectl get po -n dragonfly-system
NAME READY STATUS RESTARTS AGE
dragonfly-dfdaemon-rhnr6 1/1 Running 4 (101s ago) 3m27s
dragonfly-dfdaemon-s6sv5 1/1 Running 5 (111s ago) 3m27s
dragonfly-manager-67f97d7986-8dgn8 1/1 Running 0 3m27s
dragonfly-mysql-0 1/1 Running 0 3m27s
dragonfly-redis-master-0 1/1 Running 0 3m27s
dragonfly-redis-replicas-0 1/1 Running 1 (115s ago) 3m27s
dragonfly-redis-replicas-1 1/1 Running 0 95s
dragonfly-redis-replicas-2 1/1 Running 0 70s
dragonfly-scheduler-0 1/1 Running 0 3m27s
dragonfly-seed-peer-0 1/1 Running 2 (95s ago) 3m27s

Create peer service configuration file peer-service-config.yaml, configuration content is as follows:

apiVersion: v1
kind: Service
metadata:
name: peer
namespace: dragonfly-system
spec:
type: NodePort
ports:
- name: http-65001
nodePort: 30950
port: 65001
selector:
app: dragonfly
component: dfdaemon
release: dragonfly

Create a peer service using the configuration file:

kubectl apply -f peer-service-config.yaml

Use Hub Python Library to download files and distribute traffic through Draognfly

Any API in the Hub Python Library that uses Requests library for downloading files can distribute the download traffic in the P2P network by setting DragonflyAdapter to the requests Session.

Download a single file with Dragonfly

A single file can be downloaded using the hf_hub_download, distribute traffic through the Dragonfly peer.

Create hf_hub_download_dragonfly.py file. Use DragonflyAdapter to forward the file download request of the LFS protocol to Dragonfly HTTP proxy, so that it can use the P2P network to distribute file, content is as follows:

import requests
from requests.adapters import HTTPAdapter
from urllib.parse import urlparse
from huggingface_hub import hf_hub_download
from huggingface_hub import configure_http_backend

class DragonflyAdapter(HTTPAdapter):
def get_connection(self, url, proxies=None):
# Change the schema of the LFS request to download large files from https:// to http://,
# so that Dragonfly HTTP proxy can be used.
if url.startswith('https://cdn-lfs.huggingface.co'):
url = url.replace('https://', 'http://')
return super().get_connection(url, proxies)

def add_headers(self, request, kwargs):
super().add_headers(request, kwargs)
# If there are multiple different LFS repositories, you can override the
# default repository address by adding X-Dragonfly-Registry header.
if request.url.find('example.com') != -1:
request.headers["X-Dragonfly-Registry"] = 'https://example.com'

# Create a factory function that returns a new Session.
def backend_factory() -> requests.Session:
session = requests.Session()
session.mount('http://', DragonflyAdapter())
session.mount('https://', DragonflyAdapter())
session.proxies = {'http': 'http://127.0.0.1:65001'}
return session

# Set it as the default session factory
configure_http_backend(backend_factory=backend_factory)

hf_hub_download(repo_id="tiiuae/falcon-rw-1b", filename="pytorch_model.bin")

Download a single file of th LFS protocol with Dragonfly:

$ python3 hf_hub_download_dragonfly.py
(…)YkNX13a46FCg__&Key-Pair-Id=KVTP0A1DKRTAX: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.62G/2.62G [00:52<00:00, 49.8MB/s]

Verify a single file download with Dragonfly

Execute the command:

# find podskubectl -n dragonfly-system get pod -l component=dfdaemon# find logspod_name=dfdaemon-xxxxxkubectl -n dragonfly-system exec -it ${pod_name} -- grep "peer task done" /var/log/dragonfly/daemon/core.log

Example output:

peer task done, cost: 28349ms   {"peer": "89.116.64.101-77008-a95a6918-a52b-47f5-9b18-cec6ada03daf", "task": "2fe93348699e07ab67823170925f6be579a3fbc803ff3d33bf9278a60b08d901", "component": "PeerTask", "trace": "b34ed802b7afc0f4acd94b2cedf3fa2a"}

Download a snapshot of the repo with Dragonfly

A snapshot of the repo can be downloaded using the snapshot_download, distribute traffic through the Dragonfly peer.

Create snapshot_download_dragonfly.py file. Use DragonflyAdapter to forward the file download request of the LFS protocol to Dragonfly HTTP proxy, so that it can use the P2P network to distribute file. Only the files of the LFS protocol will be distributed through the Dragonfly P2P network. content is as follows:

import requests
from requests.adapters import HTTPAdapter
from urllib.parse import urlparse
from huggingface_hub import snapshot_download
from huggingface_hub import configure_http_backend

class DragonflyAdapter(HTTPAdapter):
def get_connection(self, url, proxies=None):
# Change the schema of the LFS request to download large files from https:// to http://,
# so that Dragonfly HTTP proxy can be used.
if url.startswith('https://cdn-lfs.huggingface.co'):
url = url.replace('https://', 'http://')
return super().get_connection(url, proxies)

def add_headers(self, request, kwargs):
super().add_headers(request, kwargs)
# If there are multiple different LFS repositories, you can override the
# default repository address by adding X-Dragonfly-Registry header.
if request.url.find('example.com') != -1:
request.headers["X-Dragonfly-Registry"] = 'https://example.com'

# Create a factory function that returns a new Session.
def backend_factory() -> requests.Session:
session = requests.Session()
session.mount('http://', DragonflyAdapter())
session.mount('https://', DragonflyAdapter())
session.proxies = {'http': 'http://127.0.0.1:65001'}
return session

# Set it as the default session factory
configure_http_backend(backend_factory=backend_factory)

snapshot_download(repo_id="tiiuae/falcon-rw-1b")

Download a snapshot of the repo with Dragonfly:

$ python3 snapshot_download_dragonfly.py
(…)03165eb22f0a867d4e6a64d34fce19/README.md: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.60k/7.60k [00:00<00:00, 374kB/s]
(…)7d4e6a64d34fce19/configuration_falcon.py: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.70k/6.70k [00:00<00:00, 762kB/s]
(…)f0a867d4e6a64d34fce19/modeling_falcon.py: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 56.9k/56.9k [00:00<00:00, 5.35MB/s]
(…)3165eb22f0a867d4e6a64d34fce19/merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 9.07MB/s]
(…)867d4e6a64d34fce19/tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234/234 [00:00<00:00, 106kB/s]
(…)eb22f0a867d4e6a64d34fce19/tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.11M/2.11M [00:00<00:00, 27.7MB/s]
(…)3165eb22f0a867d4e6a64d34fce19/vocab.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 798k/798k [00:00<00:00, 19.7MB/s]
(…)7d4e6a64d34fce19/special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 99.0/99.0 [00:00<00:00, 45.3kB/s]
(…)67d4e6a64d34fce19/generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115/115 [00:00<00:00, 5.02kB/s]
(…)165eb22f0a867d4e6a64d34fce19/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.05k/1.05k [00:00<00:00, 75.9kB/s]
(…)eb22f0a867d4e6a64d34fce19/.gitattributes: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.48k/1.48k [00:00<00:00, 171kB/s]
(…)t-oSSW23tawg__&Key-Pair-Id=KVTP0A1DKRTAX: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.62G/2.62G [00:50<00:00, 52.1MB/s]
Fetching 12 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:50<00:00, 4.23s/it]

Verify a snapshot of the repo download with Dragonfly

Execute the command:

# find podskubectl -n dragonfly-system get pod -l component=dfdaemon# find logspod_name=dfdaemon-xxxxxkubectl -n dragonfly-system exec -it ${pod_name} -- grep "peer task done" /var/log/dragonfly/daemon/core.log

Example output:

peer task done, cost: 28349ms   {"peer": "89.116.64.101-77008-a95a6918-a52b-47f5-9b18-cec6ada03daf", "task": "2fe93348699e07ab67823170925f6be579a3fbc803ff3d33bf9278a60b08d901", "component": "PeerTask", "trace": "b34ed802b7afc0f4acd94b2cedf3fa2a"}

Performance testing

Test the performance of single-machine file download by hf_hub_download API after the integration of Hugging Face Python Library and Dragonfly P2P. Due to the influence of the network environment of the machine itself, the actual download time is not important, but the ratio of the increase in the download time in different scenarios is very important.

Bar chart showing performance testing result

  • Hugging Face Python Library: Use hf_hub_download API to download models directly.
  • Hugging Face Python Library & Dragonfly Cold Boot: Use hf_hub_download API to download models via Dragonfly P2P network and no cache hits.
  • Hit Dragonfly Remote Peer Cache: Use hf_hub_download API to download models via Dragonfly P2P network and hit the remote peer cache.
  • Hit Dragonfly Local Peer Cache: Use hf_hub_download API to download models via Dragonfly P2P network and hit the local peer cache.
  • Hit Hugging Face Cache: Use hf_hub_download API to download models via Dragonfly P2P network and hit the Hugging Face local cache.

Test results show Hugging Face Python Library and Dragonfly P2P integration. It can effectively reduce the file download time. Note that this test was a single-machine test, which means that in the case of cache hits, the performance limitation is on the disk. If Dragonfly is deployed on multiple machines for P2P download, the models download speed will be faster.

Dragonfly community

Hugging Face

Dragonfly completes security audit!

· 3 min read

This summer, over four engineer weeks, Trail of Bits and OSTIF collaborated on a security audit of dragonfly. A CNCF Incubating Project, dragonfly functions as file distribution for peer-to-peer technologies. Included in the scope was the sub-project Nydus’s repository that works in image distribution. The engagement was outlined and framed around several goals relevant to the security and longevity of the project as it moves towards graduation.

The Trail of Bits audit team approached the audit by using static and manual testing with automated and manual processes. By introducing semgrep and CodeQL tooling, performing a manual review of client, scheduler, and manager code, and fuzz testing on the gRPC handlers, the audit team was able to identify a variety of findings for the project to improve their security. In focusing efforts on high-level business logic and externally accessible endpoints, the Trail of Bits audit team was able to direct their focus during the audit and provide guidance and recommendations for dragonfly’s future work.

Recorded in the audit report are 19 findings. Five of the findings were ranked as high, one as medium, four low, five informational, and four were considered undetermined. Nine of the findings were categorized as Data Validation, three of which were high severity. Ranked and reviewed as well was dragonfly’s Codebase Maturity, comprising eleven aspects of project code which are analyzed individually in the report.

This is a large project and could not be reviewed in total due to time constraints and scope. multiple specialized features were outside the scope of this audit for those reasons. this project is a great opportunity for continued audit work to improve and elevate code and harden security before graduation. Ongoing efforts for security is critical, as security is a moving target.

We would like to thank the Trail of Bits team, particularly Dan Guido, Jeff Braswell, Paweł Płatek, and Sam Alws for their work on this project. Thank you to the dragonfly maintainers and contributors, specifically Wenbo Qi, for their ongoing work and contributions to this engagement. Finally, we are grateful to the CNCF for funding this audit and supporting open source security efforts.

OSTIF & Trail of Bits

Dragonfly community

Nydus community

Using dragonfly to distribute images and files for multi-cluster kuberenetes

· 14 min read

Posted on September 1, 2023

CNCF projects highlighted in this post, and migrated by mingcheng.

Dragonfly provides efficient, stable, securefile distribution and image acceleration based on p2p technology to be the best practice and standard solution in cloud native architectures. It is hosted by the Cloud Native Computing Foundation(CNCF) as an Incubating Level Project.

This article introduces the deployment of dragonfly for multi-cluster kubernetes. A dragonfly cluster manages cluster within a network. If you have two clusters with disconnected networks, you can use two dragonfly clusters to manage their own clusters.

The recommended deployment for multi-cluster kubernetes is to use a dragonfly cluster to manage a kubernetes cluster, and use a centralized manager service to manage multiple dragonfly clusters. Because peer can only transmit data in its own dragonfly cluster, if a kubernetes cluster deploys a dragonfly cluster, then a kubernetes cluster forms a p2p network, and internal peers can only schedule and transmit data in a kubernetes cluster.

Screenshot showing diagram flow between Network A / Kubernetes Cluster A and Network B / Kubernetes Cluster B towards Manager

Setup kubernetes cluster

Kind is recommended if no Kubernetes cluster is available for testing.

Create kind cluster configuration file kind-config.yaml, configuration content is as follows:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
extraPortMappings:
- containerPort: 30950
hostPort: 8080
labels:
cluster: a
- role: worker
labels:
cluster: a
- role: worker
labels:
cluster: b
- role: worker
labels:
cluster: b

Create cluster using the configuration file:

kind create cluster --config kind-config.yaml

Switch the context of kubectl to kind cluster A:

kubectl config use-context kind-kind

Kind loads dragonfly image

Pull dragonfly latest images:

docker pull dragonflyoss/scheduler:latest
docker pull dragonflyoss/manager:latest
docker pull dragonflyoss/dfdaemon:latest

Kind cluster loads dragonfly latest images:

kind load docker-image dragonflyoss/scheduler:latest
kind load docker-image dragonflyoss/manager:latest
kind load docker-image dragonflyoss/dfdaemon:latest

Create dragonfly cluster A

Create dragonfly cluster A, the schedulers, seed peers, peers and centralized manager included in the cluster should be installed using helm.

Create dragonfly cluster A based on helm charts

Create dragonfly cluster A charts configuration file charts-config-cluster-a.yaml, configuration content is as follows:

containerRuntime:
containerd:
enable: true
injectConfigPath: true
registries:
- 'https://ghcr.io'
scheduler:
image: dragonflyoss/scheduler
tag: latest
nodeSelector:
cluster: a
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066
seedPeer:
image: dragonflyoss/dfdaemon
tag: latest
nodeSelector:
cluster: a
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066
dfdaemon:
image: dragonflyoss/dfdaemon
tag: latest
nodeSelector:
cluster: a
metrics:
enable: true
config:
verbose: true
pprofPort: 18066
manager:
image: dragonflyoss/manager
tag: latest
nodeSelector:
cluster: a
replicas: 1
metrics:
enable: true
config:
verbose: true
pprofPort: 18066
jaeger:
enable: true

Create dragonfly cluster A using the configuration file:

$ helm repo add dragonfly https://dragonflyoss.github.io/helm-charts/
$ helm install --wait --create-namespace --namespace cluster-a dragonfly dragonfly/dragonfly -f charts-config-cluster-a.yaml
NAME: dragonfly
LAST DEPLOYED: Mon Aug 7 22:07:02 2023
NAMESPACE: cluster-a
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the scheduler address by running these commands:
export SCHEDULER_POD_NAME=$(kubectl get pods --namespace cluster-a -l "app=dragonfly,release=dragonfly,component=scheduler" -o jsonpath={.items[0].metadata.name})
export SCHEDULER_CONTAINER_PORT=$(kubectl get pod --namespace cluster-a $SCHEDULER_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl --namespace cluster-a port-forward $SCHEDULER_POD_NAME 8002:$SCHEDULER_CONTAINER_PORT
echo "Visit http://127.0.0.1:8002 to use your scheduler"

2. Get the dfdaemon port by running these commands:
export DFDAEMON_POD_NAME=$(kubectl get pods --namespace cluster-a -l "app=dragonfly,release=dragonfly,component=dfdaemon" -o jsonpath={.items[0].metadata.name})
export DFDAEMON_CONTAINER_PORT=$(kubectl get pod --namespace cluster-a $DFDAEMON_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
You can use $DFDAEMON_CONTAINER_PORT as a proxy port in Node.

3. Configure runtime to use dragonfly:
https://d7y.io/docs/getting-started/quick-start/kubernetes/

4. Get Jaeger query URL by running these commands:
export JAEGER_QUERY_PORT=$(kubectl --namespace cluster-a get services dragonfly-jaeger-query -o jsonpath="{.spec.ports[0].port}")
kubectl --namespace cluster-a port-forward service/dragonfly-jaeger-query 16686:$JAEGER_QUERY_PORT
echo "Visit http://127.0.0.1:16686/search?limit=20&lookback=1h&maxDuration&minDuration&service=dragonfly to query download events"

Check that dragonfly cluster A is deployed successfully:

$ kubectl get po -n cluster-a
NAME READY STATUS RESTARTS AGE
dragonfly-dfdaemon-7t6wc 1/1 Running 0 3m18s
dragonfly-dfdaemon-r45bk 1/1 Running 0 3m18s
dragonfly-jaeger-84dbfd5b56-fmhh6 1/1 Running 0 3m18s
dragonfly-manager-75f4c54d6d-tr88v 1/1 Running 0 3m18s
dragonfly-mysql-0 1/1 Running 0 3m18s
dragonfly-redis-master-0 1/1 Running 0 3m18s
dragonfly-redis-replicas-0 1/1 Running 1 (2m ago) 3m18s
dragonfly-redis-replicas-1 1/1 Running 0 96s
dragonfly-redis-replicas-2 1/1 Running 0 45s
dragonfly-scheduler-0 1/1 Running 0 3m18s
dragonfly-seed-peer-0 1/1 Running 1 (37s ago) 3m18s

Create NodePort service of the manager REST service

Create the manager REST service configuration file manager-rest-svc.yaml, configuration content is as follows:

apiVersion: v1
kind: Service
metadata:
name: manager-rest
namespace: cluster-a
spec:
type: NodePort
ports:
- name: http
nodePort: 30950
port: 8080
selector:
app: dragonfly
component: manager
release: dragonfly

Create manager REST service using the configuration file:

kubectl apply -f manager-rest-svc.yaml -n cluster-a

Visit manager console

Visit address localhost:8080 to see the manager console. Sign in the console with the default root user, the username is root and password is dragonfly.

Screenshot showing Dragonfly welcome back page

Screenshot showing Dragonfly cluster page

By default, Dragonfly will automatically create dragonfly cluster A record in manager when it is installed for the first time. You can click dragonfly cluster A to view the details.

Screenshot showing Cluster-1 page on Dragonfly

Create dragonfly cluster B

Create dragonfly cluster B, you need to create a dragonfly cluster record in the manager console first, and the schedulers, seed peers and peers included in the dragonfly cluster should be installed using helm.

Create dragonfly cluster B in the manager console

Visit manager console and click the ADD CLUSTER button to add dragonfly cluster B record. Note that the IDC is set to cluster-2 to match the peer whose IDC is cluster-2.

Screenshot showing Create Cluster page on Dragonfly

Create dragonfly cluster B record successfully.

Screenshot showing Cluster page on Dragonfly

Use scopes to distinguish different dragonfly clusters

The dragonfly cluster needs to serve the scope. It wil provide scheduler services and seed peer services to peers in the scope. The scopes of the dragonfly cluster are configured when the console is created and updated. The scopes of the peer are configured in peer YAML config, the fields are host.idc, host.location and host.advertiseIP, refer to dfdaemon config.

If the peer scopes match the dragonfly cluster scopes, then the peer will use the dragonfly cluster’s scheduler and seed peer first, and if there is no matching dragonfly cluster then use the default dragonfly cluster.

Location: The dragonfly cluster needs to serve all peers in the location. When the location in the peer configuration matches the location in the dragonfly cluster, the peer will preferentially use the scheduler and the seed peer of the dragonfly cluster. It separated by “|”, for example “area|country|province|city”.

IDC: The dragonfly cluster needs to serve all peers in the IDC. When the IDC in the peer configuration matches the IDC in the dragonfly cluster, the peer will preferentially use the scheduler and the seed peer of the dragonfly cluster. IDC has higher priority than location in the scopes.

CIDRs: The dragonfly cluster needs to serve all peers in the CIDRs. The advertise IP will be reported in the peer configuration when the peer is started, and if the advertise IP is empty in the peer configuration, peer will automatically get expose IP as advertise IP. When advertise IP of the peer matches the CIDRs in dragonfly cluster, the peer will preferentially use the scheduler and the seed peer of the dragonfly cluster. CIDRs has higher priority than IDC in the scopes.

Create dragonfly cluster B based on helm charts

Create charts configuration with cluster information in the manager console.

Screenshot showing Cluster-2 page on Dragonfly

  • Scheduler.config.manager.schedulerClusterID using the Scheduler cluster ID from cluster-2 information in the manager console.
  • Scheduler.config.manager.addr is address of the manager GRPC server.
  • seedPeer.config.scheduler.manager.seedPeer.clusterID using the Seed peer cluster ID from cluster-2 information in the manager console.
  • seedPeer.config.scheduler.manager.netAddrs[0].addr is address of the manager GRPC server.
  • dfdaemon.config.host.idc using the IDC from cluster-2 information in the manager console.
  • dfdaemon.config.scheduler.manager.netAddrs[0].addr is address of the manager GRPC server.
  • externalManager.host is host of the manager GRPC server.
  • externalRedis.addrs[0] is address of the redis.

Create dragonfly cluster B charts configuration file charts-config-cluster-b.yaml, configuration content is as follows:

containerRuntime:
containerd:
enable: true
injectConfigPath: true
registries:
- 'https://ghcr.io'
scheduler:
image: dragonflyoss/scheduler
tag: latest
nodeSelector:
cluster: b
replicas: 1
config:
manager:
addr: dragonfly-manager.cluster-a.svc.cluster.local:65003
schedulerClusterID: 2
seedPeer:
image: dragonflyoss/dfdaemon
tag: latest
nodeSelector:
cluster: b
replicas: 1
config:
scheduler:
manager:
netAddrs:
- type: tcp
addr: dragonfly-manager.cluster-a.svc.cluster.local:65003
seedPeer:
enable: true
clusterID: 2
dfdaemon:
image: dragonflyoss/dfdaemon
tag: latest
nodeSelector:
cluster: b
config:
host:
idc: cluster-2
scheduler:
manager:
netAddrs:
- type: tcp
addr: dragonfly-manager.cluster-a.svc.cluster.local:65003
manager:
enable: false
externalManager:
enable: true
host: dragonfly-manager.cluster-a.svc.cluster.local
restPort: 8080
grpcPort: 65003
redis:
enable: false
externalRedis:
addrs:
- dragonfly-redis-master.cluster-a.svc.cluster.local:6379
password: dragonfly
mysql:
enable: false
jaeger:
enable: true

Create dragonfly cluster B using the configuration file:

$ helm install --wait --create-namespace --namespace cluster-b dragonfly dragonfly/dragonfly -f charts-config-cluster-b.yaml
NAME: dragonfly
LAST DEPLOYED: Mon Aug 7 22:13:51 2023
NAMESPACE: cluster-b
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the scheduler address by running these commands:
export SCHEDULER_POD_NAME=$(kubectl get pods --namespace cluster-b -l "app=dragonfly,release=dragonfly,component=scheduler" -o jsonpath={.items[0].metadata.name})
export SCHEDULER_CONTAINER_PORT=$(kubectl get pod --namespace cluster-b $SCHEDULER_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl --namespace cluster-b port-forward $SCHEDULER_POD_NAME 8002:$SCHEDULER_CONTAINER_PORT
echo "Visit http://127.0.0.1:8002 to use your scheduler"

2. Get the dfdaemon port by running these commands:
export DFDAEMON_POD_NAME=$(kubectl get pods --namespace cluster-b -l "app=dragonfly,release=dragonfly,component=dfdaemon" -o jsonpath={.items[0].metadata.name})
export DFDAEMON_CONTAINER_PORT=$(kubectl get pod --namespace cluster-b $DFDAEMON_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
You can use $DFDAEMON_CONTAINER_PORT as a proxy port in Node.

3. Configure runtime to use dragonfly:
https://d7y.io/docs/getting-started/quick-start/kubernetes/

4. Get Jaeger query URL by running these commands:
export JAEGER_QUERY_PORT=$(kubectl --namespace cluster-b get services dragonfly-jaeger-query -o jsonpath="{.spec.ports[0].port}")
kubectl --namespace cluster-b port-forward service/dragonfly-jaeger-query 16686:$JAEGER_QUERY_PORT
echo "Visit http://127.0.0.1:16686/search?limit=20&lookback=1h&maxDuration&minDuration&service=dragonfly to query download events"

Check that dragonfly cluster B is deployed successfully:

$ kubectl get po -n cluster-b
NAME READY STATUS RESTARTS AGE
dragonfly-dfdaemon-q8bsg 1/1 Running 0 67s
dragonfly-dfdaemon-tsqls 1/1 Running 0 67s
dragonfly-jaeger-84dbfd5b56-rg5dv 1/1 Running 0 67s
dragonfly-scheduler-0 1/1 Running 0 67s
dragonfly-seed-peer-0 1/1 Running 0 67s

Create dragonfly cluster B successfully.

Screenshot showing Cluster-2 page on Dragonfly

Using dragonfly to distribute images for multi-cluster kubernetes

Containerd pull image back-to-source for the first time through dragonfly in cluster A

Pull ghcr.io/dragonflyoss/dragonfly2/scheduler:v2.0.5 image in kind-worker node:

docker exec -i kind-worker /usr/local/bin/crictl pull ghcr.io/dragonflyoss/dragonfly2/scheduler:v2.0.5

Expose jaeger’s port 16686:

kubectl --namespace cluster-a port-forward service/dragonfly-jaeger-query 16686:16686

Visit the Jaeger page in http://127.0.0.1:16686/search, Search for tracing with Tags http.url="/v2/dragonflyoss/dragonfly2/scheduler/blobs/sha256:82cbeb56bf8065dfb9ff5a0c6ea212ab3a32f413a137675df59d496e68eaf399?ns=ghcr.io":

Screenshot showing Jaeger page (dragonfly-dfget)

Tracing details:

Screenshot showing dragonfly-dfget tracing details on Jaeger UI

When pull image back-to-source for the first time through dragonfly, peer uses cluster-a’s scheduler and seed peer. It takes 1.47s to download the 82cbeb56bf8065dfb9ff5a0c6ea212ab3a32f413a137675df59d496e68eaf399 layer.

Containerd pull image hits the cache of remote peer in cluster A

Pull ghcr.io/dragonflyoss/dragonfly2/scheduler:v2.0.5 image in kind-worker2 node:

docker exec -i kind-worker2 /usr/local/bin/crictl pull ghcr.io/dragonflyoss/dragonfly2/scheduler:v2.0.5

Expose jaeger’s port 16686:

kubectl --namespace cluster-a port-forward service/dragonfly-jaeger-query 16686:16686

Visit the Jaeger page in http://127.0.0.1:16686/search, Search for tracing with Tags http.url="/v2/dragonflyoss/dragonfly2/scheduler/blobs/sha256:82cbeb56bf8065dfb9ff5a0c6ea212ab3a32f413a137675df59d496e68eaf399?ns=ghcr.io":

Screenshot showing dragonfly-dfget on Jaeger UI

Tracing details:

Screenshot showing dragonfly-dfget Tracing Details on Jaeger UI

When pull image hits cache of remote peer, peer uses cluster-a’s scheduler and seed peer. It takes 37.48ms to download the 82cbeb56bf8065dfb9ff5a0c6ea212ab3a32f413a137675df59d496e68eaf399 layer.

Containerd pull image back-to-source for the first time through dragonfly in cluster B

Pull ghcr.io/dragonflyoss/dragonfly2/scheduler:v2.0.5 image in kind-worker3 node:

docker exec -i kind-worker3 /usr/local/bin/crictl pull ghcr.io/dragonflyoss/dragonfly2/scheduler:v2.0.5

Expose jaeger’s port 16686:

kubectl --namespace cluster-b port-forward service/dragonfly-jaeger-query 16686:16686

Visit the Jaeger page in http://127.0.0.1:16686/search, Search for tracing with Tags http.url=”/v2/dragonflyoss/dragonfly2/scheduler/blobs/sha256:82cbeb56bf8065dfb9ff5a0c6ea212ab3a32f413a137675df59d496e68eaf399?ns=ghcr.io”:

Screenshot showing dragonfly-dfget on Jaeger UI

Tracing details:

Screenshot showing dragonfly-dfget Tracing Details on Jaeger UI

When pull image back-to-source for the first time through dragonfly, peer uses cluster-b’s scheduler and seed peer. It takes 4.97s to download the 82cbeb56bf8065dfb9ff5a0c6ea212ab3a32f413a137675df59d496e68eaf399 layer.

Containerd pull image hits the cache of remote peer in cluster B

Pull ghcr.io/dragonflyoss/dragonfly2/scheduler:v2.0.5 image in kind-worker4 node:

docker exec -i kind-worker4 /usr/local/bin/crictl pull ghcr.io/dragonflyoss/dragonfly2/scheduler:v2.0.5

Expose jaeger’s port 16686:

kubectl --namespace cluster-b port-forward service/dragonfly-jaeger-query 16686:16686

Visit the Jaeger page in http://127.0.0.1:16686/search, Search for tracing with Tags http.url="/v2/dragonflyoss/dragonfly2/scheduler/blobs/sha256:82cbeb56bf8065dfb9ff5a0c6ea212ab3a32f413a137675df59d496e68eaf399?ns=ghcr.io":

Screenshot showing dragonfly-dfget on Jaeger UI

Tracing details:

Screenshot showing dragonfly-dfget Tracing Details on Jaeger UI

When pull image hits cache of remote peer, peer uses cluster-b’s scheduler and seed peer. It takes 14.53ms to download the 82cbeb56bf8065dfb9ff5a0c6ea212ab3a32f413a137675df59d496e68eaf399 layer.

Dragonfly Website: https://d7y.io/

Dragonfly Github Repo: https://github.com/dragonflyoss/dragonfly

Dragonfly Slack Channel: #dragonfly on CNCF Slack

Dragonfly Discussion Group: dragonfly-discuss@googlegroups.com

Dragonfly Twitter: @dragonfly_oss

Nydus Website: https://nydus.dev/

Nydus Github Repo: https://github.com/dragonflyoss/image-service

Dragonfly v2.1.0 is released!

· 6 min read

CNCF projects highlighted in this post, and migrated by mingcheng.

Dragonfly v2.1.0 is released! 🎉🎉🎉 Thanks to the Xinxin Zhao[1] for helping to refactor the console[2] and the manager provides a new console for users to operate Dragonfly. Welcome to visit d7y.io[3] website.

Announcement screenshot from Github mentioning "Dragonfly v2.1.0 is released!"

#features Features

  • Console v1.0.0[4] is released and it provides a new console for users to operate Dragonfly.
  • Add network topology feature and it can probe the network latency between peers, providing better scheduling capabilities.
  • Provides the ability to control the features of the scheduler in the manager. If the scheduler preheat feature is not in feature flags, then it will stop providing the preheating in the scheduler.
  • dfstore adds GetObjectMetadatas and CopyObject to supports using Dragonfly as the JuiceFS backend.
  • Add personal access tokens feature in the manager and personal access token contains your security credentials for the restful open api.
  • Add TLS config to manager rest server.
  • Fix dfdaemon fails to start when there is no available scheduler address.
  • Add cluster in the manager and the cluster contains a scheduler cluster and a seed peer cluster.
  • Fix object downloads failed by dfstore when dfdaemon enabled concurrent.
  • Scheduler adds database field in config and moves the redis config to database field.
  • Replace net.Dial with grpc health check in dfdaemon.
  • Fix filtering and evaluation in scheduling. Since the final length of the filter is the candidateParentLimit used, the parents after the filter is wrong.
  • Fix storage can not write records to file when bufferSize is zero.
  • Hiding sensitive information in logs, such as the token in the header.
  • Use unscoped delete when destroying the manager’s resources.
  • Add uk_scheduler index and uk_seed_peer index in the table of the database.
  • Remove security domain feature and security feature in the manager.
  • Add advertise port config to manager and scheduler.
  • Fix fsm changes state failed when register task.

#break-change Break Change

  • The M:N relationship model between the scheduler cluster and the seed peer cluster is no longer supported. In the future, a P2P cluster will be a cluster in the manager, and a cluster will only include a scheduler cluster and a seed peer cluster.

#console Console

Screenshot showing Dragonfly Console welcome back page

You can see Manager Console[5] for more details.

#ai-infrastructure AI Infrastructure

  • Triton Inference Server[6] uses Dragonfly to distribute model files, refer to #2185[7]. If there are developers who are interested in the drgaonfly repository agent[8] project, please contact gaius.qi@gmail.com.
  • TorchServer[9] uses Dragonfly to distribute model files. Developers have already participated in the dragonfly endpoint[10] project, and the feature will be released in v2.1.1.
  • Fluid[11] downloads data through Dragonfly when running based on JuiceFS[12], the feature will be released in v2.1.1.
  • Dragonfly helps Volcano Engine AIGC inference to accelerate image through p2p technology[13].
  • There have been many cases in the community, using Dragonfly to distribute data in AI scenarios based on P2P technology. In the inference stage, the concurrent download model of the inference service can effectively relieve the bandwidth pressure of the model registry through Dragonfly, and improving the download speed. Community will share topic 《Dragonfly: Intro, Updates and AI Model Distribution in the Practice of Kuaishou – Wenbo Qi, Ant Group & Zekun Liu, Kuaishou Technology》[14] with Kuaishou[15] in KubeCon + CloudNativeCon + Open Source Summit China 2023[16], please follow if interested.

#maintainers Maintainers

The community has added four new Maintainers, hoping to help more contributors participate in community.

  • Yiyang Huang[17]: He works for Volcano Engine and will focus on the engineering work for Dragonfly.
  • Manxiang Wen[18]: He works for Baidu and will focus on the engineering work for Dragonfly.
  • Mohammed Farooq[19] He works for Intel and will focus on the engineering work for Dragonfly.
  • Zhou Xu[20]: He is a PhD student at Dalian University of Technology and will focus on the intelligent scheduling algorithms.

#others Others

You can see CHANGELOG[21] for more details.

Ant Group security technology’s Nydus and Dragonfly image acceleration practices

· 24 min read

CNCF projects highlighted in this post, and migrated by mingcheng.

Introduction

ZOLOZ is a global security and risk management platform under Ant Group. Through biometric, big data analysis, and artificial intelligence technologies, ZOLOZ provides safe and convenient security and risk management solutions for users and institutions. ZOLOZ has provided security and risk management technology support for more than 70 partners in 14 countries and regions, including China, Indonesia, Malaysia, and the Philippines. It has already covered areas such as finance, insurance, securities, credit, telecommunications, and public services, and has served over 1.2 billion users.

With the explosion of Kubernetes and cloud-native, ZOLOZ applications have begun to be deployed on a large scale on public clouds using containerization. The images of ZOLOZ applications have been maintained and updated for a long time, and both the number of layers and the overall size have reached a large scale (hundreds of MBs or several GBs). In particular, the basic image size of ZOLOZ’s AI algorithm inference application is much larger than that of general application images (PyTorch/PyTorch:1.13.1-CUDA 11.6-cuDNN 8-Runtime on Docker Hub is 4.92GB, compared to CentOS:latest with only about 234MB).

For container cold start, i.e., when there is no image locally, the image needs to be downloaded from the registry before creating the container. In the production environment, container cold start often takes several minutes, and as the scale increases, the registry may be unable to download images quickly due to network congestion within the cluster. Such large images have brought many challenges to application updates and scaling. With the continuous promotion of containerization on public clouds, ZOLOZ applications mainly face three challenges:

  1. The algorithm image is large, and pushing it to the cloud image repository takes a long time. During the development process, when testing in the testing environment, developers often hope to iterate quickly and verify quickly. However, every time a branch is modified and released for verification, it takes several tens of minutes, which is very inefficient.
  2. Pulling the algorithm image takes a long time, and pulling many image files during cluster expansion can easily cause the cluster network card to be flooded and affect the normal operation of the business.
  3. The cluster machine takes a long time to start up, making it difficult to meet the needs of sudden traffic increases and elastic automatic scaling.

Although various compromise solutions have been attempted, these solutions all have their shortcomings. Now, in collaboration with multiple technical teams such as Ant Group, Alibaba Cloud, and ByteDance, a more universal solution on public clouds has been developed, which has low transformation costs and good performance, and currently appears to be an ideal solution.

Terminology

OCI: Open Container Initiative, a Linux Foundation project initiated by Docker in June 2015, aimed at designing open standards for operating system-level virtualization, most importantly Linux containers.

OCI Manifest: The product follows the OCI Image Spec.

BuildKit: A new generation Docker build tool produced by Docker that is more efficient, Dockerfile-independent, and more suitable for cloud-native applications.

Image: In this article, the image refers to OCI Manifest, including Helm Chart and other OCI Manifests.

Image Repository: A product repository implemented in accordance with OCI Distribution Spec.

ECS: A resource collection consisting of CPUs, memory, and cloud disks, with each type of resource logically corresponding to a computing hardware entity in a data center.

ACR: Alibaba Cloud’s image repository service.

ACK: Alibaba Cloud Container Service Kubernetes version provides high-performance and scalable container application management capabilities and supports full lifecycle management of enterprise-level containerized applications.

ACI: Ant Continuous Integration, is a CI/CD efficiency product under the Ant Group’s research and development efficiency umbrella, which is centered around pipelines. With intelligent automated construction, testing, and deployment, it provides a lightweight continuous delivery solution based on code flow to improve the work efficiency of team development.

Private Zone: A private DNS service based on the Virtual Private Cloud (VPC) environment. This service allows private domain names to be mapped to IP addresses in one or more custom VPCs.

P2P: Peer-to-peer technology, when a Peer in a P2P network downloads data from the server, it can also act as a server for other Peers to download after downloading the data. When a large number of nodes are downloading simultaneously, subsequent data downloads can be obtained without downloading from the server. This can reduce the pressure on the server.

Dragonfly: Dragonfly is a file distribution and image acceleration system based on P2P technology and is the standard solution and best practice in the field of image acceleration in cloud-native architecture. It is now hosted by the Cloud Native Computing Foundation (CNCF) as an incubation-level project.

Nydus: Nydus is a sub-project of Dragonfly’s image acceleration framework that provides on-demand loading of container images and supports millions of accelerated image container creations in production environments every day. It has significant advantages over OCIv1 in terms of startup performance, image space optimization, end-to-end data consistency, kernel-level support, etc.

LifseaOS: A lightweight, fast, secure, and image-atomic management container optimization operating system launched by Alibaba Cloud for container scenarios. Compared with traditional operating systems, the number of software packages is reduced by 60%, and the image size is reduced by 70%. The first-time startup time is reduced from over 1 minute to around 2 seconds. It supports image read-only and OSTree technology, versioning management of OS images, and updating software packages or fixed configurations on the operating system on an image-level basis.

Solution

1: Large image size

Reduce the size of the base image

The basic OS is changed from CentOS 7 to AnolisOS 8, and the installation of maintenance tools is streamlined. Only a list of essential tools (basic maintenance tools, runtime dependencies, log cleaning, security baselines, etc.) is installed by default, and the configuration of security hardening is simplified. The base image is reduced from 1.63GB to 300MB.

AnolisOS Repository: https://hub.docker.com/r/openanolis/anolisos/tags

Dockerfile optimization

Reduce unnecessary build resources and time through Dockerfile writing constraints, image inspection, and other means.

Dockerfile Best Practices: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/

Parallel building and build caching

AntGroup’s Build Center uses the Nydus community-optimized version of BuildKit, which supports layer-level caching. The previous artifacts are accurately referenced and cached, and for Multistage type Dockerfiles, BuildKit can achieve parallel execution between different stages.

2: Slow pushing image

Use Nydus images for block-level data deduplication

In traditional OCI images, the smallest unit that can be shared between different images is the layer in the image, and the efficiency of deduplication is very low. There may be a lot of duplicate data between layers, even if there are slight differences, they will be treated as different layers. According to the design of deletion files and hard links in OCI Image Spec, there may be files that have been deleted in the upper layer but still exist in the lower layer and are included in the image. In addition, OCI Image uses the tar+gzip format to express the layers in the image, and the tar format does not distinguish between tar archive entries order, which brings a problem that if users build the same image on different machines, they may get different images because of using different file systems, but the substantial content of several different images is completely identical, which leads to a sharp increase in the amount of uploaded and downloaded data.

Issues with OCIv1 and OCIv2 proposals: https://hackmd.io/@cyphar/ociv2-brainstorm

Nydus image files are divided into file chunks, and the metadata layer is flattened (removing intermediate layers). Each chunk is only saved once in the image, and a base image can be specified as a chunk dictionary for other Nydus images. Based on chunk-level deduplication, it provides low-cost data deduplication capabilities between different images, greatly reducing the amount of uploaded and downloaded data for the images.

nydus-image-files

As shown in the figure above, Nydus image 1 and image 2 have the same data blocks B2, C, E1, and F. Image 2 adds E2, G1, H1, and H2. If image 1 already exists in the image repository, image 2 can be built based on image 1. Only E2, G1, H1, and H2 need to be built in one layer, and only this layer needs to be uploaded to the image repository during upload. This achieves the effect of uploading and pulling only file differences, shortens the development cycle.

Directly building Nydus images

Currently, in most landing scenarios for acceleration images, the production of acceleration images is based on image conversion. The following two Nydus conversion schemes are currently in place:

i. Repository conversion

After a traditional image is built and pushed to the image repository, the conversion action of the image repository is triggered to complete the image conversion. The disadvantage of this approach is that the build and conversion are often done on different machines. After the image is built and pushed, it needs to be pulled to the conversion machine and the output needs to be pushed to the image repository, which adds a complete image circulation process and causes high latency. Also, it occupies the network resources of the image repository. Before the acceleration image conversion is complete, application deployment cannot enjoy the acceleration effect and still needs to pull the entire image.

ii. Double version building

After the traditional image is built, it is converted directly on the local build machine. To improve efficiency, the conversion of each layer can be started immediately after the construction of that layer, which can significantly reduce the delay in generating acceleration images. With this approach, conversion can begin without waiting for traditional image upload, and because it is local conversion, compared to approach 1, the cost of transfer between the conversion machine and the image repository can be saved. If the accelerated image corresponding to the base image does not exist, it will be converted; if it exists, pulling can be ignored, but inevitably, pushing always requires twice the data.

iii. Direct building

Compared with the two conversion-based schemes mentioned above, directly building Nydus acceleration images has obvious production delays. First, OCI-based image construction is significantly slower than Nydus image construction. Second, conversion is an after-the-fact behavior and there is more or less delay. Third, there is additional data transmission in both schemes. Direct building, on the other hand, has fewer steps, is faster, and saves resources.

It can be seen that the steps and data transmission volume for building acceleration images are significantly reduced. After the construction is completed, the ability of the acceleration image can be directly enjoyed and the speed of application deployment can be greatly improved.

3: Slow container startup

Nydus images load on demand

The actual usage rate of the image data is very low. For example, Cern’s paper mentions that only 6% of the content of a general image is actually used. The purpose of on-demand loading is to allow the container runtime to selectively download and extract files from the image layers in the Blob, but the OCI/Docker image specifications package all image layers into a tar or tar.gz archive. This means that even if you want to extract a single file, you still have to scan the entire Blob. If the image is compressed using gzip, it is even more difficult to extract specific files.

nydus-images-load-on-demand

The RAFS image format is an archive compression format proposed by Nydus. It separates the data (Blobs) and metadata (Bootstrap) of the container image file system, so that the original image layers only store the data part of the files. Furthermore, the files are divided into chunks according to a certain granularity, and the corresponding chunk data is stored in each layer of Blob. Using chunk granularity refines the deduplication granularity, and allows easier sharing of data between layers and images, and easier on-demand loading. The original image layers only store the data part of the files (i.e. the Blob layer in the figure).

The Blob layer stores the chunk files, which are chunks of file data. For example, a 10MB file can be sliced into 10 1MB blocks, and the offset of each chunk can be recorded in an index. When requesting part of the data from a file, the container runtime can selectively obtain the file from the image repository by combining with the HTTP Range Request supported by the OCI/Docker image repository specification, thus saving unnecessary network overhead. For more details about the Nydus image format, please refer to the Nydus Image Service project.

The metadata and chunk indexes are combined to form the Meta layer in the figure above, which is the entire filesystem structure that the container can see after all image layers are stacked. It includes the directory tree structure, file metadata, chunk information (block size and offset, as well as metadata such as file name, file type, owner, etc. for each file). With Meta, the required files can be extracted without scanning the entire archive file. In addition, the Meta layer contains a hash tree and the hash of each chunk data block, which ensures that the entire file tree can be verified at runtime, and the signature of the entire Meta layer can be checked to ensure that the runtime data can be detected even if it is tampered with.

nydus-meta-layer

Nydus uses the user-mode file system implementation FUSE to implement on-demand loading by default. The user-mode Nydus daemon process mounts the Nydus image mount point as the container RootFS directory. When the container generates a file system IO such as read(fd, count), the kernel-mode FUSE driver adds the request to the processing queue. The user-mode Nydus daemon reads and processes the request through the FUSE Device, pulls the corresponding number of Chunk data blocks from the remote Registry, and finally replies to the container through the kernel-mode FUSE. Nydus also implements a layer of local cache, where chunks that have been pulled from the remote are uncompressed and cached locally. The cache can be shared between images on a layer-by-layer basis, or at a chunk level.

nydus-uses-the-user-mode-file-system

After using Nydus for image acceleration, the startup time of different applications has made a qualitative leap, enabling applications to be launched in a very short time, meeting the requirements of rapid scaling in the cloud.

Read-only file system EROFS

When there are many files in the container image, frequent file operations generate a large number of FUSE requests, which causes frequent context switching between kernel-space and user-space, resulting in performance bottlenecks. Based on the kernel-space EROFS file system (originating from Linux 4.19), Nydus has made a series of improvements and enhancements to expand its capabilities in the image scenario. The final result is a kernel-space container image format, Nydus RAFS (Registry Acceleration File System) v6. Compared with the previous format, it has the advantages of block data alignment, more concise metadata, high scalability, and high performance. When all image data is downloaded locally, the FUSE user-space solution can cause the process that accesses the file to frequently trap to user-space, and involves memory copies between kernel-space and user-space.

Furthermore, Nydus supports the EROFS over FS-Cache scheme (Linux 5.19-rc1), where the user-space Nydusd directly writes downloaded chunks into the FS-Cache cache. When the container accesses the data, it can directly read the data through the kernel-space FS-Cache without trapping to user-space. In the container image scenario, this achieves almost lossless performance and stability, outperforming the FUSE user-space solution, and comparable to native file systems (without on-demand loading).

OCIFuse + rafsv5Fuse + rafsv6Fscache + rafsv6Fscache + rafsv6 + opt patch
e2e startup wordpress11.704s, 11.651s, 11.330s5.237s, 5.489s, 5.337s5.094s, 5.382s, 5.314s10.167s, 9.999s, 9.884s4.659s, 4.541s, 4.658s
e2e startup Hello bench java9.2186s, 8.9132s, 8.8412s2.8325s, 2.7671s, 2.7671s2.7543s, 2.8104, 2.8692s4.6904s, 4.7012s, 4.6654s2.9691s, 3.0485s, 3.0294s

Currently Nydus has supported this scheme in building, running, and kernel-space (Linux 5.19-rc1). For detailed usage, please refer to the Nydus EROFS FS-Cache user guide. If you want to learn more about the implementation details of Nydus in kernel-space, you can refer to Nydus Image Acceleration: The Evolutionary Road of the Kernel.

the-evolution-of-the-nydus-kernel

Dragonfly P2P Accelerates Image Downloading

Both the image repository service and the underlying storage have bandwidth and QPS limitations. If we rely solely on the bandwidth and QPS provided by the server, it is easy to fail to meet the demand. Therefore, P2P needs to be introduced to relieve server pressure, thereby meeting the demand for large-scale concurrent image pulling. In scenarios where large-scale image pulling is required, using Dragonfly&Nydus can save more than 90% of container startup time compared to using OCIv1.

dragonfly-p2p-accelerates-image-downloading

The shorter startup time after using Nydus is due to the lazy loading feature of the image, where only a small portion of metadata needs to be pulled for the Pod to start. In large-scale scenarios, the number of images pulled back by Dragonfly is very small. In the OCIv1 scenario, all image pulling requires a return to the source, so the peak return to the source and return to the source traffic using Dragonfly are much less than in the OCIv1 scenario. Furthermore, after using Dragonfly, as the concurrency increases, the peak return to the source and traffic do not increase significantly.

1GB random file for TEST
ConcurrencyCompletion time for OCI imageCompletion time for Nydus+Dragonfly imagePerformance improvement ratio
163s41s53%
563s51s23%
50145s65s123%

4: Slow Cluster scaling

ACR Image Repository Global Synchronization

To meet customer demand for a high-quality experience and data compliance requirements, ZOLOZ deploys in multiple global sites in the cloud. With the help of ACR image repository for cross-border synchronization acceleration, multiple regions around the world are synchronized to improve the efficiency of container image distribution. Image uploading and downloading are performed within the local data center, so even in countries with poor network conditions, deployments can be made like in local data centers, truly achieving one-click deployment of applications around the world.

Use ContainerOS for High-Speed Startup

With cloud-native, customers can rapidly expand resource scaling, and use elasticity to reduce costs. On the cloud, virtual machines need to be scaled quickly and added to the cluster. ContainerOS simplifies the OS startup process and pre-installs necessary container images for cluster management components, reducing the time spent during node startup due to image pulling, greatly improving OS startup speed, and reducing node scaling time in the ACK link. ContainerOS is optimized in the following ways:

  • ContainerOS simplifies the OS startup process to effectively reduce OS startup time. The positioning of ContainerOS is an operating system running on a cloud-based virtual machine, which does not involve too many hardware drivers. Therefore, ContainerOS modifies the necessary kernel driver modules to built-in mode. In addition, ContainerOS removes initramfs, and udev rules are greatly simplified, which significantly improves OS startup speed. For example, in the ecs.g7.large ECS instance, the first startup time of LifseaOS is about 2 seconds, while Alinux3 requires more than 1 minute.
  • ContainerOS pre-installs necessary container images for cluster management components to reduce the time spent during node startup due to image pulling. After the ECS node startup is completed, some component container images need to be pulled, which are responsible for performing some basic work in the ACK scenario. For example, the Terway component is responsible for the network, and the node must be in the ready state only when the Terway component container is ready. Therefore, since the long-tail effect of network pulling will cause great time consumption, pre-installing this component in the OS in advance can directly obtain it from the local directory, avoiding the time consumption of pulling images from the network.
  • ContainerOS also improves node elasticity performance by combining ACK control link optimization.

Finally, the end-to-end P90 time consumption from an empty ACK node pool expansion was statistically calculated, starting from the issuance of the expansion request and ending the timing when 90% of the nodes were ready, and compared with CentOS and Alinux2 Optimized-OS solutions. ContainerOS has significant performance advantages.

The overall solution

the-overall-solution

  1. By using a streamlined base image and following Dockerfile conventions, we can reduce the size of our images.
  2. We can utilize the buildkit provided by Ant Group for multistage and parallel image building, and use caching to speed up repeated builds. When directly building Nydus accelerated images, we can deduplicate by analyzing the repetition between images and only upload the different blocks to remote image repositories.
  3. By utilizing ACR’s global acceleration synchronization capability, we can distribute our images to different repositories around the world for faster pulling.
  4. We can use the Dragonfly P2P network to accelerate the on-demand pulling of Nydus image blocks.
  5. We can use the ContainerOS operating system on our nodes to improve both OS and image startup speed.

mprove-the-startup-speed-of-the-image

Time (3GB image as an example)Build ImagePush ImageSchedule NodePull Image
Before180s60s506s4m15s
After130s1s56s560ms

*Schedule Node: This refers to the time from creating an ECS instance on Alibaba Cloud to the node joining the K8s cluster and becoming Ready. Thanks to the optimization of ContainerOS, this time is reduced significantly.

Through extreme optimization of various stages in the entire R&D process, it is found that after optimization, both R&D efficiency and online stability have been qualitatively improved. Currently, the entire solution has been deployed on both Alibaba Cloud and AWS and has been running stably for three months. In the future, standard deployment environments will be provided by cloud vendors to meet the needs of more types of business scenarios.

Usage Guide

Dragonfly installation

$ helm repo add dragonfly https://dragonflyoss.github.io/helm-charts/
$ helm install --wait --timeout 10m --dependency-update --create-namespace --namespace dragonfly-system dragonfly dragonfly/dragonfly --set dfdaemon.config.download.prefetch=true,seedPeer.config.download.prefetch=true
NAME: dragonfly
LAST DEPLOYED: Fri Apr 7 10:35:12 2023
NAMESPACE: dragonfly-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the scheduler address by running these commands:
export SCHEDULER_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=scheduler" -o jsonpath={.items[0].metadata.name})
export SCHEDULER_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $SCHEDULER_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl --namespace dragonfly-system port-forward $SCHEDULER_POD_NAME 8002:$SCHEDULER_CONTAINER_PORT
echo "Visit http://127.0.0.1:8002 to use your scheduler"

2. Get the dfdaemon port by running these commands:
export DFDAEMON_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=dfdaemon" -o jsonpath={.items[0].metadata.name})
export DFDAEMON_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $DFDAEMON_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
You can use $DFDAEMON_CONTAINER_PORT as a proxy port in Node.

3. Configure runtime to use dragonfly:
https://d7y.io/docs/getting-started/quick-start/kubernetes/

For more details, please refer to: nydus

Nydus installation

$ curl -fsSL -o config-nydus.yaml https://raw.githubusercontent.com/dragonflyoss/Dragonfly2/main/test/testdata/charts/config-nydus.yaml
$ helm install --wait --timeout 10m --dependency-update --create-namespace --namespace nydus-snapshotter nydus-snapshotter dragonfly/nydus-snapshotter -f config-nydus.yaml
NAME: nydus-snapshotter
LAST DEPLOYED: Fri Apr 7 10:40:50 2023
NAMESPACE: nydus-snapshotter
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing nydus-snapshotter.
Your release is named nydus-snapshotter.
To learn more about the release, try:
$ helm status nydus-snapshotter
$ helm get all nydus-snapshotter

For more details, please refer to: https://github.com/dragonflyoss/helm-charts/blob/main/INSTALL.md

ContainerOS

ContainerOS has implemented high-speed scaling for elastic expansion scenarios in ACK cluster node pools through the following optimizations:

OS Startup Speed Improvement

LifseaOS significantly improves OS startup speed by:

  • Removing unnecessary hardware drivers for cloud scenarios
  • Modifying essential kernel driver modules to built-in mode
  • Removing initramfs
  • Simplifying udev rules

These optimizations reduce first boot time from over 1 minute (traditional OS) to approximately 2 seconds.

ACK-Specific Optimizations

ContainerOS is customized for ACK environments:

  • Pre-installed container images for cluster management components eliminate network pull time
  • ACK control link optimizations:
    • Adjusted detection frequency for critical logic
    • Modified system bottleneck thresholds under high load

How to Use ContainerOS

When creating a managed node pool for an ACK cluster in the Alibaba Cloud console:

  1. Go to the ECS instance OS configuration menu
  2. Select ContainerOS from the dropdown menu
  3. Note: The version number (e.g., "1.24.6") in the OS image name corresponds to your cluster's Kubernetes version

For optimal node scaling performance, refer to ContainerOS documentation on high-speed node scaling.

Nydus Project: https://nydus.dev/

Dragonfly Project: https://d7y.io/

Reference

[1]ZOLOZ: https://www.zoloz.com/

[2]BuildKit: https://github.com/moby/buildkit/blob/master/docs/nydus.md

[3]Paper by Cern: https://indico.cern.ch/event/567550/papers/2627182/files/6153-paper.pdf

[4]OCI: https://github.com/opencontainers/image-spec/

[5]Docker: https://github.com/moby/moby/blob/master/image/spec/v1.2.md

[6]RAFS image format: https://d7y.io/blog/2022/06/06/evolution-of-nydus/

[7]Nydus Image Service project: https://github.com/dragonflyoss/image-service

[8]FUSE: https://www.kernel.org/doc/html/latest/filesystems/fuse.html

[9]Nydus EROFS fscache user guide: https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-fscache.md

[10]The path of kernel evolution for Nydus image acceleration: https://d7y.io/blog/2022/06/06/evolution-of-nydus/[11]Alinux2 Optimized-OS: https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/containeros-overview

Volcano Engine, distributed image acceleration practice based on Dragonfly

· 11 min read

CNCF projects highlighted in this post, and migrated by mingcheng.

Terms and definitions

TermDefinition
OCIThe Open Container Initiative is a Linux Foundation project launched by Docker in June 2015 to design open standards for operating system-level virtualization (and most importantly Linux containers).
OCI ArtifactProducts that follow the OCI image spec.
ImageThe image in this article refers to OCI Artifact
Image DistributionA product distribution implemented according to the OCI distribution spec.
ECSIt is a collection of resources composed of CPU, memory, and Cloud Drive, each of which logically corresponds to the computing hardware entity of the Data center infrastructure.
CRVolcano Engine image distribution service.
VKEVolcano Engine deeply integrates the new generation of Cloud Native technology to provide high-performance Kubernetes container cluster management services with containers as the core, helping users to quickly build containerized applications.
VCIVolcano is a serverless and containerized computing service. The current VCI seamlessly integrates with the Container Service VKE to provide Kubernetes orchestration capabilities. With VCI, you can focus on building the app itself, without having to buy and manage infrastructure such as the underlying Cloud as a Service, and pay only for the resources that the container actually consumes to run. VCI also supports second startup, high concurrent creation, sandbox Container Security isolation, and more.
TOSVolcano Engine provides massive, secure, low-cost, easy-to-use, highly reliable and highly available distributed cloud storage services.
Private ZonePrivate DNS service based on a proprietary network VPC (Virtual Private Cloud) environment. This service allows private domain names to be mapped to IP addresses in one or more custom VPCs.
P2PPeer-to-peer technology, when a peer in a P2P network downloads data from the server, it can also be used as a server level for other peers to download after downloading the data. When a large number of nodes download at the same time, it can ensure that the subsequent downloaded data does not need to be downloaded from the server side. Thereby reducing the pressure on the server side.
DragonflyDragonfly is a file distribution and image acceleration system based on P2P technology, and is the standard solution and best practice in the field of image acceleration in Cloud Native architecture. Now hosted as an incubation project by the Cloud Native Computing Foundation (CNCF).
NydusNydus Acceleration Framework implements a content-addressable filesystem that can accelerate container image startup by lazy loading. It has supported the creation of millions of accelerated image containers daily, and deeply integrated with the linux kernel's erofs and fscache, enabling in-kernel support for image acceleration.

Background

Volcano Engine image repository CR uses TOS to store container images. Currently, it can meet the demand of large-scale concurrent image pulling to a certain extent. However, the final concurrency of pulling is limited by the bandwidth and QPS of TOS.

Here is a brief introduction of the two scenarios that are currently encountered for large-scale image pulling:

  1. The number of clients is increasing, and the images are getting larger. The bandwidth of TOS will eventually be insufficient.
  2. If the client uses Nydus to convert the image format, the request volume to TOS will increase by an order of magnitude. The QPS limit of TOS API makes it unable to meet the demand.

Whether it is the image repository service itself or the underlying storage, there will be bandwidth and QPS limitations in the end. If you rely solely on the bandwidth and QPS provided by the server, it is easy to be unable to meet the demand. Therefore, P2P needs to be introduced to reduce server pressure and meet the demand for large-scale concurrent image pulling.

Investigation of image distribution system based on P2P technology

There are several P2P projects in the open source community. Here is a brief introduction to these projects.

Dragonfly

Architecture

Diagram flow showing Dragonfly architecture

Manager

  • Stores dynamic configuration for consumption by seed peer cluster, scheduler cluster and dfdaemon.
  • Maintain the relationship between seed peer cluster and scheduler cluster.
  • Provide async task management features for image preheat combined with harbor.
  • Keepalive with scheduler instance and seed peer instance.
  • Filter the optimal scheduler cluster for dfdaemon.
  • Provides a visual console, which is helpful for users to manage the P2P cluster.

Scheduler

  • Based on the multi-feature intelligent scheduling system selects the optimal parent peer.
  • Build a scheduling directed acyclic graph for the P2P cluster.
  • Remove abnormal peer based on peer multi-feature evaluation results.
  • In the case of scheduling failure, notice peer back-to-source download.

Dfdaemon

  • Serve gRPC for dfget with downloading feature, and provide adaptation to different source protocols.
  • It can be used as seed peer. Turning on the Seed Peer mode can be used as a back-to-source download peer in a P2P cluster, which is the root peer for download in the entire cluster.
  • Serve proxy for container registry mirror and any other http backend.
  • Download object like via http, https and other custom protocol.

Kraken

Architecture

Diagram flow showing Kraken architecture

Agent

  • Is a peer node in a P2P network and needs to be deployed on each node
  • Implemented the docker registry interface
  • Notify the tracker of the data they own
  • Download the data of other agents (the tracker will tell the agent which agent to download this data from)

Origin

  • Responsible for reading data from storage for seeding
  • Support for different storage
  • High availability in the form of a hash ring

Tracker

  • A coordinator in a P2P network, tracking who is a peer and who is a seeder
  • Track data owned by peers
  • Provide ordered peer nodes for peers to download data
  • High availability in the form of a hash ring

Proxy

  • Implemented the docker registry interface
  • Pass the image layer to the Origin component
  • Pass the tag to the build-index component

Build-Index

  • Tag and digest mapping, when the agent downloads the corresponding tag data, it obtains the corresponding digest value from Build-Index
  • image replication between clusters
  • Save tag data in storage
  • High availability in the form of a hash ring

Dragonfly vs Kraken

DragonflyKraken
High availabilityScheduler consistent hash ring supports high availabilityTracker consistent hash ring, multiple replicas ensure high availability
Containerd supportSupportSupport
HTTPS image repositorySupportSupport
Community active levelActiveInactive
Number of usersMoreLess
MaturityHighHigh
Is it optimized for NydusYesNo
Architecture complexityMiddleMiddle

Summary

Based on the overall maturity of the project, community active level, number of users, architecture complexity, whether it is optimized for Nydus , future development trends and other factors, Dragonfly is the best choice in P2P projects.

Proposal

For Volcano Engine, the main consideration is that VKE and VCI pull images through CR.

  • The product feature of VKE is K8s deployed based on ECS, so it is very suitable to deploy dfdaemon on each node, fully utilize the bandwidth of each node, and then fully utilize the capability of P2P.
  • The product feature of VCI is that there are some virtual nodes with abundant resources at the bottom layer. The upper layer service is based on POD as the carrier, so it is impossible to deploy dfdaemon on each node like VKE, so the deployment form deploys several dfdaemon as cache, using the cache capability.
  • VKE or VCI client pulls images that have been converted by Nydus format. In this scenario, dfdaemon needs to be used as a cache, and not too many nodes should be used to avoid putting too much scheduling pressure on the Scheduler.

Based on Volcano Engine’s demand for the above products, and combined with Dragonfly’s characteristics, a deployment scheme compatible with many factors needs to be designed. The scheme for deploying Dragonfly is designed as follows.

Architecture

Diagram flow showing Volcano Engine architecture combined with Dragonfly&#39;s characteristics

  • Volcano Engine resources belong to the main account, P2P control components divided by the main account level isolation, each master account under a set of P2P control components. server level implementation of P2PManager controller, through the controller to control the control plane of all P2P cgroup parts
  • P2P control components are deployed in CR data plane VPC , through LB exposed to user cluster
  • On a VKE cluster, Dfdaemons are deployed as DaemonSets, with one Dfdaemon deployed on each node.
  • On VCI , Dfdaemon is deployed as Deployment
  • Containerd on ECS accesses Dfdaemon on this node via 127.0.0.1:65001
  • Through a controller component in the user Clustered Deployment , based on the PrivateZone function, generate <clusterid>.p2p.volces.com domain name in the user cluster, the controller will select the Dfdaemon  pod of a specific node (including VKE , VCI ) according to certain rules, and resolve to the above domain name in the form of A record.
  • ECS on Nydusd by <clusterid>.p2p.volces.com domain name access Dfdaemon
  • The image service Client and Nydusd on VCI access Dfdaemon via <clusterid>.p2p.volces.com domain name

Benchmark

Environment

Container Repository : Bandwidth 10Gbit/s

Dragonfly Scheduler: 2 Replicas,Request 1C2G,Limit 4C8G, Bandwidth 6Gbit/s

Dragonfly Manager: 2 Replicas,Request 1C2G,Limit 4C8G, Bandwidth 6Gbit/s

Dragonfly Peer : Limit 2C6G, Bandwidth 6Gbit/s, SSD

Image

Nginx(500M)

TensorFlow(3G)

Component Version

Dragonfly v2.0.8

POD Creation to Container Start

Nginx  pods concurrently consume time from creation to startup for all pods of 50, 100, 200, and 500

Bar chart showing NGinx Pod Creation to Container Start divided by 50 Pods, 100 Pods, 200 Pods and 500 Pods in OCI v1. Dragonfly, Dragonfly &amp; Nydus

TensorFlow  pods concurrently consume time from creation to startup for all pods of 50, 100, 200, 500, respectively

Bar chart showing TensorFlow Pod Creation to Container Start divided by 50 Pods, 100 Pods, 200 Pods and 500 Pods in OCI v1, Dragonfly ad Dragonfly &amp; Nydus

In large-scale image scenarios, using Dragonfly and Dragonfly & Nydus scenarios can save more than 90% of container startup time compared to OCIv1 scenarios. The shorter startup time after using Nydus is due to the lazyload feature, which only needs to pull a small part of the metadata  Pod to start.

Back-to-source Peak Bandwidth on Container Registry

Nginx Pod concurrent storage peak traffic of 50, 100, 200, and 500, respectively

Bar Chart showing impact of Nginx on Container Registry divided in 50 Pods, 100 Pods, 200 Pods and 500 Pods in OCI v1, Dragonfly

TensorFlow  Pod concurrent storage peak traffic of 50, 100, 200, 500, respectively

Bar Chart showing impact of TensorFlow on Container Registry divided in 50 Pods, 100 Pods, 200 Pods and 500 Pods in OCI v1, Dragonfly

Back-to-source Traffic on Container Registry

Nginx Pod concurrent 50, 100, 200, 500 back to the source traffic respectively

Bar Chart showing impact of Nginx on Container Registry divided in 50 Pods, 100 Pods, 200 Pods and 500 Pods in OCI v1, Dragonfly

TensorFlow  Pod concurrent 50, 100, 200, 500 back to the source traffic respectively

Bar Chart showing impact of TensorFlow on Container Registry divided in 50 Pods, 100 Pods, 200 Pods and 500 Pods in OCI v1, Dragonfly

In large-scale scenarios, using Dragonfly back to the source pulls a small number of images, and all images in OCIv1 scenarios have to be back to the source, so using Dragonfly back to the source peak and back to the source traffic is much less than OCIv1. And after using Dragonfly, as the number of concurrency increases, the peak and traffic back to the source will not increase significantly.

Reference

Volcano Engine https://www.volcengine.com/

Volcano Engine VKE https://www.volcengine.com/product/vke

Volcano Engine CR https://www.volcengine.com/product/cr

Dragonfly https://d7y.io/

Dragonfly Github Repo https://github.com/dragonflyoss/dragonfly

Nydus https://nydus.dev/

Nydus Gihtub Repo https://github.com/dragonflyoss/image-service