Pip
This document will help you experience how to use Dragonfly with Pip. When installing large Python packages, simultaneous downloads from multiple services can saturate storage bandwidth, resulting in significantly reduced download speeds.

Dragonfly can be used to eliminate the bandwidth limit of the storage through P2P technology, thereby accelerating file downloading.

Prerequisites
| Name | Version | Document | 
|---|---|---|
| Kubernetes cluster | 1.20+ | kubernetes.io | 
| Helm | 3.8.0+ | helm.sh | 
| Python | 3.8.0+ | python.org | 
| Pip | 23.0.1+ | pypi.org | 
Prepare Kubernetes Cluster
Kind is recommended if no Kubernetes cluster is available for testing.
Create kind multi-node cluster configuration file kind-config.yaml, configuration content is as follows:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
  - role: worker
  - role: worker
Create a kind multi-node cluster using the configuration file:
kind create cluster --config kind-config.yaml
Switch the context of kubectl to kind cluster:
kubectl config use-context kind-kind
Kind loads Dragonfly image
Pull Dragonfly latest images:
docker pull dragonflyoss/scheduler:latest
docker pull dragonflyoss/manager:latest
docker pull dragonflyoss/client:latest
Kind cluster loads Dragonfly latest images:
kind load docker-image dragonflyoss/scheduler:latest
kind load docker-image dragonflyoss/manager:latest
kind load docker-image dragonflyoss/client:latest
Create Dragonfly cluster based on helm charts
Create helm charts configuration file charts-config.yaml.
Add files.pythonhosted.org/packages/.*\.(whl|tar.gz|zip), pypi.python.org/.*\.(whl|tar.gz|zip)
and pypi.org/.*\.(whl|tar.gz|zip) rules to client.config.proxy.rules.regex
to forward HTTP file downloads of Python packages to the P2P network.
manager:
  image:
    repository: dragonflyoss/manager
    tag: latest
  metrics:
    enable: true
scheduler:
  image:
    repository: dragonflyoss/scheduler
    tag: latest
  metrics:
    enable: true
seedClient:
  image:
    repository: dragonflyoss/client
    tag: latest
  metrics:
    enable: true
client:
  image:
    repository: dragonflyoss/client
    tag: latest
  hostNetwork: true
  metrics:
    enable: true
  config:
    proxy:
      server:
        port: 4001
      registryMirror:
        addr: https://index.docker.io
      rules:
        - regex: blobs/sha256.*
        - regex: files.pythonhosted.org/packages(?::\d+)?/.*\.(whl|tar.gz|zip)
          useTLS: true
        - regex: pypi.python.org(?::\d+)?/.*\.(whl|tar.gz|zip)
          useTLS: true
        - regex: pypi.org(?::\d+)?/.*\.(whl|tar.gz|zip)
          useTLS: true
Create a Dragonfly cluster using the configuration file:
$ helm repo add dragonfly https://dragonflyoss.github.io/helm-charts/
$ helm install --wait --create-namespace --namespace dragonfly-system dragonfly dragonfly/dragonfly -f charts-config.yaml
NAME: dragonfly
LAST DEPLOYED: Wed Jan 22 20:17:14 2025
NAMESPACE: dragonfly-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the manager address by running these commands:
  export MANAGER_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=manager" -o jsonpath={.items[0].metadata.name})
  export MANAGER_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $MANAGER_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
  kubectl --namespace dragonfly-system port-forward $MANAGER_POD_NAME 8080:$MANAGER_CONTAINER_PORT
  echo "Visit http://127.0.0.1:8080 to use your manager"
1. Get the scheduler address by running these commands:
  export SCHEDULER_POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,component=scheduler" -o jsonpath={.items[0].metadata.name})
  export SCHEDULER_CONTAINER_PORT=$(kubectl get pod --namespace dragonfly-system $SCHEDULER_POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
  kubectl --namespace dragonfly-system port-forward $SCHEDULER_POD_NAME 8002:$SCHEDULER_CONTAINER_PORT
  echo "Visit http://127.0.0.1:8002 to use your scheduler"
1. Configure runtime to use dragonfly:
  https://d7y.io/docs/getting-started/quick-start/kubernetes/
Check that Dragonfly is deployed successfully:
$ kubectl get po -n dragonfly-system
NAME                                READY   STATUS    RESTARTS      AGE
dragonfly-client-dfjrj              1/1     Running   0             17m
dragonfly-client-kzskj              1/1     Running   0             17m
dragonfly-manager-89df99478-gf6bm   1/1     Running   1 (17m ago)   17m
dragonfly-manager-89df99478-lh5wm   1/1     Running   0             17m
dragonfly-manager-89df99478-ljt2r   1/1     Running   0             17m
dragonfly-mysql-0                   1/1     Running   0             17m
dragonfly-redis-master-0            1/1     Running   0             17m
dragonfly-redis-replicas-0          1/1     Running   0             17m
dragonfly-redis-replicas-1          1/1     Running   0             17m
dragonfly-redis-replicas-2          1/1     Running   0             17m
dragonfly-scheduler-0               1/1     Running   0             17m
dragonfly-scheduler-1               1/1     Running   0             17m
dragonfly-scheduler-2               1/1     Running   0             17m
dragonfly-seed-client-0             1/1     Running   0             17m
dragonfly-seed-client-1             1/1     Running   0             17m
dragonfly-seed-client-2             1/1     Running   0             17m
Install Python packages using pip and distribute traffic through Draognfly
Create a configuration file for pip
Create a ~/.pip/pip.conf file and use global.proxy to forward Python package
download requests to the Dragonfly HTTP proxy so that it can distribute files via the P2P network.
The configuration content is as follows:
Notice: Replace the
proxyaddress with your actual address.
[global]
proxy = http://$NODE_IP:4001
index-url = https://pypi.org/simple/
[install]
default-timeout = 500
trusted-host = pypi.python.org
               pypi.org
               files.pythonhosted.org
Pip downloads Python package through Dragonfly
Install torch via pip:
pip install torch
The expected output is as follows:
Looking in indexes: https://pypi.org/simple/
Collecting torch
  Using cached torch-2.5.1-cp311-cp311-manylinux2014_aarch64.whl (91.9 MB)
Collecting filelock
  Using cached filelock-3.17.0-py3-none-any.whl (16 kB)
Collecting typing-extensions>=4.8.0
  Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Collecting networkx
  Downloading networkx-3.4.2-py3-none-any.whl (1.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 195.8 MB/s eta 0:00:00
Collecting jinja2
  Downloading jinja2-3.1.5-py3-none-any.whl (134 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.6/134.6 kB 38.5 MB/s eta 0:00:00
Collecting fsspec
  Downloading fsspec-2024.12.0-py3-none-any.whl (183 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 183.9/183.9 kB 37.5 MB/s eta 0:00:00
Collecting sympy==1.13.1
  Downloading sympy-1.13.1-py3-none-any.whl (6.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 245.3 MB/s eta 0:00:00
Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 170.5 MB/s eta 0:00:00
Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-3.0.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (23 kB)
Installing collected packages: mpmath, typing-extensions, sympy, networkx, MarkupSafe, fsspec, filelock, jinja2, torch
Successfully installed MarkupSafe-3.0.2 filelock-3.17.0 fsspec-2024.12.0 jinja2-3.1.5 mpmath-1.3.0 networkx-3.4.2 sympy-1.13.1 torch-2.5.1 typing-extensions-4.12.2
Verify
Execute the command:
# Find pod name.
export POD_NAME=$(kubectl get pods --namespace dragonfly-system -l "app=dragonfly,release=dragonfly,
component=client" -o=jsonpath='{.items[?(@.spec.nodeName=="kind-worker")].metadata.name}' | head -n 1 )
# Check logs.
kubectl exec -n dragonfly-system ${POD_NAME} -- sh -c 'grep "download task succeeded" /var/log/dragonfly/dfdaemon/*'
The expected output is as follows:
{
  2025-01-23T05:39:15.828741839+00:00  INFO
  download_task: dragonfly-client/src/grpc/dfdaemon_upload.rs:395: download task succeeded
  host_id="10.244.2.15-dragonfly-seed-client-2-seed"
  task_id="92d8d9e20c608edcca38790dfc76b99746e65592321923b552afe6337b275257"
  peer_id="10.244.2.15-dragonfly-seed-client-2-01f2e222-f517-47d8-98b4-18540e4c5be5-seed"
}
{
  2025-01-23T05:39:18.955058382+00:00
  INFO download_task: dragonfly-client/src/grpc/dfdaemon_upload.rs:395: download task succeeded
  host_id="10.244.2.15-dragonfly-seed-client-2-seed"
  task_id="7f684c300a629072a61536344a0369df2e049829ab8ab703e6de145cef18eb14"
  peer_id="10.244.2.15-dragonfly-seed-client-2-0f27a38b-97af-45c7-9e57-43cbc2bcd8d9-seed"
}