Tensor Fusion Deployment for Kubernetes â
Prerequisites â
- Create a Kubernetes cluster with NVIDIA GPU nodes added
- NVIDIA Device Plugin and Container Toolkit installed, this step is optional for most cloud vendors' Kubernetes distribution, they are built-in. Otherwise, using following command to install it.
helm upgrade --install --create-namespace --namespace nvidia-device-plugin --repo https://nvidia.github.io/k8s-device-plugin/ nvdp nvidia-device-plugin
Step 1. Install TensorFusion â
Sign-up your account and then goto TensorFusion Console.
Then, copy and run the command to onboard existing Kubernetes cluster, if you wanna customize Helm Chart values, see Helm Chart Reference
Step 2. Apply the Custom Resources â
For TensorFusion cloud installation, when agent is ready, click Preview and then Deploy button to one-click apply the manifests from cloud
Step 3. Deploy and Verify TensorFusion â
When status is ready, click "Deploy an Inference App" to start a simple pytorch container to verify TensorFusion.
Here is the simple pytorch deployment with TensorFusion enabled and GPU resources specified.
# simple-pytorch.yaml
# kubectl apply -f simple-pytorch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: pytorch-example
namespace: default
labels:
app: pytorch-example
tensor-fusion.ai/enabled: 'true'
spec:
replicas: 1
selector:
matchLabels:
app: pytorch-example
template:
metadata:
labels:
app: pytorch-example
tensor-fusion.ai/enabled: 'true'
annotations:
tensor-fusion.ai/generate-workload: 'true'
tensor-fusion.ai/gpupool: shared-tensor-fusion-cluster-shared
tensor-fusion.ai/inject-container: python
tensor-fusion.ai/replicas: '1'
tensor-fusion.ai/tflops-limit: '10'
tensor-fusion.ai/tflops-request: '20'
tensor-fusion.ai/vram-limit: 4Gi
tensor-fusion.ai/vram-request: 4Gi
tensor-fusion.ai/workload: pytorch-example
spec:
containers:
- name: python
image: pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime
command:
- sh
- '-c'
- sleep 1d
restartPolicy: Always
terminationGracePeriodSeconds: 0
dnsPolicy: ClusterFirst
Then, you would see a pytorch pod and the corresponding shadow GPU worker Pod started (Don't worry, it's super lightweight). Run "kubectl exec" into the pytorch pod, you can run nvidia-smi to see the limited GPU memory and utilization.
nvidia-smi
Finally, and run python3
to start python REPL console and test a simple Google T5 model inference, the following codes should translate English "Hello" to German "Hallo" in seconds.
from transformers import pipeline
pipe = pipeline("translation_en_to_de", model="google-t5/t5-base", device="cuda:0")
pipe("Hello")
Option #2 None Cloud Installation â
When you need pure local installation and don't want to use advanced features, you can try pure local installation, but you can not use TensorFusion Console for centralized management in this mode.