Create a Workload WorkloadProfile â
Step 1. Analyze Computing Resource Requirements and QoS Level â
Calculate Initial Resource Requests â
You can use TensorFusion cloud to get resource recommendations. Or you can estimate the TFlops/VRAM by following method:
VRAM:
- For each 1B parameters FP8 precision inference, need about 1GiB VRAM
- For LLM, each 1K context window indicates about 1GiB additional VRAM for each user
TFlops estimation is complex since different training and inference framework has huge different, different types of AI models also varies. One possible way is to run a basic case on a single GPU and monitor the GPU utilization, then calculate the TFlops for multiple users or larger dataset value in linear way, and then adjust that value or enable TFlops auto scaling.
Refer: Common GPU Information
Choose QoS Levels â
- low: Best for training and labs. Ensures capacity but not latency. Accumulates credits for bursts when GPUs are available. VRAM cools down quickly.
- medium: Ideal for offline tasks like embedding. Ensures capacity with bursts, preempting low QoS tasks. No latency guarantee. VRAM cools down moderately.
- high: Suited for non-latency-sensitive online tasks like inference. Ensures capacity, preempts medium QoS tasks. VRAM stays at requested levels.
- critical: For real-time, latency-critical tasks like live translation. Ensures capacity and low latency, preempts most tasks. VRAM remains at requested levels.
Step 2. Create Workload with Annotations â
Add Pod Annotations â
yaml
tensor-fusion.ai/generate-workload: 'true'
tensor-fusion.ai/gpupool: default-pool
tensor-fusion.ai/inject-container: python
tensor-fusion.ai/replicas: '1'
tensor-fusion.ai/tflops-limit: '20'
tensor-fusion.ai/tflops-request: '10'
tensor-fusion.ai/vram-limit: 4Gi
tensor-fusion.ai/vram-request: 4Gi
tensor-fusion.ai/qos: medium
tensor-fusion.ai/workload: pytorch-example
Use the WorkloadProfile â
You can also create WorkloadProfile
and refer it like this tensor-fusion.ai/workload-profile: default-profile
in Pod annotation to use advanced features.
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: <...>
spec:
template:
metadata:
labels:
tensor-fusion.ai/enabled: 'true'
annotations:
tensor-fusion.ai/workload-profile: template-for-small-model
See all configuration options Workload Configuration
Step 3. Verify the App Status â
[WIP]