Compare with Run.AI â
Run.AI is a closed source proprietary GPU management platform offering pooling and fractional GPU capabilities. Its feature set resembles that of HAMi, and doesn't support GPU remote sharing or VRAM tiering. By the way, NVIDIA acquired Run.AI in December 2024.
TensorFusion open sourced most of the codes and offers end-to-end GPU virtualization, pooling solution. It's newer and vendor agnostic, design for AI Infra team, with completely different technical architecture and different feature set.
Features â
Feature | TensorFusion | Run.AI |
---|---|---|
Basic Features | ||
Fractional GPU | â | â |
GPU Pooling | â | â |
GPU Scheduling & Allocation | â | â |
Remote GPU Sharing | â | â |
Advanced Features | ||
Seamless Onboarding for Existing Workloads | â | â |
Monitoring & Alert | â | â |
GPU Resource Oversubscription | â | â |
GPU VRAM Expansion and hot/warm/cold tiering | â | â |
GPU-first Autoscaling Policies | â | â |
Support different QoS levels | đ§ | â |
Request Multiple vGPUs | đ§ | â |
GPU Node Auto Provisioning/Termination | â | â |
GPU Compaction/Bin-packing | đ§ | â |
Dynamic MIG(Multi-instance GPU) | đ | â |
IDE Extensions & Plugins | đ§ | â |
Centralized Dashboard & Control Plane | â | â |
Support AMD GPU | đ§ | â |
Support HuaweiAscend/Cambricon and other GPU/NPU | đ§ | â |
Enterprise Features | ||
GPU Live Migration | đ§ | â |
Advanced observability, CUDA Call Profiling/Tracing | đ§ | â |
AI Model Preloading | đ§ | â |
Advanced auto-scaling policies, scale to zero, rebalancing | đ§ | â |
Monetization of your GPU cluster | đ§ | â |
Notes:
- â means supported
- â means not supported
- đ§ means Working in progress
- â means unknown
- đ means not necessary any more
In summary, Run.AI is a proprietary solution, offers command line tools, user interface and APIs to manage GPU pools and GPU workloads,
Deploy & Usage â
Run.AI doesn't offer self-service onboarding, you can only "Book a Demo" and contact salesperson to get started. Run.AI also tries to wrapper higher layer proprietary Custom Resources like "InferenceWorkload". It's definitely not a seamless solution and will impact existing workloads.
TensorFusion has less dependencies and offers full-fledged control plane to operator the GPU/NPU cluster for both community and commercial users, which can self-service onboarding.
# Example Run.AI InferenceWorkload to obtain GPU resources and run deployments
apiVersion: run.ai/v2alpha1
kind: InferenceWorkload
metadata:
name: inference1
namespace: default
spec:
name:
value: inference1
gpu:
value: "0.5"
image:
value: "gcr.io/run-ai-demo/example-triton-server"
minScale:
value: 1
maxScale:
value: 2
metric:
value: concurrency #
target:
value: 80 #
ports:
items:
port1:
value:
container: 8000
TensorFusion has less dependencies and much more open than Run.AI, offers self-service onboarding. As for end users, just add annotations in PodTemplate, much simpler and more flexible.
# TensorFusion
metadata:
labels:
tensor-fusion.ai/enabled: 'true'
annotations:
tensor-fusion.ai/workload-profile: example-workload-profile // [!code highlight]
# you can override profile fields
tensor-fusion.ai/vram-limit: 4Gi // [!code highlight]
Total Cost of Ownership â
TCO of Run.AI is much higher than TensorFusion due to its:
- High pricing
- None-open source and hidden logic
- Proprietary schema definition, incompatible with existing workloads
- Limited autoscaling policies
- Vendor lock-in
In comparison, TensorFusion is vendor-neutral and open source, supports seamless onboarding and flexible autoscaling. It's free for small teams, and charges less than 4% of computing cost for medium and large teams to archive 50%+ cost saving.