Overview โ
โพ๏ธ What is TensorFusion๏ผ โ
TensorFusion is a cutting-edge GPU virtualization and pooling platform designed to maximize GPU utilization, seamlessly scale AI applications, and automate AI infrastructure management.
๐ Core Features โ
- ๐ Fractional GPU with Single TFlops/MiB Precision
- ๐ Battle-tested GPU-over-IP Remote GPU Sharing
- โ๏ธ GPU-first Scheduling and Auto-scaling
- ๐ Computing Oversubscription and GPU VRAM Expansion
- ๐ซ GPU Pooling, Monitoring, Live Migration, AI Model Preloading, and more
๐ฌ Demo โ
Fractional vGPU & GPU-over-IP & Resource Allocation โ
Comprehensive AI Infra Console โ
GPU Live Migration โ
๐ Why TensorFusion? โ
TensorFusion is the one-stop solution for AI Infra teams, enabling more AI applications with fewer GPUs, core values are:
- Reduce GPU/NPU costs: Achieve 40% to 90% cost savings through GPU sharing, pooling, and oversubscription, with less than 4% performance impact
- Increase AI application elasticity: GPU-first scheduling and allocation, allowing AI applications to scale in seconds, imagining using GPUs like NFS(Network File System)!
- Reduce AI Infra management complexity: A full-fledged, automated AI infrastructure management solution.
๐ Quick Start โ
- Deploy in Kubernetes cluster
- Create new cluster in VM/BareMetal
- Learn Essential Concepts & Architecture
โ Applicable Scenarios โ
- Multi-model serving scenario. Typical scenarios include: Model as a Service(MaaS) platform; IaaS or PaaS cloud vendors offering GPU rentals; AI SaaS platforms running multiple AI models.
- Hands-on lab scenario. Create temporary lab environments with local/remote virtual GPU for developers, students, or researchers. Typical scenarios include: AI teaching experiments, AI application development, AI research, on-demand scientific computing.
๐ Inapplicable Scenarios โ
๐ง TensorFusion currently doesn't support AI models with intensive GPU communication and parameter sizes larger than a single GPU's capacity. Examples include large-scale distributed training and deploying FP8 precision LLMs with 405B or 671B parameters. We're planning to add support for these ultra-large AI models in the future.
โ๏ธ Compare with Other Solutions โ
Feature Comparison โ
TensorFusion is the only solution that can deliver the following features in one-stop AI Infra solution:
- True GPU virtualization, achieving virtual memory address, error isolation, and resource oversubscription etc.
- Zero-intrusion GPU remote sharing (GPU-over-IP), with less than 5% performance loss
- GPU memory hot/warm/cold tiering, second-level swapping between GPU memory and host memory
- Fully automated GPU/NPU pool management, monitoring, alerting, bin-packing etc.
- ๐ง Customizable QoS levels, usage measurement and AI computing monetization
- ๐ง Distributed live-migration of GPU contexts, AI model preloading
Price Comparison โ
TensorFusion community version is free for small teams, and a paid commercial version that charges below any other commercial solutions such as Run.AI, NVIDIA vGPU, and VirtAI OrionX etc.
- For users with up to 10 GPUs, TensorFusion community version is free
- For users with more than 10 GPUs, TensorFusion charges below 4% of the computing cost, while achieving more than 50% savings, far below the prices of Run.AI, NVIDIA vGPU, and VirtAI OrionX
Other Differences โ
- Open Source. TensorFusion's pooling, scheduling, and GPU partitioning core components are open source, while NVIDIA vGPU, Run.AI, and VirtAI OrionX commercial solutions are closed source.
- Lightweight. TensorFusion does not require Kubernetes DevicePlugin, does not require NVIDIA GPU Operator, while other solutions such as HAMi introduce more components, making maintenance more complex
- Unparalleled Performance. The virtualization layer, crafted in Rust and C++, is meticulously optimized for NVIDIA GPUs. Remarkably, in over 50% of benchmark tests, performance surpasses that of directly running on physical GPU.
Detailed Comparison Report โ
- TensorFusion vs. MIG/MPS/Timeslicing
- TensorFusion vs. NVIDIA vGPU
- TensorFusion vs. Run.AI
- TensorFusion vs. HAMi
- TensorFusion vs. VirtAI OrionX
๐ Reference Documentation โ
โ FAQ โ
Q: What are the success cases of TensorFusion?
Q: Is TensorFusion open source?
Yes, TensorFusion open sourced most of the codes, including the core components of pooling, scheduling, and GPU worker hypervisor, while the client stub and worker code are temporarily not open sourced, the implementation of Worker-ClientStub originates from rCUDA, but much more powerful.
Q: In what cases is TensorFusion free?
For users with up to 10 managed GPUs, TensorFusion is completely free for both commercial and non-commercial purposes, unless you need enable enterprise features, which are not important for startups and small teams. For users with more than 10 managed GPUs, please contact us to obtain commercial or educational licenses.
Q: Where is the development team of TensorFusion?
The TensorFusion product and related Github projects are developed and operated by NexusGPU PTE.LTD., headquartered in Singapore, with members distributed in the United States, China, Singapore, and possibly other countries in future.
Q: Which vendors and versions of GPUs does TensorFusion support?
TensorFusion supports all series of NVIDIA GPUs from Volta architecture and above, with NVIDIA driver versions starting from 530.x, and CUDA versions range from 11.8 to the latest. AMD GPU support is currently under planning