Roadmap

Explore our future plans and upcoming features

Discuss on GitHub

Products

Task types

Backlog6

Mount multiple remote GPUs from different hosts

Aggregate remote GPUs across hosts and expose them as a single logical pool to workloads.

GPU GoTensor Netfeaturearchitecture

MetaX GPU support

Vendor integration for discovery, telemetry, and isolation modes where available.

Tensor EngineTensor OSecosystem

AWS Neuron support

Support Neuron devices for scheduling, monitoring, and isolation templates where applicable.

Tensor EngineTensor OSTensor Netecosystemarchitecture

TensorNet cross-cluster GPU scheduling

Schedule and route workloads across clusters/regions with "compute to data" policies and global quotas.

Tensor Netfeaturearchitecture

GPU Go cloud sync & multi-device

Cross-device GPU resource sync for GPU Go personal/team plans.

GPU Gofeature

AI model registry & preloading

Build your own private MaaS (Model-as-a-Service) with model caching and preloading.

Tensor OSTensor Netfeature

In progress5

Ascend NPU Soft-isolation

Production-ready limiter workflow and observability for Ascend NPU oversubscription scenarios.

Tensor EngineTensor OSfeatureecosystem

2026 Q1–Q2

AMD GPU Soft-isolation

Hook-based time-sharing isolation for AMD GPUs, aligned with TensorFusion quota + scheduler.

Tensor Enginefeatureecosystem

2026 Q1

Gang scheduling

First-class support for multi-vGPU / multi-accelerator workloads requiring atomic placement.

Tensor EngineTensor Netfeaturearchitecture

Topology-aware scheduling

Place workloads with awareness of NUMA/NVLink/PCIe/IB topology to maximize performance and stability.

Tensor EngineTensor Netarchitecture

Benchmark matrix

Standard benchmark suite across vendors, isolation modes, transport (Ethernet/RDMA), and frameworks.

Tensor Engineperformance

Released33

AMD GPU remoting

Remote GPU support for AMD GPUs with TensorFusion scheduling and telemetry.

Tensor EngineTensor OSfeatureecosystem

2026-01

Hygon DCU remoting

Remote GPU path for Hygon DCU devices with unified scheduling integration.

Tensor EngineTensor OSecosystemfeature

2025-12

NPU virtualization templates

Standardized partition/isolation templates for NPUs to accelerate onboarding and operations.

Tensor OSTensor Enginearchitecturefeature

2025-12

Heterogeneous device support

Support multiple GPU/NPU vendors in the same cluster with unified scheduling.

Tensor EngineTensor OSarchitectureecosystem

2025-12

Hard isolation: spatial-division sharing

Space-sharing mode for stronger isolation guarantees (no oversubscription).

Tensor Enginearchitecture

2025-11

Partitioned scheduling (MIG-like)

Hardware-partitioned isolation scheduling for MIG and similar technologies.

Tensor Enginearchitecturefeature

2025-11

Device controller

Dedicated controller for managing accelerator lifecycle and health.

Tensor Enginearchitecture

2025-11

Soft/hard/shared isolation modes

Three isolation modes for compute percent scheduling with different trade-offs.

Tensor Enginearchitecturefeature

2025-10

Elastic rate limiter

Adaptive compute throttling with PID controller for smooth resource sharing.

Tensor Engineperformancearchitecture

2025-10

VRAM hard-isolation

Strict memory enforcement for GPU workloads requiring hard memory limits.

Tensor Enginearchitecture

2025-10

GPU workload autoscaling

Auto-scale GPU workloads based on utilization and pending demand.

Tensor EngineTensor OSfeature

2025-09

Karpenter node expansion

Auto-expand GPU nodes when pods are pending, integrated with Karpenter.

Tensor EngineTensor OSfeatureecosystem

2025-09

GPU worker preemption

Preempt lower-priority GPU workers to improve scheduling fairness.

Tensor Enginefeature

2025-09

RDMA transport support

RDMA path for low-latency/high-throughput remote GPU access and scheduling.

Tensor EngineTensor Netperformancearchitecture

2025-08

Hypervisor health probes

Healthz/readyz APIs for hypervisor liveness and readiness monitoring.

Tensor Enginearchitecture

2025-08

Large-scale benchmark & optimization

Performance optimization for high GPU count clusters based on benchmarking.

Tensor Engineperformance

2025-08

GPUNodeClaim & Karpenter integration

Cloud vendor integration and Karpenter auto-scaling for GPU nodes.

Tensor EngineTensor OSfeatureecosystem

2025-07

Progressive migration from NVIDIA operator

Migrate from existing NVIDIA operator/device-plugin setups incrementally.

Tensor Enginefeature

2025-07

Kubernetes device plugin integration

Native K8s device plugin integration in hypervisor for standard resource management.

Tensor Enginearchitecture

2025-07

Hypervisor TUI monitoring

Real-time terminal UI for monitoring workers and GPU state.

Tensor Enginefeature

2025-07

Full-fledged NVIDIA remoting

Production-grade GPU-over-IP for NVIDIA, including Windows vGPU and Remote GPU.

Tensor EngineTensor OSfeaturearchitecture

2025-06

K8s scheduler framework refactor

Refactored to Kubernetes scheduler framework for advanced scheduling policies.

Tensor Enginearchitecture

2025-06

Alertmanager integration

Integrated alerting with Prometheus Alertmanager for GPU cluster monitoring.

Tensor EngineTensor OSecosystem

2025-06

Multi-GPU requests

Allow workloads to request multiple GPUs with model filtering.

Tensor Enginefeature

2025-05

Per-GPU UUID limits

Set CUDA limits per GPU using device UUIDs or indices.

Tensor Enginefeature

2025-05

Weighted scheduling

Weighted scheduler for fair GPU resource distribution.

Tensor Enginefeature

2025-05

Canary/gray rollout for TF Pods

Gradual rollout support for TensorFusion-enabled Pods.

Tensor Enginefeature

2025-04

CUDA memory hooks (cuMemCreate)

Hook CUDA memory APIs for strict memory limit enforcement.

Tensor Enginearchitecture

2025-04

TFLOPs-based resource limiting

Limit GPU resources based on TFLOPs for fine-grained control.

Tensor Enginefeature

2025-03

Distribution controls (maxSkew)

Control workload distribution across nodes with maxSkew parameter.

Tensor Enginefeature

2025-03

GPU temperature monitoring

Monitor GPU temperature for thermal management and alerting.

Tensor Enginefeature

2025-03

GPU metrics foundation

TFLOPs/VRAM metrics pipeline across controller and engine.

Tensor Enginearchitecture

2025-01

GPU pool management

Manage GPU resources as pools with component configuration.

Tensor Enginefeature

2025-01