Skip to content

Command Line Reference ​

This document provides a comprehensive reference for all command line interfaces in TensorFusion.

Operator & Scheduler CLI ​

CLI Parameters ​

ParameterDescriptionDefault
-enable-http2Enables HTTP/2 for the metrics and webhook servers-
-health-probe-bind-addressThe address the probe endpoint binds to:8081
-kubeconfigPath to a kubeconfig file (only required if out-of-cluster)-
-leader-electEnable leader election for controller manager to ensure only one active instance-
-metrics-bind-addressThe address the metrics endpoint binds to0 (disabled)
-metrics-secureServe metrics endpoint securely via HTTPS (use --metrics-secure=false for HTTP)-
-zap-develUse development mode for loggingtrue
-zap-encoderZap log encoding format (json or console)-
-zap-log-levelVerbosity level for logging (debug, info, error, or any integer value > 0)-
-zap-stacktrace-levelLevel at which stacktraces are captured (info, error, panic)-
-zap-time-encodingTime encoding format (epoch, millis, nano, iso8601, rfc3339, rfc3339nano)epoch

Environment Variables ​

VariableDescriptionExample
INITIAL_GPU_NODE_LABEL_SELECTORInitial label selector for GPU nodesnvidia.com/gpu.present=true
ENABLE_WEBHOOKSEnable webhook functionalitytrue
OPERATOR_NAMESPACENamespace for the operatortensor-fusion-sys
KUBECONFIGPath to kubeconfig file<kubeconfig>

Hypervisor CLI ​

CLI Parameters ​

ParameterDescriptionDefault
--sock_pathWorker unix socket path/tensor-fusion/worker/sock
--gpu_metrics_fileGPU metrics file location/logs/metrics.log
--schedulerScheduling policy for multiple processes on single GPU node (when GPU load is high)Options: FIFO for simple first-in-first-out, MLFQ for multi-level feedback queue

Node Discovery CLI ​

CLI Parameters ​

ParameterDescriptionExample
--hostnameCustom hostname for binding current node with GPUNode custom resource<hostname>
--gpu-info-configPath to the GPU info configuration fileSee example below

GPU Info Config Example ​

yaml
- model: RTX5090
  fullModelName: "NVIDIA GeForce RTX 5090"
  vendor: NVIDIA
  costPerHour: 0.65
  fp16TFlops: 419

Environment Variables ​

VariableDescriptionExample
HOSTNAMENode hostname<hostname>
KUBECONFIGPath to kubeconfig file<kubeconfig>
NODE_DISCOVERY_REPORT_GPU_NODEGPU node custom resource name<gpu-node-custom-resource-name>

Worker CLI ​

CLI Parameters ​

ParameterDescriptionDefault/Notes
-nNetwork protocolCurrently only native (native TCP communication)
-pWorker portRandom value assigned by TensorFusion Operator-Scheduler
-sUnix socket path folderShould be /tensor-fusion/worker/sock/ in Kubernetes

Environment Variables ​

VariableDescriptionValue
TF_ENABLE_LOGEnable logging1

GPU Client Stub ​

The GPU Client Stub consists of two libraries that use LD_PRELOAD to run before every process started inside the container or server:

  • libadd_path.so: Adds additional library paths for AI application environments (e.g., hooked NVML)
  • libcuda.so: Hooks into CUDA runtime

Example configuration in worker template:

yaml
env:
- name: LD_PRELOAD
  value: /tensor-fusion/libadd_path.so:/tensor-fusion/libcuda.so

Environment Variables ​

VariableDescriptionValue/Notes
TF_PATHAppended to PATH environment variable/tensor-fusion
TF_LD_PRELOADAppended to LD_PRELOADVaries
TF_LD_LIBRARY_PATHAppended to LD_LIBRARY_PATH/tensor-fusion
TF_ENABLE_LOGDisable/Enable logging, default to disabled0