Skip to content

GPUPool ​

Kubernetes Resource Information ​

FieldValue
API Versiontensor-fusion.ai/v1
KindGPUPool
ScopeCluster

Table of Contents ​

Spec ​

GPUPoolSpec defines the desired state of GPUPool.

Property
Type
Constraints
Description
capacityConfig ↓object
componentConfig ↓objectCustomize system components for seamless onboarding.
nodeManagerConfig ↓object
qosConfig ↓objectDefine different QoS and their price.
schedulingConfigTemplatestring

capacityConfig ​

Properties ​

Property
Type
Constraints
Description
maxResources ↓object
minResources ↓object
oversubscription ↓object
warmResources ↓object

maxResources ​

Properties ​

Property
Type
Constraints
Description
cpuanypattern: RegexCPU/Memory is only available when CloudVendor connection is enabled
memoryanypattern: Regex
tflopsanypattern: Regex
vramanypattern: Regex

minResources ​

Properties ​

Property
Type
Constraints
Description
cpuanypattern: RegexCPU/Memory is only available when CloudVendor connection is enabled
memoryanypattern: Regex
tflopsanypattern: Regex
vramanypattern: Regex

oversubscription ​

Properties ​

Property
Type
Constraints
Description
tflopsOversellRatiointeger<int32>min: 100 max: 100000The multi of TFlops to oversell, default to 500%, indicates 5 times oversell Default: 500
vramExpandToHostDiskinteger<int32>min: 0 max: 100the percentage of Host Disk appending to GPU VRAM, default to 70% Default: 70
vramExpandToHostMeminteger<int32>min: 0 max: 100the percentage of Host RAM appending to GPU VRAM, default to 50% Default: 50

warmResources ​

Properties ​

Property
Type
Constraints
Description
cpuanypattern: RegexCPU/Memory is only available when CloudVendor connection is enabled
memoryanypattern: Regex
tflopsanypattern: Regex
vramanypattern: Regex

componentConfig ​

Customize system components for seamless onboarding.

Properties ​

Property
Type
Constraints
Description
client ↓object
hypervisor ↓object
nodeDiscovery ↓object
worker ↓object

client ​

Properties ​

Property
Type
Constraints
Description
embeddedModeImagestring
operatorEndpointstring
patchEmbeddedWorkerToPodobject
patchToContainerobject
patchToEmbeddedWorkerContainerobject
patchToPodobject
remoteModeImagestring

hypervisor ​

Properties ​

Property
Type
Constraints
Description
enableVectorboolean
imagestring
podTemplateobject
portNumberinteger<int32>min: 0 max: 65535Default: 8000
vectorImagestring

nodeDiscovery ​

Properties ​

Property
Type
Constraints
Description
imagestring
podTemplateobject

worker ​

Properties ​

Property
Type
Constraints
Description
imagestring
podTemplateobject

nodeManagerConfig ​

Properties ​

Property
Type
Constraints
Description
nodeCompaction ↓object
nodePoolRollingUpdatePolicy ↓object
nodeProvisioner ↓objectNodeProvisioner or NodeSelector, they are exclusive.
NodeSelector is for existing GPUs, NodeProvisioner is for Karpenter-like auto management.
nodeSelector ↓objectA node selector represents the union of the results of one or more label queries
over a set of nodes; that is, it represents the OR of the selectors represented
by the node selector terms.
provisioningModestringProvisioned AutoSelect KarpenterDefault: AutoSelect

nodeCompaction ​

Properties ​

Property
Type
Constraints
Description
periodstringDefault: 5m

nodePoolRollingUpdatePolicy ​

Properties ​

Property
Type
Constraints
Description
autoUpdatebooleanDefault: true
batchIntervalstringDefault: 10m
batchPercentageinteger<int32>min: 0 max: 100Default: 100
maintenanceWindow ↓object
maxDurationstringDefault: 10m

maintenanceWindow ​

Properties ​
Property
Type
Constraints
Description
includesarraycrontab syntax.

nodeProvisioner ​

NodeProvisioner or NodeSelector, they are exclusive.
NodeSelector is for existing GPUs, NodeProvisioner is for Karpenter-like auto management.

Properties ​

Property
Type
Constraints
Description
budget ↓objectNodeProvisioner will start an virtual billing based on public pricing or customized pricing, if the VM's costs exceeded any budget constraints, the new VM will not be created, and alerts will be generated
cpuNodeLabelsobject
cpuRequirements ↓array
cpuTaints ↓array
gpuNodeAnnotationsobject
gpuNodeLabelsobject
gpuRequirements ↓array
gpuTaints ↓array
karpenterNodeClassRef ↓objectKarpenter NodeClass name
nodeClassstringTensorFusion GPUNodeClass name

budget ​

NodeProvisioner will start an virtual billing based on public pricing or customized pricing, if the VM's costs exceeded any budget constraints, the new VM will not be created, and alerts will be generated

Properties ​
Property
Type
Constraints
Description
budgetExceedStrategystringAlertOnly AlertAndTerminateVMDefault: AlertOnly
budgetPerDaystringDefault: 100
budgetPerMonthstringDefault: 1000
budgetPerQuarterstringDefault: 3000

cpuRequirements (items) ​

Properties ​
Property
Type
Constraints
Description
keystringnode.kubernetes.io/instance-type kubernetes.io/arch kubernetes.io/os topology.kubernetes.io/region topology.kubernetes.io/zone karpenter.sh/capacity-type tensor-fusion.ai/gpu-vendor tensor-fusion.ai/gpu-instance-family tensor-fusion.ai/gpu-instance-size
operatorstringIn Exists DoesNotExist Gt LtA node selector operator is the set of operators that can be used in
a node selector requirement. Default: In
valuesarray

cpuTaints (items) ​

Properties ​
Property
Type
Constraints
Description
effectstringNoSchedule NoExecute PreferNoScheduleDefault: NoSchedule
keystring
valuestring

gpuRequirements (items) ​

Properties ​
Property
Type
Constraints
Description
keystringnode.kubernetes.io/instance-type kubernetes.io/arch kubernetes.io/os topology.kubernetes.io/region topology.kubernetes.io/zone karpenter.sh/capacity-type tensor-fusion.ai/gpu-vendor tensor-fusion.ai/gpu-instance-family tensor-fusion.ai/gpu-instance-size
operatorstringIn Exists DoesNotExist Gt LtA node selector operator is the set of operators that can be used in
a node selector requirement. Default: In
valuesarray

gpuTaints (items) ​

Properties ​
Property
Type
Constraints
Description
effectstringNoSchedule NoExecute PreferNoScheduleDefault: NoSchedule
keystring
valuestring

karpenterNodeClassRef ​

Karpenter NodeClass name

Properties ​
Property
Type
Constraints
Description
groupstring
kindstring
namestring
versionstring

nodeSelector ​

A node selector represents the union of the results of one or more label queries
over a set of nodes; that is, it represents the OR of the selectors represented
by the node selector terms.

Properties ​

Property
Type
Constraints
Description
nodeSelectorTerms ↓arrayRequired. A list of node selector terms. The terms are ORed.

nodeSelectorTerms (items) ​

Required. A list of node selector terms. The terms are ORed.

Properties ​
Property
Type
Constraints
Description
matchExpressions ↓arrayA list of node selector requirements by node's labels.
matchFields ↓arrayA list of node selector requirements by node's fields.

matchExpressions (items) ​

A list of node selector requirements by node's labels.

Properties ​
Property
Type
Constraints
Description
keystringThe label key that the selector applies to.
operatorstringRepresents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
valuesarrayAn array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.

matchFields (items) ​

A list of node selector requirements by node's fields.

Properties ​
Property
Type
Constraints
Description
keystringThe label key that the selector applies to.
operatorstringRepresents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
valuesarrayAn array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.

qosConfig ​

Define different QoS and their price.

Properties ​

Property
Type
Constraints
Description
defaultQoSstringlow medium high critical
definitions ↓array
pricing ↓array

definitions (items) ​

Properties ​

Property
Type
Constraints
Description
descriptionstring
namestringlow medium high critical
priorityinteger

pricing (items) ​

Properties ​

Property
Type
Constraints
Description
limitsOverRequestsstringDefault requests and limitsOverRequests are same, indicates normal on-demand serverless GPU usage, in hands-on lab low QoS case, limitsOverRequests should be lower, so that user can get burstable GPU resources with very low cost Default: 1
qosstringlow medium high critical
requests ↓objectThe default pricing based on second level pricing from https://modal.com/pricing
with Tensor/CUDA Core : HBM = 2:1

requests ​

The default pricing based on second level pricing from https://modal.com/pricing
with Tensor/CUDA Core : HBM = 2:1

Properties ​
Property
Type
Constraints
Description
perFP16TFlopsPerHourstringDefault: $0.0069228
perGBOfVRAMPerHourstringDefault: $0.01548

Status ​

GPUPoolStatus defines the observed state of GPUPool.

Property
Type
Constraints
Description
allocatedTFlopsPercentstring
allocatedVRAMPercentstring
availableTFlopsanypattern: Regex
availableVRAManypattern: Regex
budgetExceededstringIf the budget is exceeded, the set value in comma separated string to indicate which period caused the exceeding.
If this field is not empty, scheduler will not schedule new AI workloads and stop scaling-up check. Default: ``
clusterstring
componentStatus ↓objectwhen updating any component version or config, pool controller will perform rolling update.
the status will be updated periodically, default to 5s, progress will be 0-100.
when the progress is 100, the component version or config is fully updated.
conditions ↓array
lastCompactionTimestring<date-time>
notReadyNodesinteger<int32>
phasestringPending Running Updating Destroying UnknownDefault: Pending
potentialSavingsPerMonthstring
provisioningPhasestringNone Initializing Provisioning CompletedDefault: None
readyNodesinteger<int32>
runningAppsCntinteger<int32>
savedCostsPerMonthstring
totalGPUsinteger<int32>
totalNodesinteger<int32>
totalTFlopsanypattern: Regex
totalVRAManypattern: Regex
utilizedTFlopsPercentstring
utilizedVRAMPercentstring
virtualAvailableTFlopsanypattern: Regex
virtualAvailableVRAManypattern: Regex
virtualTFlopsanypattern: Regex
virtualVRAManypattern: Regex

componentStatus ​

when updating any component version or config, pool controller will perform rolling update.
the status will be updated periodically, default to 5s, progress will be 0-100.
when the progress is 100, the component version or config is fully updated.

Properties ​

Property
Type
Constraints
Description
clientstring
clientConfigSyncedboolean
clientUpdateProgressinteger<int32>
hypervisorstring
hypervisorConfigSyncedboolean
hypervisorUpdateProgressinteger<int32>
workerstring
workerConfigSyncedboolean
workerUpdateProgressinteger<int32>

conditions (items) ​

Properties ​

Property
Type
Constraints
Description
lastTransitionTimestring<date-time>lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
messagestringmaxLength: 32768message is a human readable message indicating details about the transition.
This may be an empty string.
observedGenerationinteger<int64>min: 0observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
with respect to the current state of the instance.
reasonstringminLength: 1 maxLength: 1024 pattern: Regexreason contains a programmatic identifier indicating the reason for the condition's last transition.
Producers of specific condition types may define expected values and meanings for this field,
and whether the values are considered a guaranteed API.
The value should be a CamelCase string.
This field may not be empty.
statusstringTrue False Unknownstatus of the condition, one of True, False, Unknown.
typestringmaxLength: 316 pattern: Regextype of condition in CamelCase or in foo.example.com/CamelCase.