Skip to content

GPUPool ​

Kubernetes Resource Information ​

FieldValue
API Versiontensor-fusion.ai/v1
KindGPUPool
ScopeCluster

Table of Contents ​

Spec ​

GPUPoolSpec defines the desired state of GPUPool.

Property
Type
Constraints
Description
capacityConfig ↓object
componentConfig ↓objectCustomize system components for seamless onboarding.
nodeManagerConfig ↓object
observabilityConfig ↓object
qosConfig ↓objectDefine different QoS and their price.
schedulingConfig ↓objectPlace the workload to right nodes and scale smart.
schedulingConfigTemplatestring

capacityConfig ​

Properties ​

Property
Type
Constraints
Description
maxResources ↓object
minResources ↓object
oversubscription ↓object
warmResources ↓object

maxResources ​

Properties ​

Property
Type
Constraints
Description
cpuanypattern: RegexCPU/Memory is only available when CloudVendor connection is enabled
memoryanypattern: Regex
tflopsanypattern: Regex
vramanypattern: Regex

minResources ​

Properties ​

Property
Type
Constraints
Description
cpuanypattern: RegexCPU/Memory is only available when CloudVendor connection is enabled
memoryanypattern: Regex
tflopsanypattern: Regex
vramanypattern: Regex

oversubscription ​

Properties ​

Property
Type
Constraints
Description
tflopsOversellRatiointeger<int32>min: 100 max: 100000The multi of TFlops to oversell, default to 500%, indicates 5 times oversell Default: 500
vramExpandToHostDiskinteger<int32>min: 0 max: 100the percentage of Host Disk appending to GPU VRAM, default to 70% Default: 70
vramExpandToHostMeminteger<int32>min: 0 max: 100the percentage of Host RAM appending to GPU VRAM, default to 50% Default: 50

warmResources ​

Properties ​

Property
Type
Constraints
Description
cpuanypattern: RegexCPU/Memory is only available when CloudVendor connection is enabled
memoryanypattern: Regex
tflopsanypattern: Regex
vramanypattern: Regex

componentConfig ​

Customize system components for seamless onboarding.

Properties ​

Property
Type
Constraints
Description
client ↓object
hypervisor ↓object
nodeDiscovery ↓object
worker ↓object

client ​

Properties ​

Property
Type
Constraints
Description
operatorEndpointstring
patchToContainerobject
patchToPodobject

hypervisor ​

Properties ​

Property
Type
Constraints
Description
podTemplateobject

nodeDiscovery ​

Properties ​

Property
Type
Constraints
Description
podTemplateobject

worker ​

Properties ​

Property
Type
Constraints
Description
podTemplateobject

nodeManagerConfig ​

Properties ​

Property
Type
Constraints
Description
nodeCompaction ↓object
nodePoolRollingUpdatePolicy ↓object
nodeProvisioner ↓objectNodeProvisioner or NodeSelector, they are exclusive.
NodeSelector is for existing GPUs, NodeProvisioner is for Karpenter-like auto management.
nodeSelector ↓objectA node selector represents the union of the results of one or more label queries
over a set of nodes; that is, it represents the OR of the selectors represented
by the node selector terms.
provisioningModestringProvisioned AutoSelectDefault: AutoSelect

nodeCompaction ​

Properties ​

Property
Type
Constraints
Description
periodstringDefault: 5m

nodePoolRollingUpdatePolicy ​

Properties ​

Property
Type
Constraints
Description
autoUpdatebooleanDefault: true
batchIntervalstringDefault: 10m
batchPercentageinteger<int32>min: 0 max: 100Default: 100
maintenanceWindow ↓object
maxDurationstringDefault: 10m

maintenanceWindow ​

Properties ​
Property
Type
Constraints
Description
includesarraycrontab syntax.

nodeProvisioner ​

NodeProvisioner or NodeSelector, they are exclusive.
NodeSelector is for existing GPUs, NodeProvisioner is for Karpenter-like auto management.

Properties ​

Property
Type
Constraints
Description
budget ↓objectNodeProvisioner will start an virtual billing based on public pricing or customized pricing, if the VM's costs exceeded any budget constraints, the new VM will not be created, and alerts will be generated
cpuNodeLabelsobject
cpuRequirements ↓array
cpuTaints ↓array
gpuNodeLabelsobject
gpuRequirements ↓array
gpuTaints ↓array
modestringNative KarpenterMode could be Karpenter or Native, for Karpenter mode, node provisioner will start dummy nodes to provision and warmup GPU nodes, do nothing for CPU nodes, for Native mode, provisioner will create or compact GPU & CPU nodes based on current pods Default: Native
nodeClassstring

budget ​

NodeProvisioner will start an virtual billing based on public pricing or customized pricing, if the VM's costs exceeded any budget constraints, the new VM will not be created, and alerts will be generated

Properties ​
Property
Type
Constraints
Description
budgetExceedStrategystringAlertOnly AlertAndTerminateVMDefault: AlertOnly
budgetPerDaystringDefault: 100
budgetPerMonthstringDefault: 1000
budgetPerQuarterstringDefault: 3000

cpuRequirements (items) ​

Properties ​
Property
Type
Constraints
Description
keystringnode.kubernetes.io/instance-type kubernetes.io/arch kubernetes.io/os topology.kubernetes.io/region topology.kubernetes.io/zone karpenter.sh/capacity-type tensor-fusion.ai/gpu-arch tensor-fusion.ai/gpu-instance-family tensor-fusion.ai/gpu-instance-size
operatorstringIn Exists DoesNotExist Gt LtA node selector operator is the set of operators that can be used in
a node selector requirement. Default: In
valuesarray

cpuTaints (items) ​

Properties ​
Property
Type
Constraints
Description
effectstringNoSchedule NoExecute PreferNoScheduleDefault: NoSchedule
keystring
valuestring

gpuRequirements (items) ​

Properties ​
Property
Type
Constraints
Description
keystringnode.kubernetes.io/instance-type kubernetes.io/arch kubernetes.io/os topology.kubernetes.io/region topology.kubernetes.io/zone karpenter.sh/capacity-type tensor-fusion.ai/gpu-arch tensor-fusion.ai/gpu-instance-family tensor-fusion.ai/gpu-instance-size
operatorstringIn Exists DoesNotExist Gt LtA node selector operator is the set of operators that can be used in
a node selector requirement. Default: In
valuesarray

gpuTaints (items) ​

Properties ​
Property
Type
Constraints
Description
effectstringNoSchedule NoExecute PreferNoScheduleDefault: NoSchedule
keystring
valuestring

nodeSelector ​

A node selector represents the union of the results of one or more label queries
over a set of nodes; that is, it represents the OR of the selectors represented
by the node selector terms.

Properties ​

Property
Type
Constraints
Description
nodeSelectorTerms ↓arrayRequired. A list of node selector terms. The terms are ORed.

nodeSelectorTerms (items) ​

Required. A list of node selector terms. The terms are ORed.

Properties ​
Property
Type
Constraints
Description
matchExpressions ↓arrayA list of node selector requirements by node's labels.
matchFields ↓arrayA list of node selector requirements by node's fields.

matchExpressions (items) ​

A list of node selector requirements by node's labels.

Properties ​
Property
Type
Constraints
Description
keystringThe label key that the selector applies to.
operatorstringRepresents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
valuesarrayAn array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.

matchFields (items) ​

A list of node selector requirements by node's fields.

Properties ​
Property
Type
Constraints
Description
keystringThe label key that the selector applies to.
operatorstringRepresents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
valuesarrayAn array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.

observabilityConfig ​

Properties ​

Property
Type
Constraints
Description
alert ↓object
monitor ↓object

alert ​

Properties ​

Property
Type
Constraints
Description
expressionobject

monitor ​

Properties ​

Property
Type
Constraints
Description
intervalstring

qosConfig ​

Define different QoS and their price.

Properties ​

Property
Type
Constraints
Description
defaultQoSstringlow medium high critical
definitions ↓array
pricing ↓array

definitions (items) ​

Properties ​

Property
Type
Constraints
Description
descriptionstring
namestringlow medium high critical
priorityinteger

pricing (items) ​

Properties ​

Property
Type
Constraints
Description
limitsOverRequestsstringDefault requests and limitsOverRequests are same, indicates normal on-demand serverless GPU usage, in hands-on lab low QoS case, limitsOverRequests should be cheaper, for example Low QoS, ratio should be 0.5 Default: 1
qosstringlow medium high critical
requests ↓objectThe default pricing based on second level pricing from https://modal.com/pricing
with Tensor/CUDA Core : HBM = 2:1

requests ​

The default pricing based on second level pricing from https://modal.com/pricing
with Tensor/CUDA Core : HBM = 2:1

Properties ​
Property
Type
Constraints
Description
perFP16TFlopsPerHourstringDefault: $0.0069228
perGBOfVRAMPerHourstringDefault: $0.01548

schedulingConfig ​

Place the workload to right nodes and scale smart.

Properties ​

Property
Type
Constraints
Description
autoScaling ↓objectscale the workload based on the usage and traffic
hypervisor ↓objectsingle GPU device multi-process queuing and fair scheduling with QoS constraint
placement ↓objectplace the client or worker to best matched nodes
reBalancer ↓objectavoid hot GPU devices and continuously balance the workload
implemented by trigger a simulation scheduling and advise better GPU nodes for scheduler

autoScaling ​

scale the workload based on the usage and traffic

Properties ​

Property
Type
Constraints
Description
autoSetLimits ↓objectlayer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly
autoSetReplicas ↓objectlayer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit
autoSetRequests ↓objectlayer 3 adjusting, to match the actual usage in the long run
scaleToZero ↓objectadditional layer to save VRAM, auto-freeze memory and cool down to RAM and Disk

autoSetLimits ​

layer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly

Properties ​
Property
Type
Constraints
Description
evaluationPeriodstring
extraTFlopsBufferRatiostring
ignoredDeltaRangestring
maxRatioToRequestsstringthe multiplier of requests, to avoid limit set too high, like 5.0
prediction ↓object
scaleUpStepstring

prediction ​

Properties ​
Property
Type
Constraints
Description
enableboolean
historyDataPeriodstring
modelstring
predictionPeriodstring

autoSetReplicas ​

layer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit

Properties ​
Property
Type
Constraints
Description
enableboolean
evaluationPeriodstring
scaleDownCoolDownTimestring
scaleDownStepstring
scaleUpCoolDownTimestring
scaleUpStepstring
targetTFlopsOfLimitsstring

autoSetRequests ​

layer 3 adjusting, to match the actual usage in the long run

Properties ​
Property
Type
Constraints
Description
aggregationPeriodstring
evaluationPeriodstring
extraBufferRatiostringthe request buffer ratio, for example actual usage is 1.0, 10% buffer will be 1.1 as final preferred requests
percentileForAutoRequestsstring
prediction ↓object

prediction ​

Properties ​
Property
Type
Constraints
Description
enableboolean
historyDataPeriodstring
modelstring
predictionPeriodstring

scaleToZero ​

additional layer to save VRAM, auto-freeze memory and cool down to RAM and Disk

Properties ​
Property
Type
Constraints
Description
autoFreeze ↓array
intelligenceWarmup ↓object

autoFreeze (items) ​

Properties ​
Property
Type
Constraints
Description
enableboolean
freezeToDiskTTLstring
freezeToMemTTLstring
qosstringlow medium high critical

intelligenceWarmup ​

Properties ​
Property
Type
Constraints
Description
enableboolean
historyDataPeriodstring
modelstring
predictionPeriodstring

hypervisor ​

single GPU device multi-process queuing and fair scheduling with QoS constraint

Properties ​

Property
Type
Constraints
Description
multiProcessQueuing ↓object

multiProcessQueuing ​

Properties ​
Property
Type
Constraints
Description
enableboolean
intervalstring
queueLevelTimeSlicesarray

placement ​

place the client or worker to best matched nodes

Properties ​

Property
Type
Constraints
Description
allowUsingLocalGPUbooleanDefault: true
gpuFilters ↓array
modestringCompactFirst LowLoadFirstDefault: CompactFirst

gpuFilters (items) ​

Properties ​
Property
Type
Constraints
Description
paramsobject
typestring

reBalancer ​

avoid hot GPU devices and continuously balance the workload
implemented by trigger a simulation scheduling and advise better GPU nodes for scheduler

Properties ​

Property
Type
Constraints
Description
internalstring
reBalanceCoolDownTimestring
threshold ↓object

threshold ​

Properties ​
Property
Type
Constraints
Description
matchAnyobject

Status ​

GPUPoolStatus defines the observed state of GPUPool.

Property
Type
Constraints
Description
allocatedTFlopsPercentstringupdated with interval
allocatedVRAMPercentstring
availableTFlopsanypattern: Regex
availableVRAManypattern: Regex
budgetExceededstringIf the budget is exceeded, the set value in comma separated string to indicate which period caused the exceeding.
If this field is not empty, scheduler will not schedule new AI workloads and stop scaling-up check. Default: ``
clusterstring
componentStatus ↓objectwhen updating any component version or config, pool controller will perform rolling update.
the status will be updated periodically, default to 5s, progress will be 0-100.
when the progress is 100, the component version or config is fully updated.
conditions ↓array
lastCompactionTimestring<date-time>
notReadyNodesinteger<int32>
phasestringPending Running Updating Destroying UnknownDefault: Pending
potentialSavingsPerMonthstring
readyNodesinteger<int32>
savedCostsPerMonthstringaggregated with interval
totalGPUsinteger<int32>
totalNodesinteger<int32>
totalTFlopsanypattern: Regex
totalVRAManypattern: Regex
utilizedTFlopsPercentstringcalculated every 5m average
utilizedVRAMPercentstring
virtualAvailableTFlopsanypattern: Regex
virtualAvailableVRAManypattern: Regex
virtualTFlopsanypattern: Regex
virtualVRAManypattern: Regex

componentStatus ​

when updating any component version or config, pool controller will perform rolling update.
the status will be updated periodically, default to 5s, progress will be 0-100.
when the progress is 100, the component version or config is fully updated.

Properties ​

Property
Type
Constraints
Description
clientstring
clientConfigSyncedboolean
clientUpdateProgressinteger<int32>
hypervisorstring
hypervisorConfigSyncedboolean
hypervisorUpdateProgressinteger<int32>
workerstring
workerConfigSyncedboolean
workerUpdateProgressinteger<int32>

conditions (items) ​

Properties ​

Property
Type
Constraints
Description
lastTransitionTimestring<date-time>lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
messagestringmaxLength: 32768message is a human readable message indicating details about the transition.
This may be an empty string.
observedGenerationinteger<int64>min: 0observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
with respect to the current state of the instance.
reasonstringminLength: 1 maxLength: 1024 pattern: Regexreason contains a programmatic identifier indicating the reason for the condition's last transition.
Producers of specific condition types may define expected values and meanings for this field,
and whether the values are considered a guaranteed API.
The value should be a CamelCase string.
This field may not be empty.
statusstringTrue False Unknownstatus of the condition, one of True, False, Unknown.
typestringmaxLength: 316 pattern: Regextype of condition in CamelCase or in foo.example.com/CamelCase.