GPUPool â
Kubernetes Resource Information â
Field | Value |
---|---|
API Version | tensor-fusion.ai/v1 |
Kind | GPUPool |
Scope | Cluster |
Table of Contents â
Spec â
GPUPoolSpec defines the desired state of GPUPool.
Property | Type | Constraints | Description |
---|---|---|---|
capacityConfig â | object | ||
componentConfig â | object | Customize system components for seamless onboarding. | |
nodeManagerConfig â | object | ||
observabilityConfig â | object | ||
qosConfig â | object | Define different QoS and their price. | |
schedulingConfig â | object | Place the workload to right nodes and scale smart. | |
schedulingConfigTemplate | string |
capacityConfig â
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
maxResources â | object | ||
minResources â | object | ||
oversubscription â | object | ||
warmResources â | object |
maxResources â
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
cpu | any | pattern: Regex | CPU/Memory is only available when CloudVendor connection is enabled |
memory | any | pattern: Regex | |
tflops | any | pattern: Regex | |
vram | any | pattern: Regex |
minResources â
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
cpu | any | pattern: Regex | CPU/Memory is only available when CloudVendor connection is enabled |
memory | any | pattern: Regex | |
tflops | any | pattern: Regex | |
vram | any | pattern: Regex |
oversubscription â
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
tflopsOversellRatio | integer<int32> | min: 100 max: 100000 | The multi of TFlops to oversell, default to 500%, indicates 5 times oversell Default: 500 |
vramExpandToHostDisk | integer<int32> | min: 0 max: 100 | the percentage of Host Disk appending to GPU VRAM, default to 70% Default: 70 |
vramExpandToHostMem | integer<int32> | min: 0 max: 100 | the percentage of Host RAM appending to GPU VRAM, default to 50% Default: 50 |
componentConfig â
Customize system components for seamless onboarding.
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
client â | object | ||
hypervisor â | object | ||
nodeDiscovery â | object | ||
worker â | object |
nodeManagerConfig â
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
nodeCompaction â | object | ||
nodePoolRollingUpdatePolicy â | object | ||
nodeProvisioner â | object | NodeProvisioner or NodeSelector, they are exclusive. NodeSelector is for existing GPUs, NodeProvisioner is for Karpenter-like auto management. | |
nodeSelector â | object | A node selector represents the union of the results of one or more label queries over a set of nodes; that is, it represents the OR of the selectors represented by the node selector terms. | |
provisioningMode | string | Provisioned AutoSelect | Default: AutoSelect |
nodePoolRollingUpdatePolicy â
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
autoUpdate | boolean | Default: true | |
batchInterval | string | Default: 10m | |
batchPercentage | integer<int32> | min: 0 max: 100 | Default: 100 |
maintenanceWindow â | object | ||
maxDuration | string | Default: 10m |
nodeProvisioner â
NodeProvisioner or NodeSelector, they are exclusive.
NodeSelector is for existing GPUs, NodeProvisioner is for Karpenter-like auto management.
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
budget â | object | NodeProvisioner will start an virtual billing based on public pricing or customized pricing, if the VM's costs exceeded any budget constraints, the new VM will not be created, and alerts will be generated | |
cpuNodeLabels | object | ||
cpuRequirements â | array | ||
cpuTaints â | array | ||
gpuNodeLabels | object | ||
gpuRequirements â | array | ||
gpuTaints â | array | ||
mode | string | Native Karpenter | Mode could be Karpenter or Native, for Karpenter mode, node provisioner will start dummy nodes to provision and warmup GPU nodes, do nothing for CPU nodes, for Native mode, provisioner will create or compact GPU & CPU nodes based on current pods Default: Native |
nodeClass | string |
budget â
NodeProvisioner will start an virtual billing based on public pricing or customized pricing, if the VM's costs exceeded any budget constraints, the new VM will not be created, and alerts will be generated
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
budgetExceedStrategy | string | AlertOnly AlertAndTerminateVM | Default: AlertOnly |
budgetPerDay | string | Default: 100 | |
budgetPerMonth | string | Default: 1000 | |
budgetPerQuarter | string | Default: 3000 |
cpuRequirements (items) â
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
key | string | node.kubernetes.io/instance-type kubernetes.io/arch kubernetes.io/os topology.kubernetes.io/region topology.kubernetes.io/zone karpenter.sh/capacity-type tensor-fusion.ai/gpu-arch tensor-fusion.ai/gpu-instance-family tensor-fusion.ai/gpu-instance-size | |
operator | string | In Exists DoesNotExist Gt Lt | A node selector operator is the set of operators that can be used in a node selector requirement. Default: In |
values | array |
cpuTaints (items) â
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
effect | string | NoSchedule NoExecute PreferNoSchedule | Default: NoSchedule |
key | string | ||
value | string |
gpuRequirements (items) â
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
key | string | node.kubernetes.io/instance-type kubernetes.io/arch kubernetes.io/os topology.kubernetes.io/region topology.kubernetes.io/zone karpenter.sh/capacity-type tensor-fusion.ai/gpu-arch tensor-fusion.ai/gpu-instance-family tensor-fusion.ai/gpu-instance-size | |
operator | string | In Exists DoesNotExist Gt Lt | A node selector operator is the set of operators that can be used in a node selector requirement. Default: In |
values | array |
nodeSelector â
A node selector represents the union of the results of one or more label queries
over a set of nodes; that is, it represents the OR of the selectors represented
by the node selector terms.
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
nodeSelectorTerms â | array | Required. A list of node selector terms. The terms are ORed. |
nodeSelectorTerms (items) â
Required. A list of node selector terms. The terms are ORed.
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
matchExpressions â | array | A list of node selector requirements by node's labels. | |
matchFields â | array | A list of node selector requirements by node's fields. |
matchExpressions (items) â
A list of node selector requirements by node's labels.
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
key | string | The label key that the selector applies to. | |
operator | string | Represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt. | |
values | array | An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch. |
matchFields (items) â
A list of node selector requirements by node's fields.
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
key | string | The label key that the selector applies to. | |
operator | string | Represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt. | |
values | array | An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch. |
observabilityConfig â
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
alert â | object | ||
monitor â | object |
qosConfig â
Define different QoS and their price.
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
defaultQoS | string | low medium high critical | |
definitions â | array | ||
pricing â | array |
definitions (items) â
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
description | string | ||
name | string | low medium high critical | |
priority | integer |
pricing (items) â
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
limitsOverRequests | string | Default requests and limitsOverRequests are same, indicates normal on-demand serverless GPU usage, in hands-on lab low QoS case, limitsOverRequests should be cheaper, for example Low QoS, ratio should be 0.5 Default: 1 | |
qos | string | low medium high critical | |
requests â | object | The default pricing based on second level pricing from https://modal.com/pricing with Tensor/CUDA Core : HBM = 2:1 |
requests â
The default pricing based on second level pricing from https://modal.com/pricing
with Tensor/CUDA Core : HBM = 2:1
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
perFP16TFlopsPerHour | string | Default: $0.0069228 | |
perGBOfVRAMPerHour | string | Default: $0.01548 |
schedulingConfig â
Place the workload to right nodes and scale smart.
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
autoScaling â | object | scale the workload based on the usage and traffic | |
hypervisor â | object | single GPU device multi-process queuing and fair scheduling with QoS constraint | |
placement â | object | place the client or worker to best matched nodes | |
reBalancer â | object | avoid hot GPU devices and continuously balance the workload implemented by trigger a simulation scheduling and advise better GPU nodes for scheduler |
autoScaling â
scale the workload based on the usage and traffic
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
autoSetLimits â | object | layer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly | |
autoSetReplicas â | object | layer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit | |
autoSetRequests â | object | layer 3 adjusting, to match the actual usage in the long run | |
scaleToZero â | object | additional layer to save VRAM, auto-freeze memory and cool down to RAM and Disk |
autoSetLimits â
layer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
evaluationPeriod | string | ||
extraTFlopsBufferRatio | string | ||
ignoredDeltaRange | string | ||
maxRatioToRequests | string | the multiplier of requests, to avoid limit set too high, like 5.0 | |
prediction â | object | ||
scaleUpStep | string |
autoSetReplicas â
layer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
enable | boolean | ||
evaluationPeriod | string | ||
scaleDownCoolDownTime | string | ||
scaleDownStep | string | ||
scaleUpCoolDownTime | string | ||
scaleUpStep | string | ||
targetTFlopsOfLimits | string |
autoSetRequests â
layer 3 adjusting, to match the actual usage in the long run
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
aggregationPeriod | string | ||
evaluationPeriod | string | ||
extraBufferRatio | string | the request buffer ratio, for example actual usage is 1.0, 10% buffer will be 1.1 as final preferred requests | |
percentileForAutoRequests | string | ||
prediction â | object |
scaleToZero â
additional layer to save VRAM, auto-freeze memory and cool down to RAM and Disk
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
autoFreeze â | array | ||
intelligenceWarmup â | object |
hypervisor â
single GPU device multi-process queuing and fair scheduling with QoS constraint
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
multiProcessQueuing â | object |
placement â
place the client or worker to best matched nodes
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
allowUsingLocalGPU | boolean | Default: true | |
gpuFilters â | array | ||
mode | string | CompactFirst LowLoadFirst | Default: CompactFirst |
Status â
GPUPoolStatus defines the observed state of GPUPool.
Property | Type | Constraints | Description |
---|---|---|---|
allocatedTFlopsPercent | string | updated with interval | |
allocatedVRAMPercent | string | ||
availableTFlops | any | pattern: Regex | |
availableVRAM | any | pattern: Regex | |
budgetExceeded | string | If the budget is exceeded, the set value in comma separated string to indicate which period caused the exceeding. If this field is not empty, scheduler will not schedule new AI workloads and stop scaling-up check. Default: `` | |
cluster | string | ||
componentStatus â | object | when updating any component version or config, pool controller will perform rolling update. the status will be updated periodically, default to 5s, progress will be 0-100. when the progress is 100, the component version or config is fully updated. | |
conditions â | array | ||
lastCompactionTime | string<date-time> | ||
notReadyNodes | integer<int32> | ||
phase | string | Pending Running Updating Destroying Unknown | Default: Pending |
potentialSavingsPerMonth | string | ||
readyNodes | integer<int32> | ||
savedCostsPerMonth | string | aggregated with interval | |
totalGPUs | integer<int32> | ||
totalNodes | integer<int32> | ||
totalTFlops | any | pattern: Regex | |
totalVRAM | any | pattern: Regex | |
utilizedTFlopsPercent | string | calculated every 5m average | |
utilizedVRAMPercent | string | ||
virtualAvailableTFlops | any | pattern: Regex | |
virtualAvailableVRAM | any | pattern: Regex | |
virtualTFlops | any | pattern: Regex | |
virtualVRAM | any | pattern: Regex |
componentStatus â
when updating any component version or config, pool controller will perform rolling update.
the status will be updated periodically, default to 5s, progress will be 0-100.
when the progress is 100, the component version or config is fully updated.
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
client | string | ||
clientConfigSynced | boolean | ||
clientUpdateProgress | integer<int32> | ||
hypervisor | string | ||
hypervisorConfigSynced | boolean | ||
hypervisorUpdateProgress | integer<int32> | ||
worker | string | ||
workerConfigSynced | boolean | ||
workerUpdateProgress | integer<int32> |
conditions (items) â
Properties â
Property | Type | Constraints | Description |
---|---|---|---|
lastTransitionTime | string<date-time> | lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. | |
message | string | maxLength: 32768 | message is a human readable message indicating details about the transition. This may be an empty string. |
observedGeneration | integer<int64> | min: 0 | observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance. |
reason | string | minLength: 1 maxLength: 1024 pattern: Regex | reason contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty. |
status | string | True False Unknown | status of the condition, one of True, False, Unknown. |
type | string | maxLength: 316 pattern: Regex | type of condition in CamelCase or in foo.example.com/CamelCase. |