Skip to content

WorkloadProfile ​

WorkloadProfile is the Schema for the workloadprofiles API.

Kubernetes Resource Information ​

FieldValue
API Versiontensor-fusion.ai/v1
KindWorkloadProfile
ScopeNamespaced

Table of Contents ​

Spec ​

WorkloadProfileSpec defines the desired state of WorkloadProfile.

Property
Type
Constraints
Description
autoScalingConfig ↓objectAutoScalingConfig configured here will override Pool's schedulingConfig
This field can not be fully supported in annotation, if user want to enable auto-scaling in annotation,
user can set tensor-fusion.ai/auto-limits|requests|replicas: 'true'
gpuCountinteger<int32>The number of GPUs to be used by the workload, default to 1
gpuModelstringGPUModel specifies the required GPU model (e.g., "A100", "H100")
isLocalGPUbooleanSchedule the workload to the same GPU server that runs vGPU worker for best performance, default to false
nodeAffinity ↓objectNodeAffinity specifies the node affinity requirements for the workload
poolNamestring
qosstringlow medium high criticalQos defines the quality of service level for the client.
replicasinteger<int32>If replicas not set, it will be dynamic based on pending Pod
If isLocalGPU set to true, replicas must be dynamic, and this field will be ignored
resources ↓object

autoScalingConfig ​

AutoScalingConfig configured here will override Pool's schedulingConfig
This field can not be fully supported in annotation, if user want to enable auto-scaling in annotation,
user can set tensor-fusion.ai/auto-limits|requests|replicas: 'true'

Properties ​

Property
Type
Constraints
Description
autoSetLimits ↓objectlayer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly
VPA-like, aggregate metrics data <1m
autoSetReplicas ↓objectlayer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit
HPA-like, aggregate metrics data 1m-1h (when tf-worker scaled-up, should also trigger client pod's owner[Deployment etc.]'s replica increasing, check if KNative works)
autoSetRequests ↓objectlayer 3 adjusting, to match the actual usage in the long run, only for N:M remote vGPU mode, not impl yet
Adjust baseline requests to match the actual usage in longer period, such as 1day - 2weeks

autoSetLimits ​

layer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly
VPA-like, aggregate metrics data <1m

Properties ​

Property
Type
Constraints
Description
enableboolean
evaluationPeriodstring
extraTFlopsBufferRatiostring
ignoredDeltaRangestring
maxRatioToRequestsstringthe multiplier of requests, to avoid limit set too high, like 5.0
prediction ↓object
scaleUpStepstring
targetResourcestringtarget resource to scale limits, such as "tflops", "vram", or "all" by default

prediction ​

Properties ​
Property
Type
Constraints
Description
enableboolean
historyDataPeriodstring
modelstring
predictionPeriodstring

autoSetReplicas ​

layer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit
HPA-like, aggregate metrics data 1m-1h (when tf-worker scaled-up, should also trigger client pod's owner[Deployment etc.]'s replica increasing, check if KNative works)

Properties ​

Property
Type
Constraints
Description
enableboolean
evaluationPeriodstring
scaleDownCoolDownTimestring
scaleDownStepstring
scaleUpCoolDownTimestring
scaleUpStepstring
targetTFlopsOfLimitsstring

autoSetRequests ​

layer 3 adjusting, to match the actual usage in the long run, only for N:M remote vGPU mode, not impl yet
Adjust baseline requests to match the actual usage in longer period, such as 1day - 2weeks

Properties ​

Property
Type
Constraints
Description
aggregationPeriodstring
enableboolean
evaluationPeriodstring
extraBufferRatiostringthe request buffer ratio, for example actual usage is 1.0, 10% buffer will be 1.1 as final preferred requests
percentileForAutoRequestsstring
prediction ↓object
targetResourcestringtarget resource to scale requests, such as "tflops", "vram", or "all" by default

prediction ​

Properties ​
Property
Type
Constraints
Description
enableboolean
historyDataPeriodstring
modelstring
predictionPeriodstring

nodeAffinity ​

NodeAffinity specifies the node affinity requirements for the workload

Properties ​

Property
Type
Constraints
Description
preferredDuringSchedulingIgnoredDuringExecution ↓arrayThe scheduler will prefer to schedule pods to nodes that satisfy
the affinity expressions specified by this field, but it may choose
a node that violates one or more of the expressions. The node that is
most preferred is the one with the greatest sum of weights, i.e.
for each node that meets all of the scheduling requirements (resource
request, requiredDuringScheduling affinity expressions, etc.),
compute a sum by iterating through the elements of this field and adding
"weight" to the sum if the node matches the corresponding matchExpressions; the
node(s) with the highest sum are the most preferred.
requiredDuringSchedulingIgnoredDuringExecution ↓objectIf the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node.
If the affinity requirements specified by this field cease to be met
at some point during pod execution (e.g. due to an update), the system
may or may not try to eventually evict the pod from its node.

preferredDuringSchedulingIgnoredDuringExecution (items) ​

The scheduler will prefer to schedule pods to nodes that satisfy
the affinity expressions specified by this field, but it may choose
a node that violates one or more of the expressions. The node that is
most preferred is the one with the greatest sum of weights, i.e.
for each node that meets all of the scheduling requirements (resource
request, requiredDuringScheduling affinity expressions, etc.),
compute a sum by iterating through the elements of this field and adding
"weight" to the sum if the node matches the corresponding matchExpressions; the
node(s) with the highest sum are the most preferred.

Properties ​

Property
Type
Constraints
Description
preference ↓objectA node selector term, associated with the corresponding weight.
weightinteger<int32>Weight associated with matching the corresponding nodeSelectorTerm, in the range 1-100.

preference ​

A node selector term, associated with the corresponding weight.

Properties ​
Property
Type
Constraints
Description
matchExpressions ↓arrayA list of node selector requirements by node's labels.
matchFields ↓arrayA list of node selector requirements by node's fields.

matchExpressions (items) ​

A list of node selector requirements by node's labels.

Properties ​
Property
Type
Constraints
Description
keystringThe label key that the selector applies to.
operatorstringRepresents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
valuesarrayAn array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.

matchFields (items) ​

A list of node selector requirements by node's fields.

Properties ​
Property
Type
Constraints
Description
keystringThe label key that the selector applies to.
operatorstringRepresents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
valuesarrayAn array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.

requiredDuringSchedulingIgnoredDuringExecution ​

If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node.
If the affinity requirements specified by this field cease to be met
at some point during pod execution (e.g. due to an update), the system
may or may not try to eventually evict the pod from its node.

Properties ​

Property
Type
Constraints
Description
nodeSelectorTerms ↓arrayRequired. A list of node selector terms. The terms are ORed.

nodeSelectorTerms (items) ​

Required. A list of node selector terms. The terms are ORed.

Properties ​
Property
Type
Constraints
Description
matchExpressions ↓arrayA list of node selector requirements by node's labels.
matchFields ↓arrayA list of node selector requirements by node's fields.

matchExpressions (items) ​

A list of node selector requirements by node's labels.

Properties ​
Property
Type
Constraints
Description
keystringThe label key that the selector applies to.
operatorstringRepresents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
valuesarrayAn array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.

matchFields (items) ​

A list of node selector requirements by node's fields.

Properties ​
Property
Type
Constraints
Description
keystringThe label key that the selector applies to.
operatorstringRepresents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
valuesarrayAn array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.

resources ​

Properties ​

Property
Type
Constraints
Description
limits ↓object
requests ↓object

limits ​

Properties ​

Property
Type
Constraints
Description
tflopsanypattern: Regex
vramanypattern: Regex

requests ​

Properties ​

Property
Type
Constraints
Description
tflopsanypattern: Regex
vramanypattern: Regex

Status ​

WorkloadProfileStatus defines the observed state of WorkloadProfile.