Skip to content

WorkloadProfile

WorkloadProfile is the Schema for the workloadprofiles API.

Kubernetes Resource Information

FieldValue
API Versiontensor-fusion.ai/v1
KindWorkloadProfile
ScopeNamespaced

Table of Contents

Spec

WorkloadProfileSpec defines the desired state of WorkloadProfile.

Property
Type
Constraints
Description
autoScalingConfig objectAutoScalingConfig configured here will override Pool's schedulingConfig
This field can not be fully supported in annotation, if user want to enable auto-scaling in annotation,
user can set tensor-fusion.ai/auto-limits|requests|replicas: 'true'
gpuCountinteger<int32>The number of GPUs to be used by the workload, default to 1
gpuModelstringGPUModel specifies the required GPU model (e.g., "A100", "H100")
isLocalGPUbooleanSchedule the workload to the same GPU server that runs vGPU worker for best performance, default to false
nodeAffinity objectNodeAffinity specifies the node affinity requirements for the workload
poolNamestring
qosstringlow medium high criticalQos defines the quality of service level for the client.
replicasinteger<int32>If replicas not set, it will be dynamic based on pending Pod
If isLocalGPU set to true, replicas must be dynamic, and this field will be ignored
resources object

autoScalingConfig

AutoScalingConfig configured here will override Pool's schedulingConfig
This field can not be fully supported in annotation, if user want to enable auto-scaling in annotation,
user can set tensor-fusion.ai/auto-limits|requests|replicas: 'true'

Properties

Property
Type
Constraints
Description
autoSetLimits objectlayer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly
VPA-like, aggregate metrics data <1m
autoSetReplicas objectlayer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit
HPA-like, aggregate metrics data 1m-1h (when tf-worker scaled-up, should also trigger client pod's owner[Deployment etc.]'s replica increasing, check if KNative works)
autoSetRequests objectlayer 3 adjusting, to match the actual usage in the long run, only for N:M remote vGPU mode, not impl yet
Adjust baseline requests to match the actual usage in longer period, such as 1day - 2weeks

autoSetLimits

layer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly
VPA-like, aggregate metrics data <1m

Properties

Property
Type
Constraints
Description
enableboolean
evaluationPeriodstring
extraTFlopsBufferRatiostring
ignoredDeltaRangestring
maxRatioToRequestsstringthe multiplier of requests, to avoid limit set too high, like 5.0
prediction object
scaleUpStepstring
targetResourcestringtarget resource to scale limits, such as "tflops", "vram", or "all" by default

prediction

Properties
Property
Type
Constraints
Description
enableboolean
historyDataPeriodstring
modelstring
predictionPeriodstring

autoSetReplicas

layer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit
HPA-like, aggregate metrics data 1m-1h (when tf-worker scaled-up, should also trigger client pod's owner[Deployment etc.]'s replica increasing, check if KNative works)

Properties

Property
Type
Constraints
Description
enableboolean
evaluationPeriodstring
scaleDownCoolDownTimestring
scaleDownStepstring
scaleUpCoolDownTimestring
scaleUpStepstring
targetTFlopsOfLimitsstring

autoSetRequests

layer 3 adjusting, to match the actual usage in the long run, only for N:M remote vGPU mode, not impl yet
Adjust baseline requests to match the actual usage in longer period, such as 1day - 2weeks

Properties

Property
Type
Constraints
Description
aggregationPeriodstring
enableboolean
evaluationPeriodstring
extraBufferRatiostringthe request buffer ratio, for example actual usage is 1.0, 10% buffer will be 1.1 as final preferred requests
percentileForAutoRequestsstring
prediction object
targetResourcestringtarget resource to scale requests, such as "tflops", "vram", or "all" by default

prediction

Properties
Property
Type
Constraints
Description
enableboolean
historyDataPeriodstring
modelstring
predictionPeriodstring

nodeAffinity

NodeAffinity specifies the node affinity requirements for the workload

Properties

Property
Type
Constraints
Description
preferredDuringSchedulingIgnoredDuringExecution arrayThe scheduler will prefer to schedule pods to nodes that satisfy
the affinity expressions specified by this field, but it may choose
a node that violates one or more of the expressions. The node that is
most preferred is the one with the greatest sum of weights, i.e.
for each node that meets all of the scheduling requirements (resource
request, requiredDuringScheduling affinity expressions, etc.),
compute a sum by iterating through the elements of this field and adding
"weight" to the sum if the node matches the corresponding matchExpressions; the
node(s) with the highest sum are the most preferred.
requiredDuringSchedulingIgnoredDuringExecution objectIf the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node.
If the affinity requirements specified by this field cease to be met
at some point during pod execution (e.g. due to an update), the system
may or may not try to eventually evict the pod from its node.

preferredDuringSchedulingIgnoredDuringExecution (items)

The scheduler will prefer to schedule pods to nodes that satisfy
the affinity expressions specified by this field, but it may choose
a node that violates one or more of the expressions. The node that is
most preferred is the one with the greatest sum of weights, i.e.
for each node that meets all of the scheduling requirements (resource
request, requiredDuringScheduling affinity expressions, etc.),
compute a sum by iterating through the elements of this field and adding
"weight" to the sum if the node matches the corresponding matchExpressions; the
node(s) with the highest sum are the most preferred.

Properties

Property
Type
Constraints
Description
preference objectA node selector term, associated with the corresponding weight.
weightinteger<int32>Weight associated with matching the corresponding nodeSelectorTerm, in the range 1-100.

preference

A node selector term, associated with the corresponding weight.

Properties
Property
Type
Constraints
Description
matchExpressions arrayA list of node selector requirements by node's labels.
matchFields arrayA list of node selector requirements by node's fields.

matchExpressions (items)

A list of node selector requirements by node's labels.

Properties
Property
Type
Constraints
Description
keystringThe label key that the selector applies to.
operatorstringRepresents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
valuesarrayAn array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.

matchFields (items)

A list of node selector requirements by node's fields.

Properties
Property
Type
Constraints
Description
keystringThe label key that the selector applies to.
operatorstringRepresents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
valuesarrayAn array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.

requiredDuringSchedulingIgnoredDuringExecution

If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node.
If the affinity requirements specified by this field cease to be met
at some point during pod execution (e.g. due to an update), the system
may or may not try to eventually evict the pod from its node.

Properties

Property
Type
Constraints
Description
nodeSelectorTerms arrayRequired. A list of node selector terms. The terms are ORed.

nodeSelectorTerms (items)

Required. A list of node selector terms. The terms are ORed.

Properties
Property
Type
Constraints
Description
matchExpressions arrayA list of node selector requirements by node's labels.
matchFields arrayA list of node selector requirements by node's fields.

matchExpressions (items)

A list of node selector requirements by node's labels.

Properties
Property
Type
Constraints
Description
keystringThe label key that the selector applies to.
operatorstringRepresents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
valuesarrayAn array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.

matchFields (items)

A list of node selector requirements by node's fields.

Properties
Property
Type
Constraints
Description
keystringThe label key that the selector applies to.
operatorstringRepresents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
valuesarrayAn array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.

resources

Properties

Property
Type
Constraints
Description
limits object
requests object

limits

Properties

Property
Type
Constraints
Description
tflopsanypattern: Regex
vramanypattern: Regex

requests

Properties

Property
Type
Constraints
Description
tflopsanypattern: Regex
vramanypattern: Regex

Status

WorkloadProfileStatus defines the observed state of WorkloadProfile.