WorkloadProfile
WorkloadProfile is the Schema for the workloadprofiles API.
Kubernetes Resource Information
Field | Value |
---|---|
API Version | tensor-fusion.ai/v1 |
Kind | WorkloadProfile |
Scope | Namespaced |
Table of Contents
Spec
WorkloadProfileSpec defines the desired state of WorkloadProfile.
Property | Type | Constraints | Description |
---|---|---|---|
autoScalingConfig ↓ | object | AutoScalingConfig configured here will override Pool's schedulingConfig This field can not be fully supported in annotation, if user want to enable auto-scaling in annotation, user can set tensor-fusion.ai/auto-limits|requests|replicas: 'true' | |
gpuCount | integer<int32> | The number of GPUs to be used by the workload, default to 1 | |
gpuModel | string | GPUModel specifies the required GPU model (e.g., "A100", "H100") | |
isLocalGPU | boolean | Schedule the workload to the same GPU server that runs vGPU worker for best performance, default to false | |
nodeAffinity ↓ | object | NodeAffinity specifies the node affinity requirements for the workload | |
poolName | string | ||
qos | string | low medium high critical | Qos defines the quality of service level for the client. |
replicas | integer<int32> | If replicas not set, it will be dynamic based on pending Pod If isLocalGPU set to true, replicas must be dynamic, and this field will be ignored | |
resources ↓ | object |
autoScalingConfig
AutoScalingConfig configured here will override Pool's schedulingConfig
This field can not be fully supported in annotation, if user want to enable auto-scaling in annotation,
user can set tensor-fusion.ai/auto-limits|requests|replicas: 'true'
Properties
Property | Type | Constraints | Description |
---|---|---|---|
autoSetLimits ↓ | object | layer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly VPA-like, aggregate metrics data <1m | |
autoSetReplicas ↓ | object | layer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit HPA-like, aggregate metrics data 1m-1h (when tf-worker scaled-up, should also trigger client pod's owner[Deployment etc.]'s replica increasing, check if KNative works) | |
autoSetRequests ↓ | object | layer 3 adjusting, to match the actual usage in the long run, only for N:M remote vGPU mode, not impl yet Adjust baseline requests to match the actual usage in longer period, such as 1day - 2weeks |
autoSetLimits
layer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly
VPA-like, aggregate metrics data <1m
Properties
Property | Type | Constraints | Description |
---|---|---|---|
enable | boolean | ||
evaluationPeriod | string | ||
extraTFlopsBufferRatio | string | ||
ignoredDeltaRange | string | ||
maxRatioToRequests | string | the multiplier of requests, to avoid limit set too high, like 5.0 | |
prediction ↓ | object | ||
scaleUpStep | string | ||
targetResource | string | target resource to scale limits, such as "tflops", "vram", or "all" by default |
autoSetReplicas
layer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit
HPA-like, aggregate metrics data 1m-1h (when tf-worker scaled-up, should also trigger client pod's owner[Deployment etc.]'s replica increasing, check if KNative works)
Properties
Property | Type | Constraints | Description |
---|---|---|---|
enable | boolean | ||
evaluationPeriod | string | ||
scaleDownCoolDownTime | string | ||
scaleDownStep | string | ||
scaleUpCoolDownTime | string | ||
scaleUpStep | string | ||
targetTFlopsOfLimits | string |
autoSetRequests
layer 3 adjusting, to match the actual usage in the long run, only for N:M remote vGPU mode, not impl yet
Adjust baseline requests to match the actual usage in longer period, such as 1day - 2weeks
Properties
Property | Type | Constraints | Description |
---|---|---|---|
aggregationPeriod | string | ||
enable | boolean | ||
evaluationPeriod | string | ||
extraBufferRatio | string | the request buffer ratio, for example actual usage is 1.0, 10% buffer will be 1.1 as final preferred requests | |
percentileForAutoRequests | string | ||
prediction ↓ | object | ||
targetResource | string | target resource to scale requests, such as "tflops", "vram", or "all" by default |
nodeAffinity
NodeAffinity specifies the node affinity requirements for the workload
Properties
Property | Type | Constraints | Description |
---|---|---|---|
preferredDuringSchedulingIgnoredDuringExecution ↓ | array | The scheduler will prefer to schedule pods to nodes that satisfy the affinity expressions specified by this field, but it may choose a node that violates one or more of the expressions. The node that is most preferred is the one with the greatest sum of weights, i.e. for each node that meets all of the scheduling requirements (resource request, requiredDuringScheduling affinity expressions, etc.), compute a sum by iterating through the elements of this field and adding "weight" to the sum if the node matches the corresponding matchExpressions; the node(s) with the highest sum are the most preferred. | |
requiredDuringSchedulingIgnoredDuringExecution ↓ | object | If the affinity requirements specified by this field are not met at scheduling time, the pod will not be scheduled onto the node. If the affinity requirements specified by this field cease to be met at some point during pod execution (e.g. due to an update), the system may or may not try to eventually evict the pod from its node. |
preferredDuringSchedulingIgnoredDuringExecution (items)
The scheduler will prefer to schedule pods to nodes that satisfy
the affinity expressions specified by this field, but it may choose
a node that violates one or more of the expressions. The node that is
most preferred is the one with the greatest sum of weights, i.e.
for each node that meets all of the scheduling requirements (resource
request, requiredDuringScheduling affinity expressions, etc.),
compute a sum by iterating through the elements of this field and adding
"weight" to the sum if the node matches the corresponding matchExpressions; the
node(s) with the highest sum are the most preferred.
Properties
Property | Type | Constraints | Description |
---|---|---|---|
preference ↓ | object | A node selector term, associated with the corresponding weight. | |
weight | integer<int32> | Weight associated with matching the corresponding nodeSelectorTerm, in the range 1-100. |
preference
A node selector term, associated with the corresponding weight.
Properties
Property | Type | Constraints | Description |
---|---|---|---|
matchExpressions ↓ | array | A list of node selector requirements by node's labels. | |
matchFields ↓ | array | A list of node selector requirements by node's fields. |
matchExpressions (items)
A list of node selector requirements by node's labels.
Properties
Property | Type | Constraints | Description |
---|---|---|---|
key | string | The label key that the selector applies to. | |
operator | string | Represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt. | |
values | array | An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch. |
matchFields (items)
A list of node selector requirements by node's fields.
Properties
Property | Type | Constraints | Description |
---|---|---|---|
key | string | The label key that the selector applies to. | |
operator | string | Represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt. | |
values | array | An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch. |
requiredDuringSchedulingIgnoredDuringExecution
If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node.
If the affinity requirements specified by this field cease to be met
at some point during pod execution (e.g. due to an update), the system
may or may not try to eventually evict the pod from its node.
Properties
Property | Type | Constraints | Description |
---|---|---|---|
nodeSelectorTerms ↓ | array | Required. A list of node selector terms. The terms are ORed. |
nodeSelectorTerms (items)
Required. A list of node selector terms. The terms are ORed.
Properties
Property | Type | Constraints | Description |
---|---|---|---|
matchExpressions ↓ | array | A list of node selector requirements by node's labels. | |
matchFields ↓ | array | A list of node selector requirements by node's fields. |
matchExpressions (items)
A list of node selector requirements by node's labels.
Properties
Property | Type | Constraints | Description |
---|---|---|---|
key | string | The label key that the selector applies to. | |
operator | string | Represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt. | |
values | array | An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch. |
matchFields (items)
A list of node selector requirements by node's fields.
Properties
Property | Type | Constraints | Description |
---|---|---|---|
key | string | The label key that the selector applies to. | |
operator | string | Represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt. | |
values | array | An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch. |
Status
WorkloadProfileStatus defines the observed state of WorkloadProfile.