Skip to content

SchedulingConfigTemplate

SchedulingConfigTemplate is the Schema for the schedulingconfigtemplates API.

Kubernetes Resource Information

FieldValue
API Versiontensor-fusion.ai/v1
KindSchedulingConfigTemplate
ScopeCluster

Table of Contents

Spec

Place the workload to right nodes and scale smart.

Property
Type
Constraints
Description
autoScaling objectscale the workload based on the usage and traffic
hypervisor objectsingle GPU device multi-process queuing and fair scheduling with QoS constraint
placement objectplace the client or worker to best matched nodes
reBalancer objectavoid hot GPU devices and continuously balance the workload
implemented by trigger a simulation scheduling and advise better GPU nodes for scheduler

autoScaling

scale the workload based on the usage and traffic

Properties

Property
Type
Constraints
Description
autoSetLimits objectlayer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly
autoSetReplicas objectlayer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit
autoSetRequests objectlayer 3 adjusting, to match the actual usage in the long run
scaleToZero objectadditional layer to save VRAM, auto-freeze memory and cool down to RAM and Disk

autoSetLimits

layer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly

Properties

Property
Type
Constraints
Description
evaluationPeriodstring
extraTFlopsBufferRatiostring
ignoredDeltaRangestring
maxRatioToRequestsstringthe multiplier of requests, to avoid limit set too high, like 5.0
prediction object
scaleUpStepstring

prediction

Properties
Property
Type
Constraints
Description
enableboolean
historyDataPeriodstring
modelstring
predictionPeriodstring

autoSetReplicas

layer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit

Properties

Property
Type
Constraints
Description
enableboolean
evaluationPeriodstring
scaleDownCoolDownTimestring
scaleDownStepstring
scaleUpCoolDownTimestring
scaleUpStepstring
targetTFlopsOfLimitsstring

autoSetRequests

layer 3 adjusting, to match the actual usage in the long run

Properties

Property
Type
Constraints
Description
aggregationPeriodstring
evaluationPeriodstring
extraBufferRatiostringthe request buffer ratio, for example actual usage is 1.0, 10% buffer will be 1.1 as final preferred requests
percentileForAutoRequestsstring
prediction object

prediction

Properties
Property
Type
Constraints
Description
enableboolean
historyDataPeriodstring
modelstring
predictionPeriodstring

scaleToZero

additional layer to save VRAM, auto-freeze memory and cool down to RAM and Disk

Properties

Property
Type
Constraints
Description
autoFreeze array
intelligenceWarmup object

autoFreeze (items)

Properties
Property
Type
Constraints
Description
enableboolean
freezeToDiskTTLstring
freezeToMemTTLstring
qosstringlow medium high critical

intelligenceWarmup

Properties
Property
Type
Constraints
Description
enableboolean
historyDataPeriodstring
modelstring
predictionPeriodstring

hypervisor

single GPU device multi-process queuing and fair scheduling with QoS constraint

Properties

Property
Type
Constraints
Description
multiProcessQueuing object

multiProcessQueuing

Properties

Property
Type
Constraints
Description
enableboolean
intervalstring
queueLevelTimeSlicesarray

placement

place the client or worker to best matched nodes

Properties

Property
Type
Constraints
Description
allowUsingLocalGPUbooleanDefault: true
gpuFilters array
modestringCompactFirst LowLoadFirstDefault: CompactFirst

gpuFilters (items)

Properties

Property
Type
Constraints
Description
paramsobject
typestring

reBalancer

avoid hot GPU devices and continuously balance the workload
implemented by trigger a simulation scheduling and advise better GPU nodes for scheduler

Properties

Property
Type
Constraints
Description
internalstring
reBalanceCoolDownTimestring
threshold object

threshold

Properties

Property
Type
Constraints
Description
matchAnyobject

Status

SchedulingConfigTemplateStatus defines the observed state of SchedulingConfigTemplate.