KEDA
業務でfallbackについて調査する機会がありましたので自分なりに整理しておきます。
Fallback
-
https://keda.sh/docs/2.9/concepts/scaling-deployments/#fallback
- Triggerで指定したevent sourceからmetricsが取得できない場合の代替動作
- KEDAが公開するcustom metrics(
/metrics
) で取得可能なmetric value
とreplicas
の値を(正しく取得・計算できないため) 正規化した値で返す
Warning
fallbackを使用する際の注意点
spec.triggers.metricType
がAverageValue
である場合に限られます- refs v2.9.1
- CPU/Memory scalerや
spec.triggers.metricType
がValue
であるscalerは未サポート
ScaledObjects
でのみサポートされていますScaledJobs
は未サポート
Fallback behavior
- event sourceからmetrics取得に失敗した場合、
NumberOfFailures
に1を加算する NumberOfFailures > spec.fallback.failureThreshold
の場合、fallback処理を呼び出すmetric value
とreplicas
として正規化した値を返す- 正規化のための数式は後述
metric value
-
以下計算式で求めます
-
https://keda.sh/docs/2.9/concepts/scaling-deployments/#fallback
-
https://github.com/kedacore/keda/blob/v2.9.1/pkg/fallback/fallback.go#L105
target metric value * fallback replicas
fallback動作時の
target metric value
はAverageValue
であるe.g. (cloudwatch scaler)
spec.triggers.targetMetricValue: 100 target metric value: 50 fallback replicas: 2
(100 / 2) * 2 = 100
-
-
https://keda.sh/docs/2.9/concepts/scaling-deployments/#triggers
- https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
With AverageValue, the value returned from the custom metrics API is divided by the number of Pods before being compared to the target.
-
最初にtarget metric valueの
AverageValue
を求め、fallback replicasで掛けている- https://github.com/kedacore/keda/blob/v2.9.1/pkg/fallback/fallback.go#L102
- https://github.com/kedacore/keda/blob/v2.9.1/pkg/scaling/cache/scalers_cache.go#L344-L361
- metricSpecs ExternalMetricSource が入るため要素数は常に
2
になる?(推測)
- metricSpecs ExternalMetricSource が入るため要素数は常に
metric value / number of pod
e.g.
target metric value: 100 fallback replicas: 2
(100 * 2) / 2 = 100
このexampleの場合、fallbackが動作した場合のmetric valueは
100
となります
-
replicas
- 以下計算式で求めます
-
https://keda.sh/docs/2.9/concepts/scaling-deployments/#triggers
metric value / target metric value (`spec.fallback.replicas` と同値となる)
e.g.
target metric value: 100 fallback replicas: 2
(100 * 2) / 100 = 2
このexampleの場合、fallbackが動作した場合のreplicasは
2
となります
-
検証
- deploy
- Install KEDA > 動作確認 で使用したmanifestsを使用
kubectl apply -f deployment.yaml
- Install KEDA > 動作確認 で使用したmanifestsを使用
-
Deploymentリソースの
Replicas
を削除- HPAリソースの
Replicas
と DeploymentリソースのReplicas
が競合してしまう -
1.
kubectl.kubernetes.io/last-applied-configuration
annotationsにspec.replicas
が存在することを確認$ kubectl get deployment nginx-deployment -o yaml | yq .metadata.annotations deployment.kubernetes.io/revision: "1" kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"name":"nginx-deployment","namespace":"default"},"spec":{"replicas":2,"selector":{"matchLabels":{"app":"nginx"}},"template":{"metadata":{"annotations":{"prometheus.io/port":"9113","prometheus.io/scrape":"true"},"labels":{"app":"nginx"}},"spec":{"containers":[{"image":"nginx:1.14.2","name":"nginx","ports":[{"containerPort":80}],"volumeMounts":[{"mountPath":"/etc/nginx/nginx.conf","name":"nginx-conf","readOnly":true,"subPath":"nginx.conf"}]},{"args":["-nginx.scrape-uri=http://localhost/nginx_status"],"image":"nginx/nginx-prometheus-exporter:0.11.0","name":"nginx-exporter","ports":[{"containerPort":9113}]}],"volumes":[{"configMap":{"items":[{"key":"nginx.conf","path":"nginx.conf"}],"name":"nginx-conf"},"name":"nginx-conf"}]}}}}
2.
kubectl.kubernetes.io/last-applied-configuration
annotationsのspec.replicas
を削除kubectl apply edit-last-applied deployment/nginx-deployment
3.
kubectl.kubernetes.io/last-applied-configuration
annotationsからspec.replicas
が削除されていることを確認$ kubectl get deployment nginx-deployment -o yaml | yq .metadata.annotations deployment.kubernetes.io/revision: "1" kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"name":"nginx-deployment","namespace":"default"},"spec":{"selector":{"matchLabels":{"app":"nginx"}},"template":{"metadata":{"annotations":{"prometheus.io/port":"9113","prometheus.io/scrape":"true"},"labels":{"app":"nginx"}},"spec":{"containers":[{"image":"nginx:1.14.2","name":"nginx","ports":[{"containerPort":80}],"volumeMounts":[{"mountPath":"/etc/nginx/nginx.conf","name":"nginx-conf","readOnly":true,"subPath":"nginx.conf"}]},{"args":["-nginx.scrape-uri=http://localhost/nginx_status"],"image":"nginx/nginx-prometheus-exporter:0.11.0","name":"nginx-exporter","ports":[{"containerPort":9113}]}],"volumes":[{"configMap":{"items":[{"key":"nginx.conf","path":"nginx.conf"}],"name":"nginx-conf"},"name":"nginx-conf"}]}}}}
4.
deployment.yaml
のDeployment manifestsからspec.replicas
を削除するvim deployment.yaml や sed -i -e '/replicas:\s2/d' deployment.yaml など
- HPAリソースの
-
scaledObjectを確認
Spec
の設定が想定通りであることを確認Status > Conditions
のType: Fallback
でFallbackが発生していないことを確認scaledObject
$ kubectl describe scaledObject nginx-scaledobject Name: nginx-scaledobject Namespace: default Labels: deploymentName=nginx-deployment scaledobject.keda.sh/name=nginx-scaledobject Annotations: <none> API Version: keda.sh/v1alpha1 Kind: ScaledObject Metadata: Creation Timestamp: 2023-02-03T14:44:15Z Finalizers: finalizer.keda.sh Generation: 1 Managed Fields: API Version: keda.sh/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"finalizer.keda.sh": f:labels: f:scaledobject.keda.sh/name: Manager: keda Operation: Update Time: 2023-02-03T14:44:15Z API Version: keda.sh/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:externalMetricNames: f:hpaName: f:originalReplicaCount: f:scaleTargetGVKR: .: f:group: f:kind: f:resource: f:version: f:scaleTargetKind: Manager: keda Operation: Update Subresource: status Time: 2023-02-03T14:44:15Z API Version: keda.sh/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:kubectl.kubernetes.io/last-applied-configuration: f:labels: .: f:deploymentName: f:spec: .: f:fallback: .: f:failureThreshold: f:replicas: f:maxReplicaCount: f:minReplicaCount: f:scaleTargetRef: .: f:name: f:triggers: Manager: kubectl-client-side-apply Operation: Update Time: 2023-02-03T14:44:15Z API Version: keda.sh/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:status: f:conditions: f:health: .: f:s0-prometheus-nginx_http_requests_total: .: f:numberOfFailures: f:status: Manager: keda-adapter Operation: Update Subresource: status Time: 2023-02-03T14:44:31Z Resource Version: 19302409 UID: 20ef1bf8-5476-4e17-ac31-2b33e73c758a Spec: Fallback: Failure Threshold: 3 Replicas: 5 Max Replica Count: 5 Min Replica Count: 1 Scale Target Ref: Name: nginx-deployment Triggers: Metadata: Metric Name: nginx_http_requests_total Query: sum(rate(nginx_http_requests_total{app="nginx"}[2m])) Server Address: http://prometheus-server.monitoring.svc.cluster.local Threshold: 3 Type: prometheus Status: Conditions: Message: ScaledObject is defined correctly and is ready for scaling Reason: ScaledObjectReady Status: True Type: Ready Message: Scaling is not performed because triggers are not active Reason: ScalerNotActive Status: False Type: Active Message: No fallbacks are active on this scaled object Reason: NoFallbackFound Status: False Type: Fallback External Metric Names: s0-prometheus-nginx_http_requests_total Health: s0-prometheus-nginx_http_requests_total: Number Of Failures: 0 Status: Happy Hpa Name: keda-hpa-nginx-scaledobject Original Replica Count: 1 Scale Target GVKR: Group: apps Kind: Deployment Resource: deployments Version: v1 Scale Target Kind: apps/v1.Deployment Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal KEDAScalersStarted 36m keda-operator Started scalers watch Normal ScaledObjectReady 36m keda-operator ScaledObject is ready for scaling
-
MetricsProviderの
Prometheus
をdownさせます-
prometheus-server
のreplicasを0
に変更する1. prometheus-server podが1つ起動していることを確認
$ kubectl get deployments -n monitoring prometheus-server NAME READY UP-TO-DATE AVAILABLE AGE prometheus-server 1/1 1 1 96d
2. prometheus-server deploymentのreplicasを0に変更する
$ kubectl edit deployments -n monitoring prometheus-server deployment.apps/prometheus-server edited
3. prometheus-server podが0となったことを確認
$ kubectl get deployments -n monitoring prometheus-server NAME READY UP-TO-DATE AVAILABLE AGE prometheus-server 0/0 0 0 96d
-
-
Fallbackが発生することを確認
-
scaledObject
リソースのStatus
フィールドに変化を確認1.
Health
のNumber Of Failures
が1
にカウントされ、Status
がHappy
からFailing
に変化$ kubectl describe scaledObject nginx-scaledobject snip... Status: Conditions: Message: ScaledObject is defined correctly and is ready for scaling Reason: ScaledObjectReady Status: True Type: Ready Message: Scaling is not performed because triggers are not active Reason: ScalerNotActive Status: False Type: Active Message: No fallbacks are active on this scaled object Reason: NoFallbackFound Status: False Type: Fallback External Metric Names: s0-prometheus-nginx_http_requests_total Health: s0-prometheus-nginx_http_requests_total: Number Of Failures: 1 Status: Failing
2.
Number Of Failures
が2
にカウントアップし、Events
フィールドにprometheus-server
へ接続できなかった旨のエラーが記録された。かつ、Status > Conditions > Type: Fallback
でStatus: True
へ変化があった$ kubectl describe scaledObject nginx-scaledobject snip... Status: Conditions: Message: ScaledObject is defined correctly and is ready for scaling Reason: ScaledObjectReady Status: True Type: Ready Message: Scaling is not performed because triggers are not active Reason: ScalerNotActive Status: False Type: Active Message: At least one trigger is falling back on this scaled object Reason: FallbackExists Status: True Type: Fallback External Metric Names: s0-prometheus-nginx_http_requests_total Health: s0-prometheus-nginx_http_requests_total: Number Of Failures: 2 Status: Failing snip... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal KEDAScalersStarted 43m keda-operator Started scalers watch Normal ScaledObjectReady 43m keda-operator ScaledObject is ready for scaling Warning KEDAScalerFailed 1s keda-operator Get "http://prometheus-server.monitoring.svc.cluster.local/api/v1/query?query=sum%28rate%28nginx_http_requests_total%7Bapp%3D%22nginx%22%7D%5B2m %5D%29%29&time=2023-02-03T15:27:18Z": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
3.
Number Of Failures
が3
にカウントアップ$ kubectl describe scaledObject nginx-scaledobject snip... Status: Conditions: Message: ScaledObject is defined correctly and is ready for scaling Reason: ScaledObjectReady Status: True Type: Ready Message: Scaling is not performed because triggers are not active Reason: ScalerNotActive Status: False Type: Active Message: No fallbacks are active on this scaled object Reason: NoFallbackFound Status: False Type: Fallback External Metric Names: s0-prometheus-nginx_http_requests_total Health: s0-prometheus-nginx_http_requests_total: Number Of Failures: 3 Status: Failing
-
-
HPAのReplicasがFallbackの設定通りとなったことを確認
get
$ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE keda-hpa-nginx-scaledobject Deployment/nginx-deployment 0/3 (avg) 1 5 5 43m
describe
$ kubectl describe hpa keda-hpa-nginx-scaledobject Name: keda-hpa-nginx-scaledobject Namespace: default Labels: app.kubernetes.io/managed-by=keda-operator app.kubernetes.io/name=keda-hpa-nginx-scaledobject app.kubernetes.io/part-of=nginx-scaledobject app.kubernetes.io/version=2.8.1 deploymentName=nginx-deployment scaledobject.keda.sh/name=nginx-scaledobject Annotations: <none> CreationTimestamp: Fri, 03 Feb 2023 14:44:15 +0000 Reference: Deployment/nginx-deployment Metrics: ( current / target ) "s0-prometheus-nginx_http_requests_total" (target average value): 0 / 3 Min replicas: 1 Max replicas: 5 Deployment pods: 5 current / 5 desired Conditions: Type Status Reason Message ---- ------ ------ ------- AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 1 ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from external metric s0-prometheus-nginx_http_requests_total(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: nginx-scaledobject,},MatchExpressions:[]LabelSelectorRequirement{},}) ScalingLimited True TooFewReplicas the desired replica count is less than the minimum replica count Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedGetExternalMetric 29s (x3 over 68s) horizontal-pod-autoscaler unable to get external metric default/s0-prometheus-nginx_http_requests_total/&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: nginx-scaledobject,},MatchExpressions:[]LabelSelectorRequirement{},}: unable to fetch metrics from external metrics API: no matching metrics found for s0-prometheus-nginx_http_requests_total Warning FailedComputeMetricsReplicas 29s (x3 over 68s) horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get s0-prometheus-nginx_http_requests_total external metric: unable to get external metric default/s0-prometheus-nginx_http_requests_total/&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: nginx-scaledobject,},MatchExpressions:[]LabelSelectorRequirement{},}: unable to fetch metrics from external metrics API: no matching metrics found for s0-prometheus-nginx_http_requests_total Normal SuccessfulRescale 14s horizontal-pod-autoscaler New size: 1; reason: All metrics below target
-
keda-operator logからFallbackが発生したことを確認
- 以下のような纏まりのlog
Successfully set ScaleTarget replicas count to ScaledObject fallback.replicas
keda-operator log
2023-02-03T15:27:18Z ERROR prometheus_scaler error executing prometheus query {"type": "ScaledObject", "namespace": "default", "name": "nginx-scaledobject", "error": "Get \"http://prometheus-server.monitoring.svc.cluster.local/api/v1/query?query=sum%28rate%28nginx_http_requests_total%7Bapp%3D%22nginx%22%7D%5B2m%5D%29%29&time=2023-02-03T15:27:15Z\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"} github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).IsScaledObjectActive /workspace/pkg/scaling/cache/scalers_cache.go:89 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers /workspace/pkg/scaling/scale_handler.go:278 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop /workspace/pkg/scaling/scale_handler.go:149 2023-02-03T15:27:22Z ERROR prometheus_scaler error executing prometheus query {"type": "ScaledObject", "namespace": "default", "name": "nginx-scaledobject", "error": "Get \"http://prometheus-server.monitoring.svc.cluster.local/api/v1/query?query=sum%28rate%28nginx_http_requests_total%7Bapp%3D%22nginx%22%7D%5B2m%5D%29%29&time=2023-02-03T15:27:18Z\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"} github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).IsScaledObjectActive /workspace/pkg/scaling/cache/scalers_cache.go:94 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers /workspace/pkg/scaling/scale_handler.go:278 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop /workspace/pkg/scaling/scale_handler.go:149 2023-02-03T15:27:22Z ERROR scalehandler Error getting scale decision {"scaledobject.Name": "nginx-scaledobject", "scaledObject.Namespace": "default", "scaleTarget.Name": "nginx-deployment", "error": "Get \"http://prometheus-server.monitoring.svc.cluster.local/api/v1/query?query=sum%28rate%28nginx_http_requests_total%7Bapp%3D%22nginx%22%7D%5B2m%5D%29%29&time=2023-02-03T15:27:18Z\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"} github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers /workspace/pkg/scaling/scale_handler.go:278 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop /workspace/pkg/scaling/scale_handler.go:149 2023-02-03T15:27:22Z DEBUG events Warning {"object": {"kind":"ScaledObject","namespace":"default","name":"nginx-scaledobject","uid":"20ef1bf8-5476-4e17-ac31-2b33e73c758a","apiVersion":"keda.sh/v1alpha1","resourceVersion":"19308916"}, "reason": "KEDAScalerFailed", "message": "Get \"http://prometheus-server.monitoring.svc.cluster.local/api/v1/query?query=sum%28rate%28nginx_http_requests_total%7Bapp%3D%22nginx%22%7D%5B2m%5D%29%29&time=2023-02-03T15:27:18Z\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"} 2023-02-03T15:27:22Z INFO scaleexecutor Successfully set ScaleTarget replicas count to ScaledObject fallback.replicas {"scaledobject.Name": "nginx-scaledobject", "scaledObject.Namespace": "default", "scaleTarget.Name": "nginx-deployment", "Original Replicas Count": 1, "New Replicas Count": 5}
-
MetricsProviderが復旧したらMetricsに基づいてReplicasが指定されるようになります