Skip to content

Commit bf74432

Browse files
author
Sherif Akoush
authored
feat: Config logging via helm (#6312)
1 parent 15aed87 commit bf74432

File tree

18 files changed

+153
-44
lines changed

18 files changed

+153
-44
lines changed

docs-gb/installation/helm/README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ This section details the key Helm configuration parameters for Envoy, Autoscalin
77
* **Envoy**: Manage pre-stop behaviors and configure access logging to track request-level interactions.
88
* **Autoscaling** (Experimental): Fine-tune dynamic scaling policies for efficient resource allocation based on real-time inference workloads.
99
* **Servers**: Define grace periods for controlled shutdowns and optimize model control plane parameters for efficient model loading, unloading, and error handling.
10+
* **Logging**: Define log levels for the different components of the system.
1011

1112

1213
## Envoy
@@ -60,3 +61,23 @@ This section details the key Helm configuration parameters for Envoy, Autoscalin
6061
| `agent.maxUnloadElapsedTimeMinutes` | components | Max time allowed for one model unload command for a model on a particular server replica to take. Lower values allow errors to be exposed faster. | 15 |
6162
| `agent.maxUnloadRetryCount` | components | Max number of retries for unsuccessful unload command for a model on a particular server replica. Lower values allow control plane commands to fail faster. | 5 |
6263
| `agent.unloadGracePeriodSeconds` | components | A period guarding against race conditions between Envoy actually applying the cluster change to remove a route and before proceeding with the model replica unloading command. | 2 |
64+
65+
66+
## Logging
67+
68+
### Component Log Level
69+
70+
| Key | Chart | Description | Default
71+
| --- | --- | --- | --- |
72+
| `logging.logLevel` | components | Components wide settings for logging level, if individual component levels are not set. Options are: debug, info, error. | info |
73+
| `controller.logLevel` | components | check zap log level [here](https://pkg.go.dev/go.uber.org/zap#pkg-constants) | |
74+
| `dataflow.logLevel` | components | | check klogging level [here](https://dokka.klogging.io/-klogging/io.klogging/-level/index.html) |
75+
| `scheduler.logLevel` | components | check logrus log level [here](https://pkg.go.dev/github.com/sirupsen/logrus#Level) | |
76+
| `modelgateway.logLevel` | components | check logrus log level [here](https://pkg.go.dev/github.com/sirupsen/logrus#Level) | |
77+
| `pipelinegateway.logLevel` | components | check logrus log level [here](https://pkg.go.dev/github.com/sirupsen/logrus#Level) | |
78+
| `hodometer.logLevel` | components | check logrus log level [here](https://pkg.go.dev/github.com/sirupsen/logrus#Level) | |
79+
| `serverConfig.rclone.logLevel` | components | check rclone `log-level` [here](https://rclone.org/docs/) | |
80+
| `serverConfig.agent.logLevel` | components | check logrus log level [here](https://pkg.go.dev/github.com/sirupsen/logrus#Level) | |
81+
82+
**Notes**:
83+
- We set kafka client library log level from the log level that is passed to the component, which could be different to the level expected by `librdkafka` (syslog level). In this case we attempt to map the log level value to the best match.

k8s/helm-charts/seldon-core-v2-setup/templates/seldon-v2-components.yaml

Lines changed: 32 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -410,6 +410,7 @@ spec:
410410
- --leader-elect
411411
- --namespace=$(POD_NAMESPACE)
412412
- --clusterwide=$(CLUSTERWIDE)
413+
- --log-level=$(LOG_LEVEL)
413414
command:
414415
- /manager
415416
env:
@@ -430,6 +431,9 @@ spec:
430431
value: '{{ .Values.security.controlplane.ssl.client.caPath }}'
431432
- name: CONTROL_PLANE_SERVER_TLS_CA_LOCATION
432433
value: '{{ .Values.security.controlplane.ssl.client.serverCaPath }}'
434+
- name: LOG_LEVEL
435+
value: '{{ hasKey .Values.controller "logLevel" | ternary .Values.controller.logLevel
436+
.Values.logging.logLevel }}'
433437
- name: POD_NAMESPACE
434438
valueFrom:
435439
fieldRef:
@@ -558,7 +562,8 @@ spec:
558562
- name: ENABLE_SERVER_AUTOSCALING
559563
value: '{{ .Values.autoscaling.autoscalingServerEnabled }}'
560564
- name: LOG_LEVEL
561-
value: '{{ .Values.scheduler.logLevel }}'
565+
value: '{{ hasKey .Values.scheduler "logLevel" | ternary .Values.scheduler.logLevel
566+
.Values.logging.logLevel }}'
562567
- name: ALLOW_PLAINTXT
563568
value: "true"
564569
- name: POD_NAMESPACE
@@ -711,7 +716,8 @@ spec:
711716
- name: CONTROL_PLANE_SERVER_TLS_CA_LOCATION
712717
value: '{{ .Values.security.controlplane.ssl.client.serverCaPath }}'
713718
- name: LOG_LEVEL
714-
value: '{{ .Values.pipelinegateway.logLevel }}'
719+
value: '{{ hasKey .Values.pipelinegateway "logLevel" | ternary .Values.pipelinegateway.logLevel
720+
.Values.logging.logLevel }}'
715721
- name: SELDON_SCHEDULER_PLAINTXT_PORT
716722
value: "9004"
717723
- name: SELDON_SCHEDULER_TLS_PORT
@@ -843,7 +849,8 @@ spec:
843849
- name: ENVOY_DOWNSTREAM_SERVER_TLS_CA_LOCATION
844850
value: '{{ .Values.security.envoy.ssl.downstream.client.serverCaPath }}'
845851
- name: LOG_LEVEL
846-
value: '{{ .Values.modelgateway.logLevel }}'
852+
value: '{{ hasKey .Values.modelgateway "logLevel" | ternary .Values.modelgateway.logLevel
853+
.Values.logging.logLevel }}'
847854
- name: SELDON_SCHEDULER_PLAINTXT_PORT
848855
value: "9004"
849856
- name: SELDON_SCHEDULER_TLS_PORT
@@ -895,7 +902,8 @@ spec:
895902
- name: METRICS_LEVEL
896903
value: '{{ .Values.hodometer.metricsLevel }}'
897904
- name: LOG_LEVEL
898-
value: '{{ .Values.hodometer.logLevel }}'
905+
value: '{{ hasKey .Values.hodometer "logLevel" | ternary .Values.hodometer.logLevel
906+
.Values.logging.logLevel }}'
899907
- name: EXTRA_PUBLISH_URLS
900908
value: '{{ .Values.hodometer.extraPublishUrls }}'
901909
- name: CONTROL_PLANE_SECURITY_PROTOCOL
@@ -1058,6 +1066,12 @@ spec:
10581066
}}'
10591067
- name: SELDON_CORES_COUNT
10601068
value: '{{ .Values.dataflow.cores }}'
1069+
- name: SELDON_LOG_LEVEL_APP
1070+
value: '{{ hasKey .Values.dataflow "logLevel" | ternary .Values.dataflow.logLevel
1071+
.Values.logging.logLevel | upper }}'
1072+
- name: SELDON_LOG_LEVEL_KAFKA
1073+
value: '{{ hasKey .Values.dataflow "logLevel" | ternary .Values.dataflow.logLevel
1074+
.Values.logging.logLevel | upper }}'
10611075
- name: SELDON_UPSTREAM_HOST
10621076
value: seldon-scheduler
10631077
- name: SELDON_UPSTREAM_PORT
@@ -1136,7 +1150,11 @@ metadata:
11361150
spec:
11371151
podSpec:
11381152
containers:
1139-
- image: '{{ .Values.serverConfig.rclone.image.registry }}/{{ .Values.serverConfig.rclone.image.repository
1153+
- env:
1154+
- name: RCLONE_LOG_LEVEL
1155+
value: '{{ hasKey .Values.serverConfig.rclone "logLevel" | ternary .Values.serverConfig.rclone.logLevel
1156+
.Values.logging.logLevel | upper }}'
1157+
image: '{{ .Values.serverConfig.rclone.image.registry }}/{{ .Values.serverConfig.rclone.image.repository
11401158
}}:{{ .Values.serverConfig.rclone.image.tag }}'
11411159
imagePullPolicy: '{{ .Values.serverConfig.rclone.image.pullPolicy }}'
11421160
lifecycle:
@@ -1232,7 +1250,8 @@ spec:
12321250
- name: MLSERVER_TRACING_SERVER
12331251
value: '{{ .Values.opentelemetry.endpoint }}'
12341252
- name: SELDON_LOG_LEVEL
1235-
value: '{{ .Values.serverConfig.agent.logLevel }}'
1253+
value: '{{ hasKey .Values.serverConfig.agent "logLevel" | ternary .Values.serverConfig.agent.logLevel
1254+
.Values.logging.logLevel }}'
12361255
- name: SELDON_SERVER_HTTP_PORT
12371256
value: "9000"
12381257
- name: SELDON_SERVER_GRPC_PORT
@@ -1404,7 +1423,11 @@ metadata:
14041423
spec:
14051424
podSpec:
14061425
containers:
1407-
- image: '{{ .Values.serverConfig.rclone.image.registry }}/{{ .Values.serverConfig.rclone.image.repository
1426+
- env:
1427+
- name: RCLONE_LOG_LEVEL
1428+
value: '{{ hasKey .Values.serverConfig.rclone "logLevel" | ternary .Values.serverConfig.rclone.logLevel
1429+
.Values.logging.logLevel | upper }}'
1430+
image: '{{ .Values.serverConfig.rclone.image.registry }}/{{ .Values.serverConfig.rclone.image.repository
14081431
}}:{{ .Values.serverConfig.rclone.image.tag }}'
14091432
imagePullPolicy: '{{ .Values.serverConfig.rclone.image.pullPolicy }}'
14101433
lifecycle:
@@ -1498,7 +1521,8 @@ spec:
14981521
- name: ENVOY_UPSTREAM_CLIENT_TLS_CA_LOCATION
14991522
value: '{{ .Values.security.envoy.ssl.upstream.server.clientCaPath }}'
15001523
- name: SELDON_LOG_LEVEL
1501-
value: '{{ .Values.serverConfig.agent.logLevel }}'
1524+
value: '{{ hasKey .Values.serverConfig.agent "logLevel" | ternary .Values.serverConfig.agent.logLevel
1525+
.Values.logging.logLevel }}'
15021526
- name: SELDON_SERVER_HTTP_PORT
15031527
value: "9000"
15041528
- name: SELDON_SERVER_GRPC_PORT

k8s/helm-charts/seldon-core-v2-setup/values.yaml

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,15 @@ opentelemetry:
7979
disable: false
8080
ratio: 1
8181

82+
# logging
83+
# this is a global setting, in the case individual components logLevel is not set
84+
# Users should set a value from:
85+
# fatal, error, warn, info, debug, trace
86+
# if used also for .rclone.logLevel, the allowed set reduces to:
87+
# debug, info, error
88+
logging:
89+
logLevel: info
90+
8291
hodometer:
8392
image:
8493
pullPolicy: IfNotPresent
@@ -131,7 +140,6 @@ modelgateway:
131140
runAsUser: 1000
132141
runAsGroup: 1000
133142
runAsNonRoot: true
134-
logLevel: warn
135143

136144
pipelinegateway:
137145
image:
@@ -147,7 +155,6 @@ pipelinegateway:
147155
runAsUser: 1000
148156
runAsGroup: 1000
149157
runAsNonRoot: true
150-
logLevel: warn
151158

152159
dataflow:
153160
image:
@@ -226,7 +233,6 @@ scheduler:
226233
runAsGroup: 1000
227234
runAsNonRoot: true
228235
schedulerReadyTimeoutSeconds: 600
229-
logLevel: warn
230236

231237
autoscaling:
232238
autoscalingModelEnabled: false
@@ -252,6 +258,8 @@ serverConfig:
252258
resources:
253259
cpu: 50m
254260
memory: 128Mi
261+
# should follow `log-level` from https://rclone.org/docs/
262+
logLevel: info
255263

256264
agent:
257265
image:
@@ -274,7 +282,6 @@ serverConfig:
274282
resources:
275283
cpu: 200m
276284
memory: 1Gi
277-
logLevel: warn
278285

279286
mlserver:
280287
image:

k8s/helm-charts/seldon-core-v2-setup/values.yaml.template

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,15 @@ opentelemetry:
7979
disable: false
8080
ratio: 1
8181

82+
# logging
83+
# this is a global setting, in the case individual components logLevel is not set
84+
# Users should set a value from:
85+
# fatal, error, warn, info, debug, trace
86+
# if used also for .rclone.logLevel, the allowed set reduces to:
87+
# debug, info, error
88+
logging:
89+
logLevel: info
90+
8291
hodometer:
8392
image:
8493
pullPolicy: IfNotPresent
@@ -131,7 +140,6 @@ modelgateway:
131140
runAsUser: 1000
132141
runAsGroup: 1000
133142
runAsNonRoot: true
134-
logLevel: warn
135143

136144
pipelinegateway:
137145
image:
@@ -147,7 +155,6 @@ pipelinegateway:
147155
runAsUser: 1000
148156
runAsGroup: 1000
149157
runAsNonRoot: true
150-
logLevel: warn
151158

152159
dataflow:
153160
image:
@@ -226,7 +233,6 @@ scheduler:
226233
runAsGroup: 1000
227234
runAsNonRoot: true
228235
schedulerReadyTimeoutSeconds: 600
229-
logLevel: warn
230236

231237
autoscaling:
232238
autoscalingModelEnabled: false
@@ -252,6 +258,8 @@ serverConfig:
252258
resources:
253259
cpu: 50m
254260
memory: 128Mi
261+
# should follow `log-level` from https://rclone.org/docs/
262+
logLevel: info
255263

256264
agent:
257265
image:
@@ -274,7 +282,6 @@ serverConfig:
274282
resources:
275283
cpu: 200m
276284
memory: 1Gi
277-
logLevel: warn
278285

279286
mlserver:
280287
image:

k8s/kustomize/helm-components-sc/patch_controller.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,5 @@ spec:
3333
value: '{{ .Values.security.controlplane.ssl.client.caPath }}'
3434
- name: CONTROL_PLANE_SERVER_TLS_CA_LOCATION
3535
value: '{{ .Values.security.controlplane.ssl.client.serverCaPath }}'
36+
- name: LOG_LEVEL
37+
value: '{{ hasKey .Values.controller "logLevel" | ternary .Values.controller.logLevel .Values.logging.logLevel }}'

k8s/kustomize/helm-components-sc/patch_dataflow.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,10 @@ spec:
5353
value: '{{ .Values.security.kafka.ssl.client.endpointIdentificationAlgorithm }}'
5454
- name: SELDON_CORES_COUNT
5555
value: '{{ .Values.dataflow.cores }}'
56+
- name: SELDON_LOG_LEVEL_APP
57+
value: '{{ hasKey .Values.dataflow "logLevel" | ternary .Values.dataflow.logLevel .Values.logging.logLevel | upper }}'
58+
- name: SELDON_LOG_LEVEL_KAFKA
59+
value: '{{ hasKey .Values.dataflow "logLevel" | ternary .Values.dataflow.logLevel .Values.logging.logLevel | upper }}'
5660
resources:
5761
requests:
5862
cpu: '{{ .Values.dataflow.resources.cpu }}'

k8s/kustomize/helm-components-sc/patch_hodometer.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ spec:
2222
- name: METRICS_LEVEL
2323
value: '{{ .Values.hodometer.metricsLevel }}'
2424
- name: LOG_LEVEL
25-
value: '{{ .Values.hodometer.logLevel }}'
25+
value: '{{ hasKey .Values.hodometer "logLevel" | ternary .Values.hodometer.logLevel .Values.logging.logLevel }}'
2626
- name: EXTRA_PUBLISH_URLS
2727
value: '{{ .Values.hodometer.extraPublishUrls }}'
2828
- name: CONTROL_PLANE_SECURITY_PROTOCOL

k8s/kustomize/helm-components-sc/patch_mlserver.yaml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,10 @@ spec:
66
podSpec:
77
imagePullSecrets: []
88
containers:
9-
- image: '{{ .Values.serverConfig.rclone.image.registry }}/{{ .Values.serverConfig.rclone.image.repository }}:{{ .Values.serverConfig.rclone.image.tag }}'
9+
- env:
10+
- name: RCLONE_LOG_LEVEL
11+
value: '{{ hasKey .Values.serverConfig.rclone "logLevel" | ternary .Values.serverConfig.rclone.logLevel .Values.logging.logLevel | upper }}'
12+
image: '{{ .Values.serverConfig.rclone.image.registry }}/{{ .Values.serverConfig.rclone.image.repository }}:{{ .Values.serverConfig.rclone.image.tag }}'
1013
imagePullPolicy: '{{ .Values.serverConfig.rclone.image.pullPolicy }}'
1114
name: rclone
1215
resources:
@@ -73,7 +76,7 @@ spec:
7376
- name: MLSERVER_TRACING_SERVER
7477
value: '{{ .Values.opentelemetry.endpoint }}'
7578
- name: SELDON_LOG_LEVEL
76-
value: '{{ .Values.serverConfig.agent.logLevel }}'
79+
value: '{{ hasKey .Values.serverConfig.agent "logLevel" | ternary .Values.serverConfig.agent.logLevel .Values.logging.logLevel }}'
7780
image: '{{ .Values.serverConfig.agent.image.registry }}/{{ .Values.serverConfig.agent.image.repository }}:{{ .Values.serverConfig.agent.image.tag }}'
7881
imagePullPolicy: '{{ .Values.serverConfig.agent.image.pullPolicy }}'
7982
name: agent

k8s/kustomize/helm-components-sc/patch_modelgateway.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ spec:
7171
- name: ENVOY_DOWNSTREAM_SERVER_TLS_CA_LOCATION
7272
value: '{{ .Values.security.envoy.ssl.downstream.client.serverCaPath }}'
7373
- name: LOG_LEVEL
74-
value: '{{ .Values.modelgateway.logLevel }}'
74+
value: '{{ hasKey .Values.modelgateway "logLevel" | ternary .Values.modelgateway.logLevel .Values.logging.logLevel }}'
7575
resources:
7676
requests:
7777
cpu: '{{ .Values.modelgateway.resources.cpu }}'

k8s/kustomize/helm-components-sc/patch_pipelinegateway.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,4 +86,4 @@ spec:
8686
- name: CONTROL_PLANE_SERVER_TLS_CA_LOCATION
8787
value: '{{ .Values.security.controlplane.ssl.client.serverCaPath }}'
8888
- name: LOG_LEVEL
89-
value: '{{ .Values.pipelinegateway.logLevel }}'
89+
value: '{{ hasKey .Values.pipelinegateway "logLevel" | ternary .Values.pipelinegateway.logLevel .Values.logging.logLevel }}'

0 commit comments

Comments
 (0)