Skip to content

Commit be0bdf9

Browse files
fix(logger): Improve error handling to return errors when creating Kafka topics (#640)
# Description It appears that calling `CreateTopics` [here](https://github.com/caraml-dev/merlin/blob/eb52b7f240e3cc516026816e15b3b248a1aa3850/api/pkg/inference-logger/logger/kafka_sink.go#L59) in `NewKafkaSink` sometimes doesn't return errors within the `err` return value, but instead in each of the individual `topicResults` (this is a slice) elements returned. This happens in certain scenarios, e.g. when there are authorisation problems creating new topics, which causes errors to be returned within each element of the `topicResults` slice instead (leaving `err` as `nil` instead): ![Screenshot 2025-04-22 at 15 19 52](https://github.com/user-attachments/assets/d9f828a6-2602-4a05-a5bc-491a43dd4825) This is problematic because it means that even though there were problems creating the topic, a `nil` error gets returned, with a `nil` sink value. Tracing this back to where `NewKafkaSink` gets called in `getLogSink`, https://github.com/caraml-dev/merlin/blob/main/api/cmd/inference-logger/main.go#L362, and to where `getLogSink` gets called, https://github.com/caraml-dev/merlin/blob/main/api/cmd/inference-logger/main.go#L130, it's easy to see that the error isn't properly caught (since the error returned is `nil`) at any part of the call stack. To make things worse, because `logSink` is `nil`, it continues getting passed around the entire logger code as if it were non-nil, creating huge problems downstream, like null pointer exceptions (and crashes) because `logSink` is not at all expected to be `nil`. This PR simply refactors the error handling of the Kafka sink initialisation step, so that any errors are properly returned and subsequently caught by the upstream callers. # Modifications - `api/pkg/inference-logger/logger/kafka_sink.go` - Improve error handling # Tests <!-- Besides the existing / updated automated tests, what specific scenarios should be tested? Consider the backward compatibility of the changes, whether corner cases are covered, etc. Please describe the tests and check the ones that have been completed. Eg: - [x] Deploying new and existing standard models - [ ] Deploying PyFunc models --> # Checklist - [x] Added PR label - [ ] Added unit test, integration, and/or e2e tests - [x] Tested locally - [ ] Updated documentation - [ ] Update Swagger spec if the PR introduce API changes - [ ] Regenerated Golang and Python client if the PR introduces API changes # Release Notes ```release-note None ```
1 parent eb52b7f commit be0bdf9

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

api/pkg/inference-logger/logger/kafka_sink.go

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,9 +63,12 @@ func NewKafkaSink(
6363
ReplicationFactor: ReplicationFactor,
6464
},
6565
})
66+
if err != nil {
67+
return nil, err
68+
}
6669
for _, result := range topicResults {
6770
if result.Error.Code() != kafka.ErrNoError && result.Error.Code() != kafka.ErrTopicAlreadyExists {
68-
return nil, err
71+
return nil, fmt.Errorf("failed to create topic %s: %w", result.Topic, result.Error)
6972
}
7073
}
7174

0 commit comments

Comments
 (0)