-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Environment
- ksqlDB: 0.29.0 (confluentinc/ksqldb-server:0.29.0, confluentinc/ksqldb-cli:0.29.0)
- Kafka: Confluent Platform 7.9.1 (cp-kafka:7.9.1, cp-zookeeper:7.9.1)
- Kafka Connect: debezium/connect:2.7.3.Final (MariaDB)
- Deployment: Docker, bridge network
- Serdes: KEY_FORMAT='KAFKA', VALUE_FORMAT='JSON'
- All topics in this repro: PARTITIONS=1
Context
We enrich an event stream with:
- a TABLE mapping event_type (INT) to a label (STRING),
- a TABLE mapping device_type (INT) to a label (STRING),
- two dimension TABLEs for users and organizations (join on uuid).
With the TABLE join on event_type, some events never appear in the output stream at all. There are no NULL labels; the rows are simply not emitted. If we replace the TABLE join with either a CASE mapping, all events are emitted as expected.
Minimal reproduction
Session settings:
SET 'auto.offset.reset'='earliest';
SET 'ksql.streams.cache.max.bytes.buffering'='0';
Mapping TABLE:
CREATE TABLE EVENT_TYPE_MAP (
event_type INT PRIMARY KEY,
event_type_label VARCHAR
) WITH (
KAFKA_TOPIC='vw_event_type_map',
KEY_FORMAT='KAFKA',
VALUE_FORMAT='JSON',
PARTITIONS=1
);
INSERT INTO EVENT_TYPE_MAP VALUES (1000, 'UserLoggedIn');
INSERT INTO EVENT_TYPE_MAP VALUES (1100, 'CipherCreated');
Source STREAM and two events
CREATE STREAM VW_EVENT (
uuid STRING KEY,
event_type INT
) WITH (
KAFKA_TOPIC='vw_event',
KEY_FORMAT='KAFKA',
VALUE_FORMAT='JSON',
PARTITIONS=1
);
Enriched output with LEFT JOIN (problematic path):
CREATE STREAM VW_EVENT_ENRICHED WITH (
KAFKA_TOPIC='vw_enriched',
VALUE_FORMAT='JSON'
) AS
SELECT
ev.uuid,
ev.event_type,
m.event_type_label
FROM VW_EVENT ev
LEFT JOIN EVENT_TYPE_MAP m
ON ev.event_type = m.event_type
EMIT CHANGES;
Observation:
SELECT uuid, event_type, event_type_label
FROM VW_EVENT_ENRICHED
EMIT CHANGES LIMIT 2;
Actual: only one of the two input events appears in vw_enriched (the 1000 case). The 1100 event is not emitted at all.
Expected: both rows should be emitted (1000 → 'UserLoggedIn', 1100 → 'CipherCreated').
- Troubleshooting already done
- Ensured only one query writes to the output (SHOW QUERIES).
- Dropped and recreated all objects and topics (TERMINATE, DROP ... DELETE TOPIC).
- Set auto.offset.reset='earliest' before creating objects.
- Verified the mapping TABLE is populated:
- PRINT 'vw_event_type_map' FROM BEGINNING shows the inserted rows.
- SELECT * FROM EVENT_TYPE_MAP EMIT CHANGES LIMIT 2; returns the rows.
- Tried explicit casts in the join (CAST(ev.event_type AS INT) = CAST(m.event_type AS INT)).
- No WHERE clause filtering the result.
- The same pattern with TABLE joins on uuid STRING for users/orgs works correctly; the problem seems specific to the INT mapping TABLE join.
Question
Could you help explain why a STREAM LEFT JOIN TABLE on an INT column would drop some rows (no emission) even though matching entries exist in the TABLE, and how to resolve this?