Skip to content

STREAM LEFT JOIN TABLE drops a subset of events (rows not emitted) #10790

@Sh4dow998

Description

@Sh4dow998

Environment

  • ksqlDB: 0.29.0 (confluentinc/ksqldb-server:0.29.0, confluentinc/ksqldb-cli:0.29.0)
  • Kafka: Confluent Platform 7.9.1 (cp-kafka:7.9.1, cp-zookeeper:7.9.1)
  • Kafka Connect: debezium/connect:2.7.3.Final (MariaDB)
  • Deployment: Docker, bridge network
  • Serdes: KEY_FORMAT='KAFKA', VALUE_FORMAT='JSON'
  • All topics in this repro: PARTITIONS=1

Context

We enrich an event stream with:

  • a TABLE mapping event_type (INT) to a label (STRING),
  • a TABLE mapping device_type (INT) to a label (STRING),
  • two dimension TABLEs for users and organizations (join on uuid).

With the TABLE join on event_type, some events never appear in the output stream at all. There are no NULL labels; the rows are simply not emitted. If we replace the TABLE join with either a CASE mapping, all events are emitted as expected.

Minimal reproduction

Session settings:

SET 'auto.offset.reset'='earliest';
SET 'ksql.streams.cache.max.bytes.buffering'='0';

Mapping TABLE:

CREATE TABLE EVENT_TYPE_MAP (
  event_type INT PRIMARY KEY,
  event_type_label VARCHAR
) WITH (
  KAFKA_TOPIC='vw_event_type_map',
  KEY_FORMAT='KAFKA',
  VALUE_FORMAT='JSON',
  PARTITIONS=1
);

INSERT INTO EVENT_TYPE_MAP VALUES (1000, 'UserLoggedIn');
INSERT INTO EVENT_TYPE_MAP VALUES (1100, 'CipherCreated');

Source STREAM and two events

CREATE STREAM VW_EVENT (
  uuid STRING KEY,
  event_type INT
) WITH (
  KAFKA_TOPIC='vw_event',
  KEY_FORMAT='KAFKA',
  VALUE_FORMAT='JSON',
  PARTITIONS=1
);

Enriched output with LEFT JOIN (problematic path):

CREATE STREAM VW_EVENT_ENRICHED WITH (
  KAFKA_TOPIC='vw_enriched',
  VALUE_FORMAT='JSON'
) AS
SELECT
  ev.uuid,
  ev.event_type,
  m.event_type_label
FROM VW_EVENT ev
LEFT JOIN EVENT_TYPE_MAP m
  ON ev.event_type = m.event_type
EMIT CHANGES;

Observation:

SELECT uuid, event_type, event_type_label
FROM VW_EVENT_ENRICHED
EMIT CHANGES LIMIT 2;

Actual: only one of the two input events appears in vw_enriched (the 1000 case). The 1100 event is not emitted at all.
Expected: both rows should be emitted (1000 → 'UserLoggedIn', 1100 → 'CipherCreated').

  • Troubleshooting already done
  • Ensured only one query writes to the output (SHOW QUERIES).
  • Dropped and recreated all objects and topics (TERMINATE, DROP ... DELETE TOPIC).
  • Set auto.offset.reset='earliest' before creating objects.
  • Verified the mapping TABLE is populated:
    • PRINT 'vw_event_type_map' FROM BEGINNING shows the inserted rows.
    • SELECT * FROM EVENT_TYPE_MAP EMIT CHANGES LIMIT 2; returns the rows.
  • Tried explicit casts in the join (CAST(ev.event_type AS INT) = CAST(m.event_type AS INT)).
  • No WHERE clause filtering the result.
  • The same pattern with TABLE joins on uuid STRING for users/orgs works correctly; the problem seems specific to the INT mapping TABLE join.

Question

Could you help explain why a STREAM LEFT JOIN TABLE on an INT column would drop some rows (no emission) even though matching entries exist in the TABLE, and how to resolve this?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions