Skip to content

RDBMS ingestion process can miss updates #128

@banderous

Description

@banderous

This bug concerns the RDBMS ingestion process as described in the official docs

The suggested approach is based on tracking row modification time in the sql_last_value of the jdbc plugin and using it to find any rows modified since the previous logstash query.

For this to be correct would require every changed row's modification_time to be >= the current max modification_time but this is not the case - a row's modification_time is the time the row was written and not the time the database transaction committed - which could be significantly later.

It is therefore possible for a row to be committed with a modification time lower than the current maximum modification time and never get indexed since it's modification_time will be < sql_last_value.

Steps to reproduce:

I was able to reproduce this by following the linked tutorial and using two different database connections (C1 & C2):

C1 - begin; INSERT INTO es_table (id, client_name) VALUES (4,"Missing!"); -- Never gets indexed

C2 - INSERT INTO es_table (id, client_name) VALUES (5,"latest"); -- Gets a higher modification_time, indexed
  -- wait for logstash query to run
C1 - commit;

The row inserted on C1 never gets indexed since when committed it has a lower modification_time than the row committed on C2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions