Skip to content

set deduplicate default to false #16

@sboettcher

Description

@sboettcher

IMO the default for deduplication should be set to false:

  • The sorting that is done during deduplication is currently based on the string representation of a whole record.
    The first major field in a record that changes per record is currently value.time, which is a unix timestamp in the format like e.g. 1.521206142E9 (int 1521206142). In this example, a record with a timestamp of 1.52120614E9 (int 1521206140) is sorted behind the previous one, but should be before.
  • Deleting records at this point in the pipeline is IMO not a good idea (which obviously happens when deduplicating). This tool provides the raw data that is saved long-term in our storages, and if records are deleted here it might be very hard to get them back later.
  • Records from certain devices might seem "duplicate" but are actually separate records. If both stamps (device and android) for two consecutive records are the same this might happen. An example would be if the device only provides second precision timestamps, and multiple timestamps arrive at a DeviceManager at the same time and are looped over such that the android timestamp from one loop to the next does not change. (I have confirmed this can happen for the Biovotion device at least.)
  • Edit: Another problem with sorting this way, is that if both device and android timestamps are the same for multiple records, these records get sorted by the sensor value...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions