-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Description
IMO the default for deduplication should be set to false:
- The sorting that is done during deduplication is currently based on the string representation of a whole record.
The first major field in a record that changes per record is currentlyvalue.time, which is a unix timestamp in the format like e.g.1.521206142E9(int1521206142). In this example, a record with a timestamp of1.52120614E9(int1521206140) is sorted behind the previous one, but should be before. - Deleting records at this point in the pipeline is IMO not a good idea (which obviously happens when deduplicating). This tool provides the raw data that is saved long-term in our storages, and if records are deleted here it might be very hard to get them back later.
- Records from certain devices might seem "duplicate" but are actually separate records. If both stamps (device and android) for two consecutive records are the same this might happen. An example would be if the device only provides second precision timestamps, and multiple timestamps arrive at a DeviceManager at the same time and are looped over such that the android timestamp from one loop to the next does not change. (I have confirmed this can happen for the Biovotion device at least.)
- Edit: Another problem with sorting this way, is that if both device and android timestamps are the same for multiple records, these records get sorted by the sensor value...
Metadata
Metadata
Assignees
Labels
No labels