-
Notifications
You must be signed in to change notification settings - Fork 3.7k
fix(tsm1): fix condition check for optimization of array cursor #26691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The array ascending cursors do not check all the required conditions before doing an optimization to copy a full 1000 points of data from the tsm read buffer to the cursors response buffer. This can result in points from tsm files appearing in cursor output multiple times. The conditions the bug appears are rare as it requires cache data that is older (or overlapping) with tsm data. The data must be in the same shard and must match for the same key. * fixes #26690
if c.tsm.pos < len(tvals.Timestamps) { | ||
if pos == 0 && len(c.res.Timestamps) >= len(tvals.Timestamps) { | ||
if pos == 0 && c.tsm.pos == 0 && len(c.res.Timestamps) >= len(tvals.Timestamps) { | ||
// optimization: all points can be served from TSM data because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how much of an optimization this is anymore. I would expect (maybe) that go doesn't reslice (create a new slice header) if copy(dst[0:], src[0:])
is used, but i haven't looked yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy calls into runtime.memmove. No slice headers are made but setup is done to get the arguments to memmove to point to the correct place in each backing array. memmove itself handles overlap per the spec.
Looks like this optimization avoids some setup to memmove at the code of the extra conditionals. Probably worthwhile to remove the "optimization" to have less code to debug and maintain. A microbench mark might show some speed differences.
This commit adds a test that verifies the duplicate data bug fix. It will fail without the previous commit. See the test for details; in short it sets up the right cache and tsm data to expose the bug which was occuring when the cache had older data than the tsm files and then the cache was exhausted while there were still more tsm values to include in more Next() calls.
kc := fs.KeyCursor(context.Background(), []byte("measurement,field=value#!~#value"), 0, true) | ||
defer kc.Close() | ||
|
||
cursor := newFloatArrayAscendingCursor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test only looks at float array cursors but all types are impacted. They are made from a template so i made the assumption testing one type also covers the other datatypes.
// and incorrectly copies TSM data including the points from the previous Next() call. Additionally, cache data | ||
// needs to be older than most of the TSM data. There needs to be enough tsm data younger than the cache data to completely | ||
// fill the remaining space in the response buffer. | ||
func TestAscendingCursorDuplicateDataBug(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AI helped me make this test more robust; but it also made it rather long with the setup and precondition verifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be cleaner with test helpers from
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend using modern test helpers, like require.NoError
instead of Fatalf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
The assert package is used for post conditions and the require package is used for pre-conditions in the test. The library provides clearer intent of conditions and better formatting of information (message and data) in the situation an assertion fails. It also makes it easier to read by reducing ifs but some lines are longer. This commit addresses PR requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* fix(tsm1): fix condition check for optimization of array cursor The array ascending cursors do not check all the required conditions before doing an optimization to copy a full 1000 points of data from the tsm read buffer to the cursors response buffer. This can result in points from tsm files appearing in cursor output multiple times. The conditions the bug appears are rare as it requires cache data that is older (or overlapping) with tsm data. The data must be in the same shard and must match for the same key. * fixes #26690 * chore(tsm1): add test that covers array cursor bug This commit adds a test that verifies the duplicate data bug fix. It will fail without the previous commit. See the test for details; in short it sets up the right cache and tsm data to expose the bug which was occuring when the cache had older data than the tsm files and then the cache was exhausted while there were still more tsm values to include in more Next() calls. * chore(test): switch to using the testify library in the new test The assert package is used for post conditions and the require package is used for pre-conditions in the test. The library provides clearer intent of conditions and better formatting of information (message and data) in the situation an assertion fails. It also makes it easier to read by reducing ifs but some lines are longer. This commit addresses PR requests. (cherry picked from commit 9d1e79d)
…) (#26861) Co-authored-by: Phil Bracikowski <[email protected]> fixes #26690
The array ascending cursors do not check all the required conditions before doing an optimization to copy a full 1000 points of data from the tsm read buffer to the cursors response buffer. This can result in points from tsm files appearing in cursor output multiple times.
The conditions the bug appears are rare as it requires cache data that is older (or overlapping) with tsm data. The data must be in the same shard and must match for the same key.