-
Notifications
You must be signed in to change notification settings - Fork 9
WIP: Dynamodb Batch aide #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I'm not as sure about this one. The functionality seems great. However, I'm not sure it shouldn't just be a Also, since the other merge there are conflicts that need to be resolved. |
7a35009 to
fa690fb
Compare
Valid point, fixed. Thanks! |
theherk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few notes on testing. First, you are providing tests, and that makes it more awesome than anything that was here before, so don't take this as criticism, but more general feedback on testing I'd like to see in our repositories.
Tests should be in a package that ends in "_test". This is not required by the language, but is a best practice in groups where only public / exported methods are to be tested; and who would want to be part of a group that disagrees with that? :) Then you just import your package and test the same way. This enforces that only exported things are used. The testing documentation doesn't note that you can do this, but the go test command documentation does. See here
Also, it would be better to name the tests based on the guidelines in the testing package documentation
func Example() { ... }
func ExampleF() { ... }
func ExampleT() { ... }
func ExampleT_M() { ... }
So TestBatch, TestBatch_Add with cases for each expected input/output code path, etc. Don't hesitate to use test tables and run through cases rather than writing a test function for each case.
| request.PutRequest = &dynamodb.PutRequest{Item: item} | ||
| case DeleteRequest: | ||
| request.DeleteRequest = &dynamodb.DeleteRequest{Key: item} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requestType is baffling me. First, why not use an iota as an enum if going this route? You could also do it how the http package does methods, where they alias strings rather than numbers, so either work. But why not just have a method that is explicit, AddPut / AddDelete, then have private add take a WriteRequest? This works as is no doubt, but it seems awkward. This will not block, but I am curious.
| b.requests = append(b.requests, request) | ||
| if len(b.requests) >= b.requestLimit { | ||
| return b.Send() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems counter intuitive. The method is called Add, but it sends the request. This behavior is odd. You should always require Send be called prior to anything being sent. It is odd behavior to do this automatically but then require an explicit call for the remainder. Of course, when Send is called it should send them in batch sizes as expected, including the remainder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intent here is that the user of this aide sits in a loop, Add()ing items as it goes, and after the loop is done, does a Send() to "finalize".
Meanwhile, the aide is building up dynamo batches and sending them to AWS when each batch has enough items.
Sort of like how one might print a bunch of lines... which get buffered to the screen and actually displayed whenever. But one might also use an explicit flush to display whatever is there.
I can see why that might be confusing. I'll take suggestions on better names.
| } | ||
| b.sleepSeconds = 0 | ||
| delete(b.bwInput.RequestItems, b.table) | ||
| return out.ConsumedCapacity, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this just return the latest consumed capacity rather than the sum consumed by all the recursive calls to BatchWriteItem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, probably. This was a nice-to-have, though. I guess I'll accumulate it instead and return the total.
| if len(b.requests) > 0 { | ||
| b.sleepSeconds++ | ||
| time.Sleep(time.Duration(b.sleepSeconds)) | ||
| return b.Send() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a linear backoff here? Why not let the service do the throttling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the Go AWS SDK do that? If so, TIL.
| } | ||
|
|
||
| // Send the batch, if there's anything in it | ||
| func (b *Batch) Send() ([]*dynamodb.ConsumedCapacity, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably do this whole thing inside for requests > 0, and it would be more clear. In any event, what do you do when requests are appended while the batch is being processed. They would be overwritten by the unprocessed remainder. You probably need a mutex of some sort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't making it thread-safe... but you're right it's not threadsafe/re-entrant/whatever. Hmm. I'm afraid of that kind of programming :(
Also, it seems you just moved this package to a directory within the dynamodb directory. That isn't what I meant. It is still a package called batch. I meant |
|
I lost track of this, but I'm going to get back on it very soon. In the meantime, are you running this from a fork? I'd hate to think this hung you up indefinitely. |
I understand this better now.
Sort of, I just have the same code in my codebase. I'll be revisiting this sometime soon. |
Package batchwriter provides an aide for dynamodb's BatchWriteItem function.
This aide wraps the complexities of building the batch and retrying unprocessed items,
at the cost of being able to only do 1 table at a time.
Use NewBatchWriter() to get a BatchWriter object. SetTableName(), and then
use the object's Add() method to add as many dynamodb items as you want.
Tell Add() whether it's a PutRequest or a DeleteRequest, and pass either the item to be put
or the Key of the item to be deleted. Either way, pass a map[string]*dynamodb.AttributeValue{}
Includes a README update.