Skip to content

Conversation

@satyap
Copy link
Contributor

@satyap satyap commented Jul 16, 2019

Package batchwriter provides an aide for dynamodb's BatchWriteItem function.
This aide wraps the complexities of building the batch and retrying unprocessed items,
at the cost of being able to only do 1 table at a time.

Use NewBatchWriter() to get a BatchWriter object. SetTableName(), and then
use the object's Add() method to add as many dynamodb items as you want.

Tell Add() whether it's a PutRequest or a DeleteRequest, and pass either the item to be put
or the Key of the item to be deleted. Either way, pass a map[string]*dynamodb.AttributeValue{}

Includes a README update.

@theherk
Copy link
Contributor

theherk commented Jul 18, 2019

I'm not as sure about this one. The functionality seems great. However, I'm not sure it shouldn't just be a Batch within the dynamodb package. Writ_er_ implies it is an interface, and in go Writer means a specific interface implementation. What are you thoughts are these minor compliants?

Also, since the other merge there are conflicts that need to be resolved.

@satyap satyap force-pushed the feature/dynamo-batchwriter branch from 7a35009 to fa690fb Compare July 19, 2019 23:47
@satyap satyap changed the title Dynamodb BatchWriter aide Dynamodb Batchr aide Jul 20, 2019
@satyap satyap changed the title Dynamodb Batchr aide Dynamodb Batch aide Jul 20, 2019
@satyap
Copy link
Contributor Author

satyap commented Jul 20, 2019

I'm not sure it shouldn't just be a Batch within the dynamodb package.

Valid point, fixed. Thanks!

Copy link
Contributor

@theherk theherk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes on testing. First, you are providing tests, and that makes it more awesome than anything that was here before, so don't take this as criticism, but more general feedback on testing I'd like to see in our repositories.

Tests should be in a package that ends in "_test". This is not required by the language, but is a best practice in groups where only public / exported methods are to be tested; and who would want to be part of a group that disagrees with that? :) Then you just import your package and test the same way. This enforces that only exported things are used. The testing documentation doesn't note that you can do this, but the go test command documentation does. See here

Also, it would be better to name the tests based on the guidelines in the testing package documentation

func Example() { ... }
func ExampleF() { ... }
func ExampleT() { ... }
func ExampleT_M() { ... }

So TestBatch, TestBatch_Add with cases for each expected input/output code path, etc. Don't hesitate to use test tables and run through cases rather than writing a test function for each case.

request.PutRequest = &dynamodb.PutRequest{Item: item}
case DeleteRequest:
request.DeleteRequest = &dynamodb.DeleteRequest{Key: item}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requestType is baffling me. First, why not use an iota as an enum if going this route? You could also do it how the http package does methods, where they alias strings rather than numbers, so either work. But why not just have a method that is explicit, AddPut / AddDelete, then have private add take a WriteRequest? This works as is no doubt, but it seems awkward. This will not block, but I am curious.

b.requests = append(b.requests, request)
if len(b.requests) >= b.requestLimit {
return b.Send()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems counter intuitive. The method is called Add, but it sends the request. This behavior is odd. You should always require Send be called prior to anything being sent. It is odd behavior to do this automatically but then require an explicit call for the remainder. Of course, when Send is called it should send them in batch sizes as expected, including the remainder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent here is that the user of this aide sits in a loop, Add()ing items as it goes, and after the loop is done, does a Send() to "finalize".

Meanwhile, the aide is building up dynamo batches and sending them to AWS when each batch has enough items.

Sort of like how one might print a bunch of lines... which get buffered to the screen and actually displayed whenever. But one might also use an explicit flush to display whatever is there.

I can see why that might be confusing. I'll take suggestions on better names.

}
b.sleepSeconds = 0
delete(b.bwInput.RequestItems, b.table)
return out.ConsumedCapacity, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this just return the latest consumed capacity rather than the sum consumed by all the recursive calls to BatchWriteItem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, probably. This was a nice-to-have, though. I guess I'll accumulate it instead and return the total.

if len(b.requests) > 0 {
b.sleepSeconds++
time.Sleep(time.Duration(b.sleepSeconds))
return b.Send()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a linear backoff here? Why not let the service do the throttling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the Go AWS SDK do that? If so, TIL.

}

// Send the batch, if there's anything in it
func (b *Batch) Send() ([]*dynamodb.ConsumedCapacity, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably do this whole thing inside for requests > 0, and it would be more clear. In any event, what do you do when requests are appended while the batch is being processed. They would be overwritten by the unprocessed remainder. You probably need a mutex of some sort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't making it thread-safe... but you're right it's not threadsafe/re-entrant/whatever. Hmm. I'm afraid of that kind of programming :(

@theherk
Copy link
Contributor

theherk commented Aug 20, 2019

I'm not sure it shouldn't just be a Batch within the dynamodb package.

Valid point, fixed. Thanks!

Also, it seems you just moved this package to a directory within the dynamodb directory. That isn't what I meant. It is still a package called batch. I meant Batch should be a type defined in the package dynamodb, perhaps in a file "batch.go", but not as its own package.

@theherk
Copy link
Contributor

theherk commented Sep 24, 2019

I lost track of this, but I'm going to get back on it very soon. In the meantime, are you running this from a fork? I'd hate to think this hung you up indefinitely.

@satyap
Copy link
Contributor Author

satyap commented Jul 21, 2020

I meant Batch should be a type defined in the package dynamodb, perhaps in a file "batch.go", but not as its own package.

I understand this better now.

In the meantime, are you running this from a fork? I'd hate to think this hung you up indefinitely.

Sort of, I just have the same code in my codebase.

I'll be revisiting this sometime soon.

@satyap satyap changed the title Dynamodb Batch aide WIP: Dynamodb Batch aide Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants