Skip to content
This repository was archived by the owner on Oct 22, 2021. It is now read-only.

Conversation

@chgeuer
Copy link

@chgeuer chgeuer commented Mar 15, 2019

…g to the same blob, and updated Azure/Go SDK.

  • When uploading a blob, the upload now contains a Content-MD5.
  • Uploads are only done if necessary, i.e.
    • if the blob doesn't yet exist, or
    • if it has a different size, or
    • if it has the same size, but due to lack of a Content-MD5 can't be tracked whether the existing blob has the same contents, or
    • if it has a ContentMD5, but the local / to-be-uploaded contents differ from the existing blob.
  • When uploading to a certain location, we first upload to a temporary name, and then we copy within the storage account to the proper location.
    • This copy operation is protected by an Azure blob lease, to prevent two concurrent copy operations.
  • Each block blob upload is also protected against accidental modifications with a MD5
  • The SDK has been upgraded from a really old one, to the current Azure Go SDK (which supports context.Context) to allow subsequent parallelization of uploads

…g to the same blob, and updated Azure/Go SDK.

- When uploading a blob, the upload now contains a Content-MD5.
- Uploads are only done if necessary, i.e.
  - if the blob doesn't yet exist, or
  - if it has a different size, or
  - if it has the same size, but due to lack of a Content-MD5 can't be tracked whether the existing blob has the same contents, or
  - if it has a ContentMD5, but the local / to-be-uploaded contents differ from the existing blob.
- When uploading to a certain location, we first upload to a temporary name, and then we copy within the storage account to the proper location.
  - This copy operation is protected by an Azure blob lease, to prevent two concurrent copy operations.
- Each block blob upload is also protected against accidental modifications with a MD5
- The SDK has been upgraded from a really old one, to the current Azure Go SDK (which supports context.Context) to allow subsequent parallelization of uploads
@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/164656890

The labels on this github issue will be updated when the story is started.

@cfdreddbot
Copy link

❌ Hey chgeuer!

All pull request submitters and commit authors must have a Contributor License Agreement (CLA). Click here for details on the CLA process.

The following github user @chgeuer is not covered by a CLA.

After the CLA process is complete, this pull request will need to be closed & reopened. DreddBot will then validate the CLA(s).

@chgeuer
Copy link
Author

chgeuer commented Mar 15, 2019

I guess Microsoft should have a CLA in place, I need to check... I published my potential memberships...

@chgeuer chgeuer closed this Mar 15, 2019
@chgeuer chgeuer reopened this Mar 15, 2019
@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/164664843

The labels on this github issue will be updated when the story is started.

@cfdreddbot
Copy link

✅ Hey chgeuer! The commit authors and yourself have already signed the CLA.

@chgeuer
Copy link
Author

chgeuer commented Mar 18, 2019

Hi @smoser-ibm , @petergtz mentioned he'd be OOF for the next 2 months... Is there any chance you have a look at this PR?

Hi @suhlig, is that something you could help with?

@petergtz
Copy link
Member

Hi @chgeuer ,

Thanks for all your work. Your changes obviously do a lot more than just fix the issues we’ve seen with SAP’s Azure envs. This is very welcome, because we were aware of the room for optimization here, but simply weren’t able to address it due to bandwidth constraints.

Unfortunately, it makes it more difficult to merge, because we cannot merge portions that are uncontroversial, while iterating on those portions that need more discussion. That was the reason why I couldn’t merge it before I went on leave. I think what would make sense here is to tear apart those different portions and submit them as separate PRs. We’re happy to merge them then.

What we discussed via Slack, was: Avoid conflicting concurrent uploads via an “upload to temp path, then copy and delete”. That’s what I have done in my trivial commit in 0422b42 ahead of your PR submission. My change is trivial, but should exactly address the problems that we saw in SAP’s Azure envs.

Assuming the above fixes the uploads, we can work on optimizations and improvements that you started:

  • Size comparison, MD5 sum comparison etc. to avoid unnecessary uploads
  • concurrent block uploads to further optimize times
  • SDK update

Each should go into a separate PR as explained above.

Further comments in regards to your changes:

  • The leases introduced for the copy operation will break the logic: if multiple concurrent uploads to the same blob take place, one of them will now fail with an error, likely the latter one. As discussed with you, this will not work in our use case. We need a last writer wins strategy.
  • Your code fails when reads from local files don’t return all bytes (see doc for https://golang.org/pkg/os/#File.Read which explains that a read doesn’t necessarily return all bytes).
  • The new version of the Put function is now long enough that it should probably be refactored. Ideally the refactoring would allow to partially unit-test the logic.

I hope my comments are helpful and I’m happy to discuss this further with you. Please keep in mind though that we’re still limited in bandwidth with what we can support and merge at this point in time. So please be patient with us as we cannot give any promises as to when we can actually merge stuff.

Thanks,
Peter

@jochenehret
Copy link

Hi everyone,

we have validated this PR on CF v8.1.0 with the corresponding Cloud Foundry Acceptance Tests (CATS). All tests that are applicable for our CF deployment are green. The Bits-Service logs are more informative now.

As we are still interested to get the PR merged, can you please consider refactoring and merging the fix?

Thanks and Best Regards,

@FloThinksPi and Jochen.

@chgeuer
Copy link
Author

chgeuer commented May 13, 2019

Hey @jochenehret many thanks for running the CATS test suite, that's great. Thanks for the effort.

@petergtz
Copy link
Member

Hi @jochenehret, hi @FloThinksPi,

Thanks for your help here. Note that CATs are not sufficient to prove that the code works. We run CATs multiple times a day with Azure storage and still, we never saw the errors that Michael from SAP saw in their large Azure landscape.

@petergtz
Copy link
Member

Hi @chgeuer, finally able to follow up on this. Before diving too deep into the details of this change, could you please restate the problem that you are trying to solve with this PR?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants