-
Notifications
You must be signed in to change notification settings - Fork 35
Upgrade TF dependency to v2.5. #628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
load("@org_tensorflow//tensorflow:workspace2.bzl", "tf_workspace2") | ||
|
||
apple_support_dependencies() | ||
tf_workspace2() | ||
|
||
load("@upb//bazel:repository_defs.bzl", "bazel_version_repository") | ||
load("@org_tensorflow//tensorflow:workspace1.bzl", "tf_workspace1") | ||
|
||
bazel_version_repository(name = "bazel_version") | ||
tf_workspace1() | ||
|
||
load("@org_tensorflow//third_party/googleapis:repository_rules.bzl", "config_googleapis") | ||
load("@org_tensorflow//tensorflow:workspace0.bzl", "tf_workspace0") | ||
|
||
config_googleapis() | ||
tf_workspace0() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workspace file is much simplified, which is great. TF made some changes upstream to allow downstream projects to easily import the entire TF workspace config, which is what we're doing here. This means we no longer need to register our own versions of dependency repos, for example. See e.g. IREE for another example of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very cool, thanks for updating. Not sure what the CI failures are about, but we can take a look at it together later.
.bazelrc
Outdated
|
||
# On windows, we still link everything into a single DLL. | ||
build:windows --config=monolithic | ||
|
||
# On linux, we dynamically link small amount of kernels | ||
build:linux --config=dynamic_kernels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work our wheel builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked, but once the tests pass I'll do a test release.
Nice, looks like we are getting closed:
|
Yeah, just building locally so that I can debug these properly, but fingers crossed not too far left to go :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I only have a few minor comments and questions.
And the cherry-pick is now merged, so I've updated the TF dependency and am running the test release again: https://github.com/larq/compute-engine/runs/2498846429?check_suite_focus=true. Fingers crossed the Windows builds now pass. |
Hmm sadly the Windows builds still don't pass, they get roughly 90% of the way there: I'll try and investigate other options for speeding things up. |
What if we add another bazel cache for these builds? Then they might fail a first time but should succeed a second time, and it would also just be faster in general. It might have be a separate cache from the normal CI tests since the build configuration is completely different and we don't want to invalidate the normal cache (although maybe bazel already takes care of this properly by itself). |
I spent a while this evening running a Windows build on my home machine, and didn't learn much, except that I saw lots of MKL targets - turns out that MKL is built on Windows indescriminately, even if you try and disable it (also on Linux, for that matter): https://github.com/tensorflow/tensorflow/blob/5dcfc51118817f27fad5246812d83e5dccdc5f72/third_party/mkl/build_defs.bzl#L37. This might explain why MacOS builds are relatively a lot quicker. |
Thanks for investigating! These logs also show up on CI and unfortunately 9922add makes the build crash :( Do you think we can use the Windows equivalent of |
Ah yeah I know, it turns out that TF is just completely incompatible with I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I just have one minor comment, other than that this looks good.
@@ -58,12 +58,23 @@ pybind11::bytes ConvertGraphDefToTFLiteFlatBuffer( | |||
throw std::runtime_error("Invalid target."); | |||
} | |||
|
|||
// `ParseInputArrayInfo` requires a type that isn't pybind compatible, so | |||
// translate here. | |||
std::vector<llvm::Optional<std::vector<int>>> translated_input_shapes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't pybind recognise this type if we change the signature of ConvertGraphDefToTFLiteFlatBuffer
to accept std::vector<llvm::Optional<std::vector<int>>>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, no. I tried that previously and got an error. I think that makes sense though because pybind has no way of knowing how to construct an element with type llvm::Optional
.
The test release looks good: https://github.com/larq/compute-engine/actions/runs/857230232 Actually only one build timed out, 3/4 of the Windows ones succeeded. But as discussed above let's resolve the Windows build time issues later in a future PR. |
throw std::runtime_error("Could not complete conversion passes."); | ||
} | ||
|
||
TruncateOpOrArgLocNameMapper op_or_arg_name_mapper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AdamHillier Do we not need this truncation anymore, or is it now handled within the conversion function?
We added it in df24c2b, not sure if we still want to keep this.
Sorry for not spotting this earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I missed that too :p Thanks for the PR :)
What do these changes do?
This PR updates the TensorFlow dependency to recently released
v2.5-rc0v2.5-rc2.How Has This Been Tested?
CI.
Benchmark Results
N/A, though we should collect new benchmark numbers before making a new release.
Related issue number
Depends on #627, will be a draft until then.