Skip to content

Conversation

AdamHillier
Copy link
Contributor

@AdamHillier AdamHillier commented Apr 10, 2021

What do these changes do?

This PR updates the TensorFlow dependency to recently released v2.5-rc0 v2.5-rc2.

How Has This Been Tested?

CI.

Benchmark Results

N/A, though we should collect new benchmark numbers before making a new release.

Related issue number

Depends on #627, will be a draft until then.

Comment on lines +25 to +35
load("@org_tensorflow//tensorflow:workspace2.bzl", "tf_workspace2")

apple_support_dependencies()
tf_workspace2()

load("@upb//bazel:repository_defs.bzl", "bazel_version_repository")
load("@org_tensorflow//tensorflow:workspace1.bzl", "tf_workspace1")

bazel_version_repository(name = "bazel_version")
tf_workspace1()

load("@org_tensorflow//third_party/googleapis:repository_rules.bzl", "config_googleapis")
load("@org_tensorflow//tensorflow:workspace0.bzl", "tf_workspace0")

config_googleapis()
tf_workspace0()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workspace file is much simplified, which is great. TF made some changes upstream to allow downstream projects to easily import the entire TF workspace config, which is what we're doing here. This means we no longer need to register our own versions of dependency repos, for example. See e.g. IREE for another example of this.

Copy link
Member

@lgeiger lgeiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool, thanks for updating. Not sure what the CI failures are about, but we can take a look at it together later.

.bazelrc Outdated

# On windows, we still link everything into a single DLL.
build:windows --config=monolithic

# On linux, we dynamically link small amount of kernels
build:linux --config=dynamic_kernels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work our wheel builds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked, but once the tests pass I'll do a test release.

@lgeiger
Copy link
Member

lgeiger commented Apr 13, 2021

Nice, looks like we are getting closed:

larq_compute_engine/mlir/python/graphdef_tfl_flatbuffer.cc:7:10: fatal error: mlir/IR/Module.h: No such file or directory
    7 | #include "mlir/IR/Module.h"
      |          ^~~~~~~~~~~~~~~~~~
compilation terminated.

@AdamHillier
Copy link
Contributor Author

Nice, looks like we are getting closed:

Yeah, just building locally so that I can debug these properly, but fingers crossed not too far left to go :)

@lgeiger lgeiger changed the title Upgrade TF dependency to v2.5-rc0. Upgrade TF dependency to v2.5-rc1. Apr 14, 2021
Copy link
Member

@lgeiger lgeiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I only have a few minor comments and questions.

@lgeiger lgeiger changed the title Upgrade TF dependency to v2.5-rc1. Upgrade TF dependency to v2.5-rc2. Apr 28, 2021
@AdamHillier
Copy link
Contributor Author

tensorflow/tensorflow#48525 just got merged, which is nice, so I'll open a PR to cherry-pick that onto the 2.5 branch, and if that goes smoothly it might help with the Windows builds.

And the cherry-pick is now merged, so I've updated the TF dependency and am running the test release again: https://github.com/larq/compute-engine/runs/2498846429?check_suite_focus=true. Fingers crossed the Windows builds now pass.

@AdamHillier AdamHillier changed the title Upgrade TF dependency to v2.5-rc2. Upgrade TF dependency to v2.5. May 4, 2021
@AdamHillier
Copy link
Contributor Author

Hmm sadly the Windows builds still don't pass, they get roughly 90% of the way there: 9,017 / 9,710 targets within the six hour limit.

I'll try and investigate other options for speeding things up.

@Tombana
Copy link
Collaborator

Tombana commented May 4, 2021

Hmm sadly the Windows builds still don't pass, they get roughly 90% of the way there: 9,017 / 9,710 targets within the six hour limit.

I'll try and investigate other options for speeding things up.

What if we add another bazel cache for these builds? Then they might fail a first time but should succeed a second time, and it would also just be faster in general. It might have be a separate cache from the normal CI tests since the build configuration is completely different and we don't want to invalidate the normal cache (although maybe bazel already takes care of this properly by itself).

@AdamHillier
Copy link
Contributor Author

I spent a while this evening running a Windows build on my home machine, and didn't learn much, except that I saw lots of MKL targets - turns out that MKL is built on Windows indescriminately, even if you try and disable it (also on Linux, for that matter): https://github.com/tensorflow/tensorflow/blob/5dcfc51118817f27fad5246812d83e5dccdc5f72/third_party/mkl/build_defs.bzl#L37. This might explain why MacOS builds are relatively a lot quicker.

@lgeiger
Copy link
Member

lgeiger commented May 4, 2021

Thanks for investigating! These logs also show up on CI and unfortunately 9922add makes the build crash :(

Do you think we can use the Windows equivalent of -O2 for the builds?

@AdamHillier
Copy link
Contributor Author

Thanks for investigating! These logs also show up on CI and unfortunately 9922add makes the build crash :(

Do you think we can use the Windows equivalent of -O2 for the builds?

Ah yeah I know, it turns out that TF is just completely incompatible with --config=fastbuild. I tried adding -DNDEBUG but that didn't help, tested on my local machine.

I think /O2 is probably what's already being used by Bazel with --config=opt, and in MSVC there is no /O3: https://docs.microsoft.com/en-us/cpp/build/reference/compiler-options-listed-by-category?view=msvc-160#optimization

Copy link
Member

@lgeiger lgeiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I just have one minor comment, other than that this looks good.

@@ -58,12 +58,23 @@ pybind11::bytes ConvertGraphDefToTFLiteFlatBuffer(
throw std::runtime_error("Invalid target.");
}

// `ParseInputArrayInfo` requires a type that isn't pybind compatible, so
// translate here.
std::vector<llvm::Optional<std::vector<int>>> translated_input_shapes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't pybind recognise this type if we change the signature of ConvertGraphDefToTFLiteFlatBuffer to accept std::vector<llvm::Optional<std::vector<int>>>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, no. I tried that previously and got an error. I think that makes sense though because pybind has no way of knowing how to construct an element with type llvm::Optional.

@AdamHillier AdamHillier marked this pull request as ready for review May 19, 2021 16:21
@AdamHillier
Copy link
Contributor Author

The test release looks good: https://github.com/larq/compute-engine/actions/runs/857230232

Actually only one build timed out, 3/4 of the Windows ones succeeded. But as discussed above let's resolve the Windows build time issues later in a future PR.

@AdamHillier AdamHillier merged commit 3cb3e4f into master May 20, 2021
@AdamHillier AdamHillier deleted the tf-2.5 branch May 20, 2021 09:19
@lgeiger lgeiger added the dependencies Pull requests that update a dependency file label May 20, 2021
throw std::runtime_error("Could not complete conversion passes.");
}

TruncateOpOrArgLocNameMapper op_or_arg_name_mapper;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AdamHillier Do we not need this truncation anymore, or is it now handled within the conversion function?

We added it in df24c2b, not sure if we still want to keep this.

Sorry for not spotting this earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I missed that too :p Thanks for the PR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants