Upgrade TF dependency to v2.5. #628

AdamHillier · 2021-04-10T19:16:19Z

What do these changes do?

This PR updates the TensorFlow dependency to recently released ~~v2.5-rc0~~ v2.5-rc2.

How Has This Been Tested?

CI.

Benchmark Results

N/A, though we should collect new benchmark numbers before making a new release.

Related issue number

~~Depends on #627, will be a draft until then.~~

.bazelrc

AdamHillier · 2021-04-10T19:20:40Z

WORKSPACE

+load("@org_tensorflow//tensorflow:workspace2.bzl", "tf_workspace2")

-apple_support_dependencies()
+tf_workspace2()

-load("@upb//bazel:repository_defs.bzl", "bazel_version_repository")
+load("@org_tensorflow//tensorflow:workspace1.bzl", "tf_workspace1")

-bazel_version_repository(name = "bazel_version")
+tf_workspace1()

-load("@org_tensorflow//third_party/googleapis:repository_rules.bzl", "config_googleapis")
+load("@org_tensorflow//tensorflow:workspace0.bzl", "tf_workspace0")

-config_googleapis()
+tf_workspace0()


The workspace file is much simplified, which is great. TF made some changes upstream to allow downstream projects to easily import the entire TF workspace config, which is what we're doing here. This means we no longer need to register our own versions of dependency repos, for example. See e.g. IREE for another example of this.

larq_compute_engine/mlir/BUILD

third_party/arm_compiler.BUILD

lgeiger

This is very cool, thanks for updating. Not sure what the CI failures are about, but we can take a look at it together later.

.bazelrc

lgeiger · 2021-04-12T09:01:37Z

.bazelrc


 # On windows, we still link everything into a single DLL.
 build:windows --config=monolithic

+# On linux, we dynamically link small amount of kernels
+build:linux --config=dynamic_kernels


Does this work our wheel builds?

I haven't checked, but once the tests pass I'll do a test release.

.bazelrc

Co-authored-by: Lukas Geiger <[email protected]>

lgeiger · 2021-04-13T15:53:28Z

Nice, looks like we are getting closed:

larq_compute_engine/mlir/python/graphdef_tfl_flatbuffer.cc:7:10: fatal error: mlir/IR/Module.h: No such file or directory
    7 | #include "mlir/IR/Module.h"
      |          ^~~~~~~~~~~~~~~~~~
compilation terminated.

AdamHillier · 2021-04-13T15:55:16Z

Nice, looks like we are getting closed:

Yeah, just building locally so that I can debug these properly, but fingers crossed not too far left to go :)

lgeiger

This looks great! I only have a few minor comments and questions.

.bazelrc

larq_compute_engine/mlir/python/graphdef_tfl_flatbuffer.cc

larq_compute_engine/tflite/benchmark/BUILD

AdamHillier · 2021-05-04T08:51:02Z

tensorflow/tensorflow#48525 just got merged, which is nice, so I'll open a PR to cherry-pick that onto the 2.5 branch, and if that goes smoothly it might help with the Windows builds.

And the cherry-pick is now merged, so I've updated the TF dependency and am running the test release again: https://github.com/larq/compute-engine/runs/2498846429?check_suite_focus=true. Fingers crossed the Windows builds now pass.

AdamHillier · 2021-05-04T14:33:08Z

Hmm sadly the Windows builds still don't pass, they get roughly 90% of the way there: 9,017 / 9,710 targets within the six hour limit.

I'll try and investigate other options for speeding things up.

.github/workflows/release.yml

Tombana · 2021-05-04T17:27:29Z

Hmm sadly the Windows builds still don't pass, they get roughly 90% of the way there: 9,017 / 9,710 targets within the six hour limit.

I'll try and investigate other options for speeding things up.

What if we add another bazel cache for these builds? Then they might fail a first time but should succeed a second time, and it would also just be faster in general. It might have be a separate cache from the normal CI tests since the build configuration is completely different and we don't want to invalidate the normal cache (although maybe bazel already takes care of this properly by itself).

AdamHillier · 2021-05-04T18:37:21Z

I spent a while this evening running a Windows build on my home machine, and didn't learn much, except that I saw lots of MKL targets - turns out that MKL is built on Windows indescriminately, even if you try and disable it (also on Linux, for that matter): https://github.com/tensorflow/tensorflow/blob/5dcfc51118817f27fad5246812d83e5dccdc5f72/third_party/mkl/build_defs.bzl#L37. This might explain why MacOS builds are relatively a lot quicker.

lgeiger · 2021-05-04T21:08:07Z

Thanks for investigating! These logs also show up on CI and unfortunately 9922add makes the build crash :(

Do you think we can use the Windows equivalent of -O2 for the builds?

AdamHillier · 2021-05-04T21:10:54Z

Thanks for investigating! These logs also show up on CI and unfortunately 9922add makes the build crash :(

Do you think we can use the Windows equivalent of -O2 for the builds?

Ah yeah I know, it turns out that TF is just completely incompatible with --config=fastbuild. I tried adding -DNDEBUG but that didn't help, tested on my local machine.

I think /O2 is probably what's already being used by Bazel with --config=opt, and in MSVC there is no /O3: https://docs.microsoft.com/en-us/cpp/build/reference/compiler-options-listed-by-category?view=msvc-160#optimization

WORKSPACE

lgeiger

Great! I just have one minor comment, other than that this looks good.

lgeiger · 2021-05-19T14:26:09Z

larq_compute_engine/mlir/python/graphdef_tfl_flatbuffer.cc

@@ -58,12 +58,23 @@ pybind11::bytes ConvertGraphDefToTFLiteFlatBuffer(
    throw std::runtime_error("Invalid target.");
  }

+  // `ParseInputArrayInfo` requires a type that isn't pybind compatible, so
+  // translate here.
+  std::vector<llvm::Optional<std::vector<int>>> translated_input_shapes;


Won't pybind recognise this type if we change the signature of ConvertGraphDefToTFLiteFlatBuffer to accept std::vector<llvm::Optional<std::vector<int>>>?

I don't think so, no. I tried that previously and got an error. I think that makes sense though because pybind has no way of knowing how to construct an element with type llvm::Optional.

AdamHillier · 2021-05-19T23:51:37Z

The test release looks good: https://github.com/larq/compute-engine/actions/runs/857230232

Actually only one build timed out, 3/4 of the Windows ones succeeded. But as discussed above let's resolve the Windows build time issues later in a future PR.

lgeiger · 2021-05-20T15:27:00Z

larq_compute_engine/mlir/python/graphdef_tfl_flatbuffer.cc

-    throw std::runtime_error("Could not complete conversion passes.");
-  }
-
-  TruncateOpOrArgLocNameMapper op_or_arg_name_mapper;


@AdamHillier Do we not need this truncation anymore, or is it now handled within the conversion function?

We added it in df24c2b, not sure if we still want to keep this.

Sorry for not spotting this earlier.

Ah sorry, I missed that too :p Thanks for the PR :)

AdamHillier commented Apr 10, 2021

View reviewed changes

lgeiger approved these changes Apr 12, 2021

View reviewed changes

AdamHillier added 3 commits April 12, 2021 10:51

Upgrade TF dependency to v2.5-rc0.

282caa7

Allow up to 10 Mb of logging.

3db553d

Use TF definition of manylinux toolchain.

4b05ed4

AdamHillier force-pushed the tf-2.5 branch from 5500ad6 to 4b05ed4 Compare April 12, 2021 09:51

CNugteren reviewed Apr 12, 2021

View reviewed changes

.bazelrc Outdated Show resolved Hide resolved

Use embedded linux toolchains, plus some review suggestions.

83e6b75

lgeiger reviewed Apr 12, 2021

View reviewed changes

.bazelrc Outdated Show resolved Hide resolved

AdamHillier and others added 5 commits April 12, 2021 19:44

Fix typo in .bazelrc

86931d5

Co-authored-by: Lukas Geiger <[email protected]>

Partly fix MLIR issues.

3cc7ecd

Disable framework_shared_object for all LCE builds and tests.

4648903

Fix MLIR errors in prepare-tf.

7cdc087

Remove TF_SYSTEM_LIBS and fix typo.

ad9b9ed

Fix end2end test errors.

ebb2ba4

lgeiger mentioned this pull request Apr 13, 2021

⬆️ [email protected] #632

Merged

⬆️ [email protected] (#632)

6aac46f

lgeiger changed the title ~~Upgrade TF dependency to v2.5-rc0.~~ Upgrade TF dependency to v2.5-rc1. Apr 14, 2021

AdamHillier added 2 commits April 14, 2021 10:53

Add Python 3.9 to tests/releases.

cf72fdb

Merge remote-tracking branch 'upstream/master' into tf-2.5

b10e6d0

lgeiger reviewed Apr 14, 2021

View reviewed changes

.bazelrc Outdated Show resolved Hide resolved

.bazelrc Outdated Show resolved Hide resolved

larq_compute_engine/mlir/python/graphdef_tfl_flatbuffer.cc Show resolved Hide resolved

larq_compute_engine/mlir/python/graphdef_tfl_flatbuffer.cc Show resolved Hide resolved

AdamHillier added 6 commits April 14, 2021 16:52

Split TF options out of .bazelrc.

ef87a22

Apply full static linking only to the benchmarker target.

1ac22b8

Temporarily re-enable GCP/AWS on Windows.

8c146a6

Hopefully fix Windows config 🤞

7c6599a

Use nightly-custom-op-ubuntu16 Docker image for ManyLinux builds.

99a780c

Don't build 3.9 wheels on Linux, bump Windows pagefile size.

4cf2ad3

Tombana reviewed Apr 15, 2021

View reviewed changes

larq_compute_engine/tflite/benchmark/BUILD Show resolved Hide resolved

lgeiger force-pushed the tf-2.5 branch from c65a852 to 04dce42 Compare April 27, 2021 12:35

lgeiger added 2 commits April 27, 2021 17:52

Fix CI

9d56ea5

⬆️ [email protected]

842cd6f

lgeiger changed the title ~~Upgrade TF dependency to v2.5-rc1.~~ Upgrade TF dependency to v2.5-rc2. Apr 28, 2021

AdamHillier added 2 commits May 3, 2021 23:48

⬆️ TF version and disable GCP/AWS on Windows.

3222314

Remove conflicting Windows configs.

634d95c

AdamHillier changed the title ~~Upgrade TF dependency to v2.5-rc2.~~ Upgrade TF dependency to v2.5. May 4, 2021

Build with fastbuild on Windows.

9922add

lgeiger reviewed May 4, 2021

View reviewed changes

.github/workflows/release.yml Outdated Show resolved Hide resolved

⬆️ v2.5.0 and add patch for tensorflow/tensorflow#48546.

e5701ba

lgeiger reviewed May 19, 2021

View reviewed changes

WORKSPACE Show resolved Hide resolved

Disable fail-fast for release matrix jobs.

09fc53f

AdamHillier mentioned this pull request May 19, 2021

Enable TensorFlow mobile build config #638

Closed

Small changes.

a6a0f50

lgeiger approved these changes May 19, 2021

View reviewed changes

AdamHillier and others added 2 commits May 19, 2021 15:40

Don't statically link benchmark binary on Android.

a7aa08e

Update passes to match TF 2.5 (#643)

b5ec1b1

AdamHillier marked this pull request as ready for review May 19, 2021 16:21

AdamHillier merged commit 3cb3e4f into master May 20, 2021

AdamHillier deleted the tf-2.5 branch May 20, 2021 09:19

lgeiger added the dependencies Pull requests that update a dependency file label May 20, 2021

lgeiger reviewed May 20, 2021

View reviewed changes

lgeiger mentioned this pull request May 21, 2021

Fork ConvertTFExecutorToTFLOrFlatbuffer to re-enable argname truncation #649

Merged

Upgrade TF dependency to v2.5. #628

Upgrade TF dependency to v2.5. #628

Uh oh!

Conversation

AdamHillier commented Apr 10, 2021 • edited by lgeiger Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What do these changes do?

How Has This Been Tested?

Benchmark Results

Related issue number

Uh oh!

Uh oh!

AdamHillier Apr 10, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lgeiger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lgeiger Apr 12, 2021

Choose a reason for hiding this comment

Uh oh!

AdamHillier Apr 12, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lgeiger commented Apr 13, 2021

Uh oh!

AdamHillier commented Apr 13, 2021

Uh oh!

lgeiger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AdamHillier commented May 4, 2021

Uh oh!

AdamHillier commented May 4, 2021

Uh oh!

Uh oh!

Tombana commented May 4, 2021

Uh oh!

AdamHillier commented May 4, 2021

Uh oh!

lgeiger commented May 4, 2021

Uh oh!

AdamHillier commented May 4, 2021

Uh oh!

Uh oh!

lgeiger left a comment

Choose a reason for hiding this comment

Uh oh!

lgeiger May 19, 2021

Choose a reason for hiding this comment

Uh oh!

AdamHillier May 19, 2021

Choose a reason for hiding this comment

Uh oh!

AdamHillier commented May 19, 2021

Uh oh!

lgeiger May 20, 2021

Choose a reason for hiding this comment

Uh oh!

AdamHillier May 21, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AdamHillier commented Apr 10, 2021 •

edited by lgeiger

Loading