Skip to content

Conversation

@rnett
Copy link
Contributor

@rnett rnett commented Mar 19, 2021

Extracts the op export part of #244 so that it can make its way into the snapshots.

Adds ~1.5MB to the 70MB native jars, so I don't think the size is an issue.

@karllessard can you add the CI build tag?

@rnett rnett mentioned this pull request Mar 19, 2021
@rnett
Copy link
Contributor Author

rnett commented Mar 19, 2021

The CI's having issues w/ framework's javadocs again.

@karllessard
Copy link
Collaborator

Thanks @rnett , interesting approach, let's give it a try with a full build

@rnett
Copy link
Contributor Author

rnett commented Mar 20, 2021

The build's failing b/c linux-mkl-gpu is running out of disk space.

@rnett
Copy link
Contributor Author

rnett commented Mar 20, 2021

I'm going to move the proto sources in this PR instead of #244, since we already have native builds going.

--api_dirs=$BAZEL_SRCS/external/org_tensorflow/tensorflow/core/api_def/base_api,src/bazel/api_def \
$TENSORFLOW_LIB > $GEN_RESOURCE_DIR/ops.pb

PROTO_DIR=src/gen/protoc
Copy link
Collaborator

@karllessard karllessard Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing the directory where the generated Java protos are written? It was fine having .java source files under src/gen/java, no? Or was it conflicting somehow with something you're planning to do next?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to add it as a source directory for generator, which I can't do in src/gen/java since there's other stuff in there. There's no way to filter subfolders/packages when attaching it as a source dir.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see. So are we still heading for a two-phase build then?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only need the OpDef and ApiDef proto, right? At this point, I'm wondering if we shouldn't just simply make a one-shot copy of the java protos we need to the generator folder. The protos are under TF stability guarantee and should remain backward-compatible if they happen to change (but we saw what can happen to that guarantee with string tensors...). Anyhow, for the sake of simplifying the whole architecture, it might be worth that the generator has its own hard-copies of these files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be two phases, you can run native generation + protos, then generation, then compile core. The generation will happen in build.sh, right after the proto copy.

What I'm doing is essentially a copy, and we can do it that way, I'm just using maven to attach to the source dir from both projects rather than copying it.

Copy link
Collaborator

@karllessard karllessard Mar 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

building generator (which depends on everything in core)

I was about to tell you that you a wrong because the C API bindings are hermetic to the rest of the core's code but now thinking of it, some of the abstract classes extends from those found under src/main/java/org/tensorflow/internal/c_api... so yeah, that can be a problem (do these abstract classes depend on anything else in the core though?).

As for the ApiDef importer, this is a secondary utility that should not influence the decisions we are taking for designing the main architecture.

Finally about the "missing ops" in some native libraries, I'm also not sure what is best to do with them. Which platform do we elect to be responsible for generating the op.pbtxt, isn't the problem the same?

Yeah, for the complexity of the Maven build I understand your point but I'm just trying to see if we can avoid crossing module boundaries here, which sounds even worst.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for the complexity of the Maven build I understand your point but I'm just trying to see if we can avoid crossing module boundaries here, which sounds even worst.

Agreed, but the boundary crossing is only necessary if we don't copy the protos. It's easy enough to just make two copies, one in core and one in generator, like you suggested earlier. They'll both be in git anyways and have tensorflow stability gaurentees. I think that's much nicer than the build stuff.

do these abstract classes depend on anything else in the core though?

Not that I know of, but I meant it more in the "the IDE won't be smart enough to stop you from using Graph in the generator" sense. Regardless of whether we allow using the bindings or not that will be an issue, I think.

Finally about the "missing ops" in some native libraries, I'm also not sure what is best to do with them. Which platform do we elect to be responsible for generating the op.pbtxt, isn't the problem the same?

I generated the one we have on Linux, but I don't think that problem exists for the native exporter (although I can't get Windows native builds working to check). If it is a problem it's one that exists now, and since we haven't ran into anything I assumed things were fine. Most if not all (I didn't check) of the additional ops that were present in the loaded library started with _ and I think they were added at runtime. Things like _Arg or _XlaCompile that I think are supposed to be "private" ops anyways (they aren't in tensorflow's ops.pbtxt).

It would be good to check out ops.pb against tensorflow's for the op defs at least. It should be possible to do that in the generator and fail if they don't match, it's just a pretty big download. Maybe better to make a separate main for it.

Also, the whole reason core-api runs it's tests in integration-test instead of test is that the native libraries aren't available for test (I think), so I'm not actually sure if we could get them before compile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy enough to just make two copies, one in core and one in generator, like you suggested earlier.

Ok I agree, let's go back with this idea and play simple.

Just to recapitulate and make sure we've understood each other: while the protos intensorflow-core-api are generated at build, I was suggesting doing only a static copy of the protos we need in the generator, presuming that they won't change (much) anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 works for me. The protos used for exporting the ops from C++ are from the native tensorflow version, so worst case scenario we get errors during native build if something changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, anything that gets executed after the JavaCPP plugin executions has access to its classes and all its resources, that is to say the native libraries, in other words, we have access to the C API from Java at that point. That's just how Maven works. Anything that gets outputted by a plugin automatically gets added to the class path of the following plugins to be executed.

Copy link
Collaborator

@karllessard karllessard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rnett , let's see how it goes with a full CI build but without waiting, I'll merge it now and adjust if needed

@karllessard karllessard merged commit b92525e into tensorflow:master Mar 28, 2021
@rnett rnett deleted the rn_gen_op_specs branch October 17, 2021 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants