REv3: Reduce asymmetry between O(n) output files and O(1) output directories

I've noticed that people sometimes craft rules (e.g., packaging rules) that have many, many, many output files. This is problematic, for the reason that it makes ActionResult *very* large. So large that it can't be sent back to the client. ActionResults are not stored in the CAS. This means that clients can't download them in a streaming manner. They are returned as part of the gRPC response, which can generally only be up to 4 MB in size. To prevent Buildbarn from generating ActionResults that are this big, I generally tell my users to configure their clusters to only allow processing of Command messages up to 1-2 MB. As the size of ActionResult is generally proportional to that of Command (1-2x as big), that tends to work.

A solution I often give to my users when they hit these limits is that they should use output directories instead of plain output files. In that case ActionResult remains small. It will only contain a small number of `output_directories` containing references to Tree objects. These can be streamed from the CAS. There are also a couple of advantages on top of that:

- If all paths share long pathname prefixes, Tree objects can become more compact than listing all pathnames explicitly. So a net reduction in network traffic.
- In case a repeated invocation of an action yields the same output files, you end up with two large and mostly identical ActionResults. When using output directories, both ActionResults will share the same Tree object. This means that if a client is somewhat smart about caching results, incremental builds consume less traffic.

Though I fully understand where the asymmetry comes from, I do think it's hard to sell to our users. Why is there a difference? From their perspective it's 'tomato tomato'.

I think that as part of REv3 we should investigate whether we can reduce the noticeable differences between using O(n) output files and O(1) output directories. For example, what if every action returns exactly 1 directory hierarchy of outputs, and Command's `output_paths` merely acts as a filter for what needs to be captured as part of that directory hierarchy?

(Relatedly, is there anything we can do to reduce the size of Command's `output_paths` by preventing repetition of leading pathnames?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

REv3: Reduce asymmetry between O(n) output files and O(1) output directories #281

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

REv3: Reduce asymmetry between O(n) output files and O(1) output directories #281

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions