Currently, clang-uml processes all translation units for a single diagram sequentially, which means that generating a single large diagram can take a very long time (although several diagrams can be generated in parallel). There should be an option allowing to generate a single diagram in parallel.
Instead of making the intermediate diagram model re-entrant, we can try to generate a separate small model from each TU, and then combine them at the end into a single diagram (e.g., by implementing operator+()).