I've noticed that syncing parentchain blocks on Incognitee Paseo (10MB shard state size) takes ages (like only 2-3 blocks per second)
this also may explain why sidechain block production starts with such little time budget. However, the added weight should only hit when parentchain blocks are imported. This figure doesn't show such resolution to see the difference. Could also just be the time for "load_for_mutation"):

It looks like half of the block time is consumed before block production even starts
The current design had in mind that one validateer can operate more than one shard. As this requirement has been abandoned, we can refactor this. But this may go quite deep