Penalize tokens ending with `@@` less for having attention aligned to multiple other tokens. ...or maybe concatenate the attention matrix to word-level?