Make the scores more tolerant for subword unit parts

Penalize tokens ending with `@@` less for having attention aligned to multiple other tokens. 
...or maybe concatenate the attention matrix to word-level?