This is stupid because a UTF8 is a tokenizer that covers all Unicode with a voca...

This is stupid because a UTF8 is a tokenizer that covers all Unicode with a vocab of only 256 (yes, without a K). This is the only way of scaling the bitter lesson with tokenizers. Also, with architectures that span +1M context windows, it’s no longer an argument/issue the reduced context windows.