Text Normalization Guide: Unicode Forms, Ligatures, and Composed Characters