Getting & setting the cursor is done with byte offsets
instead of data model offset (characters) so we need to
be able to convert between the two as well as splitting
characters.
TODO: The regex only works on surrogate pairs, not
yet combining accents.
fixupInsertion will combine a combining mark with the
character to its left it it can.
Bug: 48630
Change-Id: I8d936fb15d82f73cd45fac142c540a7950850d55