Commit graph

5 commits

Author SHA1 Message Date
David Chan 6f2090aac6 Revert model to use simple UTF-16 code units
This is a prerequisite to browser-based grapheme cluster handling, which
is needed so left/right cursoring and backspace behave as users expect.

modules/ve/ve.js
modules/ve/ce/ve.ce.Document.js
modules/ve/ce/ve.ce.js
* Revert cluster-aware splitting to trivial javascript code unit splitting
* Rewrite ve.splitClusters as a trivial compatibility method (remove soon)
* getClusterOffset/getByteOffset use unicodeJS.graphemebreak.splitClusters

modules/unicodejs/tools/unicodejs-properties.py
modules/unicodejs/unicodejs.graphemebreakproperties.js
modules/unicodejs/unicodejs.js
* Allow grapheme break tests to work with surrogate pairs

demos/ve/pages/minimal.html
demos/ve/pages/multibyte.html
demos/ve/pages/unicode.html
* replace file with more precise tests

modules/ve/test/ve.test.js
* Remove reference to grapheme-based splitting (which is no longer used)
* Correct typo

Bug: 53757
Bug: 51472
Bug: 51596
Bug: 51846
Change-Id: Ife34c87ebe40bc1689298b592eec5c0cdc2f7589
2013-11-26 19:38:14 +00:00
David Chan a1eb56c14f splitClusters uses Grapheme Cluster Boundary rules
unicodejs.graphemebreak.js
* New file: singleton class with splitClusters method
* On load, builds graphemeBreakRegexp from unicodejs.graphemebreakproperties.js

unicodejs.js
* Remove old splitClusters method (was just a placeholder)
* Change "conjunction" -> "disjunction", for consistency and correctness

unicodejs.textstring.js
* Use new splitClusters method

modules/ve/ve.js
* Use new splitClusters method

unicodejs.wordbreak.text.js
* Add new splitClusters test
* Refactor charRangeArrayRegexp test to use splitClusters

PHP files
* add unicodejs.graphemebreak.js, unicodejs.graphemebreakproperties.js

.docs/categories.json
* add unicodeJS.wordbreak class

Change-Id: I8f512e2fc2c46eb4b5f00994a8dac88f3c8f7dd2
2013-06-16 21:46:02 +01:00
David Chan 6dacf615c0 Match non-BMP characters in wordbreak regexes
unicodejs.js:
* charRangeArrayRegexp to write surrogate-aware regexps
* private helper functions

unicodejs.wordbreak.test.js:
* test charRangeArrayRegexp
* corrected tests for non-BMP wordbreaks

unicodejs.wordbreak.js:
* use new surrogate-aware regexps

unicodejs.wordbreakproperties.js:
* generated from Unicode data

unicodejs.graphemebreakproperties.js:
* generated from Unicode data

unicodejs.wordbreak.groups.js:
* delete as no longer used

unicodejs-properties.py:
* generate unicodejs.wordbreakproperties.js from Unicode data
* generate unicodejs.graphemebreakproperties.js from Unicode data

index.php:
* update script tag links

/VisualEditor.php:
* update script tag links

/demos/ve/index.php:
* update script tag links

/maintenance/makeStaticLoader.php:
* update script tag links

Change-Id: I39c0386a85b0cf21d68d3385b84018a5d7648de5
2013-06-10 23:16:23 +01:00
David Chan 1c78d0a38c Use grapheme clusters in unicodeJS.TextString
unicodejs.js:
* add splitClusters(text) and splitCharacters(text) methods

unicodejs.textstring.js:
* change internal representation from a char string to a list of grapheme
  clusters

unicodejs.wordbreak.js:
* change getGroup to work on the first character of a grapheme cluster

ve.js:
* Use new unicodejs.splitClusters function

Bug: 48975
Change-Id: I202b98199d2780534d1e02519b72579ba796f08f
2013-05-30 17:34:10 +01:00
Ed Sanders 4988efd35e UnicodeJS library to implement Unicode standards
Initially just with a Wordbreak module to implement Unicode standard
on 'Default Word Boundaries'. Due to it's standaloneability this has
been written as a separate library. Non-BMP characters are currently
not supported.

Bug: 44085
Change-Id: Ieafa070076f4c36855684f6bc179667e28af2c25
2013-03-27 17:44:22 +00:00