Commit graph

6 commits

Author SHA1 Message Date
David Chan 6dacf615c0 Match non-BMP characters in wordbreak regexes
unicodejs.js:
* charRangeArrayRegexp to write surrogate-aware regexps
* private helper functions

unicodejs.wordbreak.test.js:
* test charRangeArrayRegexp
* corrected tests for non-BMP wordbreaks

unicodejs.wordbreak.js:
* use new surrogate-aware regexps

unicodejs.wordbreakproperties.js:
* generated from Unicode data

unicodejs.graphemebreakproperties.js:
* generated from Unicode data

unicodejs.wordbreak.groups.js:
* delete as no longer used

unicodejs-properties.py:
* generate unicodejs.wordbreakproperties.js from Unicode data
* generate unicodejs.graphemebreakproperties.js from Unicode data

index.php:
* update script tag links

/VisualEditor.php:
* update script tag links

/demos/ve/index.php:
* update script tag links

/maintenance/makeStaticLoader.php:
* update script tag links

Change-Id: I39c0386a85b0cf21d68d3385b84018a5d7648de5
2013-06-10 23:16:23 +01:00
David Chan 1c78d0a38c Use grapheme clusters in unicodeJS.TextString
unicodejs.js:
* add splitClusters(text) and splitCharacters(text) methods

unicodejs.textstring.js:
* change internal representation from a char string to a list of grapheme
  clusters

unicodejs.wordbreak.js:
* change getGroup to work on the first character of a grapheme cluster

ve.js:
* Use new unicodejs.splitClusters function

Bug: 48975
Change-Id: I202b98199d2780534d1e02519b72579ba796f08f
2013-05-30 17:34:10 +01:00
Ed Sanders acbf5f8210 UnicodeJS treating start/end tags as letters
Because /[a-z]/.test(null) treats null as the string 'null' and
so returns true.

Bug: 48487
Change-Id: I2408229cddd876ff8fe55a9eddf9bac87a61457e
2013-05-18 16:48:17 +00:00
Ed Sanders f36e68c333 Implement next/prevBreakOffset and word skipping
This provides the functionality for keyboard word skipping
(i.e. pressing ctrl/alt + arrow key).

Bug: 46794
Change-Id: Ib0861fa075df805410717a148b8a6e166d947849
2013-04-09 21:55:57 +00:00
Timo Tijhof 1ec8ba3e24 Remove superfluous spaces in function invocations.
Find-Exec: ack '\.[a-zA-Z]+\ \(' --js modules/ve modules/unicodejs
Change-Id: Ib7d0a6514f3321f1d09fbf7cf52c2a9c2cecde88
2013-04-03 17:48:34 +00:00
Ed Sanders 4988efd35e UnicodeJS library to implement Unicode standards
Initially just with a Wordbreak module to implement Unicode standard
on 'Default Word Boundaries'. Due to it's standaloneability this has
been written as a separate library. Non-BMP characters are currently
not supported.

Bug: 44085
Change-Id: Ieafa070076f4c36855684f6bc179667e28af2c25
2013-03-27 17:44:22 +00:00