Commit graph

6 commits

Author SHA1 Message Date
Brad Jorsch 5e28f67e88 Speed up PHP mw.ustring.gcodepoint
It seems to be over 200 times faster to iterate over the array instead
of shifting off the front.

Change-Id: Id29a4739ae2bd5dac4197e110ea73f74794e6d9f
2017-03-06 12:53:25 -05:00
Brad Jorsch d643f40de9 Ustring: Let gcodepoint work with moderately long strings
For the PHP implementation, return the codepoints as a table instead of
multiple return values that get table-ified in Lua, to avoid hitting
too-many-values stack limits.

For the pure-Lua version, inline most of ustring.codepoint instead of
calling it to avoid what's effectively "{ unpack( stuff ) }".

Bug: T118687
Change-Id: I105f388cc23ab55d4124739700ef89d5354b7dbc
2016-07-15 19:35:58 +00:00
Brad Jorsch 4dcac2fcd9 Fix mw.ustring.gmatch and patterns with '^'
The Lua manual says this:

 For this function, a '^' at the start of a pattern does not work as an
 anchor, as this would prevent the iteration.

I had interpreted that to mean that a pattern starting with '^' would
never match in gmatch. But further testing reveals that the '^' is just
treated as a literal character: string.gmatch( "foo ^bar baz", "^%a+" )
will match "^bar".

Change-Id: Id91d6ee2db753ce1d6a4f6ae27764691d9e9fdc4
2013-02-14 14:25:55 -05:00
Tim Starling ce062407ab Fix further non-local effects of library registration
Fixed several accidental leaks to the global namespace due to missing
"local" declaration. Removed extension of the string table by mw.uri,
same justification as I5d0ddb70.

Change-Id: Iba1bf8e651d4ce05812e4a9a7a074cb6679297a0
2013-02-13 15:40:18 +11:00
Tim Starling f2f866cbdd Remove global side-effects from mw.ustring library registration
The point of putting the unicode library in mw.ustring instead of
ustring was to avoid conflicts with future upstream work, and with other
libraries. It rather defeats the purpose if you then modify the global
string table during module startup.

Users can always set up local aliases if they feel "mw.ustring" is too
much to type.

Change-Id: I5d0ddb70d999aeb6e36e6ddbcdb19922d0274a39
2013-02-13 15:05:22 +11:00
Brad Jorsch 0a8757baba Lua ustring implementation
This is a reimplementation of Lua's string library with support for
UTF-8.

The entire ustring library is implemented in pure Lua. PHP callbacks are
also available for overrides: in LuaSandbox these are used for almost
all functions, while in LuaStandalone they are used only for the pattern
matching. Also, ustring.upper and ustring.lower are overridden using
mw.language's .uc and .lc if available.

It also includes a bunch of unit tests.

Note that if you download the normalization tests, they may fail under
LuaSandbox if you have PHP's intl extension installed and libicu on your
system is too old.

Change-Id: Ie76fdf8d3a85d0a3d2a41b0d3b7afe433f247af0
2013-02-12 14:26:29 -05:00