Commit graph

129 commits

Author SHA1 Message Date
Brad Jorsch 5e28f67e88 Speed up PHP mw.ustring.gcodepoint
It seems to be over 200 times faster to iterate over the array instead
of shifting off the front.

Change-Id: Id29a4739ae2bd5dac4197e110ea73f74794e6d9f
2017-03-06 12:53:25 -05:00
Brad Jorsch fe094e7bae Update ustring data tables
normalization-data.lua is updated to Unicode 6.3.0.

upper.lua and lower.lua are updated to match HHVM 3.12.1's mb_strtoupper
and mb_strtolower. I don't know what version of Unicode that might be,
but it seems old.

Bug: T86096
Change-Id: I1a0c8be2756f86db5f36dd67319a1f79aea98b3e
2017-01-21 03:26:27 +00:00
jenkins-bot ae677fbc0d Merge "Ustring: Let gcodepoint work with moderately long strings" 2016-12-16 00:42:02 +00:00
Brad Jorsch 629f11d0dd Fix pure-Lua ustring and empty patterns
An empty pattern isn't "safe" since it could match in between the
bytes of a UTF-8 character.

Also, it turns out there's a bug in PHP <5.6.9 preg_replace() that we
need to work around too.

Change-Id: I282e5909e4663461d60c5386693db182de2fd44c
2016-10-05 14:32:27 -04:00
Marius Hoch 0f4db74148 Add mw.hash to Scribunto
Provides a simple wrapper for PHP's hash() and
hash_algos() functions.

I will add docs to the Lua reference manual once
this is merged.

Bug: T142585
Change-Id: I6697463974a175e99f9b77428a1085247165ebc9
2016-08-18 04:39:04 +02:00
Brad Jorsch d643f40de9 Ustring: Let gcodepoint work with moderately long strings
For the PHP implementation, return the codepoints as a table instead of
multiple return values that get table-ified in Lua, to avoid hitting
too-many-values stack limits.

For the pure-Lua version, inline most of ustring.codepoint instead of
calling it to avoid what's effectively "{ unpack( stuff ) }".

Bug: T118687
Change-Id: I105f388cc23ab55d4124739700ef89d5354b7dbc
2016-07-15 19:35:58 +00:00
Kunal Mehta 9275cc14fb Expose ParserOutput::addWarning() to modules
Bug: T137900
Change-Id: Ibdd2506f4ab27f531ae49187bc57ba0d5c56b7cc
2016-06-16 15:48:53 -07:00
Jackmcbarn f4501ccd22 Only use mw.ustring when necessary
mw.ustring is really really slow. I've discovered that in a lot of modules
on enwiki, upwards of 2/3 of the total runtime gets used when mw.html
calls mw.ustring.gsub. This change checks whether any Unicode characters
are present, and if not, calls string.gsub instead.

Change-Id: Ia50061584be3901ae7428354c449236225c318db
2016-05-30 18:38:32 +00:00
Brad Jorsch c9de00aeff SECURITY: Don't escape strip markers when escaping attributes in mw.html
Core strip markers were changed in T110143 to include characters that
are normally encoded in attributes, however we want to pass them through
here so they can be unstripped correctly in the output wikitext.

This fix makes "Strip markers in CSS" parser test pass again.

Bug: T110143
Bug: T135961
Change-Id: I1353931a53c668d8a453dfa2300a99f59fdb01c5
2016-05-22 21:40:32 -04:00
Brad Jorsch aa4d72e3ff Fix uncontroversial phpcs errors
The following continue to be ignored:
* Generic.Arrays.DisallowLongArraySyntax.Found, because I'm not sure
  Scribunto is ready to abandon old version support in master.
* MediaWiki.ControlStructures.AssignmentInControlStructures.AssignmentInControlStructures,
  because it's overly strict for its purpose.

Squiz.Classes.ValidClassName.NotCamelCaps isn't ignored globally, we
just ignore it explicitly every place it's needed.

Change-Id: I307668da6ef7b3e23da19b1fd1e08914239b99b3
2016-05-18 16:31:28 -04:00
jenkins-bot c753698eaa Merge "Provide a standard way to get the target of a redirect page" 2016-05-12 19:32:17 +00:00
Brad Jorsch b3da8a698d Add toNFKC and toNFKD to mw.ustring
This also makes some updates to make-normalization-table.php to handle
the move of UtfNormal to a separate library.

Bug: T126427
Change-Id: Id4985c3ca441cf92f08ba1f1af85c762ba43d7d2
2016-04-02 15:22:42 +00:00
Ricordisamoa 1573bee81a Provide a standard way to get the target of a redirect page
The new Scribunto_LuaTitleLibrary::redirectTarget() method is
used by mw.title objects as read-only attribute 'redirectTarget'.

If the page does not exist or it is not a redirect, the value
of the attribute is `false`; otherwise, it is the target of the
redirect page, as mw.title object.

This is a proper alternative to parsing wikitext as it is done in:
https://en.wikipedia.org/wiki/Module:Redirect

Bug: T68974
Change-Id: Id4d9b0f8c1cd09ebc42c031d4d3fc0c33eea44aa
2016-03-01 14:30:22 +01:00
Brad Jorsch 29266a9a0f Use correct variable in ustring.lua
Change-Id: Ic576b8c31c487c106593050538f9f2cc5b722b62
2016-01-02 10:49:48 -05:00
Brad Jorsch cd618c7a92 ustring: Handle "empty" charset like Lua does (part 2)
Lua actually treats a close-bracket at the start of a bracketed
character class as a literal, rather than using it to close the
character class. Probably unintended behavior, but it happens.

Also, have the pure-lua version throw our more informative errors on
error even when falling back to string.find and the like, and fix some
other weird edge cases that came up in testing.

Bug: T95958
Bug: T115686
Change-Id: Iab783d4a3e58b1514cc09729d4a71c2cb1242ee8
2015-10-16 09:26:55 -04:00
Jan Berkel fb20934b16 Fix a problem with simple pattern detection
A string with a dot pattern is only "simple" if
followed by +, - or *. The end of string condition was not checked
properly.

Change-Id: Ia10b9164caeabe464c76441cc82eef37a7013048
2015-10-07 10:27:45 -04:00
Jan Berkel 7c5454b36c Fix off-by one error in gsub
Change-Id: I49c0386970e007271d23087fd112580af7b21c9c
2015-09-23 17:41:15 +01:00
Jackmcbarn 828c6cf513 Prevent leaking title fragments across invokes
Bug: T106951
Change-Id: Iace5d75deac3d8ffde6f3dec6a4f910dcb77d1e2
2015-07-27 10:46:23 -04:00
Mr. Stradivarius d59d852290 Fix accidental global in mw.uri.parseQueryString
The result of the type function should be compared against the
string "table", not the global variable. This bug probably went
undetected until now, as "table" is also the global variable for the Lua
table library.

Change-Id: Ia28fa10388bfc587d95b522bfa8f3524b4a3ee5f
2015-07-15 23:07:37 +09:00
Jackmcbarn a4cb7efd0d Mark metatables from mw.loadData
Add mw_loadData=true to metatables set by mw.loadData, so that modules can
distinguish them from other tables.

Change-Id: I0795d738891c85600af2621908376474ae21b3fe
2015-06-27 22:38:23 -04:00
Brad Jorsch 58d722bcdf Allow nil in mw.text.jsonEncode
If it somehow gets in there (e.g. via a crafty __pairs), let it through.

Change-Id: I9f79dbb1a09cd62b2a8f4b6beb84a3e2f1c85560
2015-06-16 16:36:30 +00:00
Brad Jorsch 4669e43135 ustring: Handle empty charset like Lua does
Both '[]' and '[^]' give a rather odd error, but it's probably best to
follow suit.

Bug: T95958
Change-Id: I3310da55f655537c9082fc9039003f6b2d31eff4
2015-04-13 18:20:33 -04:00
Kunal Mehta 3f5f3e247f Use full <?php instead of short <? in ustring generation scripts
Change-Id: Ida6bc4ee1803763b284fdaa7c63769a146fec6ad
2015-03-17 18:16:20 -07:00
jenkins-bot f62b6b4379 Merge "Adds support for JSON encoding and decoding" 2015-02-05 02:58:44 +00:00
Kunal Mehta f5a8a3b0ae Update make-normalization-table for core file moves
Depends upon Ib530ad9dbe1d3a33dc53ef8b9620f61d4e1a2d62 in core.

Change-Id: Ib530ad9dbe1d3a33dc53ef8b9620f61d4e1a2d62
2015-02-04 20:04:41 +00:00
Marius Hoch c0480eef77 Fix weird quotes in package.lua
Change-Id: I6d11813ed00489a69c88ab26aeeec4c4dd42d5dd
2015-02-03 00:40:21 +01:00
Jackmcbarn 35e3ea3ce2 Simplify code in mw.html
Replace numeric loops with iteration, don't unnecessarily check for nil
before table.insert (since it's a no-op in that case anyway), and similar
restructuring.

Change-Id: I155839a648f242a1b1de35f4081d8bcfa34f6933
2015-01-31 13:26:40 +00:00
Brad Jorsch 10bc0f7316 Adds support for JSON encoding and decoding
Provides methods to encode and decode JSON in the mw.text module.

Bug: T47470
Change-Id: I274f2ff13adb616e50600ee30e29b35327f3251e
2015-01-26 15:13:22 -08:00
Jackmcbarn f7fe4881a0 Pass the title's fragment to getExpensiveData
Without this, an error occurs when calling mw.title.new('#foo').exists
(or anything similar).

Change-Id: Id2b60fe3f121af95b4b54da3a7042b490ecbc3fe
2015-01-08 13:19:54 -05:00
Mr. Stradivarius c58c528d28 Add mw.site.interwikiMap
This makes the interwiki map available to Lua modules. The code is
based on the API interwiki map code in core (the appendInterwikiMap
method of includes/api/ApiQuerySiteInfo.php.) Everything that the
API includes is added, apart from iw_api and iw_wikiid, which I
couldn't think of a use for from Lua modules.

Accessing the interwiki map would be useful for modules like
enwiki's Module:InterwikiTable,[1] as it would stop module writers
having to duplicate the data.

[1] https://en.wikipedia.org/wiki/Module:InterwikiTable

Change-Id: Ie8ad2582aaf5e422824f7da51714a347bb4041d1
2014-12-24 01:17:48 +09:00
Jackmcbarn 4002f43ef2 Use a metatable when os.date("*t") is called
When os.date("*t") or ("!*t") is called, instead of just setting the TTL
to 1 second, create a metatable that sets TTLs as the values are looked
at.

Change-Id: Id1e2df731f182f21cf19708738f9907fa927185c
2014-12-19 03:46:23 +00:00
Jackmcbarn ce5ac6611d Avoid unnecessary database queries
Currently, mw.title.new always results in a database query, which holds up
the parse until it finishes. This changes it to not require a database
query if it's not actually necessary.

Bug: T68328
Change-Id: I62f347d4cd9176bd0440215dcbe804c1dc3d4c99
2014-12-11 13:34:06 -05:00
Mr. Stradivarius 98f25aa9a1 Improve error messages in mw.html
Add more information to error messages in mw.html. This includes the
error level, the function name, and the position of the argument in the
argument list. Where possible, use the functions in libraryUtil.lua to
do this.

Some functions in mw.html accept multiple types, so add a checkTypeMulti
function to libraryUtil.lua to make these kinds of functions easy to check.
And while we're at it, add test cases for libraryUtil.lua as well.

Change-Id: If9cf9a52bd4b1bb42cc7f9f1f1096828710cbc52
2014-12-08 17:01:31 +00:00
Brad Jorsch d485b898c3 Improve argument validation in frame:expandTemplate()
Just like the other methods, e shouldn't be allowing passing of things
that aren't numbers or strings here.

For that matter, we should just abstract out the whole "arg key and
value validation" into a separate function instead of repeating it in
four places.

Bug: T76609
Change-Id: Id7e512a988ef9b7a5c5a110c8992dd5d649dcbf9
2014-12-05 10:00:16 -05:00
Jackmcbarn 33fb32f872 Expose file page count, width, and height to Lua
Add a file table to Title objects, containing the number of pages, widths,
and heights, of files.

Change-Id: I9c6b5024ae6b5af393ed7eb1448a297c5c4e5830
2014-12-03 10:44:55 -05:00
Brad Jorsch e5564cf942 Add mw.text.unstripNoWiki, mw.text.killMarkers, fix mw.text.unstrip
mw.text.unstrip is too broad, it's allowing for unstripping things that
cause problems when unstripped (e.g. bug 61268). Since the original
request was only for unstripping <nowiki>, let's add a function that
does only that.

We should also add an interface to StripState::killMarkers(), instead of
requiring everyone to roll their own work-alike.

Then, to fix the bug, we can make mw.text.unstrip be the combination of
the two. This is the most like the original behavior of mw.text.unstrip
(removes all strip markers, replacing them with text where applicable)
without causing issues.

Bug: 61268
Change-Id: I3a151fd678b365d629b71b4f1cb0d5d284b98555
2014-11-05 12:32:35 -05:00
jenkins-bot 1fa52ef583 Merge "Allow for dynamically-loaded PHP libraries" 2014-10-03 14:01:46 +00:00
Brad Jorsch df38a296bf Allow for dynamically-loaded PHP libraries
Scribunto currently supports libraries with PHP callbacks that are
loaded on startup, and pure-Lua libraries that may be loaded from the
module with require().

This change allows for libraries with PHP callbacks to also be loaded
with require().

Change-Id: Ibdc1f4ef51b1c8644c3d4c98d57755b5c06447a5
2014-10-03 09:27:23 -04:00
Jackmcbarn ccba1c78f5 Allow numbers in tag names
HTML tags can contain numbers, like <h2>.

Bug: 71594
Change-Id: I3b7bbfa3aa8f41a28f8ce64086e4066ffda948b2
2014-10-03 07:52:17 -04:00
Jackmcbarn 634f75f53e Don't escape the delete character
Escaping the delete character breaks strip markers, so don't do it.

Bug: 68011
Change-Id: Ica97c898209c59c0084bf700d891b28603f79dd1
2014-09-21 22:59:52 -04:00
Jackmcbarn b970046f2e Don't output a semicolon at the end of CSS
It's not necessary, it makes the output bigger, and some pages have enough
elements with CSS that it does make an actual difference.

Change-Id: I80d471899c7e04a8a4876c205198a8c0d0b1f281
2014-09-10 19:08:34 -04:00
Jackmcbarn f5894a6a9f Output &nbsp; instead of &#nbsp;
Bug: 70475
Change-Id: I19aeceaa1eed17be4a128acd7fb50a9c8b40cf12
2014-09-08 16:06:34 -04:00
Jackmcbarn ef6e2fa410 Fix __pairs not working in LuaStandalone serialization
In Ia4d58f44, the code enabling __pairs to work no longer ran inside
MWServer.lua, so it hasn't worked right for serialization since then. This
restores the correct behavior.

Change-Id: Iea31ab363957f5f69838d6715527cf822c15fa94
2014-08-27 21:09:05 -04:00
Jackmcbarn fd9ecb9cbe Expose cascading protection directly to Lua
Add a way to fetch cascading protection information from Lua without
needing to call the CASCADINGSOURCES parser function.

Change-Id: I1b3ac18af11d3066f78d27b31da8d6709a6a2631
2014-08-13 12:34:47 -04:00
Brad Jorsch 0367e9bddd Fix deceptively-simple pattern in pure-Lua ustring
The pure-Lua ustring pattern matching functions short-circuit to the
much faster string library when the pattern would match the same against
the raw bytes.

A pattern like "[^a-z]" can match a partial UTF-8 character when applied
bytewise, and so must be detected as unsafe.

Let's also directly test the pure-Lua module, instead of me having to
comment out lines in Scribunto_LuaUstringLibrary::register() whenever I
want to test them.

Change-Id: I91ed3374aadfea379b9db2e13b4248ab20df509e
2014-08-10 01:18:18 +00:00
Mr. Stradivarius 1d13fd503a Simplify mw.text.listToText
Simplify the logic in mw.text.listToText so that we don't need to add or
remove anything from the original table we were passed.

Change-Id: I3efcbba1b9adc9a9e32e366e355cb742376cd91b
2014-07-14 19:32:33 +09:00
jenkins-bot ef40ccc8b1 Merge "Fix wrong variable in ustring.lua" 2014-07-11 17:50:03 +00:00
Jackmcbarn ee289c8045 Make the cssEncode pattern simpler
The pattern used by cssEncode is unnecessarily complicated. Simplify it by
using a negating pattern.

Change-Id: I5dc7169efea63473e9e23a1450d2941e434a00d8
2014-07-11 11:40:57 -04:00
Brad Jorsch cb2a331565 Fix wrong variable in ustring.lua
Change-Id: Ibc8056b36d615b57d357987c59219a22e63fdfe8
2014-07-11 11:25:35 -04:00
Jackmcbarn 7c51f69901 Create mw.dumpObject split from mw.logObject
Add an mw.dumpObject() method, which converts an object in the same manner
as mw.logObject(), but returns it instead of adding it to the log buffer.

Change-Id: Ie9fbd24d9d8d13ee2ddf8052679010892f61e1e0
2014-07-09 10:30:53 -04:00