Only use mw.ustring when necessary

mw.ustring is really really slow. I've discovered that in a lot of modules
on enwiki, upwards of 2/3 of the total runtime gets used when mw.html
calls mw.ustring.gsub. This change checks whether any Unicode characters
are present, and if not, calls string.gsub instead.

Change-Id: Ia50061584be3901ae7428354c449236225c318db
This commit is contained in:
Jackmcbarn 2014-12-16 20:34:04 -05:00 committed by Jdlrobson
parent 08a39470c0
commit f4501ccd22

View file

@ -96,11 +96,14 @@ local function htmlEncode( s )
end
local function cssEncode( s )
-- XXX: I'm not sure this character set is complete.
-- bug #68011: allow delete character (\127)
return mw.ustring.gsub( s, '[^\32-\57\60-\127]', function ( m )
return string.format( '\\%X ', mw.ustring.codepoint( m ) )
end )
-- mw.ustring is so slow that it's worth searching the whole string
-- for non-ASCII characters to avoid it if possible
return ( string.find( s, '[^%z\1-\127]' ) and mw.ustring or string )
-- XXX: I'm not sure this character set is complete.
-- bug #68011: allow delete character (\127)
.gsub( s, '[^\32-\57\60-\127]', function ( m )
return string.format( '\\%X ', mw.ustring.codepoint( m ) )
end )
end
-- Create a builder object. This is a separate function so that we can show the