HTML Tidy doesn't care for tabs, and likes to always output spaces. This
can break the syntax-highlighted output, since Tidy converts tabs to
spaces on the source-code level instead of on the rendered HTML level,
even inside <pre> tags where it really shouldn't (this is probably a bug
in Tidy).
r97300 fixed the bad indenting by converting tabs to spaces before
highlighting, which works around Tidy's bug but breaks highlighting of
languages where tabs are significant (e.g. Whitespace).
It turns out that Tidy's tab mangling occurs while it's reading the
source file, before the conversion of entities such as 	 to tab. So
GeSHi can armor its <pre>-wrapped output against Tidy's bug by encoding
all tabs as 	.
Bug: 30930
Bug: 57826
Change-Id: Id541e2712bd3f94446442ccf2e1e2f214e2801ba
Single quotes in Haskell may be used to delimit character literals, but they
can also be used in identifiers. GeSHi's syntax highlighter only recognizes the
former usage. When Haskell source uses "'" as part of an identifier, GeSHi
treats it as the start of a literal string, which screws up syntax
highlighting.
Upstream bug report and proposed patch have been ignored for more than a year,
so I am applying the proposed fix here, and changing the version identifier to
reflect a WMF modification.
Upstream bug report: <http://sourceforge.net/p/geshi/bugs/217/>
Upstream proposed patch: <http://sourceforge.net/p/geshi/bugs/219/>
Bug: 52509
Change-Id: I210832c4b272b5c03bbc4623d73fb821092e5ef4
The regular expression used for number highlighting in GeSHi is highly
recursive, and easily overflows the low PCRE recursion limit on WMF
sites (and, on sites where the recursion limit isn't low, it can easily
crash PHP).
Fortunately, it's easy to fix for the common case.
This is also reported upstream at
https://sourceforge.net/p/geshi/bugs/223/
Bug: 45669
Bug: 36839
Change-Id: I27203c767d1d3f2f0999b1b1d8a06e8cf68c19ed
Currently, the SyntaxHighlight GeSHi extension includes geshi/geshi.php.
This creates a conflict when an up to date GeSHI library exists in PHP included path.
This change let the SyntaxHighlight GeSHi include our bundled version instead.
Change-Id: Ie8f9aa6182a38508201d639723e876c156d8b0d1
Using ContentGetParserOutput instead of ShowRawCssJs allows highliting to be applied
for other kinds of scripts as well (e.g. Lua). It also allows more special case code
for CSS and JS to be phased out.
NOTE: this requires Ibfb2cbef to be merged in core!
Change-Id: Ie260c22680ec9a31e505c685d70e17efe8a7bf44
(...modules of unusual size.)
Enabling full syntax highlighting for very long Lua modules can produce DOMs
that have hundreds of thousands of elements and cause browsers to lock up.
I took a count of spans by class (which amounts to a count of tokens by type)
of https://en.wiktionary.org/wiki/Module:languages and came up with:
sy0: 62545 (symbols)
br0: 61952 (brackets)
st0: 39291 (strings)
kw3: 7746 (keywords)
kw1: 3
kw2: 2
co2: 2
co1: 2
nu0: 1
------ ------
Total: 171544
GeSHi allows you to disable highlighting for a particular token type (see
<http://qbnz.com/highlighter/geshi-doc.html#disabling-lexics>) which like a
good way of handling this issue.
Disabling symbols (set_symbols_highlighting(false)) removes both sy0 and br0
elements from the DOM (about 124k elements in the case of Module:languages),
with about 47k elements remaining on the page. This is enough to make Chromium
responsive on my laptop (2.3ghz i5, 8 GB RAM), but it's still noticeably
sluggish. Adding 'set_string_highlighting(false);' removes another 40k elements
from the rendered output, and the resulting DOM is quite zippy at 8k elements.
Proposed solution: disable symbols highlighting when >100 kB; disable strings
highlighting too when >200 kB.
Change-Id: I90c645f9d03bbdc135058a3717a463dec40aa77d
The Geshi rendered output has a font-size of 10px where we would expect
13px just like for <pre>.
Bug 33496 against MediaWiki core dealt with that issue already: in some
(all?) browser 'monospace' has a size of 13px where as the default is
16px. When defining a font-size of 0.8em the monospace is scaled down to
10 px which is too small. By appending another font statement, the
browser treat monospace as a default font and thus scale it starting
with 16px instead of 13px.
This patch append a style to geshi which set the font-family to
"monospace, monospace" thus tricking the browser in considering
monospace a regular font.
Change-Id: I7bbdcc0a21010513473a7ca9d784df77e9920b5b