Remove most named character references from output

Recommit of r66254 to trunk.  This was just

find extensions phase3 -iname '*.php' \! -iname '*.i18n.php' \! -iname 'Messages*.php' \! -iname '*_Messages.php' -exec sed -i 's/ /\ /g;s/—/―/g;s/•/•/g;s/á/á/g;s/´/´/g;s/à/à/g;s/α/α/g;s/ä/ä/g;s/ç/ç/g;s/©/©/g;s/↓/↓/g;s/°/°/g;s/é/é/g;s/ê/ê/g;s/ë/ë/g;s/è/è/g;s/€/€/g;s/↔//g;s/…/…/g;s/í/í/g;s/ì/ì/g;s/←/←/g;s/“/“/g;s/·/·/g;s/−/−/g;s/–/–/g;s/ó/ó/g;s/ô/ô/g;s/œ/œ/g;s/ò/ò/g;s/õ/õ/g;s/ö/ö/g;s/£/£/g;s/′/′/g;s/″/″/g;s/»/»/g;s/→/→/g;s/”/”/g;s/Σ/Σ/g;s/×/×/g;s/ú/ú/g;s/↑/↑/g;s/ü/ü/g;s/¥/¥/g' {} +

followed by reading over every single line of the resulting diff and
fixing a whole bunch of false positives.  The reason for this change is
given in <http://lists.wikimedia.org/pipermail/wikitech-l/2010-April/047617.html>.
I cleared it with Tim and Brion on IRC before committing.  It might
cause a few problems, but I tried to be careful; please report any
issues.

I skipped all messages files.  I plan to make a follow-up commit that
alters wfMsgExt() with 'escapenoentities' to sanitize all the entities.
That way, the only messages that will be problems will be ones that
output raw HTML, and we want to get rid of those anyway.

This should get rid of all named entities everywhere except messages.  I
skipped a few things like &nbsp that I noticed in manual inspection,
because they weren't well-formed XML anyway.

Also, to everyone who uses non-breaking spaces when they could use a
normal space, or nothing at all, or CSS padding: I still hate you.  Die.
This commit is contained in:
Aryeh Gregor 2010-05-30 17:33:59 +00:00
parent fbefc8b389
commit 90ab0608e0
Notes: Aryeh Gregor 2010-05-30 17:33:59 +00:00

View file

@ -462,9 +462,9 @@ class ReplaceText extends SpecialPage {
public static function convertWhiteSpaceToHTML( $msg ) {
$msg = htmlspecialchars( $msg );
$msg = preg_replace( '/^ /m', '&nbsp; ', $msg );
$msg = preg_replace( '/ $/m', ' &nbsp;', $msg );
$msg = preg_replace( '/ /', '&nbsp; ', $msg );
$msg = preg_replace( '/^ /m', '&#160; ', $msg );
$msg = preg_replace( '/ $/m', ' &#160;', $msg );
$msg = preg_replace( '/ /', '&#160; ', $msg );
# $msg = str_replace( "\n", '<br />', $msg );
return $msg;
}