mediawiki-extensions-SpamBl.../README

172 lines
7.3 KiB
Plaintext
Raw Normal View History

2007-01-06 20:56:46 +00:00
MediaWiki extension: SpamBlacklist
2005-01-20 07:04:19 +00:00
----------------------------------
SpamBlacklist is a simple edit filter extension. When someone tries to save the
page, it checks the text against a potentially very large list of "bad"
hostnames. If there is a match, it displays an error message to the user and
2007-01-06 20:56:46 +00:00
refuses to save the page.
2005-01-20 07:04:19 +00:00
To enable it, first download a copy of the SpamBlacklist directory and put it
into your extensions directory. Then put the following at the end of your
2005-01-20 07:04:19 +00:00
LocalSettings.php:
wfLoadExtension ( 'SpamBlacklist' );
To users running MediaWiki 1.24 or earlier:
The instructions above describe the new way of installing this extension using wfLoadExtension().
If you need to install this extension on these earlier versions (MediaWiki 1.24 and earlier),
instead of `wfLoadExtension( 'SpamBlacklist' );`, you need to use:
require_once "$IP/extensions/SpamBlacklist/SpamBlacklist.php";
2005-01-20 07:04:19 +00:00
The list of bad URLs can be drawn from multiple sources. These sources are
configured with the $wgSpamBlacklistFiles global variable. This global variable
2007-01-06 20:56:46 +00:00
can be set in LocalSettings.php, AFTER including SpamBlacklist.php.
2005-01-20 07:04:19 +00:00
$wgSpamBlacklistFiles is an array, each value containing either a URL, a filename
2005-10-31 05:31:57 +00:00
or a database location. Specifying a database location allows you to draw the
2005-01-20 07:04:19 +00:00
blacklist from a page on your wiki. The format of the database location
specifier is "DB: <db name> <title>".
Example:
wfLoadExtension ( 'SpamBlacklist' );
$wgSpamBlacklistFiles = [
"$IP/extensions/SpamBlacklist/wikimedia_blacklist", // Wikimedia's list
"DB: wikidb My_spam_blacklist", // database (wikidb), title (My_spam_blacklist)
];
2005-01-20 07:04:19 +00:00
The local pages [[MediaWiki:Spam-blacklist]] and [[MediaWiki:Spam-whitelist]]
will always be used, whatever additional files are listed.
2007-03-05 19:56:56 +00:00
Compatibility
-----------
This extension is primarily maintained to run on the latest release version
of MediaWiki (1.22.x as of this writing) and development versions, however
the current version should work up to 1.21.
If you are using an older version of MediaWiki, you can checkout an
older release branch, for example MediaWiki 1.20 would use REL1_20.
For even older versions, you may be able to dig older versions out of the
Git repository which work, but if using Wikimedia's blacklist file
you will likely have problems with failure due to the large size of the
blacklist not being handled by old versions of the code.
2007-03-05 19:56:56 +00:00
2005-01-20 07:04:19 +00:00
File format
-----------
In simple terms:
* Everything from a "#" character to the end of the line is a comment
* Every non-blank line is a regex fragment which will only match inside URLs
Internally, a regex is formed which looks like this:
!http://[a-z0-9\-.]*(line 1|line 2|line 3|....)!Si
A few notes about this format. It's not necessary to add www to the start of
hostnames, the regex is designed to match any subdomain. Don't add patterns
to your file which may run off the end of the URL, e.g. anything containing
2005-01-20 07:04:19 +00:00
".*". Unlike in some similar systems, the line-end metacharacter "$" will not
assert the end of the hostname, it'll assert the end of the page.
Performance
-----------
This extension uses a small "loader" file, to avoid loading all the code on
every page view. This means that page view performance will not be affected
even if you are not running a PHP bytecode cache such as Turck MMCache. Note
2005-01-20 07:04:19 +00:00
that a bytecode cache is strongly recommended for any MediaWiki installation.
The regex match itself generally adds an insignificant overhead to page saves,
on the order of 100ms in our experience. However loading the spam file from disk
or the database, and constructing the regex, may take a significant amount of
time depending on your hardware. If you find that enabling this extension slows
down saves excessively, try installing MemCached or another supported data
caching solution. The SpamBlacklist extension will cache the constructed regex
2007-01-06 20:56:46 +00:00
if such a system is present.
2005-01-20 07:04:19 +00:00
Caching behavior
----------------
Blacklist files loaded from remote web sites are cached locally, in the cache
subsystem used for MediaWiki's localization. (This usually means the objectcache
table on a default install.)
By default, the list is cached for 15 minutes (if successfully fetched) or
10 minutes (if the network fetch failed), after which point it will be fetched
again when next requested. This should be a decent balance between avoiding
too-frequent fetches if your site is frequently used and staying up to date.
Fully-processed blacklist data may be cached in memcached or another shared
memory cache if it's been configured in MediaWiki.
2005-01-20 07:04:19 +00:00
Stability
---------
This extension has not been widely tested outside Wikimedia. Although it has
been in production on Wikimedia websites since December 2004, it should be
2005-01-20 07:04:19 +00:00
considered experimental. Its design is simple, with little input validation, so
unexpected behavior due to incorrect regular expression input or non-standard
2005-01-20 07:04:19 +00:00
configuration is entirely possible.
Obtaining or making blacklists
------------------------------
The primary source for a MediaWiki-compatible blacklist file is the Wikimedia
spam blacklist on meta:
https://meta.wikimedia.org/wiki/Spam_blacklist
2005-01-20 07:04:19 +00:00
In the default configuration, the extension loads this list from our site
2005-10-31 05:31:57 +00:00
once every 10-15 minutes.
2005-01-20 07:04:19 +00:00
The Wikimedia spam blacklist can only be edited by trusted administrators.
Wikimedia hosts large, diverse wikis with many thousands of external links,
hence the Wikimedia blacklist is comparatively conservative in the links it
2007-01-06 20:56:46 +00:00
blocks. You may want to add your own keyword blocks or even ccTLD blocks.
2005-01-20 07:04:19 +00:00
You may suggest modifications to the Wikimedia blacklist at:
https://meta.wikimedia.org/wiki/Talk:Spam_blacklist
2005-01-20 07:04:19 +00:00
To make maintenance of local lists easier, you may wish to add a DB: source to
$wgSpamBlacklistFiles and hence create a blacklist on your wiki. If you do this,
it is strongly recommended that you protect the page from general editing.
Besides the obvious danger that someone may add a regex that matches everything,
please note that an attacker with the ability to input arbitrary regular
expressions may be able to generate segfaults in the PCRE library.
Whitelisting
------------
You may sometimes find that a site listed in a centrally-maintained blacklist
contains something you nonetheless want to link to.
A local whitelist can be maintained by creating a [[MediaWiki:Spam-whitelist]]
page and listing hostnames in it, using the same format as the blacklists.
URLs matching the whitelist will be ignored locally.
Logging
-------
To aid with tracking which domains are being spammed, this extension has
multiple logging features. By default, hits are included in the standard
debug log (controlled by $wgDebugLogFile). You can grep for 'SpamBlacklistHit',
which includes the IP of the user and the URL they tried to submit. This
file is only availible for people with server access and includes private info.
You can also enable logging to [[Special:Log]] by setting $wgLogSpamBlacklistHits to
true. This will include the account which tripped the blacklist, the page title the
edit was attempted on, and the specific URL. By default this log is only viewable
to wiki administrators, and you can grant other groups access by giving them the
"spamblacklistlog" permission.
2005-01-20 07:04:19 +00:00
Copyright
---------
This extension and this documentation was written by Tim Starling (with later
contributions by others) and is available under GPLv2 or any later version.