Commit graph

98 commits

Author SHA1 Message Date
jenkins-bot edf0a0e1ec Merge "Actually use STASH_TTL constant and bump it to 3 minutes" 2016-09-09 02:58:34 +00:00
Aaron Schulz 18d21ae86a Actually use STASH_TTL constant and bump it to 3 minutes
Change-Id: Ia50296919a5c8bdeb63496397c538f39d43c4d54
2016-09-09 01:22:06 +00:00
Aaron Schulz f051498dcc Fix links passed to filter() for stashing to match edit checks
Change-Id: I8f1b2c4d3033015de0e9a6d58776fe0ad32c4775
2016-09-08 17:47:09 -07:00
Aaron Schulz 2efbca9a6c Fix bogus stats where stashes counted as misses
Also renew stash values if they will expire soon.

Change-Id: I36771f5736f80546aac99409463293c7699fb5de
2016-08-30 06:54:03 +00:00
Aaron Schulz 1413058be9 Add spamblacklist.check-stash.store metric
Change-Id: I7990d7b0681b667015d4db68c0be8234dde4ce28
2016-07-15 06:39:53 -07:00
jenkins-bot e5227407a8 Merge "Improve use of edit stash hook to check links" 2016-07-10 23:33:02 +00:00
Aaron Schulz d29aca496a Improve use of edit stash hook to check links
In the common case where no banned links were found, cache this
information to skip the checks on save.

Change-Id: I5f936622bc62d9fc905edaa2a69f52388c047d10
2016-07-10 16:17:51 -07:00
jenkins-bot 2e9ae98b8a Merge "Fix bugs in Schema:ExternalLinksChange code" 2016-07-06 02:04:41 +00:00
Aaron Schulz 5729b7abe4 Make event logging respect $preventLog in filter()
Change-Id: Ia97cc1fbb8fdd4cc3fa64adc897fff8e559d0e85
2016-06-30 14:55:24 -07:00
Kunal Mehta a69fe26b94 Fix bugs in Schema:ExternalLinksChange code
* Have SpamBlacklist::doLogging() actually run
* Bump schema ID so userId property is an integer
* Don't try logging URLs that were unable to be parsed
* Make sure path/query/fragment are always strings

Bug: T115119
Change-Id: Ia81037e8939dd547f00e79c169fa84ca0a7b917e
2016-06-22 12:18:11 +02:00
Timo Tijhof 19afb10841 Clear the queue when logging is done
Follows-up 5910bfd7ba.

* Remove one-off $domain variable.
* Rename $urlChanges to highlight that urlChanges is a log of changes,
  not a list of changes to be applied.
* Rename doEventLogging() to isLoggingEnabled().

Change-Id: Idbd6551502362422beea4d86b912128a43e9c96b
2016-05-23 16:29:21 -07:00
Kaldari ecb2f3fbfa Making logging code less fragile and using better function name
These changes are in responce to code comments at
https://gerrit.wikimedia.org/r/#/c/263145/4/SpamBlacklist_body.php

Change-Id: I328d7cd473b692c6cdeb170bcc579c9e3154617c
2016-05-23 10:51:45 -07:00
jenkins-bot 83b178eee6 Merge "Switching to properly spelled schema to avoid confusion" 2016-05-02 16:00:41 +00:00
Kaldari deebe10498 Switching to properly spelled schema to avoid confusion
Change-Id: I029173d8821394a411b78009c2f63210c0a95df0
2016-05-02 09:50:59 -06:00
jenkins-bot 211e88c042 Merge "Log URL changes to EventLogging if configured" 2016-05-02 15:49:29 +00:00
Kunal Mehta 5910bfd7ba Log URL changes to EventLogging if configured
If enabled, changes in URLs on a page will be logged to the
"ExternalLinkChange" schema. To avoid extra lookups, the diff of URLs is
calculated during the filter step of the SpamBlacklist, and stored in
the SpamBlacklist instance state until the post-save hook is called, and
then they are queued to go to EventLogging.

Bug: T115119
Change-Id: I9a5378dca5ab473961f9fe8f7a6d929dc6d32bba
2016-04-25 17:54:48 +02:00
Aaron Schulz 2acfb30bfc Pre-cache the link list for external link filters
* This works via plugging into ApiStashEdit.
* The query is relatively slow per performance.wikimedia.org/xenon/svgs/daily/2016-02-15.index.svgz.

Change-Id: I0ad5289324b5482db7e2276f58fc1ac140250d47
2016-02-18 14:36:42 +00:00
Chad Horohoe d46d9b0c51 Remove obvious function-level profiling
Change-Id: I2270b936a1502df07f0dae529b0180908c70c58a
2015-02-10 14:07:45 -08:00
Elliott Eggleston 686b777f50 Log full URLs on spam blacklist hit
When a regex detects a URL on the blacklist, use an expanded regex
that matches the full line to get the URL to log.

Bug: 55356
Change-Id: I6dfbc1b70f9a305e76664ac28ccb90fe1594f342
2015-01-23 00:10:56 +00:00
Faidon Liambotis f9e2fed9bf Revert "Categorize pages containing blacklisted links"
This resulted in doubling the appserver-memcached traffic across the
Wikimedia cluster.

This reverts commit 32b546a223.

Change-Id: I03e96a1bb223360e62d47f98a505cc5b26e5aadf
2014-03-31 09:06:56 +03:00
Jackmcbarn 32b546a223 Categorize pages containing blacklisted links
Add pages containing links that match the spam blacklist to a tracking
category.

Change-Id: I694860bc77d05dccd81522efc23225481d51ee43
2014-03-11 11:23:38 -04:00
Jackmcbarn 57a417d181 Add an API action to test blacklisted URLs
Add API action spamblacklist, accepting parameter url, that returns the
parts of the URLs that match the spam blacklist, if any.

Bug: 54441
Change-Id: Ia6aea8b463fc63f951224520b8cc5abf185c5c74
2014-02-04 19:17:38 +00:00
jenkins-bot acaf4262d9 Merge "Log blacklist hits to Special:Log" 2013-09-08 00:58:41 +00:00
jenkins-bot c200e09a54 Merge "Better display of parts blocked by SpamBlacklist." 2013-09-04 15:29:57 +00:00
jenkins-bot 4646193da1 Merge "Remove duplicated blocked parts reported by SpamBlacklist." 2013-09-04 15:27:57 +00:00
daniel a3defb8b91 (bug 51621) Make SBL aware of ContentHandler.
This changes SpamBlacklist to make use of the new, ContentHandler
aware hooks.

This change also includes some refactoring and cleanup which made
the migration to the new hooks easier.

Change-Id: I21e9cc8479f2b95fb53c502f6e279c8a1ea378a5
2013-08-24 19:55:55 +02:00
Liangent 3840ef915c Remove duplicated blocked parts reported by SpamBlacklist.
Change-Id: I58f0faa9db7620b2413ffa4156f8262a8bb3e71a
2013-08-13 12:39:24 +00:00
daniel dfb25d3f73 Fix missing default blacklist.
The default blacklist was no longer being used, because the line
$wgSpamBlacklistFiles =& $wgBlacklistSettings['spam']['files']
initialized $wgBlacklistSettings['spam']['files'] to null,
and BaseBlacklist::__construct then overrode $this->files
with null.

Change-Id: I22448bfb87eef6dd86b61362f3eb6bb2198a10b6
2013-07-31 12:52:38 +02:00
Kunal Mehta 48076ea845 Log blacklist hits to Special:Log
Adds a logBlacklistHit function function, which is callend whenver a new url
matches the blacklist. A new log type of "spamblacklist" is created, and is only
viewable by people with the "spamblacklistlog" userright. This is given to sysops
by default.

By default this is disabled, and is controlled by $wgLogSpamBlacklistHits.

Bug: 1542
Change-Id: I7b1ee2b3bb02b693d695bf66d157e2c33526c919
2013-07-22 13:50:52 -07:00
Liangent f29979286c Better display of parts blocked by SpamBlacklist.
Currently if a rule says "go.gle" and http://www.google.com/ is being
added, "http://www.google" is displayed as the blocked part. This
doesn't look quite nice. Now it just displays "google".

Change-Id: I0851c00d38129a8e9910c65100998eb3f1e5b2c2
2013-01-12 04:01:41 +08:00
Siebrand Mazeland e9874344aa Maintenance for SpamBlacklist extension.
* Replace deprecated methods. MediaWiki 1.19 required.
* Replace <tt> with <code>.
* Update documentation.
* Use WikiPage instead of Article for doEdit().
* Use __DIR__ instead of dirname( __FILE__ ).
* Remove superfluous newlines.

Change-Id: I3a0e42ca404638f7c7934c316735ad11cbc99d42
2012-09-03 16:50:18 +02:00
Platonides 85583cd4f4 (Bug 35023) The spam blacklist doesn't act on protocol-relative links.
Change-Id: Ibe15cdf62d0099f10fb73f56ce0dfee2abac7f35
2012-07-14 19:41:29 +02:00
jarry1250 20058848ab Other half of fix for bug #30332 ("API spamblocklist error should
provide all blocked URLs").

SpamBlacklist extension to provide all matched URLs to
spamPageWithContent() rather than just one. Performance
hit negligible and zero for all edits that don't hit the
SpamBlacklist (99.999%+).

DEPENDENT ON OTHER HALF OF FIX (now in core):
https://gerrit.wikimedia.org/r/3740

Change-Id: Ia951d5795c5cedb6c3876be89f8a08f110004102
2012-03-27 21:42:49 +01:00
Sam Reed 856be3bc29 Bug 35156 - Harmonise spelling of getArticleID() and getArticleId()
Mass change ->getArticleId() to ->getArticleID()
2012-03-11 19:04:37 +00:00
Sam Reed 8468534f8e Manually apply r110682 to trunk 2012-02-03 20:15:02 +00:00
John Du Hart aaf4d74d18 Adding Email blacklisting to the SpamBlacklist extension
This relies on r109111
2012-01-18 23:29:37 +00:00
John Du Hart 62b2bde146 Refactored SpamBlacklist to be extendable for other blacklist types
This is the groundwork for Bug 33761
2012-01-17 06:13:46 +00:00
Tim Starling 220ac94681 Match protocol-relative URLs. Patch by Anaconda. 2012-01-02 23:58:18 +00:00
Roan Kattouw 769640ee5c Fix misspelled constant in r95663 2011-08-31 19:07:44 +00:00
Roan Kattouw b98c200706 Last commit to make WMF-deployed extensions HTTPS-ready (hopefully): use wfExpandUrl() in a bunch of places
* SpamBlacklist: code is weird but I'm pretty sure this needs HTTP
* ContributionTracking: expand return URL to current protocol. Use HTTP in the test suite (PROTO_CURRENT makes no sense in tests since they run from the command line)
* GlobalUsage: remove URL expansion, not needed after r95651
* CentralNotice: expand URL because it gets fed to window.location indirectly via JS
* OpenSearhXml: use canonical URLs in XML output
* MobileFrontend: expand a URL that's used in a Location: header
2011-08-29 14:37:47 +00:00
Alexandre Emsenhuber b16bb18e5a Dropped pre-1.12 compatibility code 2011-05-27 19:26:00 +00:00
Sam Reed d6131ea82d Kill/update callers for some deprecated code 2011-05-06 23:52:52 +00:00
Mark A. Hershberger 9cc1d19d23 PLEASE TEST: Bug #26332 — Patch that I think should fix the problem
according to the comments, but needs more testing

* Also, a one line w/s fix up
2011-05-03 20:23:35 +00:00
Sam Reed b3de09a381 More undefined variables 2011-01-23 10:34:56 +00:00
Sam Reed 7e97019b2e Conditionals in loops to foreachs 2010-10-29 21:30:20 +00:00
Chad Horohoe e86cfdacb4 More php4-style constructors. I think thats most of them 2010-08-30 17:11:45 +00:00
Sam Reed 5df9b1cc11 Remove some more unused globals
Kill a couple of other unused variables
2010-07-25 17:12:50 +00:00
Chad Horohoe e3978dc584 Get rid of the last (I think) php4-style calls to wfGetDB() 2010-02-13 23:03:40 +00:00
Siebrand Mazeland e26cb735b1 (bug 21387) Make $ regex work for the URLs. Patch contributed by Platonides.
Bug comment: Set PCRE_MULTILINE on spamblacklist regexes. $ on spam blacklist regex should match the end of the url (not of the text) so it can be used to match only the mainpage. Since the candidate urls are already joined with a new-line separator, it's just setting PCRE_MULTILINE on the regex.
2010-01-09 18:43:34 +00:00
Chad Horohoe 2e1c0ed6d9 Remove getHttp() method and just call Http::get() directly. 2009-05-18 00:48:07 +00:00