formatSummary() was first parsing the summary using the
summary parser, then handing off the resulting HTML to
getTextSnippet() which parsed it again with the normal parser.
Bug: T131087
Change-Id: I2724ccb7c23579b3f02dea57d4fc833079169adf
The previous implementation did the following weird things:
* Stripped tags before parsing
* Stripped templates before parsing using a hacky while loop
that bails after ten attempts
* Decoded entities using htmlspecialchars_decode(), while
html_entity_decode() makes more sense
* ...which meant it had to manually convert   back
to spaces, which is not necessary if you use html_entity_decode()
* Removed any single braces ('{' and '}') from the output
* Rejected the entire output if there were any entities left,
which is fairly likely since htmlspecialchars_decode()
only decodes a few of them
Instead of all this, just parse, strip tags, decode entities
(all of them, not just a few), trim and truncate. In particular,
don't strip templates, because we use getTextSnippet() in mention
notifications, which look weird when {{ping}} templates are stripped.
Bug: T129531
Change-Id: I956b2f6badc40d2f5bf90a0458ccab8b8fc6fefb
getTextSnippet() has a `Language` type hint that will fatal if $wgLang
is a StubUserLang object, so make sure we unstub it if nothing else
already has.
Bug: T118542
Change-Id: I847680074fbbf95bbe3b6002151d2a18c45ebe6e
To avoid using $wgLang directly. We still have to use it in
detectSectionTitleAndText for now though.
Change-Id: Ic901ed05d4e8f6291caa55d866ce58f7300880f5
This preference has been disabled since bug 47562, and doesn't make
sense to keep around given that the flyout is the main interaction most
users have with Echo.
Change-Id: I7e8ddf96dbde9a95ac01a0cc83bad396151d01bd
Pull out the logic that extracts usernames from links. This allows
it to be reused by the LQT->Flow import code.
Bug: T101979
Change-Id: Ib16a09cf1f388f56944cd1bb564384535728156e
* Do not default section to footer. If the section
is not found, it is left empty and the notification
message is simpler.
* Change notification-edit-talk-page-email-batch-body2
Replace : at the end with . so it does not look
incomplete.
Bug: T99989
Change-Id: Ic982a81eada388d750760787245dea8f72368147
Link the bottom of the talk page and use the edit summary as text
if the parser failed to find something. This is what core's enotif
does already.
Change-Id: Iadc7011ea2627e00f0c51472da7aad1355afeddb
* Parser generates signature to compare against
* Signature can be overwritten per wiki, in NS_MEDIAWIKI
* Such overwritten default can be different depending on
page the signature is on[1]
* Our comparison signature generation was page-agnostic
(always from Title::newMainPage)
* Signatures didn't match up on own talk pages, where
default signature is different
Also added 2 new tests cases & improved tests by also
setting the page
1: https://en.wikipedia.org/w/index.php?title=MediaWiki%3ASignature&diff=176507985&oldid=176229132
Bug: T78424
Change-Id: Ice151d4d16236a5d1556ef62805b61310c7beb85
Previously, there were a couple of hacks in play.
It was also not picking up ~~~ (signature without timestamp)
And it relied an a nasty regular expression which, although
based on Parser, may some day get out of date.
And it relied heavily on a specific signature format, which
isn't guaranteed (it's an i18n msg)
This patch changes the approach: it will use a very simple
regex to match links, and will send those through Parser to
generate the signature anew. My reasoning is that that should
be exactly the same as what Echo just received (should've
also gone through parser)
Biggest discomfort of this approach is that it's much stricter.
It should still match whatever it generated from a ~~~ or ~~~~,
but no longer the e.g. not-real signatures we were doing in
our tests. Also had to update our tests, because signatures
change depending on anon. So I had to generate all the users.
And fix some of the signature formats used in the tests.
Bug: T75426
Bug: T87852
Bug: T75366
Bug: T78424
Change-Id: Ibeff36397129fdd5d376f3668a23a45f9a014525
In some languages the \w+ does not match the characters used
when translating UTC and the regular expression attempting to
match the timezone fails. Testing in prod wikis where this fails
such as ne.wikipedia.org shows it still works, it just generates
a more generic regular expression.
Since the overall process still works acceptably on the wikis outputting
warnings this patch just adds a guard to prevent the warning and does
not attempt to fix the underlying issue.
Bug: T76558
Change-Id: If8e1ddd2d642b042cc24c51d5ba5aa8b34bc9552
There were two different circumstances that could trigger echo's signature
detection to fail: multibyte characters in signature, and signatures near
$wgMaxSigChars limit that expanded past the limit due to wfEscapeWikiText().
This patch adjusts to use mb_substr to appropriatly handle the multibyte
characters, and adds a couple extra charactesr to $wgMaxSigChars to allow
for wfEscapeWikiText(). This isn't perfect, but a stricter implementation
would require much more work than i think we should spend here.
Bug: 73426
Change-Id: Ic51c2bc2a08600f188db13a9a0537f1321c9a655
Currently echo attempts to find a signature by looking for a series of
strings starting with what it thinks are the current aliases of NS_USER
and NS_USER_TALK. This has shown to be error prone, see the linked bug
for how a change to ru.wikipedia.org/wiki/Mediawiki:Signature broke
mention notifications.
Patch switches things arround to pull wikilinks out of the text and run
them through the Title class. The results of this parsing are checked
for NS_USER and NS_USER_TALK, giving a much stronger guarantee of finding
translated namespaces.
Bug: 71353
Change-Id: Ib0d0f4e068339d2fd28761087c05f5a1acb3c1fc
General code cleanup as reported by the PHPStorm static code
analysis. I hope it's not a problem that I made a lot of very
different (but all very tiny) changes in a single patch. If you
want to merge this but you think it's better to split it into
several patches first, please tell me.
Change-Id: I2e2c4bb47f8d20e038d28e236e2ff813b30504af
The code was looking at the [0] element for the matched position
of timestamps, while preg_match returns it in the [1] element.
Bug: 53132
Change-Id: Ibfd3f2b86b007f28f73a137defb80276fb830d28
Follows-Up: I6c636b055bcd25760aee848aea71fe4044c7e1be
Echo's detection of section links was limited to the main heading that have
==Foo==, with exactly two ==. This updates the regexp patterns involved to
correctly detect(and hence, link) to sub sections if thats where the edit was
made.
Bug: 48484
Change-Id: Iedbe3404ec265a7f2183629b463a3d672dc9098e
Detects changes to different parts of the document as independent from each
other. Refactored parser passes all tests the previous parser passed plus
a number of new tests which fail with the original parser.
Change-Id: I65fdc6d9f922cbe9ff684332945def3776c70d30
Adjusted the edit-user-talk event creation to detect and record which section
of the talk page was edited. Flyout, special page, and email messages have
been adjusted to use this section title as a URL fragment when available.
Bug: 46937
Change-Id: I161e2ffda2f2540f64de90cc621fb3b69479d0db
This patch makes Echo talk page notification mimic the existing Orange Bar and Email talk page notification
for minor edit.
For the Orange Bar, minor edit notification is sent if the editor does not have nominornewtalk
permission.
There are additional rules for the email, minor edit notification is sent if global $wgEnotifMinorEdit
is true and notification recipient has enotifminoredits option on.
Change-Id: Ib3835c4dd57a3686b227c44710a14ab06cded166
GNU diff and mediawiki's internal UnifiedDiffFormatter do not have
the same default formats. Here we adjust the output of the internal
diff to match gnu diff as is expected by DiscussionParser.
Bug: 41689
Change-Id: Ib83cacab41adfbdfa8e122c0494b266d4caefc83
If this pref is turned off, we revert to the old orange bar talk
page notifications. Depends on core change Ifc8fbaf8.
Bug: 46550
Change-Id: If21f3aac51e484c5e077c7f4b5a2218e8b71ed2a