Review regex usage, and use an alternative where possible, to improve
performance.
* Add PHPUtils::stripPrefix() and PHPUtils::stripSuffix(). Benchmark in
doc comment.
* /foo/ -> str_contains()
* /^foo/ -> str_starts_with()
* /^f/ -> ($s[0] ?? '') === 'f'
* /foo$/ -> str_ends_with()
* /^(foo|bar)$/ -> in_array(), benchmark suggests 10x improvement
* preg_replace(/foo/) -> str_replace()
* preg_replace(/^[abc]/) -> strspn(), benchmark suggests 3x improvement.
Curiously, it is faster without a limit for short input strings,
although a limit presumably adds robustness.
* preg_replace(/[abc]+$/) -> rtrim()
* preg_match_all() -> substr_count()
* In DOMUtils::hasTypeOf(), use explode() instead of a regex. Validated
by a benchmark.
* In DOMUtils::addTypeOf(), stop normalizing adjacent spaces. This
allows us to use implode(explode()) without a filtering loop. The
patch to Ext/Cite/References.php was to remove spaces added by this
change. The parserTests.txt changes were a consequence of the
References.php change.
* In LinkHandlerUtils::getHref() I allowed a single bare slash to be
counted as a path-absolute URL since I think that was the intention of
the original code.
* In LinkHandlerUtils::getLinkRoundTripData() I captured the portion of
interest from the previoulinkHandlers regex instead of running a
second regex.
* LinkHandlerUtils::linkHandler() had the regex
/^mw:WikiLink|mw:MediaLink$/ which I think was a bug, missing
parentheses. I fixed the bug.
The margins are pretty tight for a lot of these. Using polyfills for
str_contains() etc. might change the conclusion.
Also:
* In DOMUtils::matchTypeOf(), avoid calling hasAttribute().
getAttribute() is documented as returning an empty string if the
attribute does not exist.
Change-Id: I8d7bdf1bccc869b4dc17058a5822ef34968471e6
Follow up to 47dd898
Also renames a variable to be consistent in the two places we get
contents for the ref.
Change-Id: I13e61b8911ff16549fbb0888b9c3313ed5e7701e
Follow up to 47dd898
Fixes the test case found in rt,
php bin/parse.php --domain ceb.wikipedia.org --pageName "Martin Van Buren" --offsetType ucs2 < /dev/null
The offsetType is necessary so that the ConvertOffsets pass runs. The
crasher here is because the embedded html still contains the sealed ref
fragments because we've stored the unprocessed html.
Change-Id: Ic1e1c3e54433bf6d7574420c2eade1349261de0b
Restores linkbacks for ref-in-ref.
Follow up to 568034a where it's noted that it's fine to maintain
linkbacks for ref-in-ref, as long as the ref isn't a named ref that's
trying to redefine the contents for that name, in which case we embed
the contents.
A test case for this can be,
```
<ref name="hiho">off to work</ref>
{{#tag:ref|<i>we go <ref name="ohno">ohno</ref></i>|name="hiho"}}
{{#tag:ref|<i>we go <ref name="ohno2">ohno2</ref></i>|name="test"}}
```
The linkback to #cite_ref-ohno2_3-0 is present while continuing to
suppress the dangling linkback to #cite_ref-ohno_2-0, since that's in
embedded content. On master, both linkbacks are unnecessarily
suppressed.
Bug: T289331
Change-Id: Ifcf7464e86a4408f5dd9e2fd6d3aa47a0670ca49
This will be helpful in a subsequent patch where we make use of that
data while processing refs in refs. Content differing implies that
we'll be embedding it for roundtripping, rather than putting in the dom.
Change-Id: I7bd1d4c503fc58f862960bec82ca514fc29d7eff
This moves determining if we already have a reference created for a
named ref outside of that function, which is helpful for making use of
the cached html for that ref earlier.
Change-Id: Ie416bd95b980f9f95111d7e420945f40e2ada747
The message was only shown when a new reflist was inserted, or when
any references were changed.
Bug: T284472
Change-Id: I7c1e981c93bf7e163f9fb747aad30a24e9a497f1
There is the patch(I4297aea3489bb66c98c664da2332584c27793bfa) which will
add DeprecationHelper trait to Parser class in order to deprecate public
Parser::mUser. DeprecationHelper trait has appropriate magic methods
which help to use dynamic properties. In order not to mock them via
createMock(), so getMockBuilder() and onlyMethods() was used.
onlyMethods() method helps to specify methods which need to be mocked.
Now we can use dynamic properties in Parser related tests of Cite
extension.
Bug: T285713
Change-Id: Ie75c9cd66d296ce7cf15432e2093817e18004443
The standard type for these returns is NodeList and HTMLCollection, which
are almost *but not quite* the same as an array. In two places we got a
little complacent and assumed our non-standard DOMCompat workarounds would
always return arrays. Tweaked the types of DOMCompat to report that they
return an `iterable`, which is a PHP7.1 "pseudo-type" that unifies
arrays and \Traversable types like HTMLCollection/NodeList. This
allows phan to catch places where we slip up and assume an array type
return.
It does introduce a new wrinkle, though, since there is no simple way
to turn an iterable into an array. We're using a simple
`iterable_to_array` helper function for this.
Change-Id: I35bdeb3afa30ef5182e71733a0a606aadcafb435
In PHP's DOM extension, one of the legacy bugs is that
DOMNode::getAttribute() can never return `null` (to indicate that the
attribute is missing), instead it returns an empty string in that
case. This isn't (modern) spec-compliant behavior (it's a leftover
from ancient times) and we had to watch this carefully when porting
from JS.
In the time since the port, we've written new code and embedded this
assumption that DOMNode::getAttribute() will never return null into
the new code we've written. Fix this. Always use `getAttribute(...)
?? ''` (unless we're just doing an equality test against a non-empty
string, or the code is preceded by a `hasAttribute` test) so that our
code will work whether or not getAttribute returns null for a missing
attribute.
Change-Id: If33200e1053b2dd79abb5dfb3808c05ff3a0bbba
This now aligns with Parsoid commit 88d4620278988d121761fb440952d1d66a70ce99
Required some newline fixes to resync after "Refactor newline logic"
(change I6691c70f8e3fa3f21e2d11035bed9cdc2dc87093 /
commit 6389459b1e) was merged
this morning.
Change-Id: I64fba6cc9330a55d4e1eeb5371164b3eb4efa508