Commit graph

23 commits

Author SHA1 Message Date
Arlo Breault 680df4379c Use inReferencesContent flag to get rid of processRefsInReferences
It's sufficient to handle this case in processRefs.

Also moves $referencesGroup to the ReferencesData instance, rather than
passing it around as a variable (inconsistently).

Change-Id: I8637e3ce644642259e353d0df3d9c0dbc3102c7b
2020-11-24 17:22:01 +00:00
Arlo Breault e3ca32c9ff Add method to check if in references content
More specific than just embedded content, needed for adding errors in
follow up patches.

Change-Id: I4bf659cd208c3322870e3ea0126bda4a2a7037d8
2020-11-23 18:53:03 +00:00
Arlo Breault 4310b6a243 Mark up cite errors in embedded content
It's a feature of named refs that we only know at the time of inserting
the references list whether they have content or not, and are therefore
in err.  The strategy of 4438a72 was to keep pointers to all named ref
nodes so that if an error does occur, we can mark them up.

The problem with embedded content is that, at the time when we find out
about the errors, it's been serialized and stored, and so any pointers
we might have kept around are no longer live or relevant.  We need to go
back and process all that embedded content again to find where the refs
with errors are hiding.

This patch slightly optimizes that by keeping a map of all the errors
for refs in embedded content so that only one pass is necessary, rather
than for each references list.  Also note that, in the common case, this
pass won't run since we won't have any errors in embedded content.

Bug: T266356
Change-Id: I32e7bfa796cd4382c43b3b1d17b925dc97ce9f7f
2020-11-06 18:31:26 -05:00
Arlo Breault c675396445 Fix adding 'cite_error_group_refs_without_references' to unnamed refs
Follow up to 02fb17d, which was only iterating over named refs.

Bug: T51538
Change-Id: I1a1ce39029c2e9e6e29e768675bcde266ccf3247
2020-11-06 13:14:03 -05:00
Arlo Breault 049735ba0e Clean up signatures of ref group accessors
No need to hedge on null.

Change-Id: I2afb7619a113d784741bd7d29eccf4d8368fe56f
2020-11-06 17:45:18 +00:00
Arlo Breault 0254f138ab Suppress linkbacks for all refs in embedded content
Not just for refs in references content, since they'll be equally
inaccessible everywhere.

Change-Id: Id0a2361b41d9b8103e011ff4f809fa0809169bb3
2020-11-06 17:45:16 +00:00
Arlo Breault 1dda4cdc8a Consolidate adding ref errors at references insertion
Change-Id: I01ce55989fb7b822320c63ddad19c2edf7e03bf9
2020-10-29 15:54:30 -04:00
Arlo Breault 8e237b4e34 Make $inEmbeddedContent an explicit stack
Change-Id: I48ff2f7be352fdec72b2c5e0eeee843330ec3872
2020-10-23 11:42:45 -04:00
Arlo Breault 6bd0594f28 Don't keep pointers to nodes from embedded content
Since the fragment they're subtrees of goes out of scope.

Follow up to 2f09cdb

Previous to that patch this wasn't an issue because we were creating a
whole document which is retained by the environment.

Fixes the warnings from,
"PHP Warning: DOMElement::getAttribute(): Couldn't fetch DOMElement"
https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2020.10.02/parsoid-tests?id=AXTqaLL12lgCwKx7fVYz&_g=h@a06543d

Tested on scandium with,
node bin/roundtrip-test.js --proxyURL http://scandium.eqiad.wmnet:80 --parsoidURL http://DOMAIN/w/rest.php --domain vi.wikipedia.org "Vua_Việt_Nam"

Change-Id: I74bc7de79b18054e19b77af25e978d3ab3a505e4
2020-10-02 15:57:33 -04:00
Arlo Breault 2f09cdb732 One document to rule them all
The description in T179082 suggests that by using one document for the
entire parse, we'd probably see some performance gains from not having
to import nodes when we get to the top level pipeline and we'd avoid the
validation errors from 19a9c3c.

However, the spec seems to suggest creating a new document when parsing
an HTML fragment,
https://html.spec.whatwg.org/#html-fragment-parsing-algorithm

And, indeed, domino implements it that way,
12a5f67136/lib/htmlelts.js (L84-L96)

So, the request in T217705 may be a little misguided.

What then is this patch good for?  In T221790 the ask is that
sub-pipelines produce DocumentFragment which make for cleaner interfaces
and less confusion when migrating children.

The general outline here is that a document is created when the
environment is constructed that gives us the 1-1 correspondence.
Sub-pipelines do create their own documents for the purpose of tree
building, as in the fragment parsing algorithm, but are then immediately
imported to DocumentFragments to be used for the rest of the
post-processing passes.

Bug: T221790
Bug: T179082
Bug: T217705
Change-Id: Idf856d4e071d742ca38486c8ab402e39b3c8949f
2020-09-29 22:36:33 +00:00
Arlo Breault bb34d30839 Follow up to "follow" functionality for Cite
These refs get a `style="display: none;"` since they're
not intended to be user visible.

Follow refs with errors conform to the proposed spec in T251842

Bug: T51538
Change-Id: Ie4ea28e7f9afde24614874bb4b8e07c5cabafa12
2020-09-10 12:41:06 -04:00
sbailey 467b82701b Adding "follow" functionality to the Cite extension
* Interim state commit with experimental code.

 * Updates to citeParserTests.txt to check now valid follow
   functionality and newly passing tests.

 * Added to follow refs, <sup style="display: none;" about=...
   to suppress display of hidden sups needed for VE to use
   in editing follow refs.

 * Added code to implemented follow functionality and catch
   invalid usage.

Bug: T51538
Change-Id: Ic3ac8237fd2c490cfaf2fe799759742f72f10686
2020-09-09 19:25:14 -04:00
Arlo Breault d6bcc0ef14 Prefer nullable types in comments
This was done with a custom sniff in,
MediaWiki/Sniffs/Commenting/FunctionCommentSniff.php

`$singleType === 'null' && count( $explodedType ) === 2`

since there's some ambiguity with,

`what|type|null`

but also a case like the following is left out,

`string[]|null`

Change-Id: I1bd50a4486d7ef4974280b476fd03d3ee53232b3
2020-07-29 14:24:32 -04:00
sbailey 4438a72297 Adding error handling for cite refs with name but no content
* Detects grouped and named refs that fail to define content.

* Uses group and name ref list tracking info to back patch
  'mw:Error' and i18n error key string into the data-mw
  section of all instances of named refs that all fail to
  define content.

* The failures for test References: 7b is because selser is
  arguable smarter than wt2wt. The newline before the references
  list has been randomly deleted but selser manages to restore it
  from source. wt2wt doesn't put the references tag on a line by
  itself, even though it asks for block format, because it isn't
  a new list - (these comments are from Arlo's review)

* Added test: "References: 7b. Multiple references tags some with
  errors..." to ensure that refs with and without content errors
  grouped and named do not cross references section boundaries.

Bug: T51538
Change-Id: I884fc337165506c5abbef18bcd5a5fca015786d2
2020-06-25 14:58:08 -04:00
Subramanya Sastry 0cc3ca1b98 Move DomSourceRange to Core; ParsoidExtensionApi to Ext
* At this point, DSR is a first-class Parsoid concept and
  extensions will need to use this as well. So, make it part
  of the Core/ namespace to capture high-level concepts that
  might be used outside Parsoid itself.

* Move ParsoidExtensionApi to the Ext directory since that is
  where it best belongs.

Change-Id: If824c4af9e2f8d658f1cb726cbd837222b60790d
2020-03-16 15:52:08 +00:00
Subramanya Sastry 14d9ed27f0 Remove direct access to Sanitizer from extension code
* Proxy all accesses to the santiizer via appropriately named methods
  in the ParsoidExtensionApi interface

Bug: T242746
Change-Id: I9d3d98639bb98b4abe404139786517591323d61d
2020-02-20 23:23:22 -06:00
Subramanya Sastry d0a9c42c98 Cite: Remove more Parsoid internals knowledge
* Remove use of $env from ReferencesData and RefGroup by
  providing high-level helpers in ParsoidExtensionAPI.

  - Given a fragment id, provide helpers to fetch fragment DOM
    or fragment HTML
  - Fetch the URI for the current page (being parsed)

* There is still a lot of subtle knowledge Cite has about
  how data-parsoid and data-mw attributes are held off to the
  side in a bag and all the pp* and load/store manipulation
  of those attributes. It would be an interesting exercise
  to purge this implementation of those notions OR figure out
  high-level concepts that we document as being part of Parsoid
  reality that we'll forever support.

Bug: T242746
Change-Id: I29ff154f2f17123b9756dfd2f3b422f0b30222b1
2020-02-11 19:47:28 +00:00
C. Scott Ananian 5d200e0bf0 Move all code from Parsoid to Wikimedia\Parsoid namespace
This matches core conventions.

Bug: T240054
Change-Id: I5feb8a6b41503accd01a740195256e9092609272
2020-02-03 21:34:49 +00:00
Arlo Breault e6204a1561 Test against ref name length instead of coercing to bool
Since "0" is falsy in php.

Couple tests now pass.

Change-Id: I9b62b9f78680de6e1d5c31723af7212a58a535f3
2019-08-14 18:59:28 -04:00
Pavel Astakhov 005176a355 Port Cite extension
* All wt2wt, html2wt, and all but one html2html tests pass in
  hybrid mode when entire html2wt code is run in PHP

  Set "Serializer: true" in the html2wt section of phpconfig.yaml

* The single failing html2html test is a <gallery> test which is
  presumably related to the unported <gallery> extension code, but
  not sure. Not investigating it now.

* Update Parsoid Extension API to provide access to extension source
  without exposing internals.

Change-Id: I6d6e21ad2324acfc4306b32c9055d6c088708c48
2019-06-21 16:23:42 -05:00
C. Scott Ananian 320d045ee8 Update automatically-generated PHP files w/ latest js2php
Mostly comment formatting improvements, some significant code changes
to the JS side.

Change-Id: I7a8f2105173df74dc09f2024d68268f5dc6fa632
2019-06-05 17:13:34 -04:00
Arlo Breault 05cb13ddf9 Make extensions with post-processors return constructors
This allows us to finish the cleanup started in 0b3bb10 and inline
setupProcessors.

Change-Id: Ia7840091607e9a75153031b5db7600d5a0018da6
2019-04-03 18:44:21 +00:00
Arlo Breault 20c627e3f4 Convert cite extension to es6 class structure
Also, runs js2php on these files.

Change-Id: Id8ee13ad536d75f63e0045a21fdfdb667a0df65d
2019-04-03 12:20:41 -04:00