Introduce preprocessing in the HTML -> WT direction

* This patch introduces a preprocessing step on the edited DOM.

* Existing preprocessing code has been extracted into the
  preprocessDOM method.

  Any registered extensions preprocessors are invoked on the DOM.
  So, this assumes that the htmlPreprocess extension listener is only
  applicable to the edited DOM. If we want to expose the concept of
  selective serialization through the API, we may want to add an
  additional interface method / listener to the DOMProcessor class.

  As of this patch, this is somewhat theoretical since there are no
  such extension handlers registered on either DOM. Future patches
  can clarify this better as specific needs arise.

* The handler also calls the serializer's custom preprocessing steps.
  This step is applicable to both the original as well as edited DOM
  (since DOM Diff is impacted by the results). If a need arises,
  in the future, we may introduce a new extension DOM processor method
  that applies to both original and edited DOMs.

* Right now, only selser strips section tags and non-selser wts
  doesn't need to. So, preprocessDOM there is empty. Additional
  selser-only DOM preprocessing will show up in later patches.

* Moved a stub HTML->WT preprocessor in Cite extension to RefProcessor.

Bug: T254501
Change-Id: I0c12afb2ea82617406d72ad872ac4f33678fa5f2
This commit is contained in:
Subramanya Sastry 2020-09-21 13:22:54 -05:00
parent 0667a01637
commit f00325d6cc
2 changed files with 15 additions and 23 deletions

View file

@ -3,9 +3,7 @@ declare( strict_types = 1 );
namespace Wikimedia\Parsoid\Ext\Cite;
use DOMNode;
use Wikimedia\Parsoid\Ext\ExtensionModule;
use Wikimedia\Parsoid\Ext\ParsoidExtensionAPI;
/**
* Native Parsoid implementation of the Cite extension
@ -41,24 +39,4 @@ class Cite implements ExtensionModule {
]
];
}
/**
* html -> wt DOM PreProcessor
*
* This is to reconstitute page-level information from local annotations
* left behind by editing clients.
*
* Editing clients add inserted: true or deleted: true properties to a <ref>'s
* data-mw object. These are no-ops for non-named <ref>s. For named <ref>s,
* - for inserted refs, we might want to de-duplicate refs.
* - for deleted refs, if the primary ref was deleted, we have to transfer
* the primary ref designation to another instance of the named ref.
*
* @param ParsoidExtensionAPI $extApi
* @param DOMNode $body
* @suppress PhanEmptyPrivateMethod
*/
private static function html2wtPreProcessor( ParsoidExtensionAPI $extApi, DOMNode $body ) {
// TODO
}
}

View file

@ -3,6 +3,7 @@ declare( strict_types = 1 );
namespace Wikimedia\Parsoid\Ext\Cite;
use DOMElement;
use DOMNode;
use Wikimedia\Parsoid\Ext\DOMProcessor;
use Wikimedia\Parsoid\Ext\ParsoidExtensionAPI;
@ -25,5 +26,18 @@ class RefProcessor extends DOMProcessor {
}
}
// FIXME: should implement an htmlPreprocess method as well.
/**
* html -> wt DOM PreProcessor
*
* Nothing to do right now.
*
* But, for example, as part of some future functionality, this could be used to
* reconstitute page-level information from local annotations left behind by editing clients.
*
* @param ParsoidExtensionAPI $extApi
* @param DOMElement $root
*/
public function htmlPreprocess( ParsoidExtensionAPI $extApi, DOMElement $root ): void {
// TODO
}
}