Now that we have access to the contents we can more easily compare the content
with link targets. This is still to do- this commit only converts the link
handler to work on the collected tokens.
* Start to implement latest RDFa spec from
http://www.mediawiki.org/wiki/Parsoid/RDFa_vocabulary
* Capitalize types, add mw:Entity type for html entities
* Detect changes to entities using tokenCollector and srcContent
Change-Id: I45429f4b930858a16e166ef8377c8f6f5114c414
This is just to avoid re-licensing along with VE. We want to be compatible
with MediaWiki core to make sure a closely-integrated C port is still
GPL-compatible. We could consider adding MIT to the JS implementation after
porting to C.
Change-Id: Ia83e8620e26c95625793438c4c5e8ddcf2702368
This is work in progress, but committed for now so I can use it for links and
tweak it while doing so.
Change-Id: I757277f6efacda6d9432ca57542a957f597a98de
* This code change is an attempt to address the FIXME about constant
resorting of transformations in _getTransforms. This caches sorted
transformations and selectively clears/updates the cache on add/remove.
Change-Id: If24a807b84d494aa4e5597339039a5573a30905e
This hopefully makes it clearer that data-rt contains private round-trip info
instead of semantically interesting data.
Change-Id: I03b476ed112a4b627c9871ee3677c450f943429a
* Arbitrary predicate support for the termination of collection mode
* tokens as property of the collector instead of a state-global thing
Change-Id: Ibcb342bc64a76fece9b04a760ea56c7878e67cad
* Fixed image serializer to deal with missing 'v' value in a k-v pair
representing an image attribute.
* Added fix to deal with bare <li>'s (without surrounding <ul> tags)
NOTE: The second fix is required currently to deal with bugs in the parser
as it deals with complex cases. But, in the future, we could deal with
this in one of the following ways:
(a) The serializer expects a well-formed DOM and all cleanup will be
done as part of external tools/passes.
(b) The serializer supports a small set of exceptional cases and bare
list items could be one of them
(c) The serializer ought to handle any DOM that is thrown at it.
Yet to be resolved.
Change-Id: Ib585e5c9f2a8a80854740ce0211bde705f9fd6f4
This makes it clear that setInterwiki can modify existing mappings, and adds a
similar method for the removal of existing mappings.
Change-Id: Ic603a4b2ccec35d086513fa7cf711702bfb2baa0
* Strips the first paragraph tag in a list item or table cell context
if there are no attributes on it and stx:html is not set
Change-Id: I74988645fe505c662f86488e32d0f11d464ffe41
* Looks like I misled myself in commit 88fc91 -- that wikitext
roundtripped perfectly because it went through the 'src' route
because it was a thumbnail with an explicit image which doesn't
go through renderThumb -- so, the serializer simply spit out the
original 'src' string and hence perfect rt :).
* More whitespace preserving fixes in LinkHandler.
* Also changed resource value in the img tag to use the original
filename rather than the normalized capitalized filename.
* 2 more parsertests rt -- now upto 400.
Change-Id: I144a6486dd9d07da8a74a68700fe96c78d192826
* Something to be said for code alignment - easier on the eye!
* Maybe a good case for breaking mediawiki coding guidelines.
* But, happy to abandon commit if not useful. :)
Change-Id: I1133af488f572ac7f8727be9108e08e14c4e6420
* Changed PrefixImageOptions so that thumb and thumbnail are
distinct key-value pairs. Without this fix, cannot distinguish
between thumb=foo.jpg and thumbnail=foo.jpg
* Fixed link handler so whitespace is preserved around prefixed image
options.
* Fixed figure handler to process the 3 different kind of image options:
size, simple image options, and prefixed image options.
* There is a hack/fixme for "upright: aspect" prefixed image option
which needs to be looked into.
* Still need to fix uppercasing of the image resource name.
With these fixes, the following wikitext roundtrips perfectly
(after newline breaks are removed)
[[Image:Foo.jpg|thumbnail = 'baby.jpg'|100x100px|center| alt =bbbbb|
upright=true|bottom|link='http://foo.bar'|
This is a [[Linked Caption]] in the image]]
Change-Id: I6606df56874c2b97f00f08cb6bbeaec9878167d3
* For now, extracted image markup options out of the link handler.
* This info will also be used by the serializer.
* More properties/global constants can be moved into this structure
over time.
Change-Id: I4cfbfd703f42e93fbad52b38b435f68d8a5c22ee
* Minor refactoring
* Cleared src in dataAttribs in renderThumb since we can serialize
thumbs now (or at least we can once all bugs are fixed and missing
pieces are handled).
Change-Id: If18865801cdd3d89c1477e68bfa3e13107c45b40
Anything with data-gen="both" and dataAttribs.src defined serializes to
dataAttribs.src and drops its contents (if any). We can use this to round-trip
elements we don't properly parse or serialize yet. Without RDFa info, the
editor will not touch the contents after encountering data-gen="both".
Change-Id: Ia39e5fdd765c2c9b36f26313455685d29f118839
* Don't consider them for auto-numbered links
* Don't insert a trailing space if the content is empty
These links are still wrapped in nowiki on round-tripping since the
valid/invalid url determination is done in the LinkHandler and not the
Tokenizer as it is configuration-dependent. Not incorrect for rendering (and
perhaps easier to understand for humans too), but might still introduce a
dirty diff. We'll still need reconciliation / damage tracking in the end ;)
Change-Id: I959ebc1b7f81d110a1141bb38ba5ee97f52ebf96
This only applies to newly created headings, so headings with a single newline
preceding them will be round-tripped that way.
Change-Id: Ic09972bbd25c3934b53f6fd3b5be5a0c3185c2af
* Collect all figure tokens and process them as a chunk
* This effectively mimics context-sensitive DOM walking,
but since we need serialization supported on a token stream,
we cannot use real DOM walking. The current technique should
also work on a token stream.
* There is a FIXME about the image filename being capitalized.
This needs fixing in the parser or some other way of recognizing
original unnormalized filenam.
Amended by gwicke:
* Build option list and join it with pipe to avoid stray trailing pipe
* Satisfy JSHint's weird preference to have '&&' and '||' at the end of the line
Change-Id: I1e5f6600f297fcdf81e3227a82ca3b71d4e97fc3
This is a zero-length tsr for now (and thus not 100% correct), but will do the
job for starttag / endtag range establishment
Change-Id: Iedd50ad319aa8d5916434fb6744deb04e031e456
* Removed dead commented out code.
* Cleaned up newline handling in serializer some more.
* Now, onNewLine and onStartOfLine reflect serializer state
more accurately.
* No implicit new lines for explicit html tags.
* 9 more roundtrip tests now green.
Change-Id: I9f640de2ae769c7472538fa687400dc8a40c2b2d