Commit graph

489 commits

Author SHA1 Message Date
Gabriel Wicke c4e7544f60 Merge "Use various RDFa types for links" 2012-07-24 20:17:42 +00:00
GWicke dfe082a258 Merge "Added html2wt command-line option to parse.js" 2012-07-24 18:26:50 +00:00
Subramanya Sastry 01d6f17a3e Renamed ext.Util.js to mediawiki.Util.js
Change-Id: I909847686c8239a0b00cbaa9a0b1583826ee1487
2012-07-24 13:07:53 -05:00
Subramanya Sastry 85fb2ab9ed Addressed review comments from recently merged debug_output branch.
* Got rid of mergeProperties monkey-patch from core-upgrade.
* Reformatted class defns in mediawiki.parser.defines.js.
* Protected unconditional tokenization of list handler output with an
  env.trace check.
* Other minor formatting fixes to respect 80-100 column code width
  guideline.

Change-Id: Ida769e0e239b01a813b2d30a65aba60216262a43
2012-07-24 13:05:04 -05:00
Gabriel Wicke fe97271394 Use various RDFa types for links
* the used RDFa types for links are now identical to those listed in
  http://www.mediawiki.org/wiki/Parsoid/RDFa_vocabulary, and are supported for
  serialization
* Editors are responsible for adjusting the type when converting between link
  types. Adding a caption to an mw:UrlLink for example should convert it into
  an mw:ExtLink.

Update: rebased on top of trace patches

Change-Id: Ie1b882e2b3fbad08be94769e1167dccd8dfea65d
2012-07-24 10:59:31 -07:00
Gabriel Wicke 9aa22ca0e2 Implement plain image mw:Image and eliminate data-gen
* Source-based round-tripping now uses typeof="mw:Placeholder" instead of
  data-gen.
* mw:Image is supported for round-tripping, but not yet for modifications as
  it is still source-based

Change-Id: Ie5cf4e54de0163168c25c2b5c09380657a15970f
2012-07-24 10:55:12 -07:00
Subramanya Sastry ddc6899b1b Added html2wt command-line option to parse.js
* A quick way to see how a html fragment serializes
* Minor JSHint tweaks.

Change-Id: I901398ec15700905c53de6f39cde93400b2f964a
2012-07-24 10:30:44 -05:00
Subramanya Sastry 3004f3d3db Additional work on readable tokenizer debug output
* Added custom toString functions for extlink and wikilink
* Other minor tweaks

Change-Id: Ife2f840bfa0e77a86acdfdbfb574a230c9bd29dd
2012-07-20 18:12:37 -05:00
Subramanya Sastry f558f3a33c Added 'href' key to anonymous KV wikilink and isbn attribute.
* This makes wikilink attrs more similar to ext links.
* Added 'content' key to ISBN links, but couldn't add it to regular
  wikilinks yet because of complexity of how they are handled in
  the rest of the pipeline.  Changing this requires fixing up other
  parts down the pipeline -- something for later.
* Fixed up wikilink handler to use named lookup for 'href' and
  'tail' rather than positional lookup.  Content lookup is still
  positional as before.

Change-Id: I657b1f338d38df3cfdfa99f27ac46e7fe1c9fd65
2012-07-20 18:12:37 -05:00
Subramanya Sastry 6b8a4b386e Removed utility functions from mediawiki.parser.environment
* these functions have already been added to ext.Util.js
* removed a couple jshint warnings.
* minor code restructuring in tokensToString and comments
  to better indicate what is going on.

Change-Id: I9d6a03cc35075e1a64d8fac9e167a3ce4ccd9424
2012-07-20 18:12:37 -05:00
Subramanya Sastry f87ed719e6 Added utility methods to ext.Util.js
* Copied over utility methods from mediawiki.parser.environment.js
  to ext.Util.js.
* Moved over utility method from mediawiki.parser.defines.js to
  ext.Util.js.
* Converted Util to be a singleton object rather than an allocatable
  class.  There is no reason to allocate a new utility class everywhere
  since this utility object has no useful state.
* Fixed up use of utility methods to use Util rather than env.

Change-Id: Ib81f96b894f6528f2ccbe36e1fd4c3d50cd1f6b7
2012-07-20 18:12:37 -05:00
Subramanya Sastry 82d9c31532 Added missing var keyword
Change-Id: Ia7469b4f43bc0b34f045792561d4ee24f9228773
2012-07-20 18:12:37 -05:00
Subramanya Sastry d0cdd2e5f8 In trace mode, wrap transform to output trace info
- Added extra debug_name parameter to addTransform which is
  used in addTransform to output useful trace info.

Change-Id: I160ba0c45f681149375e32ab19f97baa439b09a8
2012-07-20 18:12:37 -05:00
Subramanya Sastry f0a465abcb Output chunk tokens to console only in trace mode
Change-Id: I03ae705bf2679233c4d4b07c25915edd8110a2e8
2012-07-20 18:12:37 -05:00
Subramanya Sastry c148470204 Further refinement of readable pretty-printing of tokens.
Change-Id: I8343e5c0e2a17116920b8585ce8e5d9bc8826286
2012-07-20 18:12:37 -05:00
Subramanya Sastry 6999246fec More fixes on the way to readable debug/trace output.
Change-Id: I3ab637b7790c359255b8bc0ad5716da12bb25884
2012-07-20 18:12:37 -05:00
Subramanya Sastry 2ee1514552 Added mergeProperties function to Object.prototype
Change-Id: I50346029a9bd8a7d2cf954b0cca011f73a6fae07
2012-07-20 18:12:37 -05:00
Subramanya Sastry d994d1292e First pass updating debug output
Change-Id: I1e0953d7b26b962e3758dd1091e87a5361257abc
2012-07-20 18:12:37 -05:00
Gabriel Wicke e1b2555db0 Merge "Start to use the tokenCollector for links" 2012-07-20 23:07:24 +00:00
Catrope 6ca1b06641 Merge "Rename addInterwiki to setInterwiki and add removeInterwiki complement" 2012-07-20 22:41:05 +00:00
Gabriel Wicke b344e140aa Start to use the tokenCollector for links
Now that we have access to the contents we can more easily compare the content
with link targets. This is still to do- this commit only converts the link
handler to work on the collected tokens.

* Start to implement latest RDFa spec from
  http://www.mediawiki.org/wiki/Parsoid/RDFa_vocabulary
* Capitalize types, add mw:Entity type for html entities
* Detect changes to entities using tokenCollector and srcContent

Change-Id: I45429f4b930858a16e166ef8377c8f6f5114c414
2012-07-19 14:56:32 -07:00
GWicke 977cbafbe2 Merge "Added support for hacky use of dl before tables." 2012-07-19 18:06:25 +00:00
Subramanya Sastry c5f9961423 Added support for hacky use of dl before tables.
* This is the equivalent of commit a0746946 to
  core/includes/Parser.php

Change-Id: Id093e39dabad29cb275bd21325d39bfeb7709d98
2012-07-19 10:56:16 -07:00
Gabriel Wicke 9e0ee30446 Merge "Document GPL license for Parsoid" 2012-07-19 03:19:12 +00:00
Gabriel Wicke b91467e144 Document GPL license for Parsoid
This is just to avoid re-licensing along with VE. We want to be compatible
with MediaWiki core to make sure a closely-integrated C port is still
GPL-compatible. We could consider adding MIT to the JS implementation after
porting to C.

Change-Id: Ia83e8620e26c95625793438c4c5e8ddcf2702368
2012-07-18 20:14:13 -07:00
Gabriel Wicke 68c5a6efc6 Collect tokens in a tokencollector and use cb for processing
This is work in progress, but committed for now so I can use it for links and
tweak it while doing so.

Change-Id: I757277f6efacda6d9432ca57542a957f597a98de
2012-07-18 16:18:38 -07:00
Gabriel Wicke 681b0d4d40 Merge "Rename data-mw into data-rt" 2012-07-17 17:20:37 +00:00
Subramanya Sastry 80d74e1c16 Changed add/remove/get transforms.
* This code change is an attempt to address the FIXME about constant
  resorting of transformations in _getTransforms.  This caches sorted
  transformations and selectively clears/updates the cache on add/remove.

Change-Id: If24a807b84d494aa4e5597339039a5573a30905e
2012-07-17 12:03:48 -05:00
Gabriel Wicke 3172afb750 Rename data-mw into data-rt
This hopefully makes it clearer that data-rt contains private round-trip info
instead of semantically interesting data.

Change-Id: I03b476ed112a4b627c9871ee3677c450f943429a
2012-07-16 12:10:08 -07:00
GWicke 235739e253 Merge "Bug fix and minor code cleanup." 2012-07-13 22:40:20 +00:00
Subramanya Sastry 141ce901d2 Bug fix and minor code cleanup.
Change-Id: Ic446c8822bf1b8a859e045119782d7b8a40c5544
2012-07-13 17:39:30 -05:00
Gabriel Wicke 1e902fc050 Merge "Encapsulate token collection" 2012-07-13 21:08:52 +00:00
Gabriel Wicke e329455d55 Encapsulate token collection
* Arbitrary predicate support for the termination of collection mode
* tokens as property of the collector instead of a state-global thing

Change-Id: Ibcb342bc64a76fece9b04a760ea56c7878e67cad
2012-07-13 13:57:04 -07:00
GWicke 97bc6cd5d7 Merge "Serializer fixes" 2012-07-13 20:36:36 +00:00
Subramanya Sastry f4c6ba8545 Serializer fixes
* Fixed image serializer to deal with missing 'v' value in a k-v pair
  representing an image attribute.
* Added fix to deal with bare <li>'s (without surrounding <ul> tags)

NOTE: The second fix is required currently to deal with bugs in the parser
as it deals with complex cases.  But, in the future, we could deal with
this in one of the following ways:
(a) The serializer expects a well-formed DOM and all cleanup will be
    done as part of external tools/passes.
(b) The serializer supports a small set of exceptional cases and bare
    list items could be one of them
(c) The serializer ought to handle any DOM that is thrown at it.

Yet to be resolved.

Change-Id: Ib585e5c9f2a8a80854740ce0211bde705f9fd6f4
2012-07-13 15:33:09 -05:00
GWicke a742ec5ffc Merge "Fixed parser and serializer to deal with a 4+ length dash sequence." 2012-07-13 20:14:34 +00:00
Subramanya Sastry 49ed0d3adf Fixed parser and serializer to deal with a 4+ length dash sequence.
Change-Id: If7caaefec1ad55e7604712ef959ff0c843392adf
2012-07-13 15:12:09 -05:00
Gabriel Wicke ef92024b99 Rename addInterwiki to setInterwiki and add removeInterwiki complement
This makes it clear that setInterwiki can modify existing mappings, and adds a
similar method for the removal of existing mappings.

Change-Id: Ic603a4b2ccec35d086513fa7cf711702bfb2baa0
2012-07-13 12:40:30 -07:00
Subramanya Sastry e529ae7e0e Serializer fix for empty headings (BUG-33089)
Change-Id: Ia7b018335ac9e31938052473fc47ce38443fdeb4
2012-07-05 16:50:48 -05:00
GWicke 46d6502ca5 Merge "Fix for Bug 37913" 2012-06-30 08:56:48 +00:00
Gabriel Wicke 1736e52bfb Abstract out chunk emission from tokenizer
Patch by Adam Wight, fixes bug #35377.

Change-Id: I183baeed8dd78e7d3c775f44d62bec8e6f9fc608
2012-06-30 10:39:12 +02:00
Subramanya Sastry 166e7a75c9 Fix for Bug 37913
* Strips the first paragraph tag in a list item or table cell context
  if there are no attributes on it and stx:html is not set

Change-Id: I74988645fe505c662f86488e32d0f11d464ffe41
2012-06-29 23:47:59 -05:00
Gabriel Wicke 9ddc863d89 Up entity name length limit even further
There are some really long names in
http://www.w3.org/2003/entities/2007xml/unicode.xml

Change-Id: I0138c9610bb288cd8f29e3600b8a21f932e7bcd9
2012-06-29 23:38:10 +02:00
Gabriel Wicke cf7f437966 Match named entities with up to eight chars
The longest entries in
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references.

Change-Id: I2c9f102fe6a905e339e12520d08c1b1b0a4002d8
2012-06-29 23:15:30 +02:00
Gabriel Wicke 370fb607c8 Insert separation between adjacent pres
Change-Id: I55aa649b4e076cae32b3c970d6384ab2ed4cdd6c
2012-06-29 23:05:06 +02:00
Gabriel Wicke 6c8dfa26fa Escape ampersands in entities from plain text DOM content
Change-Id: I0826077cf48b67e38a525090be66411c38d7b65f
2012-06-29 23:02:21 +02:00
Subramanya Sastry 5874d9a5f1 More thumb roundtripping fixes.
* Looks like I misled myself in commit 88fc91 -- that wikitext
  roundtripped perfectly because it went through the 'src' route
  because it was a thumbnail with an explicit image which doesn't
  go through renderThumb -- so, the serializer simply spit out the
  original 'src' string and hence perfect rt :).
* More whitespace preserving fixes in LinkHandler.
* Also changed resource value in the img tag to use the original
  filename rather than the normalized capitalized filename.
* 2 more parsertests rt -- now upto 400.

Change-Id: I144a6486dd9d07da8a74a68700fe96c78d192826
2012-06-29 00:30:13 -05:00
Subramanya Sastry ba6a304102 Prettified Wikitext Constants hash
* Something to be said for code alignment - easier on the eye!
* Maybe a good case for breaking mediawiki coding guidelines.
* But, happy to abandon commit if not useful. :)

Change-Id: I1133af488f572ac7f8727be9108e08e14c4e6420
2012-06-28 19:08:48 -05:00
Subramanya Sastry 88fc91a292 Next round of image roundtripping fixes
* Changed PrefixImageOptions so that thumb and thumbnail are
  distinct key-value pairs.  Without this fix, cannot distinguish
  between thumb=foo.jpg and thumbnail=foo.jpg
* Fixed link handler so whitespace is preserved around prefixed image
  options.
* Fixed figure handler to process the 3 different kind of image options:
  size, simple image options, and prefixed image options.
* There is a hack/fixme for "upright: aspect" prefixed image option
  which needs to be looked into.
* Still need to fix uppercasing of the image resource name.

With these fixes, the following wikitext roundtrips perfectly
(after newline breaks are removed)

[[Image:Foo.jpg|thumbnail = 'baby.jpg'|100x100px|center| alt =bbbbb|
upright=true|bottom|link='http://foo.bar'|
This is a [[Linked Caption]] in the image]]

Change-Id: I6606df56874c2b97f00f08cb6bbeaec9878167d3
2012-06-28 18:55:47 -05:00
Subramanya Sastry 11e7c1031a Created a constants object for extracting wikitext markup properties.
* For now, extracted image markup options out of the link handler.
* This info will also be used by the serializer.
* More properties/global constants can be moved into this structure
  over time.

Change-Id: I4cfbfd703f42e93fbad52b38b435f68d8a5c22ee
2012-06-28 17:45:17 -05:00