Commit graph

505 commits

Author SHA1 Message Date
Subramanya Sastry e2197d4d89 Refactored and cleaned up Sanitizer + bug fix.
* Moved all constants to a singleton constants object
* Moved couple methods to mediawiki.Util.js
* Fixed url regexp bug -- relative urls weren't being matched
  but, this bug escaped through because the previous url regexp
  had a typo (which means url sanitization wasn't being done by
  the previous code).  This also means we have to either find
  sanitizer tests or add new ones so these bugs are caught.  Maybe
  parserTests.txt is not the right place for this?

Change-Id: Ia05e1d1596bb9bc4a9eb21d7c77248f5626a710e
2012-07-30 12:52:26 -05:00
GWicke 604a15df21 Merge "Sanitize html tags not on a whitelist." 2012-07-30 17:46:26 +00:00
GWicke 14d36a3d13 Merge "Continued port of the PHP sanitizer." 2012-07-30 17:39:11 +00:00
Gabriel Wicke 77e9b656c8 Link handling improvements
* Dynamically select piped / simple wikilinks in the serializer
* Use generic attribute shadowing, drop sHref
* Actually handle modifications to non-piped wikilinks
* Properly escape anything that looks like a percent-encoded char
* Add stripSuffix utility method
* Fix a bunch of JSHint warnings in the serializer in particular

Change-Id: I2d8954f6b665093676ccc5dd5437ea9b37c014ad
2012-07-30 10:30:10 -07:00
Gabriel Wicke b18193907f Some more attribute accessor/shadow methods
Change-Id: I2ea56054cfd534c7cca7567742438b86eef326ab
2012-07-30 10:27:57 -07:00
Subramanya Sastry 7f90b79d19 Sanitize html tags not on a whitelist.
* Need auditing to see if the whitelist is complete.
* Added <gallery> and <tag> to the whitelist since they seem to
  come from some extensions and parser tests depend on them.
  May need to use parsoid config or parser hooks/extension info
  to extend sanitizer whitelist.  For now, probably okay.

Change-Id: Id9ecdd96843e8f6ca65e8666807dec1015443d49
2012-07-28 11:33:51 -05:00
Subramanya Sastry a4274685b2 Continued port of the PHP sanitizer.
* CSS sanitization code now ported.
* Could use non-port review another day to see if the ported code makes
  sense and has any gotchas.
* More sanitizer parser tests now green.
* Could use a lot more aggressive addition of parser tests.

Change-Id: I9df003540bd31f327f5307472c9f7dcbbe7b4342
2012-07-27 10:52:39 -05:00
Gabriel Wicke 065bb50369 Merge "Rename data-rt to data-parsoid" 2012-07-26 22:48:56 +00:00
GWicke 96c9d4cb67 Merge "Minor: trace output tweak + code refactoring" 2012-07-26 22:45:49 +00:00
Gabriel Wicke cd6f8ecbbe Fix parserTests include paths
The recent directory move broke parserTests, fix it for now. Will need
refixing once me migrate to our own repo.

Change-Id: I014001cd6904d1dea3f9417c9cde9c80ab079232
2012-07-26 15:42:22 -07:00
Gabriel Wicke 72c5efedb8 Rename data-rt to data-parsoid
* Quite unique according to google, and more obvious
* Also adjust parserTests to ignore mw:Nowiki and mw:Placeholder spans

Change-Id: I340e85092b60a65b4053a40bf8c238e26cb49c96
2012-07-26 15:27:32 -07:00
Subramanya Sastry 1d46cdae08 Minor: trace output tweak + code refactoring
Change-Id: Ic8f51749e84edb7741f5bcea467d647682ef1958
2012-07-26 15:43:02 -05:00
Subramanya Sastry 25419d028a First pass porting PHP's sanitizer to Parsoid
* Ported attribute sanitization code (and related functions) from
  core/includes/Sanitizer.php
* Added dummy flags and set to true (use of rdfa, microdata attrs,
  and html5 mode).
* Removed couple whitelisted sanitizer tests.
* A few sanitizer tests now pass.
* More work to be done.

Change-Id: I19c92bbfcb57f3e97a7af1b7c5f63772e427dae4
2012-07-26 11:35:55 -05:00
Trevor Parscal a5b7d6d7c7 Merge "Parsoid: move tests/parser to modules/parser/test" 2012-07-25 21:14:13 +00:00
Gabriel Wicke e3af50ea68 Add generic attribute accessors
* addAttribute and getAttribute do the obvious and simple thing
* addNormalizedAttribute remembers an unnormalized version of an attribute and
  supports change detection for the normalized attribute
* getAttributeSource retrieves the original attribute if there was a
  normalized version which was not changed, or the current value (potentially
  based on the normalized version) otherwise. For use by the
  WikitextSerializer.

Change-Id: I72533cf6cfff1ddb88be2501653c7c47d270898c
2012-07-25 11:58:24 -07:00
Carl Fürstenberg dca609e3f9 Parsoid: move tests/parser to modules/parser/test
In preparation for the big extraction of Parsoid out of VisualEditor,
we'll start by moving the tests into the parsoid location.

Change-Id: I4a926ee4aad1490d4f769d44e91af80842b881f0
2012-07-25 02:29:25 +02:00
Gabriel Wicke c4e7544f60 Merge "Use various RDFa types for links" 2012-07-24 20:17:42 +00:00
GWicke dfe082a258 Merge "Added html2wt command-line option to parse.js" 2012-07-24 18:26:50 +00:00
Subramanya Sastry 01d6f17a3e Renamed ext.Util.js to mediawiki.Util.js
Change-Id: I909847686c8239a0b00cbaa9a0b1583826ee1487
2012-07-24 13:07:53 -05:00
Subramanya Sastry 85fb2ab9ed Addressed review comments from recently merged debug_output branch.
* Got rid of mergeProperties monkey-patch from core-upgrade.
* Reformatted class defns in mediawiki.parser.defines.js.
* Protected unconditional tokenization of list handler output with an
  env.trace check.
* Other minor formatting fixes to respect 80-100 column code width
  guideline.

Change-Id: Ida769e0e239b01a813b2d30a65aba60216262a43
2012-07-24 13:05:04 -05:00
Gabriel Wicke fe97271394 Use various RDFa types for links
* the used RDFa types for links are now identical to those listed in
  http://www.mediawiki.org/wiki/Parsoid/RDFa_vocabulary, and are supported for
  serialization
* Editors are responsible for adjusting the type when converting between link
  types. Adding a caption to an mw:UrlLink for example should convert it into
  an mw:ExtLink.

Update: rebased on top of trace patches

Change-Id: Ie1b882e2b3fbad08be94769e1167dccd8dfea65d
2012-07-24 10:59:31 -07:00
Gabriel Wicke 9aa22ca0e2 Implement plain image mw:Image and eliminate data-gen
* Source-based round-tripping now uses typeof="mw:Placeholder" instead of
  data-gen.
* mw:Image is supported for round-tripping, but not yet for modifications as
  it is still source-based

Change-Id: Ie5cf4e54de0163168c25c2b5c09380657a15970f
2012-07-24 10:55:12 -07:00
Subramanya Sastry ddc6899b1b Added html2wt command-line option to parse.js
* A quick way to see how a html fragment serializes
* Minor JSHint tweaks.

Change-Id: I901398ec15700905c53de6f39cde93400b2f964a
2012-07-24 10:30:44 -05:00
Subramanya Sastry 3004f3d3db Additional work on readable tokenizer debug output
* Added custom toString functions for extlink and wikilink
* Other minor tweaks

Change-Id: Ife2f840bfa0e77a86acdfdbfb574a230c9bd29dd
2012-07-20 18:12:37 -05:00
Subramanya Sastry f558f3a33c Added 'href' key to anonymous KV wikilink and isbn attribute.
* This makes wikilink attrs more similar to ext links.
* Added 'content' key to ISBN links, but couldn't add it to regular
  wikilinks yet because of complexity of how they are handled in
  the rest of the pipeline.  Changing this requires fixing up other
  parts down the pipeline -- something for later.
* Fixed up wikilink handler to use named lookup for 'href' and
  'tail' rather than positional lookup.  Content lookup is still
  positional as before.

Change-Id: I657b1f338d38df3cfdfa99f27ac46e7fe1c9fd65
2012-07-20 18:12:37 -05:00
Subramanya Sastry 6b8a4b386e Removed utility functions from mediawiki.parser.environment
* these functions have already been added to ext.Util.js
* removed a couple jshint warnings.
* minor code restructuring in tokensToString and comments
  to better indicate what is going on.

Change-Id: I9d6a03cc35075e1a64d8fac9e167a3ce4ccd9424
2012-07-20 18:12:37 -05:00
Subramanya Sastry f87ed719e6 Added utility methods to ext.Util.js
* Copied over utility methods from mediawiki.parser.environment.js
  to ext.Util.js.
* Moved over utility method from mediawiki.parser.defines.js to
  ext.Util.js.
* Converted Util to be a singleton object rather than an allocatable
  class.  There is no reason to allocate a new utility class everywhere
  since this utility object has no useful state.
* Fixed up use of utility methods to use Util rather than env.

Change-Id: Ib81f96b894f6528f2ccbe36e1fd4c3d50cd1f6b7
2012-07-20 18:12:37 -05:00
Subramanya Sastry 82d9c31532 Added missing var keyword
Change-Id: Ia7469b4f43bc0b34f045792561d4ee24f9228773
2012-07-20 18:12:37 -05:00
Subramanya Sastry d0cdd2e5f8 In trace mode, wrap transform to output trace info
- Added extra debug_name parameter to addTransform which is
  used in addTransform to output useful trace info.

Change-Id: I160ba0c45f681149375e32ab19f97baa439b09a8
2012-07-20 18:12:37 -05:00
Subramanya Sastry f0a465abcb Output chunk tokens to console only in trace mode
Change-Id: I03ae705bf2679233c4d4b07c25915edd8110a2e8
2012-07-20 18:12:37 -05:00
Subramanya Sastry c148470204 Further refinement of readable pretty-printing of tokens.
Change-Id: I8343e5c0e2a17116920b8585ce8e5d9bc8826286
2012-07-20 18:12:37 -05:00
Subramanya Sastry 6999246fec More fixes on the way to readable debug/trace output.
Change-Id: I3ab637b7790c359255b8bc0ad5716da12bb25884
2012-07-20 18:12:37 -05:00
Subramanya Sastry 2ee1514552 Added mergeProperties function to Object.prototype
Change-Id: I50346029a9bd8a7d2cf954b0cca011f73a6fae07
2012-07-20 18:12:37 -05:00
Subramanya Sastry d994d1292e First pass updating debug output
Change-Id: I1e0953d7b26b962e3758dd1091e87a5361257abc
2012-07-20 18:12:37 -05:00
Gabriel Wicke e1b2555db0 Merge "Start to use the tokenCollector for links" 2012-07-20 23:07:24 +00:00
Catrope 6ca1b06641 Merge "Rename addInterwiki to setInterwiki and add removeInterwiki complement" 2012-07-20 22:41:05 +00:00
Gabriel Wicke b344e140aa Start to use the tokenCollector for links
Now that we have access to the contents we can more easily compare the content
with link targets. This is still to do- this commit only converts the link
handler to work on the collected tokens.

* Start to implement latest RDFa spec from
  http://www.mediawiki.org/wiki/Parsoid/RDFa_vocabulary
* Capitalize types, add mw:Entity type for html entities
* Detect changes to entities using tokenCollector and srcContent

Change-Id: I45429f4b930858a16e166ef8377c8f6f5114c414
2012-07-19 14:56:32 -07:00
GWicke 977cbafbe2 Merge "Added support for hacky use of dl before tables." 2012-07-19 18:06:25 +00:00
Subramanya Sastry c5f9961423 Added support for hacky use of dl before tables.
* This is the equivalent of commit a0746946 to
  core/includes/Parser.php

Change-Id: Id093e39dabad29cb275bd21325d39bfeb7709d98
2012-07-19 10:56:16 -07:00
Gabriel Wicke 9e0ee30446 Merge "Document GPL license for Parsoid" 2012-07-19 03:19:12 +00:00
Gabriel Wicke b91467e144 Document GPL license for Parsoid
This is just to avoid re-licensing along with VE. We want to be compatible
with MediaWiki core to make sure a closely-integrated C port is still
GPL-compatible. We could consider adding MIT to the JS implementation after
porting to C.

Change-Id: Ia83e8620e26c95625793438c4c5e8ddcf2702368
2012-07-18 20:14:13 -07:00
Gabriel Wicke 68c5a6efc6 Collect tokens in a tokencollector and use cb for processing
This is work in progress, but committed for now so I can use it for links and
tweak it while doing so.

Change-Id: I757277f6efacda6d9432ca57542a957f597a98de
2012-07-18 16:18:38 -07:00
Gabriel Wicke 681b0d4d40 Merge "Rename data-mw into data-rt" 2012-07-17 17:20:37 +00:00
Subramanya Sastry 80d74e1c16 Changed add/remove/get transforms.
* This code change is an attempt to address the FIXME about constant
  resorting of transformations in _getTransforms.  This caches sorted
  transformations and selectively clears/updates the cache on add/remove.

Change-Id: If24a807b84d494aa4e5597339039a5573a30905e
2012-07-17 12:03:48 -05:00
Gabriel Wicke 3172afb750 Rename data-mw into data-rt
This hopefully makes it clearer that data-rt contains private round-trip info
instead of semantically interesting data.

Change-Id: I03b476ed112a4b627c9871ee3677c450f943429a
2012-07-16 12:10:08 -07:00
GWicke 235739e253 Merge "Bug fix and minor code cleanup." 2012-07-13 22:40:20 +00:00
Subramanya Sastry 141ce901d2 Bug fix and minor code cleanup.
Change-Id: Ic446c8822bf1b8a859e045119782d7b8a40c5544
2012-07-13 17:39:30 -05:00
Gabriel Wicke 1e902fc050 Merge "Encapsulate token collection" 2012-07-13 21:08:52 +00:00
Gabriel Wicke e329455d55 Encapsulate token collection
* Arbitrary predicate support for the termination of collection mode
* tokens as property of the collector instead of a state-global thing

Change-Id: Ibcb342bc64a76fece9b04a760ea56c7878e67cad
2012-07-13 13:57:04 -07:00
GWicke 97bc6cd5d7 Merge "Serializer fixes" 2012-07-13 20:36:36 +00:00