Commit graph

2163 commits

Author SHA1 Message Date
GWicke 46d6502ca5 Merge "Fix for Bug 37913" 2012-06-30 08:56:48 +00:00
Gabriel Wicke 1736e52bfb Abstract out chunk emission from tokenizer
Patch by Adam Wight, fixes bug #35377.

Change-Id: I183baeed8dd78e7d3c775f44d62bec8e6f9fc608
2012-06-30 10:39:12 +02:00
Subramanya Sastry 166e7a75c9 Fix for Bug 37913
* Strips the first paragraph tag in a list item or table cell context
  if there are no attributes on it and stx:html is not set

Change-Id: I74988645fe505c662f86488e32d0f11d464ffe41
2012-06-29 23:47:59 -05:00
Gabriel Wicke 604aae2f3f Merge "Up entity name length limit even further" 2012-06-29 21:39:54 +00:00
Gabriel Wicke 9ddc863d89 Up entity name length limit even further
There are some really long names in
http://www.w3.org/2003/entities/2007xml/unicode.xml

Change-Id: I0138c9610bb288cd8f29e3600b8a21f932e7bcd9
2012-06-29 23:38:10 +02:00
Gabriel Wicke 5c12dae254 Merge changes I2c9f102f,I55aa649b
* changes:
  Match named entities with up to eight chars
  Insert separation between adjacent pres
2012-06-29 21:25:07 +00:00
Gabriel Wicke cf7f437966 Match named entities with up to eight chars
The longest entries in
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references.

Change-Id: I2c9f102fe6a905e339e12520d08c1b1b0a4002d8
2012-06-29 23:15:30 +02:00
Gabriel Wicke da4a2e13d1 Merge "Escape ampersands in entities from plain text DOM content" 2012-06-29 21:11:23 +00:00
Gabriel Wicke 370fb607c8 Insert separation between adjacent pres
Change-Id: I55aa649b4e076cae32b3c970d6384ab2ed4cdd6c
2012-06-29 23:05:06 +02:00
Gabriel Wicke 6c8dfa26fa Escape ampersands in entities from plain text DOM content
Change-Id: I0826077cf48b67e38a525090be66411c38d7b65f
2012-06-29 23:02:21 +02:00
Translation updater bot 5fd82a18ab Localisation updates from http://translatewiki.net.
Change-Id: If6dab08c26f7c6b8d136c6c530171c2ecc989b77
2012-06-29 20:18:16 +00:00
Subramanya Sastry 5874d9a5f1 More thumb roundtripping fixes.
* Looks like I misled myself in commit 88fc91 -- that wikitext
  roundtripped perfectly because it went through the 'src' route
  because it was a thumbnail with an explicit image which doesn't
  go through renderThumb -- so, the serializer simply spit out the
  original 'src' string and hence perfect rt :).
* More whitespace preserving fixes in LinkHandler.
* Also changed resource value in the img tag to use the original
  filename rather than the normalized capitalized filename.
* 2 more parsertests rt -- now upto 400.

Change-Id: I144a6486dd9d07da8a74a68700fe96c78d192826
2012-06-29 00:30:13 -05:00
Subramanya Sastry ba6a304102 Prettified Wikitext Constants hash
* Something to be said for code alignment - easier on the eye!
* Maybe a good case for breaking mediawiki coding guidelines.
* But, happy to abandon commit if not useful. :)

Change-Id: I1133af488f572ac7f8727be9108e08e14c4e6420
2012-06-28 19:08:48 -05:00
Subramanya Sastry 88fc91a292 Next round of image roundtripping fixes
* Changed PrefixImageOptions so that thumb and thumbnail are
  distinct key-value pairs.  Without this fix, cannot distinguish
  between thumb=foo.jpg and thumbnail=foo.jpg
* Fixed link handler so whitespace is preserved around prefixed image
  options.
* Fixed figure handler to process the 3 different kind of image options:
  size, simple image options, and prefixed image options.
* There is a hack/fixme for "upright: aspect" prefixed image option
  which needs to be looked into.
* Still need to fix uppercasing of the image resource name.

With these fixes, the following wikitext roundtrips perfectly
(after newline breaks are removed)

[[Image:Foo.jpg|thumbnail = 'baby.jpg'|100x100px|center| alt =bbbbb|
upright=true|bottom|link='http://foo.bar'|
This is a [[Linked Caption]] in the image]]

Change-Id: I6606df56874c2b97f00f08cb6bbeaec9878167d3
2012-06-28 18:55:47 -05:00
Subramanya Sastry 11e7c1031a Created a constants object for extracting wikitext markup properties.
* For now, extracted image markup options out of the link handler.
* This info will also be used by the serializer.
* More properties/global constants can be moved into this structure
  over time.

Change-Id: I4cfbfd703f42e93fbad52b38b435f68d8a5c22ee
2012-06-28 17:45:17 -05:00
Translation updater bot 4d543c8949 Localisation updates from http://translatewiki.net.
Change-Id: I94ed36e57eb19d184f5212eae9178aafbd695474
2012-06-28 19:50:18 +00:00
Subramanya Sastry d9d584e1b8 Minor tweaks/fixes to LinkHandler
* Minor refactoring
* Cleared src in dataAttribs in renderThumb since we can serialize
  thumbs now (or at least we can once all bugs are fixed and missing
  pieces are handled).

Change-Id: If18865801cdd3d89c1477e68bfa3e13107c45b40
2012-06-28 13:14:52 -05:00
Gabriel Wicke 9f753d8009 Source-based round-tripping for behavior switches
Change-Id: I46d12d338314a8dbfdc9a8448a74680e67c3a720
2012-06-28 18:20:13 +02:00
Gabriel Wicke 39b82fc3fa Simple source-based round-tripping for category links
Change-Id: I5a8a03e74a95c6dceda432f0356cce6a3af77c67
2012-06-28 18:12:19 +02:00
Gabriel Wicke ff414ad825 Add generic source round-trip mode, and use it for plain images (for now)
Anything with data-gen="both" and dataAttribs.src defined serializes to
dataAttribs.src and drops its contents (if any). We can use this to round-trip
elements we don't properly parse or serialize yet. Without RDFa info, the
editor will not touch the contents after encountering data-gen="both".

Change-Id: Ia39e5fdd765c2c9b36f26313455685d29f118839
2012-06-28 17:44:26 +02:00
Gabriel Wicke 8976b66558 Merge "Fix round-tripping of invalid external links somewhat" 2012-06-28 15:30:31 +00:00
Gabriel Wicke e1a7d10063 Fix round-tripping of invalid external links somewhat
* Don't consider them for auto-numbered links
* Don't insert a trailing space if the content is empty

These links are still wrapped in nowiki on round-tripping since the
valid/invalid url determination is done in the LinkHandler and not the
Tokenizer as it is configuration-dependent. Not incorrect for rendering (and
perhaps easier to understand for humans too), but might still introduce a
dirty diff. We'll still need reconciliation / damage tracking in the end ;)

Change-Id: I959ebc1b7f81d110a1141bb38ba5ee97f52ebf96
2012-06-28 16:12:23 +02:00
Gabriel Wicke 4f94492f08 Merge changes I27bdc9c5,Ic09972bb
* changes:
  Update nowiki handling to latest spec; some fixes to it
  Default to two preceding newlines for headings for better readabilty
2012-06-28 14:02:00 +00:00
Gabriel Wicke 4dcd88fc5f Merge "Fix a crasher in unbalanced heading tokenization" 2012-06-28 13:57:29 +00:00
Gabriel Wicke 198e55a32b Update nowiki handling to latest spec; some fixes to it
346 round-trip tests are passing now (up from 343).

Change-Id: I27bdc9c5e010a13c2b4dddc6f263cbf9d3adac36
2012-06-28 14:57:05 +02:00
Gabriel Wicke 5b4cb03ee4 Default to two preceding newlines for headings for better readabilty
This only applies to newly created headings, so headings with a single newline
preceding them will be round-tripped that way.

Change-Id: Ic09972bbd25c3934b53f6fd3b5be5a0c3185c2af
2012-06-28 12:42:19 +02:00
Gabriel Wicke 17af335748 Fix a crasher in unbalanced heading tokenization
Example input:

=== foo ==

Old result:
http://www.mediawiki.org/w/index.php?title=VisualEditor:Test&diff=prev&oldid=554403

Change-Id: I0bc135884833607cedb62ec9c045310df3649dd8
2012-06-28 12:34:32 +02:00
Subramanya Sastry f995fc025a First pass serializing image thumbs.
* Collect all figure tokens and process them as a chunk
* This effectively mimics context-sensitive DOM walking,
  but since we need serialization supported on a token stream,
  we cannot use real DOM walking.  The current technique should
  also work on a token stream.
* There is a FIXME about the image filename being capitalized.
  This needs fixing in the parser or some other way of recognizing
  original unnormalized filenam.

Amended by gwicke:
* Build option list and join it with pipe to avoid stray trailing pipe
* Satisfy JSHint's weird preference to have '&&' and '||' at the end of the line

Change-Id: I1e5f6600f297fcdf81e3227a82ca3b71d4e97fc3
2012-06-28 11:29:10 +02:00
Gabriel Wicke 4e541223b0 Merge "Rename data-mw-gc to data-gen. Credit to James!" 2012-06-27 17:15:25 +00:00
Gabriel Wicke 4e86337a43 Merge "Add basic tsr on indent-pre end tag" 2012-06-27 17:11:38 +00:00
Gabriel Wicke 424a246b00 Rename data-mw-gc to data-gen. Credit to James!
Change-Id: Iacbe20b355ddf5f12fffb71ff4dd978ac4364928
2012-06-27 19:08:14 +02:00
Trevor Parscal 1067f84764 Merge "Removing some logging" 2012-06-27 16:42:34 +00:00
Trevor Parscal efe26d9be2 Merge "(bug 36201) Control-K triggers link inspector" 2012-06-27 16:41:51 +00:00
Christian Williams b9f6baba63 (bug 36201) Control-K triggers link inspector
Change-Id: I0ffd237ce51d1899d2151fb76243e818c5f5cfb8
2012-06-27 09:41:06 -07:00
Trevor Parscal 02e24d6a39 Merge "Bugzilla:37804 - Double bound events were causing double backspace bugs" 2012-06-27 16:20:43 +00:00
Trevor Parscal e1905341c1 Merge "Bugzilla:33093 - Shift-Enter splits at paragraphs instead of list items" 2012-06-27 16:20:08 +00:00
Gabriel Wicke df26663a3f Add basic tsr on indent-pre end tag
This is a zero-length tsr for now (and thus not 100% correct), but will do the
job for starttag / endtag range establishment

Change-Id: Iedd50ad319aa8d5916434fb6744deb04e031e456
2012-06-27 18:08:49 +02:00
Gabriel Wicke c02218c736 Merge changes Idfa5d6a8,I700142a5
* changes:
  Represent nowiki as span instead of meta
  Round-trip html entities and introduce data-mw-gc attribute
2012-06-27 16:07:48 +00:00
GWicke d4eb4ce741 Merge "Code cleanup and more newline fixes." 2012-06-27 13:26:22 +00:00
Subramanya Sastry 4d2a46fb44 Code cleanup and more newline fixes.
* Removed dead commented out code.
* Cleaned up newline handling in serializer some more.
* Now, onNewLine and onStartOfLine reflect serializer state
  more accurately.
* No implicit new lines for explicit html tags.
* 9 more roundtrip tests now green.

Change-Id: I9f640de2ae769c7472538fa687400dc8a40c2b2d
2012-06-27 15:23:22 +02:00
Gabriel Wicke a1d05976ce Merge "Small (and incomplete) fix to table cell tsr" 2012-06-27 12:45:39 +00:00
Gabriel Wicke 53451bfc50 Small (and incomplete) fix to table cell tsr
Change-Id: I14347939de32af698d7ce0b649165982908c49aa
2012-06-27 14:45:12 +02:00
Gabriel Wicke 7108ee985a Represent nowiki as span instead of meta
Change-Id: Idfa5d6a8ee7b2d17205779361ca69d075a79964d
2012-06-27 13:59:14 +02:00
Gabriel Wicke 0b9a420129 Round-trip html entities and introduce data-mw-gc attribute
297 round-trip tests are passing with this patch.

TODO:
* generalize data-mw-gc handling in the serializer for any tag
* use data-mw-gc="both" and data-mw.src: 'the wikitext' for round-tripping of
  wikitext structures, optionally with some presentational (but read-only)
  content
* use span and data-mw-gc="both" for nowiki

Change-Id: I700142a56818977c20c8c06e6a5f2e77a708d25e
2012-06-27 12:52:52 +02:00
Subramanya Sastry 1a504a5f54 Added tokenizer support for ----
Change-Id: Idc5519350d11ae91b2ec64553f847d56e22d63bb
2012-06-25 16:40:34 -05:00
Subramanya Sastry d5e6ec34aa Deleted dead PEG productions
Change-Id: I9b859f79f9900b3d320aa1ad0283a4b5ae6c4331
2012-06-25 13:17:01 -05:00
Translation updater bot 7fa2c97fc0 Localisation updates from http://translatewiki.net.
Change-Id: Ibe3268c865a843b5756f00fb396cac62cc37a09d
2012-06-24 20:07:18 +00:00
Translation updater bot 44e36f199a $COMMITMSG
Change-Id: Ieed1cb99b47c0032295ec1a259bb7b5f3d926ac3
2012-06-23 20:16:57 +00:00
Gabriel Wicke 08b5ed1a43 Use _inNewlineContext method instead of bare onNewline
This makes sure that we escape start-of-line syntax when needed, since
onNewline is often not yet set.

Discussion / background:
[19:18] <subbu> this will fix it, but, i think this is asking for another
minor refactoring of these flags ... because this is a subtle fix which means
it might be possible to make it clearer.  onNewline is one true in on
direction, i.e. if true, we are in a new line state, but if we are in a
newline context, onNewline is not true, which is why this new method is
needed.
[19:19] <subbu> i dont know if it is possible, but it seems like it shoudl be
possible.  but, something for later.
[19:20] <subbu> badly phraed.  "onNewline" ==> in new line context, but if in
new line context, onNewline may be false.
[19:20] <gwicke> we should perhaps update it as early as possible instead
[19:21] <subbu> i cannot today, but possible monday.  i am heading out in
about 15-30 mins.
[19:22] <gwicke> will need to check all conditions depending on it in
_serializeToken
[19:22] <subbu> oh, i misunderstood you :)
[19:22] <gwicke> and if there are cases where the onNewline / onStartOfLine
state could be reverted later
[19:23] <subbu> you were referring to the flag, i thought you meant we should
fix this sooner than later.
[19:23] <gwicke> yes, I wasn't terribly clear
[19:23] <gwicke> you wrote something about following productions swallowing
newlines, but I think we don't actually do that any more
[19:24] <gwicke> I'm quite optimistic that updating those flags much earlier
would work
[19:25] <subbu> yes, it could fix it.
[19:26] <subbu> you might be right reg. swallowing.  it was happening earlier.
but, not right now, after single-line mode and other fixes.

Change-Id: Ic1d8141c04eb54a59977d0ba87bcf06bafd421e0
2012-06-23 19:27:56 +02:00
Gabriel Wicke f731125804 Explain reasoning behind number of worker calculation
Change-Id: I92ae0bb6e02caef98b0d68de4424e775d8922651
2012-06-23 17:30:52 +02:00