Commit graph

2156 commits

Author SHA1 Message Date
Gabriel Wicke 9ddc863d89 Up entity name length limit even further
There are some really long names in
http://www.w3.org/2003/entities/2007xml/unicode.xml

Change-Id: I0138c9610bb288cd8f29e3600b8a21f932e7bcd9
2012-06-29 23:38:10 +02:00
Gabriel Wicke cf7f437966 Match named entities with up to eight chars
The longest entries in
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references.

Change-Id: I2c9f102fe6a905e339e12520d08c1b1b0a4002d8
2012-06-29 23:15:30 +02:00
Gabriel Wicke 370fb607c8 Insert separation between adjacent pres
Change-Id: I55aa649b4e076cae32b3c970d6384ab2ed4cdd6c
2012-06-29 23:05:06 +02:00
Gabriel Wicke 6c8dfa26fa Escape ampersands in entities from plain text DOM content
Change-Id: I0826077cf48b67e38a525090be66411c38d7b65f
2012-06-29 23:02:21 +02:00
Subramanya Sastry 5874d9a5f1 More thumb roundtripping fixes.
* Looks like I misled myself in commit 88fc91 -- that wikitext
  roundtripped perfectly because it went through the 'src' route
  because it was a thumbnail with an explicit image which doesn't
  go through renderThumb -- so, the serializer simply spit out the
  original 'src' string and hence perfect rt :).
* More whitespace preserving fixes in LinkHandler.
* Also changed resource value in the img tag to use the original
  filename rather than the normalized capitalized filename.
* 2 more parsertests rt -- now upto 400.

Change-Id: I144a6486dd9d07da8a74a68700fe96c78d192826
2012-06-29 00:30:13 -05:00
Subramanya Sastry ba6a304102 Prettified Wikitext Constants hash
* Something to be said for code alignment - easier on the eye!
* Maybe a good case for breaking mediawiki coding guidelines.
* But, happy to abandon commit if not useful. :)

Change-Id: I1133af488f572ac7f8727be9108e08e14c4e6420
2012-06-28 19:08:48 -05:00
Subramanya Sastry 88fc91a292 Next round of image roundtripping fixes
* Changed PrefixImageOptions so that thumb and thumbnail are
  distinct key-value pairs.  Without this fix, cannot distinguish
  between thumb=foo.jpg and thumbnail=foo.jpg
* Fixed link handler so whitespace is preserved around prefixed image
  options.
* Fixed figure handler to process the 3 different kind of image options:
  size, simple image options, and prefixed image options.
* There is a hack/fixme for "upright: aspect" prefixed image option
  which needs to be looked into.
* Still need to fix uppercasing of the image resource name.

With these fixes, the following wikitext roundtrips perfectly
(after newline breaks are removed)

[[Image:Foo.jpg|thumbnail = 'baby.jpg'|100x100px|center| alt =bbbbb|
upright=true|bottom|link='http://foo.bar'|
This is a [[Linked Caption]] in the image]]

Change-Id: I6606df56874c2b97f00f08cb6bbeaec9878167d3
2012-06-28 18:55:47 -05:00
Subramanya Sastry 11e7c1031a Created a constants object for extracting wikitext markup properties.
* For now, extracted image markup options out of the link handler.
* This info will also be used by the serializer.
* More properties/global constants can be moved into this structure
  over time.

Change-Id: I4cfbfd703f42e93fbad52b38b435f68d8a5c22ee
2012-06-28 17:45:17 -05:00
Translation updater bot 4d543c8949 Localisation updates from http://translatewiki.net.
Change-Id: I94ed36e57eb19d184f5212eae9178aafbd695474
2012-06-28 19:50:18 +00:00
Subramanya Sastry d9d584e1b8 Minor tweaks/fixes to LinkHandler
* Minor refactoring
* Cleared src in dataAttribs in renderThumb since we can serialize
  thumbs now (or at least we can once all bugs are fixed and missing
  pieces are handled).

Change-Id: If18865801cdd3d89c1477e68bfa3e13107c45b40
2012-06-28 13:14:52 -05:00
Gabriel Wicke 9f753d8009 Source-based round-tripping for behavior switches
Change-Id: I46d12d338314a8dbfdc9a8448a74680e67c3a720
2012-06-28 18:20:13 +02:00
Gabriel Wicke 39b82fc3fa Simple source-based round-tripping for category links
Change-Id: I5a8a03e74a95c6dceda432f0356cce6a3af77c67
2012-06-28 18:12:19 +02:00
Gabriel Wicke ff414ad825 Add generic source round-trip mode, and use it for plain images (for now)
Anything with data-gen="both" and dataAttribs.src defined serializes to
dataAttribs.src and drops its contents (if any). We can use this to round-trip
elements we don't properly parse or serialize yet. Without RDFa info, the
editor will not touch the contents after encountering data-gen="both".

Change-Id: Ia39e5fdd765c2c9b36f26313455685d29f118839
2012-06-28 17:44:26 +02:00
Gabriel Wicke 8976b66558 Merge "Fix round-tripping of invalid external links somewhat" 2012-06-28 15:30:31 +00:00
Gabriel Wicke e1a7d10063 Fix round-tripping of invalid external links somewhat
* Don't consider them for auto-numbered links
* Don't insert a trailing space if the content is empty

These links are still wrapped in nowiki on round-tripping since the
valid/invalid url determination is done in the LinkHandler and not the
Tokenizer as it is configuration-dependent. Not incorrect for rendering (and
perhaps easier to understand for humans too), but might still introduce a
dirty diff. We'll still need reconciliation / damage tracking in the end ;)

Change-Id: I959ebc1b7f81d110a1141bb38ba5ee97f52ebf96
2012-06-28 16:12:23 +02:00
Gabriel Wicke 4f94492f08 Merge changes I27bdc9c5,Ic09972bb
* changes:
  Update nowiki handling to latest spec; some fixes to it
  Default to two preceding newlines for headings for better readabilty
2012-06-28 14:02:00 +00:00
Gabriel Wicke 4dcd88fc5f Merge "Fix a crasher in unbalanced heading tokenization" 2012-06-28 13:57:29 +00:00
Gabriel Wicke 198e55a32b Update nowiki handling to latest spec; some fixes to it
346 round-trip tests are passing now (up from 343).

Change-Id: I27bdc9c5e010a13c2b4dddc6f263cbf9d3adac36
2012-06-28 14:57:05 +02:00
Gabriel Wicke 5b4cb03ee4 Default to two preceding newlines for headings for better readabilty
This only applies to newly created headings, so headings with a single newline
preceding them will be round-tripped that way.

Change-Id: Ic09972bbd25c3934b53f6fd3b5be5a0c3185c2af
2012-06-28 12:42:19 +02:00
Gabriel Wicke 17af335748 Fix a crasher in unbalanced heading tokenization
Example input:

=== foo ==

Old result:
http://www.mediawiki.org/w/index.php?title=VisualEditor:Test&diff=prev&oldid=554403

Change-Id: I0bc135884833607cedb62ec9c045310df3649dd8
2012-06-28 12:34:32 +02:00
Subramanya Sastry f995fc025a First pass serializing image thumbs.
* Collect all figure tokens and process them as a chunk
* This effectively mimics context-sensitive DOM walking,
  but since we need serialization supported on a token stream,
  we cannot use real DOM walking.  The current technique should
  also work on a token stream.
* There is a FIXME about the image filename being capitalized.
  This needs fixing in the parser or some other way of recognizing
  original unnormalized filenam.

Amended by gwicke:
* Build option list and join it with pipe to avoid stray trailing pipe
* Satisfy JSHint's weird preference to have '&&' and '||' at the end of the line

Change-Id: I1e5f6600f297fcdf81e3227a82ca3b71d4e97fc3
2012-06-28 11:29:10 +02:00
Gabriel Wicke 4e541223b0 Merge "Rename data-mw-gc to data-gen. Credit to James!" 2012-06-27 17:15:25 +00:00
Gabriel Wicke 4e86337a43 Merge "Add basic tsr on indent-pre end tag" 2012-06-27 17:11:38 +00:00
Gabriel Wicke 424a246b00 Rename data-mw-gc to data-gen. Credit to James!
Change-Id: Iacbe20b355ddf5f12fffb71ff4dd978ac4364928
2012-06-27 19:08:14 +02:00
Trevor Parscal 1067f84764 Merge "Removing some logging" 2012-06-27 16:42:34 +00:00
Trevor Parscal efe26d9be2 Merge "(bug 36201) Control-K triggers link inspector" 2012-06-27 16:41:51 +00:00
Christian Williams b9f6baba63 (bug 36201) Control-K triggers link inspector
Change-Id: I0ffd237ce51d1899d2151fb76243e818c5f5cfb8
2012-06-27 09:41:06 -07:00
Trevor Parscal 02e24d6a39 Merge "Bugzilla:37804 - Double bound events were causing double backspace bugs" 2012-06-27 16:20:43 +00:00
Trevor Parscal e1905341c1 Merge "Bugzilla:33093 - Shift-Enter splits at paragraphs instead of list items" 2012-06-27 16:20:08 +00:00
Gabriel Wicke df26663a3f Add basic tsr on indent-pre end tag
This is a zero-length tsr for now (and thus not 100% correct), but will do the
job for starttag / endtag range establishment

Change-Id: Iedd50ad319aa8d5916434fb6744deb04e031e456
2012-06-27 18:08:49 +02:00
Gabriel Wicke c02218c736 Merge changes Idfa5d6a8,I700142a5
* changes:
  Represent nowiki as span instead of meta
  Round-trip html entities and introduce data-mw-gc attribute
2012-06-27 16:07:48 +00:00
GWicke d4eb4ce741 Merge "Code cleanup and more newline fixes." 2012-06-27 13:26:22 +00:00
Subramanya Sastry 4d2a46fb44 Code cleanup and more newline fixes.
* Removed dead commented out code.
* Cleaned up newline handling in serializer some more.
* Now, onNewLine and onStartOfLine reflect serializer state
  more accurately.
* No implicit new lines for explicit html tags.
* 9 more roundtrip tests now green.

Change-Id: I9f640de2ae769c7472538fa687400dc8a40c2b2d
2012-06-27 15:23:22 +02:00
Gabriel Wicke a1d05976ce Merge "Small (and incomplete) fix to table cell tsr" 2012-06-27 12:45:39 +00:00
Gabriel Wicke 53451bfc50 Small (and incomplete) fix to table cell tsr
Change-Id: I14347939de32af698d7ce0b649165982908c49aa
2012-06-27 14:45:12 +02:00
Gabriel Wicke 7108ee985a Represent nowiki as span instead of meta
Change-Id: Idfa5d6a8ee7b2d17205779361ca69d075a79964d
2012-06-27 13:59:14 +02:00
Gabriel Wicke 0b9a420129 Round-trip html entities and introduce data-mw-gc attribute
297 round-trip tests are passing with this patch.

TODO:
* generalize data-mw-gc handling in the serializer for any tag
* use data-mw-gc="both" and data-mw.src: 'the wikitext' for round-tripping of
  wikitext structures, optionally with some presentational (but read-only)
  content
* use span and data-mw-gc="both" for nowiki

Change-Id: I700142a56818977c20c8c06e6a5f2e77a708d25e
2012-06-27 12:52:52 +02:00
Subramanya Sastry 1a504a5f54 Added tokenizer support for ----
Change-Id: Idc5519350d11ae91b2ec64553f847d56e22d63bb
2012-06-25 16:40:34 -05:00
Subramanya Sastry d5e6ec34aa Deleted dead PEG productions
Change-Id: I9b859f79f9900b3d320aa1ad0283a4b5ae6c4331
2012-06-25 13:17:01 -05:00
Translation updater bot 7fa2c97fc0 Localisation updates from http://translatewiki.net.
Change-Id: Ibe3268c865a843b5756f00fb396cac62cc37a09d
2012-06-24 20:07:18 +00:00
Translation updater bot 44e36f199a $COMMITMSG
Change-Id: Ieed1cb99b47c0032295ec1a259bb7b5f3d926ac3
2012-06-23 20:16:57 +00:00
Gabriel Wicke 08b5ed1a43 Use _inNewlineContext method instead of bare onNewline
This makes sure that we escape start-of-line syntax when needed, since
onNewline is often not yet set.

Discussion / background:
[19:18] <subbu> this will fix it, but, i think this is asking for another
minor refactoring of these flags ... because this is a subtle fix which means
it might be possible to make it clearer.  onNewline is one true in on
direction, i.e. if true, we are in a new line state, but if we are in a
newline context, onNewline is not true, which is why this new method is
needed.
[19:19] <subbu> i dont know if it is possible, but it seems like it shoudl be
possible.  but, something for later.
[19:20] <subbu> badly phraed.  "onNewline" ==> in new line context, but if in
new line context, onNewline may be false.
[19:20] <gwicke> we should perhaps update it as early as possible instead
[19:21] <subbu> i cannot today, but possible monday.  i am heading out in
about 15-30 mins.
[19:22] <gwicke> will need to check all conditions depending on it in
_serializeToken
[19:22] <subbu> oh, i misunderstood you :)
[19:22] <gwicke> and if there are cases where the onNewline / onStartOfLine
state could be reverted later
[19:23] <subbu> you were referring to the flag, i thought you meant we should
fix this sooner than later.
[19:23] <gwicke> yes, I wasn't terribly clear
[19:23] <gwicke> you wrote something about following productions swallowing
newlines, but I think we don't actually do that any more
[19:24] <gwicke> I'm quite optimistic that updating those flags much earlier
would work
[19:25] <subbu> yes, it could fix it.
[19:26] <subbu> you might be right reg. swallowing.  it was happening earlier.
but, not right now, after single-line mode and other fixes.

Change-Id: Ic1d8141c04eb54a59977d0ba87bcf06bafd421e0
2012-06-23 19:27:56 +02:00
Gabriel Wicke f731125804 Explain reasoning behind number of worker calculation
Change-Id: I92ae0bb6e02caef98b0d68de4424e775d8922651
2012-06-23 17:30:52 +02:00
Gabriel Wicke d4dc8d86d9 Entity-escape [<>] in text content
This should not really be needed if the tokenizer did not decode html entities
on the fly. It is still a quick way to make sure no htmlish content can be
inserted even with the current decoding.

The next step and proper fix is to make entity decoding either optional in the
tokenizer (flag-controlled), or move it to a later stage in the token
processing pipeline.

Change-Id: Ife093dcfb95113763dab5635b098c795d3550586
2012-06-23 17:06:10 +02:00
Subramanya Sastry 5f584909e1 Added documentation + minor code refactoring
* Renamed defaultOptions to initialState
* Got rid of unused state property
* Added comments explaining how state attributes
  and tag handler flags are used
* Refactored listItemHandler check into functions and
  added FIXME possible rewriting of that check.
* Protected serializeDOM in a try-catch handler to
  catch exceptions and output the exception to the console.

Change-Id: I3d351c06e4b86baeb5a55243b11dbfa9baca5bb7
2012-06-22 18:29:46 -05:00
Christian Williams a26708dd6e Removing some logging
Change-Id: I2876e56d2e3680d21877103618e59afec1c81ef9
2012-06-22 15:49:34 -07:00
Christian Williams 89e0f3d6ad Bugzilla:37804 - Double bound events were causing double backspace bugs
Change-Id: I589185d077e1efe6fb2c0457a290a8ac9ce8bceb
2012-06-22 15:39:43 -07:00
Christian Williams 122a31a021 Bugzilla:33093 - Shift-Enter splits at paragraphs instead of list items
Change-Id: Ie32e878cf9c71f7179143c631a01c0e2e671ed18
2012-06-22 15:05:35 -07:00
Translation updater bot 7a3d8fabdb Localisation updates from http://translatewiki.net.
Change-Id: Ia8bb7afcc50438516fe32cf3ae1f2d2235fb22de
2012-06-22 18:10:55 +00:00
Trevor Parscal 1ea3999e04 Fixed drop-down menu to match items more carefully
Change-Id: Ibd46861be243d2872f4edaa9a182d3931d4f9fab
2012-06-21 17:50:46 -07:00