wikimedia/mediawiki-extensions-VisualEditor

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/VisualEditor synced 2024-11-15 18:39:52 +00:00

Author	SHA1	Message	Date
Gabriel Wicke	d4dc8d86d9	Entity-escape [<>] in text content This should not really be needed if the tokenizer did not decode html entities on the fly. It is still a quick way to make sure no htmlish content can be inserted even with the current decoding. The next step and proper fix is to make entity decoding either optional in the tokenizer (flag-controlled), or move it to a later stage in the token processing pipeline. Change-Id: Ife093dcfb95113763dab5635b098c795d3550586	2012-06-23 17:06:10 +02:00
Subramanya Sastry	5f584909e1	Added documentation + minor code refactoring * Renamed defaultOptions to initialState * Got rid of unused state property * Added comments explaining how state attributes and tag handler flags are used * Refactored listItemHandler check into functions and added FIXME possible rewriting of that check. * Protected serializeDOM in a try-catch handler to catch exceptions and output the exception to the console. Change-Id: I3d351c06e4b86baeb5a55243b11dbfa9baca5bb7	2012-06-22 18:29:46 -05:00
Gabriel Wicke	cc1afb2ad3	Support dt/dd transitions in the middle of the stack Change-Id: I1d75caa7782d02a2c33413a078e99b17ccc4141c	2012-06-21 18:40:40 +02:00
Gabriel Wicke	b3bd2ffe8d	Fix definition list parsing and round-trip single vs. multi-line dt/dd * Removed murky ' :' -> ' :' replacement in tokenizer. This breaks four parser tests, and should be fixed in a token stream transformer or DOM postprocessor. This replacement clashes with round-tripping, and is not terribly important visually. * Added stx:row annotation to single-line dt/dd pairs and use it to preserve single-line syntax in the serializer. There is no attempt yet to support the addition of nested lists in an originally single-line dd. We'd need to look ahead in the serializer to support this. Perhaps the editor can simply drop data-mw in that case. * Switched default dt/dd serialization to multi-line. This supports all nested lists and multiple dds. * Don't close dls when switching from dt to dd or back in the token stream ListHandler. Overall 290 round-trip tests are passing now (up from 284, some due to  , some due to lists). The number of passing parser tests dropped slightly from 303 to 297 (or 301/295 on weekdays other than Thursday). Change-Id: I85ff40571833713388c6523e6a4ba2e94daa3807	2012-06-21 17:34:25 +02:00
Gabriel Wicke	e584e35ecb	Improve nested definition list serialization Basically only prefix all bullets if the serialization output is going to be in start-of-line context. The test for that is currently inline, but should perhaps be factored out to a method or state flag instead. We could alternatively consider to return the start-of-line prefix and let it be used in _serializeToken in case we end up in start-of-line context. This patch also fixes a newline issue on input like this: :d1 ::: d3 Both the list and list item handlers now set the startsNewline flag dynamically depending on the context, so that we don't depend on the suppression of newlines from list syntax by the singleLineMode any more. There is still an extra newline inserted between list items in the following example: ;t1 :d1 ;;t2 ::d2 This looks like a bug in the produced DOM and not in the serializer, since the outer definition list is closed and re-opened between d1 and t2. Change-Id: I78e3a1ef34cf9159d5a1e86fb64c774ff111e71d	2012-06-21 15:28:43 +02:00
Gabriel Wicke	ab286d6a59	Empty elements only use the start handler info Thus move the 'endsLine' attribute to the start section. Change-Id: I8490d866b84aa99205ca9e8e3ee137026fb18501	2012-06-21 10:30:11 +02:00
Gabriel Wicke	cf32b34b0a	First attempt at the definition list bug (work in progress) The main issue is that the bullets from dd/dt were not stored on the stack. I added a separate field for it in each stack entry, which now fixes the basic indent case without (afaik) breaking anything else. There are still some newline issues, and the need to handle the single-line dd/dt vs. the multi-line variant. Change-Id: I65939c05e2c5dde0789bf8aefd7651161a2f137c	2012-06-20 23:51:39 +02:00
Gabriel Wicke	344fac19b5	Improve preformatted text handling * Don't escape html-syntax pre content for now; Should parse this with a new pre content production later (which needs to be split out of the regular pre production in the tokenizer) * Protect indent-pre content from start-of-line syntax escaping * Preserve extra leading spaces in the tokenizer * Two more (now 284) round-trip tests are passing Change-Id: I199b89c0ee7fae12546df10c1b5117c97caccac5	2012-06-20 19:28:34 +02:00
Gabriel Wicke	6054a4aa14	Clean up serializer newline handling a bit further Queued newlines and new trailing newlines were not cleanly separated so far, which caused some trailing newlines to be consumed for needed leading newlines. This change fixes several newline bugs, taking the number of passing round-trip tests from 276 back up to 282. Change-Id: Idb4706e15ce71e63085033e3f3f29557915c11a8	2012-06-20 16:31:39 +02:00
Gabriel Wicke	2426901e5b	Fix definition lists with multiple dds Fixed a bug in the list handler for multiple dds in a definition list. Also fixed a few JSHint warnings. Change-Id: I3e883786698a9521347fc2a5e6420646318813a7	2012-06-20 15:34:20 +02:00
Gabriel Wicke	c9d3db8f34	Fix a few round-tripping and list issues At least partly fixes some bugs in http://www.mediawiki.org/wiki/Parsoid/Bug_test_cases. 276 round-trip tests are passing. * Fixes http://www.mediawiki.org/wiki/Parsoid/Bug_test_cases#extra_newline_after_empty_dd, except for lost newline in 'working' example before next heading * Fixes newlines in definition lists (http://www.mediawiki.org/wiki/Parsoid/Bug_test_cases#dd_indentation etc), but does not fix missing / incorrect bullets for those Change-Id: I21f66e265e43e1d1a4c7da70984a9984b8e6d0dd	2012-06-20 13:53:47 +02:00
Gabriel Wicke	b94cad47dc	Fix single-line mode for nested lists Known issue: breaks round-tripping of :;;;::. That test is normally disabled anyway, so we can fix it later. Change-Id: I7954271311bfb7e71caae59d8177e3f04a9ebbca	2012-06-20 01:48:52 +02:00
Gabriel Wicke	33dc9abb0d	Clean up sHref handling a bit * sHref is now always a string * fixes crasher when sHref is not set Change-Id: If5756948ac6bc26c2d7c04d970b5aba5331cb8bb	2012-06-20 00:34:57 +02:00
Gabriel Wicke	e117f09362	Wikitext escaping and quite complete source range tracking * Started to add more complete tag source range (tsr) annotations to most start / empty tags. These replace the old sourcePos and sourceTagPos annotations, and look more promising for general round-tripping than block source ranges (bsr). See http://www.mediawiki.org/wiki/User:GWicke/Parsoid_source_ranges for some notes on this. * Added an escapeWikitext method in the serializer that tokenizes supposedly text-only content from the DOM with the tokenizer and wraps runs of returned non-text tokens into nowiki tags. The source corresponding to non-text tokens is retrieved using the tsr annotations. * Removed old (unused) table productions to avoid confusion. * 276 round-trip tests are passing, vs. 283 without escaping. Known issues: * harmless for now, can be improved later: urllinks in external link captions are wrapped in nowiki. Example HTML: <a rel='mw:extLink' href="http://example.com">http://example2.com</a> * some start-of-line syntax in wiki-syntax preformatted blocks might be wrapped into nowiki when that would not really be needed. Example HTML DOM: <pre> * foo * bar </pre> Change-Id: I01c34aedd5c566614d36924add47a6a960e91987	2012-06-19 23:36:44 +02:00
Gabriel Wicke	5fbc80321b	Improve newline handling for comments and nowiki/noinclude tags * Added a newlineTransparent flag to handlers that prevents changes to the onNewline status, so that content following it is still considered to be in start-of-line context. This fixes a few rt tests where a comment or nowiki tag is at the start of the line, and following content should end up on the same line. * 283 rt parser tests are now passing. Change-Id: Ie58dcb9e5e9af9000fff61c2e1db5d8649ffc3f6	2012-06-18 22:56:41 +02:00
Gabriel Wicke	97fb2d3c0d	Serializer refactoring * tokens are not modified any more (they are supposed to be immutable) * handler info is now split in start / end objects and potentially a 'make' method; added more flags to govern the newline behavior of different tags * added a generic singleLine mode for single-line syntactical environments * switched the web service to line-based diffs to avoid issues when diffing the round-trip results of [[:en:Programming language]] * 280 round-trip tests are passing now Change-Id: I74b8ffbf69643c5d6e5ec852ec58e680c9018901	2012-06-18 21:52:15 +02:00
Subramanya Sastry	f1d03f325e	Couple minor bug fixes in serializer Change-Id: I961e2f4e7609cc6b264eaf494b39497401cdc55c	2012-06-17 22:41:14 -05:00
Gabriel Wicke	41d8212573	Emit SpaceCharacters token for HTML5 'space' chars HTML5 defines space characters as [ \r\n\t\f] in http://www.whatwg.org/specs/web-apps/current-work/multipage/common-microsyntaxes.html#space-character. It treats these specially in a few contexts. As an example, the foster parenting algorithm does not apply to space characters. As a result, this change fixes the round-tripping of spaces between table tags, which were previously moved before the table. Change-Id: I32ab29275a9f824fc66d8286638eb42748cfc9a5	2012-06-17 16:16:07 +02:00
Subramanya Sastry	a229f72833	First pass redoing serialization code to handle newline requirements from Parsoid HTML output as well as VE HTML output. There are still some newline related failures from parser tests that needs fixing, but this is getting close. So committing for now so other eyes can make the bugs shallow :). Change-Id: Ia6a218ee9fb3e18fe0573c89ff3a4236779e1e64	2012-06-16 10:09:06 -05:00
Subramanya Sastry	3f92f39397	Removed newline normalization between paragraphs. Change-Id: Ifd55db73c8fe2b3e952066a75cba2f8e13c58430	2012-06-14 18:51:56 -05:00
Subramanya Sastry	54f12d1807	Fix for href handling. - Check if href for links has the wgScriptPath prefix before attempting to strip it from the href. Change-Id: I844151ef7317476668d1306b96a2aec5a56fd0f1	2012-06-14 18:35:22 -05:00
Subramanya Sastry	c0fc9e9a97	Updated newline handling around lists and nested lists. - Something like this: <ul><li>1</li><li>2<ul><li>2.1</li><li>2.2<ul><li>2.2.1</li><li>2.2.2</li></ul></li><li>2.3</li></ul></li><li>3</li></ul> now serializes properly to: 1 2 2.1 2.2 *2.2.1 2.2.2 2.3 3 So does this form which is what the above wikitext parses to: <ul><li>1 </li><li>2 <ul><li>2.1 </li><li>2.2 <ul><li>2.2.1 </li><li>2.2.2 </li></ul></li><li>2.3 </li></ul></li><li>3 </li></ul> - Lists (and nested lists) are not entirely newline-insensitive. They still depend on newlines between lists. The opening <ul> tag for non-nested lists should always start on a new line. So, for example, <ul><li>foo</li></ul><ul><li>bar</li></ul> will serialize to: foo bar which is incorrect. But, <ul><li>foo</li></ul> <ul><li>bar</li></ul> will correctly serialize to: foo bar Change-Id: I13a0290368574865957bcf57aebab488fbbb7026	2012-06-14 17:09:59 -05:00
Subramanya Sastry	8978e406fc	Minor code refactoring Change-Id: Ib7f70a3ac42e3d5a5985e9a9bcffa313bdac289b	2012-06-14 15:18:53 -05:00
Subramanya Sastry	d7e83c4e2b	Fixed/updated newline handling for <p> tags - More pieces are now simplified and all(?) newline handling is now centralized in the serializeToken function. - This commit fixes bugs in rt-ing some code snippets ---------- Ex 1: foo<p>bar</p>baz ---------- - This commit fixes bugs serializing VE generated html ---------- Ex 2: <p>foo</p><pre>bar</pre> ==> foo\n bar ---------- - But, this round of fixes introduces RT failures for certain code examples in parserTests.txt. In all these failing cases, inline text/html is embedded within a generated <p> tag during parsing. If these generated <p> tags can have a "gc:1" attribute added to them, we can properly serialize them to the original form. ---------- Ex 3: foo<pre>bar</pre> Parsed HTML: <p>foo</p><pre>bar</pre> ---------- Note how this parsed HTML is identical to what the VE outputs in Example 2 above. So, without the gc:1 attribute, we now have conflicting requirements on the example same HTML. This increases confidence in the correctness of my commit here. Change-Id: I86beadec91c445a7f8a6d36a639b406697daa0a2	2012-06-14 14:59:18 -05:00
Subramanya Sastry	13e03ec1d7	Refix <pre> serialization. - Effectively reverted fix from `f882a65153` and added a new fix. Change-Id: I8b81e26525a5f1a22acaf2c7067f2dcd9b962818	2012-06-14 13:10:02 -05:00
Subramanya Sastry	51227f2a4a	Improved, simplified newline handling in wikitext serializer. - Eliminated newline handling from several places in code and mostly isolated it to serializeToken thus simplifying newline handling logic. - Fixing some bugs in the process: # of green roundtrip tests went up by 5 (294 --> 299) but actually introduced failures on a few originally succeeding tests (additional leading/trailing newlines on the entire test output). - Added bonus: made list serializing (mostly) insensitive to newlines between tags. So, all the following DOM serialize identically to the following wikitext: foo bar ---------- <ul><li>foo</li><li>bar</li></ul> ---------- <ul> <li>foo</li> <li>bar</li> </ul> ---------- <ul> <li> foo </li> <li> bar</li> </ul> ---------- Change-Id: I76be56c4b2789039dff5f47de4659746882e45d6	2012-06-14 00:10:51 -05:00
Subramanya Sastry	bf0f5d1b7e	Minor code cleanup Change-Id: Ic5d99b6c483841310b0c295c1c30246f907455b4	2012-06-13 13:47:26 -05:00
Subramanya Sastry	23ec054013	Fixed round-tripping of interwiki links. Change-Id: If0427b9865b3e9cf8c0ad0b4efaebc9f9f7fb865	2012-06-13 13:39:18 -05:00
Subramanya Sastry	445780b4d3	Revert default tokenization result from null to '' * As part of an earlier fix, I had changed default value of 'res' to null instead of ''. But, this was potentially buggy because the previous check was (res !== '') which could be triggered by return values of handlers. By changing the check to null, I was effectively changing the code paths for those handlers that returned ''. Change-Id: I2302023be7422ce4fb384ff5a50fe53fa7732855	2012-06-13 11:53:05 -05:00
Subramanya Sastry	cfe94eed1f	Minor code refactoring Change-Id: Iec3cb4d83d16174371f0b1f3f23b1056aeed458e	2012-06-13 09:46:34 -05:00
Subramanya Sastry	f882a65153	Fix serialization of <pre> tags Change-Id: I7ae95e7ec06167d0c1bfdaba3d0c67d941043299	2012-06-12 13:54:35 -05:00
Subramanya Sastry	727c2119bb	Refactored serializeToken method and added special-case handling of paragraphs in lists. * We need to look at other special-case handling requirements of html tags in lists (and other contexts like tables). Change-Id: I84b8402d90a186c9075c2d45263c94377312927a	2012-06-11 17:55:41 -05:00
Gabriel Wicke	1ca586e5f1	Improve interwiki config a bit * Moved wikipedia default prefixes to environment * Added 'addInterwiki' method * Adjusted link handling normalizeTitle to reflect this Change-Id: If5b2314cc36346b6da8649ed410457a612d80a22	2012-06-07 12:30:16 +02:00
Gabriel Wicke	2fa5baabbb	Make it easier to configure the default wiki, and add support for mediawiki.org * mw:Foo now loads pages from mediawiki.org * The default prefix still is 'en'. You can switch this to 'mw' in ParserService.js. Change-Id: I1208667e6114bd711b7988a8b3adb32ffab70969	2012-06-07 11:50:40 +02:00
Subramanya Sastry	b665a2558f	Fixed bugs handing/transforming quotes - Three bugs that were messing up quote transformations. - Now, the following cases are handled properly: * ''foo''' * '''foo'' * ''foo'''' * ''''foo'' These tests (and other quote tests) have to be added to core parser tests file. - One more parser test green. Change-Id: I4f93e8910639f546bfc9304becab17d26d5529de	2012-06-07 01:37:45 -05:00
Gabriel Wicke	350e700d8f	Add core-upgrade Change-Id: I5ad0955e8272d376f009f89461bed310978b25e4	2012-06-06 15:58:17 +02:00
Gabriel Wicke	a146fcb8ad	Improve the handling of newlines for round-tripping An improvement, but there still are some extra newlines inserted after paragraphs. Example input: ------- Foo: {\| \|foo \|} ------- Extra newlines are inserted after the Foo: and the foo in the table. They are not fed as tokens or text to the tree builder, so there is likely a bug in the html5 library or JSDom. Change-Id: I83eb6180e3cd1c4e7f9b15b31d339e1d32bccd3f	2012-06-06 10:17:03 +02:00
Gabriel Wicke	59fc634cce	Update patched html5 library to version 0.3.8 Change-Id: I321d9a58ea1af33842a606fc8706938093a8330f	2012-06-06 10:17:03 +02:00
Subramanya Sastry	fe6f289486	Merge changes I5d98c704,Ib8d3de75 * changes: A few tweaks to link round-tripping Use word diff if --color is enabled	2012-06-05 16:04:23 +00:00
Subramanya Sastry	b095db4303	Simpler implementation of flatten. * Possibly more efficient under heavy GC load -- untested. * No change in time and memory use for single file parsing. Change-Id: Id2f3f65cc0e5f38ed968bbda60b97e46523e700e	2012-06-05 10:47:46 -05:00
Gabriel Wicke	dc3168cf6d	A few tweaks to link round-tripping * Moved the tail attribute to the second attribute (a bit cleaner) * Disallowed newlines in the tail production * Improved the selection of round-tripped href vs. generated content vs. href in the serializer * renamed state.linkTail to state.dropTail Change-Id: I5d98c704b6ea566011e22237786f8da17548570f	2012-06-05 17:26:27 +02:00
Gabriel Wicke	d16032ae9a	Track html syntax in block_tag production Change-Id: If560523644f007485809762f12216e08fb3c3ed3	2012-06-05 12:39:56 +02:00
Gabriel Wicke	cc96ff4f5e	Very basic interwiki support Pages titles with a wikipedia interwiki prefix now load the page from corresponding Wikipedia. Links in a page then stay within the given language. Note that Parsoid currently makes no effort to recognize localized namespaces, so it won't render media files, categories etc correctly. Change-Id: I7bc4102e81a402772ea23231170734d580ea15b9	2012-06-05 11:19:58 +02:00
Gabriel Wicke	92f753a365	Pre and link target improvements * Don't explicitly add the newline in the pre, as we preserve newline tokens now. This avoids doubling of newlines when round-tripping. * Use the sHref attribute even if the href contains spaces. Change-Id: I8bec8fbfd6a7836bf2e5eec20869a0edd95c93b6	2012-06-04 14:03:05 +02:00
Gabriel Wicke	ee2ddbd3cb	Fix list handler issues Lists interrupted by non-empty lines would not close the list properly. Register for any token instead of just for newlines and close the list if no listItem follows the newline. Change-Id: I1743901e3db541bbeda78d17707db943e6ceb9b9	2012-06-04 13:38:43 +02:00
Gabriel Wicke	f821eac102	Optionally round-trip sHref in data-mw If the href would not denormalize, add a copy of the original href in data-mw and use it to preserve non-conventional capitalization etc. Change-Id: Ifef50eec7343b0e6b0ba66b6d19a8a3e8c9f8001	2012-06-04 12:28:05 +02:00
Gabriel Wicke	e0809209ec	Don't set the data-mw attribute if the object is actually empty. Change-Id: I984f1b44bba67d7a9f1a709738d14c0ee02f69a9	2012-06-04 12:26:03 +02:00
Gabriel Wicke	2774e5aa6c	Actually replace all underscores in wikilink target Change-Id: I633f8d6e4f639aff90fd456600376b7c6515fd50	2012-06-04 11:48:59 +02:00
Gabriel Wicke	3f2c72f920	Fix padleft / padright (mis)use as substr Change-Id: I0645e11c8ef8b550ad35300d1904788940fc748a	2012-06-04 11:30:45 +02:00
Gabriel Wicke	4533c274ca	Fix a crasher in the serializer A tail containing regexp syntax (a ? in [[:en:Main Page]]) would crash the serializer. Use substr instead. Change-Id: I8519aec9c07dfe31893d676b1c936a42d2af74a0	2012-06-04 00:00:54 +02:00

1 2 3 4 5 ...

415 commits