wikimedia/mediawiki-extensions-VisualEditor

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/VisualEditor synced 2024-11-15 10:35:48 +00:00

Author	SHA1	Message	Date
Gabriel Wicke	344fac19b5	Improve preformatted text handling * Don't escape html-syntax pre content for now; Should parse this with a new pre content production later (which needs to be split out of the regular pre production in the tokenizer) * Protect indent-pre content from start-of-line syntax escaping * Preserve extra leading spaces in the tokenizer * Two more (now 284) round-trip tests are passing Change-Id: I199b89c0ee7fae12546df10c1b5117c97caccac5	2012-06-20 19:28:34 +02:00
Gabriel Wicke	c9d3db8f34	Fix a few round-tripping and list issues At least partly fixes some bugs in http://www.mediawiki.org/wiki/Parsoid/Bug_test_cases. 276 round-trip tests are passing. * Fixes http://www.mediawiki.org/wiki/Parsoid/Bug_test_cases#extra_newline_after_empty_dd, except for lost newline in 'working' example before next heading * Fixes newlines in definition lists (http://www.mediawiki.org/wiki/Parsoid/Bug_test_cases#dd_indentation etc), but does not fix missing / incorrect bullets for those Change-Id: I21f66e265e43e1d1a4c7da70984a9984b8e6d0dd	2012-06-20 13:53:47 +02:00
Gabriel Wicke	e117f09362	Wikitext escaping and quite complete source range tracking * Started to add more complete tag source range (tsr) annotations to most start / empty tags. These replace the old sourcePos and sourceTagPos annotations, and look more promising for general round-tripping than block source ranges (bsr). See http://www.mediawiki.org/wiki/User:GWicke/Parsoid_source_ranges for some notes on this. * Added an escapeWikitext method in the serializer that tokenizes supposedly text-only content from the DOM with the tokenizer and wraps runs of returned non-text tokens into nowiki tags. The source corresponding to non-text tokens is retrieved using the tsr annotations. * Removed old (unused) table productions to avoid confusion. * 276 round-trip tests are passing, vs. 283 without escaping. Known issues: * harmless for now, can be improved later: urllinks in external link captions are wrapped in nowiki. Example HTML: <a rel='mw:extLink' href="http://example.com">http://example2.com</a> * some start-of-line syntax in wiki-syntax preformatted blocks might be wrapped into nowiki when that would not really be needed. Example HTML DOM: <pre> * foo * bar </pre> Change-Id: I01c34aedd5c566614d36924add47a6a960e91987	2012-06-19 23:36:44 +02:00
Gabriel Wicke	a146fcb8ad	Improve the handling of newlines for round-tripping An improvement, but there still are some extra newlines inserted after paragraphs. Example input: ------- Foo: {\| \|foo \|} ------- Extra newlines are inserted after the Foo: and the foo in the table. They are not fed as tokens or text to the tree builder, so there is likely a bug in the html5 library or JSDom. Change-Id: I83eb6180e3cd1c4e7f9b15b31d339e1d32bccd3f	2012-06-06 10:17:03 +02:00
Subramanya Sastry	fe6f289486	Merge changes I5d98c704,Ib8d3de75 * changes: A few tweaks to link round-tripping Use word diff if --color is enabled	2012-06-05 16:04:23 +00:00
Subramanya Sastry	b095db4303	Simpler implementation of flatten. * Possibly more efficient under heavy GC load -- untested. * No change in time and memory use for single file parsing. Change-Id: Id2f3f65cc0e5f38ed968bbda60b97e46523e700e	2012-06-05 10:47:46 -05:00
Gabriel Wicke	dc3168cf6d	A few tweaks to link round-tripping * Moved the tail attribute to the second attribute (a bit cleaner) * Disallowed newlines in the tail production * Improved the selection of round-tripped href vs. generated content vs. href in the serializer * renamed state.linkTail to state.dropTail Change-Id: I5d98c704b6ea566011e22237786f8da17548570f	2012-06-05 17:26:27 +02:00
Gabriel Wicke	d16032ae9a	Track html syntax in block_tag production Change-Id: If560523644f007485809762f12216e08fb3c3ed3	2012-06-05 12:39:56 +02:00
Gabriel Wicke	92f753a365	Pre and link target improvements * Don't explicitly add the newline in the pre, as we preserve newline tokens now. This avoids doubling of newlines when round-tripping. * Use the sHref attribute even if the href contains spaces. Change-Id: I8bec8fbfd6a7836bf2e5eec20869a0edd95c93b6	2012-06-04 14:03:05 +02:00
Gabriel Wicke	ece2b0f810	Tokenizer backtracking cache bug fix and memory savings * The state of syntax stops is now properly included in the cache key for the tokenizer-internal backtracking cache. This fixes some mis-parses when re-parsing a bit of text with different flags. * Clear the backtracking cache after each toplevelblock. This drops the peak memory usage when expanding [[:en:Barack Obama]] from ~380M to ~110M. Change-Id: Icdb879cae5907e4595903dd6acba2e686e8c2e4b	2012-06-01 12:53:49 +02:00
Gabriel Wicke	c5d7e01944	Another tokenizer robustness improvement This patch fixes a tokenizer syntax error encountered on [[:en:Template:JacksonvilleWikiProject-Member]] and [[:en:Template:Infobox former country]] by allowing optional whitespace before start-of-line template syntax. Change-Id: Ic214a731de58bf766e51f23d5e24ea2ce6788f58	2012-05-30 18:38:23 +02:00
Gabriel Wicke	a133768781	Don't eat '}}' in generic attributes and similar productions This fixes some syntax errors, at least one in Template:Geobox. Change-Id: I32338febe25d0833c1d9bc4de293cd15b4cbb7be	2012-05-30 17:37:10 +02:00
Gabriel Wicke	36084c5d93	Preserve original newlines in HTML and serialization 254 round-trip tests (up from 184) are now passing. Also: * tweaked runtests.sh slightly (use less -R instead of -r). * made sure the EOFTk is preserved in phase 3 transforms Change-Id: I1de22186bdb78e52019370e43f096877005b8f5a	2012-05-29 23:29:03 +02:00
Gabriel Wicke	b2adee0ae7	Basic rt support for indent pre variant * Added a generic stx_v 'syntax variant' round-trip attribute * For pre, use stx:'html' vs. no syntax annotation. This might not be 100% safe for arbitrary html input, so we might want to flip this to stx:'wiki' later. * 181 round-trip tests passing Change-Id: If6080917a3a7c069066db3db60efe59b1f6c28d8	2012-05-25 18:55:38 +02:00
Gabriel Wicke	a31ccaabe4	Support definition lists with empty definition Change-Id: I81c39a7e49f2ea7ce32cdd3600caeb5eb9f50d84	2012-05-25 15:40:32 +02:00
Gabriel Wicke	39c6f42879	Link round-tripping and other improvements * Changed RDFa for links according to http://www.mediawiki.org/wiki/Parsoid/RDFa_vocabulary * Added basic support for internal/external link serialization * Moved numbering of external links from tokenizer to LinkHandler * Added round-tripping for generic HTML tags * Replaced nowiki tag with <meta typeOf="mw:tag" content="nowiki"> and <meta typeOf="mw:tag" content="/nowiki"> for now. * 154 round-trip tests passing (node parserTests.js --roundtrip). Change-Id: I16c4db21b1b543ee57c73e569c83025b64664542	2012-05-22 13:36:06 +02:00
Gabriel Wicke	7e21b7380a	Merge "Round-trip nowiki"	2012-05-21 17:16:56 +00:00
Gabriel Wicke	fb7d5418a5	Round-trip nowiki Change-Id: I5f7e6a43f5fdc1708ee710b2a601b20db733452c	2012-05-21 18:06:09 +02:00
Gabriel Wicke	a6610e52c2	Serializer and table round-tripping improvements * added stx: 'html' round-trip information for html tags * added t_stx: 'row' info for row-wise table wiki syntax, and support for it in the serializer * the first table row is implicit in wikitext * renamed lastToken to prevToken in serializer * strip first newline in an initial chunkCB Change-Id: I014b046539d1b674d830551c5fd1b74a67f81993	2012-05-21 14:59:53 +02:00
Gabriel Wicke	e2815b516c	Start to handle links Change-Id: I1fb975910651820fd889d77152562fd4fbcb5db8	2012-05-17 14:32:56 +02:00
Gabriel Wicke	d918fa18ac	Big token transform framework overhaul part 2 * Tokens are now immutable. The progress of transformations is tracked on chunks instead of tokens. Tokenizer output is cached and can be directly returned without a need for cloning. Transforms are required to clone or newly create tokens they are modifying. * Expansions per chunk are now shared between equivalent frames via a cache stored on the chunk itself. Equivalence of frames is not yet ideal though, as right now a hash tree of unexpanded arguments is used. This should be switched to a hash of the fully expanded local parameters instead. * There is now a vastly improved maybeSyncReturn wrapper for async transforms that either forwards processing to the iterative transformTokens if the current transform is still ongoing, or manages a recursive transformation if needed. * Parameters for parser functions are now wrapped in abstract Params and ParserValue objects, which support some handy on-demand value expansions. Keys are always expanded. Parser functions are converted to use these interfaces, and now properly expand their values in the correct frame. Making this expansion lazier is certainly possible, but would complicate transformTokens and other token-handling machinery. Need to investigate if it would really be worth it. Dead branch elimination is certainly a bigger win overall. * Complex recursive asynchronous expansions should now be closer to correct for both the iterative (transformTokens) and recursive (maybeSyncReturn after transformTokens has returned) code paths. * Performance degraded slightly. There are no micro-optimizations done yet and the shared expansion cache still has a low hit rate. The progress tracking on chunks is not yet perfect, so there are likely a lot of unneeded re-expansions that can be easily eliminated. There is also more debug tracing right now. Obama currently expands in 54 seconds on my laptop. Change-Id: I4a603f3d3c70ca657ebda9fbb8570269f943d6b6	2012-05-15 17:05:47 +02:00
Adam Wight	0a7f0b7630	List markup is created during the sync23 phase. This makes it possible to transclude list items from a template. Note: "5 quotes" test is broken by this patch, it appears that ListHandler newline processing is changing some state which mysteriously affects the QuoteTransformer. This is ominous, hopefully there's a simple explanation... gwicke: fix a bug in tokenizer triggered by definition lists like this: **; foo : bar Change-Id: I4e3a86596fe9bffcbfc4bf22895362c3bf742bad	2012-05-08 11:39:36 +02:00
Gabriel Wicke	909633ea08	Improve template / tplarg precedence in tokenizer Change-Id: If9b24b42ea223e0f30f906a83496d73ec60c4a0d	2012-05-04 13:17:06 +02:00
Gabriel Wicke	30a83d7fd7	Accept wikilink parameters with dangling equal ('\|arg=\|') Change-Id: Ib4f6d186da2a74522b17c377dac5c9a7de7e5861	2012-04-27 11:35:00 +02:00
Gabriel Wicke	1d70e7b81c	Disable preformatted text from indents in template args Change-Id: I84144d3fab6541ed264d9b092806c8bf9de6e8b2	2012-04-27 10:45:08 +02:00
Gabriel Wicke	3be4992782	'Obama finally expands' ;) Misc fixes and documentation updates * [[:en:Barack Obama]] can now be expanded in 77 seconds using 330MB RAM, while it would prevously run out of RAM after ~30 minutes. Wohoooo! The token transform framework rework really paid off. * 303 parser tests are passing in the new record time of 5.5 seconds. Two more tests are passing since these tests expect the day of the week to be Thursday. Won't be the case tomorrow. Change-Id: I56e850838476b546df10c6a239c8c9e29a1a3136	2012-04-26 18:18:08 +02:00
Gabriel Wicke	8368e17d6a	Biggish token transform system refactoring * All parser pipelines including tokenizer and DOM stuff are now constructed from a 'recipe' data structure in a ParserPipelineFactory. * All sub-pipelines of these can now be cached * Event registrations to a pipeline are directly forwarded to the last pipeline member to save relatively expensive event forwarding. * Some APIs for on-demand expansion / format conversion of parameters from parser functions are added: param.to('tokens/expanded', cb) param.to('text/wiki', cb) (this does not work yet) All parameters are additionally wrapped into a Param object that provides method for positional parameter naming (.named() or conversion to a dict (.dict()). * The async token transform manager is now separated from a frame object, with the frame holding arguments, an on-demand expansion method and loop checks. * Only keys of template parameters are now expanded. Parser functions or template arguments trigger an expansion on-demand. This (unsurprisingly) makes a big performance difference with typical switch-heavy template systems. * Return values from async transforms are no longer used in favor of plain callbacks. This saves the complication of having to maintain two code paths. A trick in transformTokens still avoids the construction of unneeded TokenAccumulators. * The results of template expansions are no longer buffered. * 301 parser tests are passing Known issues: * Cosmetic cleanup remains to do * Some parser functions do not support async expansions yet, and need to be modified. Change-Id: I1a7690baffbe8141cadf67270904a1b2e1df879a	2012-04-25 16:51:36 +02:00
Gabriel Wicke	c688b039de	Collected tweaks * less verbose logging in noinclude processing and template expansion * Give priority to the processing of templates transcluded from transclusions to get closer to depth-first processing. This serves to minimize memory usage from queued-up tokens. * Increase the maximum outstanding requests per template retrieval. 10000 amazingly proved too low a limit on some big pages. * Only process a single template request callback at a time for now * Add a debug print in the treebuilder wrapper * Don't treat multiple comments on a single line as a single comment to match the PHP parser's behavior Change-Id: I9a86b6d7bec3b9e1f17415daf1bf74170240721a	2012-04-16 15:47:03 +02:00
Gabriel Wicke	efd4c026ea	Disallow < and > in external link urls Change-Id: Id865c3d46b33b182bb5b244e77e815c0afd7fa49	2012-04-16 15:36:56 +02:00
Gabriel Wicke	df050e4481	Convert external link syntax stops to stack Eat unbalanced external link parts within template parameters. This does not produce the same output as the PHP parser (try echo '{{YouTube}}' \| node parse.js), but preserves a level of sanity. Need to check how common this is for external links. If it is rare enough, moving the ']' after the parser function manually would fix the rendering for the YouTube case. Change-Id: I597d808efff36baa22191e7946a0061cc31120e8	2012-04-13 11:08:42 +02:00
Gabriel Wicke	bff43938f6	Support noinclude/includeonly/onlyinclude in attributes Fun test case: {\| \|-<includeonly> foo </includeonly> \|Hello \|} Change-Id: I353bb287d3967ade549fbcb4ae64511a1f1f7e36	2012-04-11 17:37:25 +02:00
Gabriel Wicke	5a33099875	Improve template tokenization in template arguments Taxobox tables now render pretty much correctly. Change-Id: I5a0564138ff0c688d8a5a69b7867646fd3763946	2012-04-10 16:40:49 +02:00
Gabriel Wicke	dbdd320348	Improve parameter tokenization support especially for table rows Change-Id: I961d69e228b96adc69ea9acb3733d13f5898602d	2012-04-05 16:00:26 +02:00
Gabriel Wicke	7a35e5db16	Remove behaviors var in tokenizer, now handled in token handler Change-Id: I68eeff3f05ce29c13e347c2cd7ea6519e58b0e03	2012-04-04 21:17:29 +02:00
Adam Wight	a85ed36efa	"magic words" are tokenized and used to set parser.environment flags behavior switches are converted to tokens which set parser.environment flags during the async transformation stage. The next step would be for handlers in the sync23 stage to generate the TOC, section edit links, and so on according to these directives. No tests written, because the switches are consumed and don't appear in rendered html. We can test the magic word layout controls individually, once they're implemented. Another small change was to store option flags directly in the environment object, not that it makes much difference. Change-Id: I863fbf4be1a17d2f6c31158298dd301f19ae1137	2012-04-04 11:25:29 -07:00
Gabriel Wicke	e3a745a024	Improvements for template / -argument precedence; support for empty params Change-Id: Id0894ccbedfa47fa3658817ca65119a2af76be3e	2012-04-04 16:29:47 +02:00
Gabriel Wicke	2037215185	Disallow '[' in generic attribute names This avoids interpreting something like ! [[foo\|bar]] as <th [[foo=''>bar]]</th>. Change-Id: If59708fa90eb0117a15b2b6446890d1ae19a857c	2012-04-04 14:31:11 +02:00
Gabriel Wicke	f588d2a7aa	Fix table headings in template parameters Change-Id: Icdfc5655968fc845230ad7638124309d6b8c1ada	2012-04-04 12:54:34 +02:00
Gabriel Wicke	b8d980a229	Don't eat newline / space in template parameters ..so that block_lines can match. Change-Id: I4c464dc44249f40e4aa280df35fb726bfce3a745	2012-04-04 11:22:31 +02:00
Gabriel Wicke	47de122a95	Improve support for table / template interaction Match pairs of {{!}} or \| for template productions, but not a mix of the two. Example: {{#if:1\|{{!}}- {{!}} {{#if:1\|style="color: red"{{!}}\|}} }} Note that the style parameter ends up as the key of an empty-valued attribute on the table cell currently. Change-Id: I5f9357dd1645ef97b0af89f32e8d92ae49218c72	2012-04-03 18:48:35 +02:00
Gabriel Wicke	0fe062fbe1	JSHint cleanups and parser function argument handling improvements Parser functions which only accept positional arguments now return both the key and value of arguments. Complete attributes (key and value) for templates and the like from parser functions are not yet supported though. Change-Id: I3f81bb35acd27186222ce6d5217e820042527c01	2012-04-03 18:10:48 +02:00
Gabriel Wicke	5248fd31e8	Magic links and behavior switch tokenization by Ori Livneh Commit first patch by Ori, lets 288 parser tests pass. Yay! Change-Id: Iac8c3d1ad1984900350b20f7e725c40618a1e8ba	2012-04-02 17:31:34 +02:00
Gabriel Wicke	5ef2074251	Enable support for block-level wiki constructs in template arguments. This gets a bit closer to supporting table fragments passed through template arguments. Next, we'll need a way to indicate start-of-line position to enable sol block-levels in template parameters. Example: {\| {{#if: true\|{{!}}Table cell\|}} \|}	2012-03-15 11:43:49 +00:00
Gabriel Wicke	7e22020398	Convert syntactical break flags for templates from counters to the stack variant to fix the precedence for {{!}} (break on these inside table content, but not in template options within tables).	2012-03-14 16:30:59 +00:00
Gabriel Wicke	77a61dd687	Improve support for {{!}}, and don't produce a pre for indented tables.	2012-03-14 10:58:11 +00:00
Gabriel Wicke	835914b2de	Support {{=}}.	2012-03-14 09:07:01 +00:00
Gabriel Wicke	2195c31abf	Move link types to data-mw-rt, and support some more template tokenization edge cases. For example, the PHP parser treats \| foo \| = bar \| as \| foo = bar \|, believe it or not ;)	2012-03-13 12:32:31 +00:00
Gabriel Wicke	4cd8b302ac	Improved template tokenization. The parser can now template-expand [[:en:Barack Obama]] without exceeding 1.7GB of memory (which is the node limit).	2012-03-12 17:31:45 +00:00
Gabriel Wicke	3c5fe2523c	Tolerate more newlines and spaces in templates, and support templates and comments in urls.	2012-03-12 14:31:06 +00:00
Gabriel Wicke	ae4ab7a39c	Refactor syntactic stops into an object and add a stack variant for option values.	2012-03-12 13:08:43 +00:00

1 2 3

130 commits