wikimedia/mediawiki-extensions-VisualEditor

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/VisualEditor synced 2024-11-15 18:39:52 +00:00

Author	SHA1	Message	Date
Subramanya Sastry	13e03ec1d7	Refix <pre> serialization. - Effectively reverted fix from `f882a65153` and added a new fix. Change-Id: I8b81e26525a5f1a22acaf2c7067f2dcd9b962818	2012-06-14 13:10:02 -05:00
Subramanya Sastry	51227f2a4a	Improved, simplified newline handling in wikitext serializer. - Eliminated newline handling from several places in code and mostly isolated it to serializeToken thus simplifying newline handling logic. - Fixing some bugs in the process: # of green roundtrip tests went up by 5 (294 --> 299) but actually introduced failures on a few originally succeeding tests (additional leading/trailing newlines on the entire test output). - Added bonus: made list serializing (mostly) insensitive to newlines between tags. So, all the following DOM serialize identically to the following wikitext: foo bar ---------- <ul><li>foo</li><li>bar</li></ul> ---------- <ul> <li>foo</li> <li>bar</li> </ul> ---------- <ul> <li> foo </li> <li> bar</li> </ul> ---------- Change-Id: I76be56c4b2789039dff5f47de4659746882e45d6	2012-06-14 00:10:51 -05:00
Subramanya Sastry	bf0f5d1b7e	Minor code cleanup Change-Id: Ic5d99b6c483841310b0c295c1c30246f907455b4	2012-06-13 13:47:26 -05:00
Subramanya Sastry	23ec054013	Fixed round-tripping of interwiki links. Change-Id: If0427b9865b3e9cf8c0ad0b4efaebc9f9f7fb865	2012-06-13 13:39:18 -05:00
Subramanya Sastry	445780b4d3	Revert default tokenization result from null to '' * As part of an earlier fix, I had changed default value of 'res' to null instead of ''. But, this was potentially buggy because the previous check was (res !== '') which could be triggered by return values of handlers. By changing the check to null, I was effectively changing the code paths for those handlers that returned ''. Change-Id: I2302023be7422ce4fb384ff5a50fe53fa7732855	2012-06-13 11:53:05 -05:00
Subramanya Sastry	cfe94eed1f	Minor code refactoring Change-Id: Iec3cb4d83d16174371f0b1f3f23b1056aeed458e	2012-06-13 09:46:34 -05:00
Subramanya Sastry	f882a65153	Fix serialization of <pre> tags Change-Id: I7ae95e7ec06167d0c1bfdaba3d0c67d941043299	2012-06-12 13:54:35 -05:00
Subramanya Sastry	727c2119bb	Refactored serializeToken method and added special-case handling of paragraphs in lists. * We need to look at other special-case handling requirements of html tags in lists (and other contexts like tables). Change-Id: I84b8402d90a186c9075c2d45263c94377312927a	2012-06-11 17:55:41 -05:00
Gabriel Wicke	1ca586e5f1	Improve interwiki config a bit * Moved wikipedia default prefixes to environment * Added 'addInterwiki' method * Adjusted link handling normalizeTitle to reflect this Change-Id: If5b2314cc36346b6da8649ed410457a612d80a22	2012-06-07 12:30:16 +02:00
Gabriel Wicke	2fa5baabbb	Make it easier to configure the default wiki, and add support for mediawiki.org * mw:Foo now loads pages from mediawiki.org * The default prefix still is 'en'. You can switch this to 'mw' in ParserService.js. Change-Id: I1208667e6114bd711b7988a8b3adb32ffab70969	2012-06-07 11:50:40 +02:00
Subramanya Sastry	b665a2558f	Fixed bugs handing/transforming quotes - Three bugs that were messing up quote transformations. - Now, the following cases are handled properly: * ''foo''' * '''foo'' * ''foo'''' * ''''foo'' These tests (and other quote tests) have to be added to core parser tests file. - One more parser test green. Change-Id: I4f93e8910639f546bfc9304becab17d26d5529de	2012-06-07 01:37:45 -05:00
Gabriel Wicke	350e700d8f	Add core-upgrade Change-Id: I5ad0955e8272d376f009f89461bed310978b25e4	2012-06-06 15:58:17 +02:00
Gabriel Wicke	a146fcb8ad	Improve the handling of newlines for round-tripping An improvement, but there still are some extra newlines inserted after paragraphs. Example input: ------- Foo: {\| \|foo \|} ------- Extra newlines are inserted after the Foo: and the foo in the table. They are not fed as tokens or text to the tree builder, so there is likely a bug in the html5 library or JSDom. Change-Id: I83eb6180e3cd1c4e7f9b15b31d339e1d32bccd3f	2012-06-06 10:17:03 +02:00
Gabriel Wicke	59fc634cce	Update patched html5 library to version 0.3.8 Change-Id: I321d9a58ea1af33842a606fc8706938093a8330f	2012-06-06 10:17:03 +02:00
Subramanya Sastry	fe6f289486	Merge changes I5d98c704,Ib8d3de75 * changes: A few tweaks to link round-tripping Use word diff if --color is enabled	2012-06-05 16:04:23 +00:00
Subramanya Sastry	b095db4303	Simpler implementation of flatten. * Possibly more efficient under heavy GC load -- untested. * No change in time and memory use for single file parsing. Change-Id: Id2f3f65cc0e5f38ed968bbda60b97e46523e700e	2012-06-05 10:47:46 -05:00
Gabriel Wicke	dc3168cf6d	A few tweaks to link round-tripping * Moved the tail attribute to the second attribute (a bit cleaner) * Disallowed newlines in the tail production * Improved the selection of round-tripped href vs. generated content vs. href in the serializer * renamed state.linkTail to state.dropTail Change-Id: I5d98c704b6ea566011e22237786f8da17548570f	2012-06-05 17:26:27 +02:00
Gabriel Wicke	d16032ae9a	Track html syntax in block_tag production Change-Id: If560523644f007485809762f12216e08fb3c3ed3	2012-06-05 12:39:56 +02:00
Gabriel Wicke	cc96ff4f5e	Very basic interwiki support Pages titles with a wikipedia interwiki prefix now load the page from corresponding Wikipedia. Links in a page then stay within the given language. Note that Parsoid currently makes no effort to recognize localized namespaces, so it won't render media files, categories etc correctly. Change-Id: I7bc4102e81a402772ea23231170734d580ea15b9	2012-06-05 11:19:58 +02:00
Gabriel Wicke	92f753a365	Pre and link target improvements * Don't explicitly add the newline in the pre, as we preserve newline tokens now. This avoids doubling of newlines when round-tripping. * Use the sHref attribute even if the href contains spaces. Change-Id: I8bec8fbfd6a7836bf2e5eec20869a0edd95c93b6	2012-06-04 14:03:05 +02:00
Gabriel Wicke	ee2ddbd3cb	Fix list handler issues Lists interrupted by non-empty lines would not close the list properly. Register for any token instead of just for newlines and close the list if no listItem follows the newline. Change-Id: I1743901e3db541bbeda78d17707db943e6ceb9b9	2012-06-04 13:38:43 +02:00
Gabriel Wicke	f821eac102	Optionally round-trip sHref in data-mw If the href would not denormalize, add a copy of the original href in data-mw and use it to preserve non-conventional capitalization etc. Change-Id: Ifef50eec7343b0e6b0ba66b6d19a8a3e8c9f8001	2012-06-04 12:28:05 +02:00
Gabriel Wicke	e0809209ec	Don't set the data-mw attribute if the object is actually empty. Change-Id: I984f1b44bba67d7a9f1a709738d14c0ee02f69a9	2012-06-04 12:26:03 +02:00
Gabriel Wicke	2774e5aa6c	Actually replace all underscores in wikilink target Change-Id: I633f8d6e4f639aff90fd456600376b7c6515fd50	2012-06-04 11:48:59 +02:00
Gabriel Wicke	3f2c72f920	Fix padleft / padright (mis)use as substr Change-Id: I0645e11c8ef8b550ad35300d1904788940fc748a	2012-06-04 11:30:45 +02:00
Gabriel Wicke	4533c274ca	Fix a crasher in the serializer A tail containing regexp syntax (a ? in [[:en:Main Page]]) would crash the serializer. Use substr instead. Change-Id: I8519aec9c07dfe31893d676b1c936a42d2af74a0	2012-06-04 00:00:54 +02:00
Gabriel Wicke	31522d3d49	Add ApiRequest Change-Id: I5f2a1cb65223a68f10bc63903000248efca05586	2012-06-02 16:52:51 +02:00
Gabriel Wicke	63abd57fc8	Improve newline-before-paragraph round-tripping support Change-Id: I9176a97f9695018650d9a63b89514c07e0d6be90	2012-06-02 16:39:33 +02:00
Gabriel Wicke	d3975a8d03	Very basic round-trip test mode for the API Returns both the resulting wikitext and the diff with the original input. Change-Id: Iad25039beb054a84e1ad51ffa9fee924db49c60b	2012-06-02 16:20:54 +02:00
Gabriel Wicke	74135b295f	Some more switch fixes Change-Id: If1a6086348c45a73a941bc8e6728ef75d002be50	2012-06-02 15:04:20 +02:00
Subramanya Sastry	8f216af2f5	Handle link tails properly. - Added a tail json attribute for wikiLinks - During serialization, this attribute is used to strip the tail from the link target and render it after the link [[hen]]s ==> <a ... data-mw="{gc:1, tail: 's'}" ...>hens</a> ==> [[hen]]s - 2 more roundtrip tests green Change-Id: I84f3dabaf0271f7a67641a00148467daa8310eb0	2012-06-01 23:41:10 -05:00
Subramanya Sastry	413fc5e043	Fixed bug serializing wikilinks with implicit link text. * Simple fix but greens 10 more roundtrip tests. Change-Id: I7f82d788a10bd83e0e3215568c2168081c332c50	2012-06-01 17:25:21 -05:00
Gabriel Wicke	16219ddc6d	Fix up #switch a bit * Re-establish the value-only default * Fix value expansion Change-Id: I32e62789b25bbe17a74c564e41e9101ad5528fb7	2012-06-01 22:15:43 +02:00
Gabriel Wicke	e2301813ed	Merge "Tokenizer backtracking cache bug fix and memory savings"	2012-06-01 12:06:00 +00:00
GWicke	befd223476	Merge "First pass implementing a general tag minimization routine"	2012-06-01 11:15:48 +00:00
Gabriel Wicke	ece2b0f810	Tokenizer backtracking cache bug fix and memory savings * The state of syntax stops is now properly included in the cache key for the tokenizer-internal backtracking cache. This fixes some mis-parses when re-parsing a bit of text with different flags. * Clear the backtracking cache after each toplevelblock. This drops the peak memory usage when expanding [[:en:Barack Obama]] from ~380M to ~110M. Change-Id: Icdb879cae5907e4595903dd6acba2e686e8c2e4b	2012-06-01 12:53:49 +02:00
Subramanya Sastry	1c80e2d7f0	First pass implementing a general tag minimization routine * This routine attempts to rewrite the DOM to maximize tag overlap and thus minimize tag uses. * This takes as input a set of tags which participate in the minimization. * Tested on the following example <b><i><u><s>BIUS</s></u></i></b><b><i><s>BIS</s></i></b><b><u><s>BUS</s></u></b><u><i>UI</i></u> with multiple combinations of the 2^4 possible variations of i,b,u,s tags: [], ['i','b','u','s'], ['i'], ['b','s'], ['i','b','u'] - But, I am not fully sure if this implements the right behavior when only a subset of inline tags are provided. Needs discussion and tweaking as necessary. * Also tested on few others: <b>B</b><b><i>BI</i></b><b><i><u>BIU</u></i></b><b><i><u><s>BIUS</s></u></i></b> <s><i><b>SIB</s></i></b><s><i><u>SIU</u></i></s><i><u>IU</u></i><i>I</i> * The previous pairwise tag rewriting version fails on several of these examples, so this new version is a definite improvement. * No change in parserTests run (203 passing before and after). * Possible improvements that could/should be undertaken: - get rid of useless/idempotent add/remove of nodes that don't change the DOM. - ensure that node attributes post-restructuring are correct. Change-Id: Ib4a8b39583fa96a2be880a77021ca81cefa06484	2012-05-31 12:10:28 -05:00
Gabriel Wicke	4ea6b8e2be	Revert part of last template syntax tweak Change-Id: I084e1210577f80c3b96020d57cfa5c68eb5d139b	2012-05-31 12:02:42 +02:00
Gabriel Wicke	c5d7e01944	Another tokenizer robustness improvement This patch fixes a tokenizer syntax error encountered on [[:en:Template:JacksonvilleWikiProject-Member]] and [[:en:Template:Infobox former country]] by allowing optional whitespace before start-of-line template syntax. Change-Id: Ic214a731de58bf766e51f23d5e24ea2ce6788f58	2012-05-30 18:38:23 +02:00
Gabriel Wicke	a133768781	Don't eat '}}' in generic attributes and similar productions This fixes some syntax errors, at least one in Template:Geobox. Change-Id: I32338febe25d0833c1d9bc4de293cd15b4cbb7be	2012-05-30 17:37:10 +02:00
Gabriel Wicke	36084c5d93	Preserve original newlines in HTML and serialization 254 round-trip tests (up from 184) are now passing. Also: * tweaked runtests.sh slightly (use less -R instead of -r). * made sure the EOFTk is preserved in phase 3 transforms Change-Id: I1de22186bdb78e52019370e43f096877005b8f5a	2012-05-29 23:29:03 +02:00
Subramanya Sastry	8174c9dafc	First attempt implementing rewriting rules on the DOM - This is implemented as a post-processing pass. - Might require additional checks to verify rewriteability. - Implemented as a pair-wise tag DOM minimization strategy, i.e. it takes tag pairs (B, I) for ex, and attempts to normalize the tree just for those tag pairs. Normalizing across multiple tags is implemented as pairwise rewriting across all pairs: Ex:(b,i), (b,u),(i,u) for (b,i,u) - Copied over attributes as part of rewriting, but some of the attributes lose their meaning on rewriting since tags are reordered (ex: sourcePosn, sourceTagPosn). How do we handle this? Output examples and possible issues to fix: <i><b><u>biu</u></b></i><b><u>bu</u></b><u>u</u> gets rewritten to: <u><b><i>biu</i>bu</b>u</u> But, the equivalent wikitext form: '''''<u>biu</u>''''''''<u>bu</u>'''<u>u</u> does not get rewritten because of parsing differences. This wikitext gets parsed into: <i><b><u>biu</u>'''</b></i><u>bu<b>u</b></u> The extra ''' token in the middle thwarts DOM rewriting. However, a slightly different version: "'''''<u>biu</u>''<u>bu</u>'''<u>u</u>" gets properly normalized to: <u>'''''biu''bu'''u</u> An alternative, but fun strategy to play with is to use the following two normalization primitives: S(wap) and M(erge). - S rewrites T1(T2(x)) into T2(T1(x)) (ex: <b><i>foo</i></b> ==> <i><b>foo</b></i>) - M rewrites (T(x),T(y)) into (T(x,y)). (ex: <b>foo</b><b>bar</b> ==> <b>foobar</b>) The current rewriting strategy could possibly be re-implemented as S-M rewriting. The problem to solve there would be to find an efficient rewriting strategy that is guaranteed to lead to a normal form. I may not play with it now, but just documenting it for later (to play with in my spare time). This commit is just as a record of fun/experimental code where I get to learn details of JS, wikitext, parsing, and DOM manipulation. Next version of this code will attempt to introduce minimal DOM restructuring across multiple tags at once which can be more efficient. gwicke: Removed now passing test from whitelist, and updated another whitelist entry which is now improved. Change-Id: Ie97bcb164eb62c34ba61aa76ba2f4c232aa713d8	2012-05-29 08:17:57 +02:00
Gabriel Wicke	b2adee0ae7	Basic rt support for indent pre variant * Added a generic stx_v 'syntax variant' round-trip attribute * For pre, use stx:'html' vs. no syntax annotation. This might not be 100% safe for arbitrary html input, so we might want to flip this to stx:'wiki' later. * 181 round-trip tests passing Change-Id: If6080917a3a7c069066db3db60efe59b1f6c28d8	2012-05-25 18:55:38 +02:00
Gabriel Wicke	a31ccaabe4	Support definition lists with empty definition Change-Id: I81c39a7e49f2ea7ce32cdd3600caeb5eb9f50d84	2012-05-25 15:40:32 +02:00
Gabriel Wicke	06b51b1f3f	Properly round-trip dd/dt; 178 round-trip tests passing. Need to track variable whitespace before elements to make some more tests pass. Change-Id: Ia86535d6f352e2ffe7965547cd506b0dbb6dfba2	2012-05-25 13:59:55 +02:00
Gabriel Wicke	6f62878c78	Resolve subpage links, and remove hack for H: titles Change-Id: I6c9c64179274e5c1641a3b127ac3b273a3c5254e	2012-05-24 17:57:41 +02:00
Gabriel Wicke	dc61f313a2	Notes on missing parser functions, more error reporting tweaks Change-Id: Ib6ce60cf1b55671a6ff57aa47edb5787ec3aefea	2012-05-24 17:31:26 +02:00
Gabriel Wicke	cc10aab54f	Add self alias Change-Id: I47682f407da6b554179611c7d0f63f882ab5a871	2012-05-24 17:16:35 +02:00
Gabriel Wicke	13ae7cda11	A few (partly hackish) improvements * Very basic support attribute key-value pairs emitted from templates * Add TALKPAGENAME stub implementation * Only show 'no revisions' message for top-level pages Change-Id: I4b4ac0c7b2c0531ac4b39f0f49f4217302576ab9	2012-05-24 16:30:26 +02:00
Gabriel Wicke	3e0e11b1d0	Sanity check for tokens being an array Change-Id: Ia4e4071e1469c31e3b320d854500938bb0245f82	2012-05-24 14:35:58 +02:00
Gabriel Wicke	93ce7453f0	Fake fullpagename et al a bit better Change-Id: I85ddf9e88e5f8ac274f371bea0879600997001e4	2012-05-24 11:05:31 +02:00
Gabriel Wicke	cdd1eca42d	Fix non-existing revision error reporting Change-Id: I6b8687bcde98b92d9d6217a738a177db279fd006	2012-05-24 10:50:47 +02:00
Gabriel Wicke	f03fc39d15	Report missing revisions when retrieving templates Change-Id: I9f33acafc4d3fbd062125d824e2614dafd4cd5a0	2012-05-24 10:45:01 +02:00
Gabriel Wicke	caf2fa663d	Keep going on tokenizer errors Change-Id: I76fab4528f89b425845aef1685b3a54ddfeceef4	2012-05-24 10:30:32 +02:00
Gabriel Wicke	e70448e53a	Use text/x-mediawiki content type, and handle tokenizer errors without --debug Change-Id: I154cd344306aa05ada7ff30f631d487f39fa9739	2012-05-24 10:19:25 +02:00
Gabriel Wicke	4cc2d25e70	Fix a debug print reference error Change-Id: Ic26d29aced4129c3dd718c4751dadb62a0be1a27	2012-05-23 20:52:45 +02:00
Gabriel Wicke	d6af3b3375	Improve the serializer and its output display in the web service Change-Id: Id3ca96846cad42517d7d4bada8f4bb250d54247b	2012-05-23 17:50:35 +02:00
Gabriel Wicke	95496c02db	Add an extra newline before headings, and ignore favicon.ico requests Change-Id: Ibacac3453afefa5dbe803c1e0260e8c943785f12	2012-05-23 17:17:54 +02:00
Gabriel Wicke	21286a50df	Make sure pageName is set in the web service, and handle empty page name in parser function Change-Id: I5d36eefecc2f35a860d00a8960004f8e651ed17c	2012-05-23 16:43:45 +02:00
Gabriel Wicke	a862718ad8	Add some checks against undefined tokens returned from async transforms Change-Id: Ie19537083b96b1b2e12e1c4b65a7a044753c18ac	2012-05-23 16:32:21 +02:00
Gabriel Wicke	a4c5d43ff7	Fix an external link regression, and add server shell wrapper and setup docs Change-Id: I9a4f7690e98313d003a2fec35324ed70556e6461	2012-05-23 16:25:42 +02:00
Gabriel Wicke	b89f5071e5	Basic parser / serializer web service * After installing Parsoid (sudo npm install -g in modules/parser), run 'node server.js' from the api directory and navigate to http://localhost:8000/ and follow the directions. You can start to navigate the English wikipedia at http://localhost:8000/Main_Page, or manually enter wikitext or HTML DOM to convert. * Uses the express framework, could also use just connect * Uses the cluster module to manage workers per-core and restart those on failure Change-Id: I443f2996ed3df00826b038b7476a2f966ab0c425	2012-05-23 12:35:00 +02:00
Gabriel Wicke	febb912ead	No end delimiter after template row attributes Change-Id: Iba304fb797d221e2d65ae055d266bff2f6301df8	2012-05-23 09:30:07 +02:00
Gabriel Wicke	39c6f42879	Link round-tripping and other improvements * Changed RDFa for links according to http://www.mediawiki.org/wiki/Parsoid/RDFa_vocabulary * Added basic support for internal/external link serialization * Moved numbering of external links from tokenizer to LinkHandler * Added round-tripping for generic HTML tags * Replaced nowiki tag with <meta typeOf="mw:tag" content="nowiki"> and <meta typeOf="mw:tag" content="/nowiki"> for now. * 154 round-trip tests passing (node parserTests.js --roundtrip). Change-Id: I16c4db21b1b543ee57c73e569c83025b64664542	2012-05-22 13:36:06 +02:00
Gabriel Wicke	7e21b7380a	Merge "Round-trip nowiki"	2012-05-21 17:16:56 +00:00
Gabriel Wicke	fb7d5418a5	Round-trip nowiki Change-Id: I5f7e6a43f5fdc1708ee710b2a601b20db733452c	2012-05-21 18:06:09 +02:00
Gabriel Wicke	a6610e52c2	Serializer and table round-tripping improvements * added stx: 'html' round-trip information for html tags * added t_stx: 'row' info for row-wise table wiki syntax, and support for it in the serializer * the first table row is implicit in wikitext * renamed lastToken to prevToken in serializer * strip first newline in an initial chunkCB Change-Id: I014b046539d1b674d830551c5fd1b74a67f81993	2012-05-21 14:59:53 +02:00
Gabriel Wicke	e069e7cb1c	Merge "Support table captions and properly delimit the end of table options"	2012-05-21 12:51:58 +00:00
Gabriel Wicke	54e75b93b7	Support table captions and properly delimit the end of table options Change-Id: I15eb8df19528cfceadfee368370501b30f0e36a0	2012-05-18 10:46:43 +02:00
Gabriel Wicke	c39eb36968	Use outerHTML to serialize unhandled DOM node in serializer Change-Id: I37350712c9450c34025740a8d6de51344739c2b7	2012-05-18 10:03:16 +02:00
Gabriel Wicke	3c6d829708	Fix first bug caught by new roundtrip mode for parserTests Change-Id: Id152fd29606d8ee34ac300945f41e2a5f48f087f	2012-05-18 09:55:22 +02:00
Subramanya Sastry	ae4810b201	Renamed items to itemCount for better code readability. Change-Id: I53851c07a4746928fddec4b3737136f081d49178	2012-05-17 12:32:46 -05:00
Subramanya Sastry	58da03bc85	Track list prefixes in the list start handler and use them to output serialized text in list item handlers. Change-Id: Ic7562d531d2313bedcf3b7450b4f28f02bc2b5a3	2012-05-17 12:12:46 -05:00
Gabriel Wicke	e2815b516c	Start to handle links Change-Id: I1fb975910651820fd889d77152562fd4fbcb5db8	2012-05-17 14:32:56 +02:00
Gabriel Wicke	b7fd4498a9	Use single _serializeToken handler for both DOM and tokens Change-Id: I45e1d90b53a5ddc678f7744f27274bebcfc375fe	2012-05-17 13:20:39 +02:00
Gabriel Wicke	8dbc2f573f	Simplistic wikitext round-tripping with parse.js --wikitext Lists are a bit tricky, as nested lists are not wrapped in a separate list item. Should work now though. Change-Id: I2e5f29f6afa6bdd2d5e5c0c5d019b70c611b73d1	2012-05-17 12:44:46 +02:00
Gabriel Wicke	3414418b1f	Don't eat newline tokens in the ListHandler This fix only affects following transforms, of which there are few right now. Also removed a stray token mutation in QuoteTransformer. Change-Id: Id6d4adce944b06fc1a3651cfbf63fc2670125225	2012-05-16 23:14:21 +02:00
Gabriel Wicke	542921b5a3	Removed html5 parser patch no longer needed with 0.3.8 Change-Id: Id8c23d34e8cca49a360f536e792144a85a8468a3	2012-05-16 12:06:42 +02:00
Mark Holmquist	96ee9ad45c	Add a new wikitext serializer, with limited functionality. This isn't finished at all, but Gabriel wants to take a crack at it, so here it is! Change-Id: I9732aa141f7c69a28c8f5978cb18180e93cb9eda	2012-05-15 10:41:28 -07:00
Gabriel Wicke	d918fa18ac	Big token transform framework overhaul part 2 * Tokens are now immutable. The progress of transformations is tracked on chunks instead of tokens. Tokenizer output is cached and can be directly returned without a need for cloning. Transforms are required to clone or newly create tokens they are modifying. * Expansions per chunk are now shared between equivalent frames via a cache stored on the chunk itself. Equivalence of frames is not yet ideal though, as right now a hash tree of unexpanded arguments is used. This should be switched to a hash of the fully expanded local parameters instead. * There is now a vastly improved maybeSyncReturn wrapper for async transforms that either forwards processing to the iterative transformTokens if the current transform is still ongoing, or manages a recursive transformation if needed. * Parameters for parser functions are now wrapped in abstract Params and ParserValue objects, which support some handy on-demand value expansions. Keys are always expanded. Parser functions are converted to use these interfaces, and now properly expand their values in the correct frame. Making this expansion lazier is certainly possible, but would complicate transformTokens and other token-handling machinery. Need to investigate if it would really be worth it. Dead branch elimination is certainly a bigger win overall. * Complex recursive asynchronous expansions should now be closer to correct for both the iterative (transformTokens) and recursive (maybeSyncReturn after transformTokens has returned) code paths. * Performance degraded slightly. There are no micro-optimizations done yet and the shared expansion cache still has a low hit rate. The progress tracking on chunks is not yet perfect, so there are likely a lot of unneeded re-expansions that can be easily eliminated. There is also more debug tracing right now. Obama currently expands in 54 seconds on my laptop. Change-Id: I4a603f3d3c70ca657ebda9fbb8570269f943d6b6	2012-05-15 17:05:47 +02:00
Catrope	c256ea7d71	Fix fatal error in parse.js Trying something trivial like echo 'Hello world' \| node parse.js would throw TypeError: Function.prototype.apply: Arguments list has wrong type Change-Id: Ia0a1154b0f3edbfb1f228a1d2072fced1b147141	2012-05-10 12:04:57 -07:00
Gabriel Wicke	b1bd0d73ec	Don't eat end token in ListHandler, and lazier Quote handler registration * Setting the rank on tokens is still used currently, but will be phased out in favor of setting it on chunks. Tokens will be immutable to allow sharing and caching without a need for cloning. * Only register for newline and end tokens in QuoteTransformer when active. Change-Id: I2c45bc7e4a105219a1404ab221eed7f242128f1e	2012-05-10 09:47:53 +02:00
Adam Wight	0a7f0b7630	List markup is created during the sync23 phase. This makes it possible to transclude list items from a template. Note: "5 quotes" test is broken by this patch, it appears that ListHandler newline processing is changing some state which mysteriously affects the QuoteTransformer. This is ominous, hopefully there's a simple explanation... gwicke: fix a bug in tokenizer triggered by definition lists like this: **; foo : bar Change-Id: I4e3a86596fe9bffcbfc4bf22895362c3bf742bad	2012-05-08 11:39:36 +02:00
Gabriel Wicke	909633ea08	Improve template / tplarg precedence in tokenizer Change-Id: If9b24b42ea223e0f30f906a83496d73ec60c4a0d	2012-05-04 13:17:06 +02:00
Gabriel Wicke	8a30f76370	Use upright option, including the 0.75 default width Change-Id: Iacdf6173e0ee8f58ca4385fd9b2cde77b2fdf3c4	2012-05-04 11:15:35 +02:00
Gabriel Wicke	57dfd89383	Handle upright option properly Change-Id: I831fcccf874f9a0505e88eb76d269b1d2f68e3e0	2012-05-03 16:15:34 +02:00
Gabriel Wicke	c4fc7508a7	Add basic # REDIRECT handling Change-Id: I71f659201c1d5de4a528ddfac7f65bf20a89f97d	2012-05-03 15:54:36 +02:00
Gabriel Wicke	6ab017308b	Only specify the width for thumbnails to keep the aspect ratio Change-Id: I4e55ff719da6cb58f396ad6043e46acaed4a504d	2012-05-03 15:36:42 +02:00
Gabriel Wicke	6139398494	Reduce debugging overhead a bit, and provide default internal image size Change-Id: I345af8c5905a5fa747f9ed342ba2ba8c1026d044	2012-05-03 14:49:55 +02:00
Gabriel Wicke	6e21f6bb27	Forward-port Cite extension * Adapted Cite extension to use current interfaces and token formats * Improved TokenCollector Change-Id: I20419b19edd9bbad2c2abf17a2ff1411b99c0c04	2012-05-03 13:22:01 +02:00
Gabriel Wicke	2291fe8364	Reduce the need for token cloning slightly Change-Id: I31c71bddca4855afdffc3fe5c8d759cfa1994d86	2012-04-27 23:12:25 +02:00
Gabriel Wicke	5fb2c46073	Clone cached tokens, and fix switch for empty needle Change-Id: I63946e5a56f6fd7dd30d00b12d36032dd1dd0017	2012-04-27 15:59:01 +02:00
Gabriel Wicke	ed8cb54831	Simplify transformToken slightly, and fix JSHint warnings Change-Id: I95769ed063ea855a9109148f5db83ea43f423e56	2012-04-27 15:31:30 +02:00
Gabriel Wicke	2d7b4a2a59	Make .to more consistent and add optional parentCB arg * parentCB (if set) is called with { async: true } if expansion is going to be asynchronous. * Strings are handled efficiently * all value parameter chunks can now be converted using .to(). Change-Id: Ib013e1bc3d8e7f692009038209db6a056887326e	2012-04-27 13:57:23 +02:00
Gabriel Wicke	fd1a67aa16	Add .to('text/plain/expanded', cb) support and convert ifeq to use it Change-Id: I99c78de12fed41ba36811402f7ecacb420391d70	2012-04-27 12:18:30 +02:00
Gabriel Wicke	30a83d7fd7	Accept wikilink parameters with dangling equal ('\|arg=\|') Change-Id: Ib4f6d186da2a74522b17c377dac5c9a7de7e5861	2012-04-27 11:35:00 +02:00
Gabriel Wicke	1d70e7b81c	Disable preformatted text from indents in template args Change-Id: I84144d3fab6541ed264d9b092806c8bf9de6e8b2	2012-04-27 10:45:08 +02:00
Gabriel Wicke	56d6757f67	Fixes for the template fetch retry feature Change-Id: Id36cb02c535d07f4f2cdd54ae682b6a144a2faa9	2012-04-26 20:31:23 +02:00
Gabriel Wicke	027d77e0c9	Fix --wikidom and --linearmodel parse.js options; retry on template fetch failures Change-Id: I444397936fd87971fe085df4b467089367e9ffa6	2012-04-26 19:51:00 +02:00
Gabriel Wicke	3be4992782	'Obama finally expands' ;) Misc fixes and documentation updates * [[:en:Barack Obama]] can now be expanded in 77 seconds using 330MB RAM, while it would prevously run out of RAM after ~30 minutes. Wohoooo! The token transform framework rework really paid off. * 303 parser tests are passing in the new record time of 5.5 seconds. Two more tests are passing since these tests expect the day of the week to be Thursday. Won't be the case tomorrow. Change-Id: I56e850838476b546df10c6a239c8c9e29a1a3136	2012-04-26 18:18:08 +02:00

1 2 3 4 5 ...

441 commits