wikimedia/mediawiki-extensions-VisualEditor

mirror of https://gerrit.wikimedia.org/r/mediawiki/extensions/VisualEditor synced 2024-11-15 18:39:52 +00:00

Author	SHA1	Message	Date
Gabriel Wicke	06ae53fdfe	Drastically reduce memory usage for template-heavy pages Only call back a few callbacks per reactor iteration from the template fetch request queue. This changes the expansion pattern from a (memory intensive) breadth-first expansion to something quite close to depth-first expansion. Additionally, retrieved pages are quickly added to the page cache so that a lot of request queuing is avoided in favor of synchronous expansion from the cache. On pages like Barack Obama that previously ran out of memory after consuming node's 1.6G heap limit, expansion now runs in relatively constant 100-300M resident (so far, still running). Change-Id: Ie34a1eeff00d868416de45ef8d289898258f560c	2012-04-13 14:31:03 +02:00
Gabriel Wicke	df050e4481	Convert external link syntax stops to stack Eat unbalanced external link parts within template parameters. This does not produce the same output as the PHP parser (try echo '{{YouTube}}' \| node parse.js), but preserves a level of sanity. Need to check how common this is for external links. If it is rare enough, moving the ']' after the parser function manually would fix the rendering for the YouTube case. Change-Id: I597d808efff36baa22191e7946a0061cc31120e8	2012-04-13 11:08:42 +02:00
Gabriel Wicke	5bb2d96869	Token stream transform improvements * add past paths for empty arguments etc * cache attribute token transform pipelines * fix bugs in TokenCollector and NoIncludeOnly handler, and improve its efficiency by only registering for 'end' tokens on demand * Remove empty reset methods from a few handlers * Add a simple 'ap' debug print function that makes it easy to only print some debug prints by temporarily changing 'dp' to 'ap' * Improvements and bug fixes in AttributeExpander Change-Id: Ie69729c8f62d48bba922712e44ebce484c621c50	2012-04-12 15:42:09 +02:00
Gabriel Wicke	3124deca2c	Track inclusion status on CachedTokenPipeline Non-include attribute pipelines are not cached for now. Adding separate caching for non-include attribute pipelines is very likely worth it, but deferred for now. Change-Id: I13f949d9f0a04536f9ccfcb73a2be69c5c08be01	2012-04-12 10:21:50 +02:00
Gabriel Wicke	efa41370d3	Set inclusion flag for attribute transform managers too Change-Id: Ice15d8fde6de4a3e850a028db9917e976218fc43	2012-04-11 21:55:52 +02:00
Gabriel Wicke	bff43938f6	Support noinclude/includeonly/onlyinclude in attributes Fun test case: {\| \|-<includeonly> foo </includeonly> \|Hello \|} Change-Id: I353bb287d3967ade549fbcb4ae64511a1f1f7e36	2012-04-11 17:37:25 +02:00
Gabriel Wicke	9ae572cca0	Fixes to template expansion / token transform managers, 296 tests passing. * Convert isNoInclude logic to positive isInclude throughout and set it properly on attribute pipelines. Also don't cache non-include pipelines. * Add a --pagename parameter to parse.js, which sets the page name in the environment. This is then returned by {{PAGENAME}}. Not the final solution, but useful for taxobox testing as taxons are selected based on PAGENAME. * Add rudimentary pagenamebase parser function Change-Id: If9c0be4c255200d0f2a30f02e5619437b4fd8f12	2012-04-11 16:34:27 +02:00
Gabriel Wicke	bbae66cd69	Nominate more HTML5 sectioning and heading elements for block-level treatment Block-level (in HTML4 lingo) elements are not wrapped into paragraphs. Change-Id: I4a01c9721be30b526172952915d528dea79e2f30	2012-04-11 12:53:49 +02:00
Gabriel Wicke	5a33099875	Improve template tokenization in template arguments Taxobox tables now render pretty much correctly. Change-Id: I5a0564138ff0c688d8a5a69b7867646fd3763946	2012-04-10 16:40:49 +02:00
Gabriel Wicke	577ef1f916	Add some support for alignment of thumbs Change-Id: I70570f48423628f7a87a35647698a66a5f413088	2012-04-10 12:11:59 +02:00
Gabriel Wicke	403be4af42	Add basic thumb rendering support * DOM based on Wikia's thumb output: HTML5, clean caption without magnify icon. * basic RDFa annotations, but most options additionally in data-mw object- might want to move more (or all?) of those into RDFa data using meta tags. * no support yet for framed or other formats, image scaling etc * also tweaked some config options in the environment Change-Id: Ie461fcdce060cfc2dec65cc057709ae650ef3368	2012-04-09 23:04:26 +02:00
Gabriel Wicke	dbdd320348	Improve parameter tokenization support especially for table rows Change-Id: I961d69e228b96adc69ea9acb3733d13f5898602d	2012-04-05 16:00:26 +02:00
Gabriel Wicke	7a35e5db16	Remove behaviors var in tokenizer, now handled in token handler Change-Id: I68eeff3f05ce29c13e347c2cd7ea6519e58b0e03	2012-04-04 21:17:29 +02:00
GWicke	da60861be8	Merge ""magic words" are tokenized and used to set parser.environment flags"	2012-04-04 19:11:03 +00:00
Adam Wight	a85ed36efa	"magic words" are tokenized and used to set parser.environment flags behavior switches are converted to tokens which set parser.environment flags during the async transformation stage. The next step would be for handlers in the sync23 stage to generate the TOC, section edit links, and so on according to these directives. No tests written, because the switches are consumed and don't appear in rendered html. We can test the magic word layout controls individually, once they're implemented. Another small change was to store option flags directly in the environment object, not that it makes much difference. Change-Id: I863fbf4be1a17d2f6c31158298dd301f19ae1137	2012-04-04 11:25:29 -07:00
Adam Wight	b234edba88	As much as I have loved writing Makefiles... I've replaced its functionality with package.json, mostly so we can avoid non-node dependencies. This is one of the recommended practices. We should consider moving tests/parser into modules/parser/tests, other node projects keep all module code in one directory. Explained in the README how to use npm to load the dependencies and run tests. Too bad about NODE_PATH... Don't try to find parserTests.txt in assorted places--if it isn't present, fetch from gerrit. You can symlink from core if you're developing on both parsers, and the fetch script will not overwrite. Use __dirname in parserTests.js to allow the script to run independent of current working directory. Change-Id: I4c8b884e91f4fdeae385c7697aff768bdd199dd5	2012-04-04 11:02:58 -07:00
Gabriel Wicke	e3a745a024	Improvements for template / -argument precedence; support for empty params Change-Id: Id0894ccbedfa47fa3658817ca65119a2af76be3e	2012-04-04 16:29:47 +02:00
Gabriel Wicke	2037215185	Disallow '[' in generic attribute names This avoids interpreting something like ! [[foo\|bar]] as <th [[foo=''>bar]]</th>. Change-Id: If59708fa90eb0117a15b2b6446890d1ae19a857c	2012-04-04 14:31:11 +02:00
Gabriel Wicke	f588d2a7aa	Fix table headings in template parameters Change-Id: Icdfc5655968fc845230ad7638124309d6b8c1ada	2012-04-04 12:54:34 +02:00
Gabriel Wicke	b8d980a229	Don't eat newline / space in template parameters ..so that block_lines can match. Change-Id: I4c464dc44249f40e4aa280df35fb726bfce3a745	2012-04-04 11:22:31 +02:00
Trevor Parscal	606d97da99	Merge "Add HTML DOM -> linear model converter"	2012-04-03 17:52:55 +00:00
Gabriel Wicke	47de122a95	Improve support for table / template interaction Match pairs of {{!}} or \| for template productions, but not a mix of the two. Example: {{#if:1\|{{!}}- {{!}} {{#if:1\|style="color: red"{{!}}\|}} }} Note that the style parameter ends up as the key of an empty-valued attribute on the table cell currently. Change-Id: I5f9357dd1645ef97b0af89f32e8d92ae49218c72	2012-04-03 18:48:35 +02:00
Gabriel Wicke	0fe062fbe1	JSHint cleanups and parser function argument handling improvements Parser functions which only accept positional arguments now return both the key and value of arguments. Complete attributes (key and value) for templates and the like from parser functions are not yet supported though. Change-Id: I3f81bb35acd27186222ce6d5217e820042527c01	2012-04-03 18:10:48 +02:00
GWicke	b7db83e09a	Merge "Magic links and behavior switch tokenization by Ori Livneh"	2012-04-02 16:43:13 +00:00
Gabriel Wicke	f662690d02	Shorten data-mw-rt to data-mw and clean up whitelist Instead of a proliferation of data-mw-* attributes, it should be easier to stash all private / non-semantic round-trip information in a JSON object stored in data-mw. Change-Id: Id200a6a8789fa152f29ea530e5a24b6ee7b4b285	2012-04-02 18:12:49 +02:00
Gabriel Wicke	5248fd31e8	Magic links and behavior switch tokenization by Ori Livneh Commit first patch by Ori, lets 288 parser tests pass. Yay! Change-Id: Iac8c3d1ad1984900350b20f7e725c40618a1e8ba	2012-04-02 17:31:34 +02:00
Catrope	8dc994f037	Add HTML DOM -> linear model converter Also, in ParserPipeline: * Import the LM converter and expose it through getLinearModel() * Fix getWikiDom() to actually work (still unused) In parse.js: * Add --help option that prints usage information (was unreachable) * Add --linearmodel option to output linear model JSON instead of HTML Change-Id: Ic534e03ff40a7c9117bb63f0c635a4213d5e3406	2012-03-29 12:47:14 -07:00
Gabriel Wicke	5ef2074251	Enable support for block-level wiki constructs in template arguments. This gets a bit closer to supporting table fragments passed through template arguments. Next, we'll need a way to indicate start-of-line position to enable sol block-levels in template parameters. Example: {\| {{#if: true\|{{!}}Table cell\|}} \|}	2012-03-15 11:43:49 +00:00
Gabriel Wicke	7e22020398	Convert syntactical break flags for templates from counters to the stack variant to fix the precedence for {{!}} (break on these inside table content, but not in template options within tables).	2012-03-14 16:30:59 +00:00
Gabriel Wicke	77a61dd687	Improve support for {{!}}, and don't produce a pre for indented tables.	2012-03-14 10:58:11 +00:00
Gabriel Wicke	835914b2de	Support {{=}}.	2012-03-14 09:07:01 +00:00
Gabriel Wicke	2195c31abf	Move link types to data-mw-rt, and support some more template tokenization edge cases. For example, the PHP parser treats \| foo \| = bar \| as \| foo = bar \|, believe it or not ;)	2012-03-13 12:32:31 +00:00
Gabriel Wicke	4cd8b302ac	Improved template tokenization. The parser can now template-expand [[:en:Barack Obama]] without exceeding 1.7GB of memory (which is the node limit).	2012-03-12 17:31:45 +00:00
Gabriel Wicke	3c5fe2523c	Tolerate more newlines and spaces in templates, and support templates and comments in urls.	2012-03-12 14:31:06 +00:00
Gabriel Wicke	ae4ab7a39c	Refactor syntactic stops into an object and add a stack variant for option values.	2012-03-12 13:08:43 +00:00
Roan Kattouw	29f416937e	Fix some usages of splice.apply in the data model to use ve.batchedSplice(). Added FIXME comments for occurrences outside of DM	2012-03-10 00:31:28 +00:00
Gabriel Wicke	ffc9383096	Temporary fix for template tokenization, especially needed for [[Template:Cite core]].	2012-03-08 14:24:04 +00:00
Gabriel Wicke	39017dd769	Percent-encode spaces in URLs, so that they are recognized as valid URLs later on.	2012-03-08 11:53:15 +00:00
Gabriel Wicke	7518db8197	A few fixes to parser functions and template expansion. Trim whitespace off template arguments, let the last duplicate key win and fake pagenamee slightly better.	2012-03-08 11:44:37 +00:00
Gabriel Wicke	51023feaa4	Improvements for image option handling.	2012-03-08 10:03:22 +00:00
Gabriel Wicke	b1e131d568	A bit more documentation and naming cleanup in the tokenizer wrapper.	2012-03-08 09:00:45 +00:00
Gabriel Wicke	f02ff95aa3	Token representation clean-up. Now all tokens are differentiated using constructors instead of type attributes.	2012-03-07 20:06:54 +00:00
Gabriel Wicke	f157093a41	Delegate responsibility for resetting the token rank to transforms, if full re-processing in a phase is wanted. By default, after a token type change or the return of multiple tokens only the remaining transforms with higher ranks are applied. Updated a few comments as well.	2012-03-07 19:29:53 +00:00
Gabriel Wicke	1f8c43b9e2	A few minor documentation updates.	2012-03-07 18:42:26 +00:00
Gabriel Wicke	5f618103d7	Set allTokensProcessed flag for async callbacks from the template expander.	2012-03-07 17:36:33 +00:00
Gabriel Wicke	e5a1116817	Start re-transformation as soon as possible in TokenAccumulator._returnTokens to maximize IO concurrency. Signal that all tokens are fully transformed to callbacks called from TokenAccumulator._returnTokens. The result should be a single re-transformation when entering the callback chain, and only if the transform does not signal that it took care of full transformation itself. Template expansion would set this flag, as the nested transform pipeline processes all tokens to the end of phase async12.	2012-03-07 16:29:06 +00:00
Gabriel Wicke	656524dbbc	Fixes for multi-transformer expansion in AsyncTransformManager. Added argument to callback which lets transforms indicate if their returned tokens are fully processed for their phase. If not, the callback re-processes them so that any remaining transforms are applied.	2012-03-07 15:39:18 +00:00
Gabriel Wicke	af03eb4f29	Improve generic attribute expansion before external link processing, and make wgUploadPath configurable. Also change the hard-coded fall-back image sizes to sensible defaults. This breaks three parser tests until image size retrieval from the wiki is implemented.	2012-03-06 18:02:35 +00:00
Gabriel Wicke	227103e12c	Accept empty table cell attribute sections, and consider percent-encoded %2525 valid. 270 tests passing.	2012-03-06 14:32:45 +00:00
Gabriel Wicke	2efcd3cd57	Reworked percent encoding handling for URIs to get closer to the 'url construction' part of the HTML5 spec: http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#url-manipulation-and-creation Removed a few whitelisted test cases that are now passing directly. The encoding canonicalization could also be moved to the Sanitizer. Doing this early in token stream processing however has the advantage of providing further transformations uniform data to work with. We could even consider to move this even further into the tokenizer.	2012-03-06 13:49:37 +00:00
Gabriel Wicke	19fe9726a2	Fix invalid external link representation. 268 tests passing.	2012-03-05 18:06:29 +00:00
Gabriel Wicke	a9ebc1d986	Support external images wrapped in a clickable link using bracketed external link syntax. 265 tests passing.	2012-03-05 16:23:00 +00:00
Gabriel Wicke	7f7202e89c	A few improvements to external link and image handling. 264 tests passing.	2012-03-05 15:34:27 +00:00
Gabriel Wicke	7b0c807710	Change wikilink tokenization strategy to split on pipes. This makes it possible to support template / template argument expansion in image options, and causes little trouble for wikilinks. Non-image wikilinks with multiple text pipes are quite rare in the dumps, and concatenating description tokens with a plain '\|' is quite easy. 261 parser tests passing.	2012-03-05 12:00:38 +00:00
Gabriel Wicke	3e6f1b6bea	Use some options primitively.	2012-03-02 14:19:33 +00:00
Gabriel Wicke	167dbdb0fa	Parse image options.	2012-03-02 13:36:37 +00:00
Gabriel Wicke	8b7ba9051b	Add productions for image option tokenization, and prepare to call those from the LinkHandler token stream transformer.	2012-03-01 18:07:20 +00:00
Gabriel Wicke	b1a7119a46	Hack up some rudimentary image rendering. Using jshashes for the md5, and a few hard-coded image image sizes ;) 262 tests passing.	2012-03-01 13:51:53 +00:00
Gabriel Wicke	d4faf9eaf4	More work on wiki link rendering and general wiki title / namespace functionality.	2012-03-01 12:47:05 +00:00
Gabriel Wicke	4b9bd45b82	Start to move wikilink expansion to a separate async token transformer.	2012-02-29 13:56:29 +00:00
Gabriel Wicke	b8bb503199	Actually commit onlyinclude, as already announced in r112592.	2012-02-28 13:24:35 +00:00
Gabriel Wicke	3227903d48	Follow-up to r112116, accidentally committed from subdirectory.	2012-02-22 16:41:01 +00:00
Gabriel Wicke	3568dfee14	Add some support for functionhooks in test parser and parserTests.js, and tweak a few parser functions.	2012-02-22 15:59:11 +00:00
Gabriel Wicke	d7da324272	Basic fall-through support for #switch parser function	2012-02-22 14:57:50 +00:00
Gabriel Wicke	491ad5ffef	Cleanup and commenting.	2012-02-22 13:13:18 +00:00
Gabriel Wicke	9b3313d923	Speed up flatten slightly by avoiding garbage for already flat arrays. Also, use simple string concatenation instead of arrays as the strings tend to be few and short.	2012-02-22 11:25:44 +00:00
Gabriel Wicke	8dde1f77b4	Reduce debug print overhead, roughly a 10% speed-up on parserTests.	2012-02-21 18:49:43 +00:00
Gabriel Wicke	058c4213a4	Remove some more unused code and tidy up some more.	2012-02-21 18:26:40 +00:00
Gabriel Wicke	416126c041	Fix the bug in the inline_breaks replacement, and write another switch-based version, which is slightly faster and shorter. Performance is improved by about 5% for parserTests.	2012-02-21 17:57:30 +00:00
Gabriel Wicke	18a04f7581	Tidy up and comment the tokenizer a bit more. Start to move code into mediawiki.tokenizer.js module, and pass a reference to parse(). Faster inline_breaks production using a JS function which seems to be generally correct, but still breaks five tests when enabled. Seems to be some weird interaction with peg.js, possibly something to do with caching.	2012-02-21 17:21:42 +00:00
Gabriel Wicke	8718bd65bc	Add list of HTML5 and deprecated HTML3/4 elements in preparation for end-of-potential-extension rules; Support indented tag-wrapped pre blocks.	2012-02-21 14:44:56 +00:00
Gabriel Wicke	ffec77273a	Comment and minor code tweaks.	2012-02-21 11:24:20 +00:00
au	ea15bffb27	Revert "* Always sort attributes (+1 test pass)." This reverts commit 45ca281da8eef8030bdd1986418cb914fc9a717c.	2012-02-20 22:26:12 +00:00
Gabriel Wicke	5806705733	Push transformer setup a bit further into the attribute pipeline.	2012-02-20 12:56:00 +00:00
Gabriel Wicke	8eddb4ec6b	Add some comments to the Sanitizer	2012-02-20 11:14:53 +00:00
Gabriel Wicke	71e95bd54b	Set up token stream transformers from a map of phases per input content type. Not yet applied to attribute pipeline creation. 249 tests passing.	2012-02-20 11:07:21 +00:00
au	9c55f5e8b7	* Always sort attributes (+1 test pass). The performance impact for .sort is quite small (12.079s => 12.158s) and Sanitizer is probably one of the more accessible places to do this.	2012-02-18 21:01:07 +00:00
au	aa589d989b	* Rudimentary CSS validation; +4 tests pass. (Bug 2304, 3244).	2012-02-18 20:16:23 +00:00
Gabriel Wicke	4d80b8daa8	Detail comments about next steps and divide parser functions in those that need more information from the wiki and readily implementable items.	2012-02-17 10:23:14 +00:00
Gabriel Wicke	059ff94bc4	Reject match for invalid urlencoded code points.	2012-02-16 13:57:56 +00:00
Gabriel Wicke	dc1d30fcb5	Tweaked template parameters a bit further, and made the self-closing tag protection a bit less trigger-happy.	2012-02-15 15:56:11 +00:00
Gabriel Wicke	089413298c	Protect self-closing tags in generic attribute production.	2012-02-15 13:23:50 +00:00
Gabriel Wicke	5e94a238fc	Prepare for the support of tables (and later generally block-level elements) in template parameters. 244 tests passing.	2012-02-15 11:51:29 +00:00
Gabriel Wicke	774a3189c8	Improve support for generic attribute names coming from templates/templateargs.	2012-02-15 10:19:39 +00:00
Gabriel Wicke	1ce6f5a3c4	Improve support for single-line attributes with preprocessor support. 243 tests passing.	2012-02-14 21:25:52 +00:00
Gabriel Wicke	f02b3d91c6	Port urlencoded char support to preprocessor-supporting link target production, and remove old link_target production.	2012-02-14 21:08:25 +00:00
Gabriel Wicke	001194b140	Replace console.log with console.warn in all debug statements	2012-02-14 20:56:14 +00:00
Gabriel Wicke	f42b379e52	Fix named wikilink options (image options really) in template arguments, and speed up template parameter parsing by eliminating some backtracking. 238 tests passing (unchanged).	2012-02-14 15:45:18 +00:00
Gabriel Wicke	64f63b3714	request is automatically installed by jsdom. Follow-up to r111459. Thanks Hashar!	2012-02-14 14:15:50 +00:00
Gabriel Wicke	466e8e54ad	Tweak comment about request module	2012-02-14 14:01:13 +00:00
Gabriel Wicke	0b8d1b0387	* Add custom toString methods for tokens to aid debugging * Convert all attributes into strings in Sanitizer * Use strict comparison against empty string in tokenizer * Add very simple sitename parserfunction * 138 tests passing	2012-02-13 17:02:23 +00:00
Gabriel Wicke	9945175416	Reformat Date.replaceChars	2012-02-13 14:23:48 +00:00
Gabriel Wicke	0b40741e1c	Strip trailing newlines from included templates	2012-02-13 14:17:03 +00:00
Gabriel Wicke	025f9cddb3	Prefix all internal data- attributes with data-mw- and adjust the whitelist and test output normalization accordingly. 235 tests passing.	2012-02-13 13:54:07 +00:00
Gabriel Wicke	b1617b1d71	Add some support for ideographic spaces in external links, support the int: namespace alias and perform some normalization on the MediaWiki namespace prefix.	2012-02-13 13:35:46 +00:00
Gabriel Wicke	55ddb4fd66	Remove WikiDom default serialization and --html argument from parse.js wrapper. HTML ist now the only supported format. The DOMConverter is now no longer used. Roan, feel free to remove / butcher it for direct HTML to linear model conversion.	2012-02-11 17:59:17 +00:00
Gabriel Wicke	a122e51eec	Move data-* annotations into separate object on tokens, that is then serialized into a single data-mw-rt attribute if present. Update parserTests to ignore this attribute for comparisons with expected parser output. A few more tweaks and notes are thrown into this commit too. 233 tests are passing now.	2012-02-11 16:43:25 +00:00
Gabriel Wicke	aff30be131	Some comments and reshuffling in the grammar, and a typo in the AttributeExpander.	2012-02-09 22:27:45 +00:00
Gabriel Wicke	6e33255503	Improve support for preprocessor functionality in attributes; Support multi-line xmlish tags with preprocessor stuff in attributes.	2012-02-09 16:36:29 +00:00
Gabriel Wicke	16ded7d955	Fix a bug in wikilink with trail tokenization.	2012-02-09 14:06:35 +00:00
Gabriel Wicke	6983481561	Move attribute expansion back to separate handler, as this makes it easier to only expand used branches selected by parser functions. Template (and -argument) expansion is simply registered before general expansion. Additionally, a few more simple time-based magic words are added in ParserFunctions.	2012-02-09 13:44:20 +00:00
Gabriel Wicke	3f7c1499cd	Enable support for general preprocessor functionality in attribute keys and values. This includes comments, templates and template arguments. This also replaces the specialized expansion logic in the TemplateHandler. The removal of link validation lets one more parser test fail for now. External link target validation will need to be implemented in the token stream handler for links. This is noted as TODO in https://www.mediawiki.org/wiki/Future/Parser_development#Token_stream_transforms.	2012-02-08 15:10:30 +00:00
Gabriel Wicke	157c495a9e	Normalize the title in localurl. 232 tests passing.	2012-02-07 12:26:00 +00:00
Gabriel Wicke	b4892102a4	Clean up transform callback interface	2012-02-07 11:53:29 +00:00
Gabriel Wicke	1f6db903e9	Pluck a few low-hanging fruit in external link tokenization, and add a simple localurl parser function implementation. 230 parser tests now passing.	2012-02-07 10:28:23 +00:00
Gabriel Wicke	cf8b7bf45d	External links don't nest.	2012-02-07 09:38:28 +00:00
Gabriel Wicke	53bf4f2bd0	Temporarily disable the sanitizer and start to support preprocessor functionality (comments, templates, template arguments) in arbitrary attributes. The grammar for this is still quite rough, will need to consolidate that area.	2012-02-06 19:15:44 +00:00
Gabriel Wicke	c26243989e	Improve toJSON handlers to include all properties	2012-02-06 19:12:29 +00:00
Gabriel Wicke	0bea9fdfbb	Fix nowiki tokenization regression introduced r110495	2012-02-03 13:10:04 +00:00
Gabriel Wicke	26f2026cff	Add custom JSON serializers for tokens that include a type attribute	2012-02-03 13:09:01 +00:00
Gabriel Wicke	8c75aa1a7a	Remove type attribute for tag tokens.	2012-02-01 18:37:48 +00:00
Gabriel Wicke	689f697a93	Push token format conversion a bit further along, and add defines that were missing in last commit.	2012-02-01 17:03:08 +00:00
Gabriel Wicke	a5cc10a06b	Change token format to plain strings for text tokens, and specific objects for other tokens. This is only the first half of the conversion. The next step is to drop the type attribute on most tokens and match on the constructor in the token transform machinery.	2012-02-01 16:30:43 +00:00
Gabriel Wicke	dd3707ded5	Remove some modules normally bundled with node.js from dependencies, and remove some older ones that are only used in currently-dead code.	2012-02-01 10:32:33 +00:00
Gabriel Wicke	e65c6502c0	Add source for #time implementation in comment	2012-02-01 10:14:01 +00:00
Gabriel Wicke	14a8a13678	A few more debug helpers including a --trace mode for light debugging. Some improvements to parser functions on the way to support the cite extensions. Preparation for generic template and template arg in attribute support. 222 parser tests now passing.	2012-01-31 16:50:16 +00:00
Neil Kandalgaonkar	2688f823ef	added dependencies to README	2012-01-31 00:56:07 +00:00
Neil Kandalgaonkar	f0b934ef2e	first pass at an API method that returns wikidom. Shells out to node. Some issues with XML API result formatting but works fine in JSON	2012-01-31 00:02:48 +00:00
Gabriel Wicke	7cd94df47d	A few minor tweaks to reduce memory usage	2012-01-27 13:32:44 +00:00
Gabriel Wicke	4e6a54560a	* Emit token chunks for top-level block elements by patching the source of the tokenizer * Fix a bug uncovered by this * Increase the number of outstanding listeners on a single download to 10000	2012-01-22 23:21:53 +00:00
Gabriel Wicke	7ea4d7d3db	A few parser function fixes and maximum template expansion in environment config.	2012-01-22 19:32:28 +00:00
Gabriel Wicke	561cf3c237	Bug fixes and a first stab at a #time parser function. You can expand the main page like this: cd extensions/VisualEditor/modules/parser echo '{{:Main Page}}' \| node parse.js echo '{{:Main Page}}' \| node parse.js --html echo '{{:Main Page}}' \| node parse.js --debug Even the date-based includes work somewhat, although they don't yet accept passed-in dates.	2012-01-22 07:07:16 +00:00
Gabriel Wicke	60e45bb739	A bit of template expansion bug fixing and parser function documentation	2012-01-22 01:27:22 +00:00
Gabriel Wicke	e8a7034acf	Add some commandline switches to parse.js. Supports switching on/off debug mode and a selection of html/WikiDom serialization.	2012-01-21 22:42:54 +00:00
Gabriel Wicke	785a4af76f	Implement a few parser functions. 220 parser tests now passing.	2012-01-21 20:38:13 +00:00
Gabriel Wicke	1a6546fbca	Support empty template arguments and default values in arg expansion	2012-01-21 03:03:33 +00:00
Gabriel Wicke	fdd048b3b2	Remove a few stray debug prints and disable debugging in parse.js	2012-01-20 22:21:33 +00:00
Gabriel Wicke	145df2655c	* NoInclude and IncludeOnly improvements * Tokenizer support for templates and template args in template arguments and titles * Async attribute expansion fixes	2012-01-20 22:02:23 +00:00
Gabriel Wicke	348cac6cf0	Fix a bug in TokenCollector, and misc tweaks for template expansions.	2012-01-20 18:47:17 +00:00
Gabriel Wicke	7cc8e69147	Collapse all requests per template into a single outstanding request using an event-emitting TemplateRequest object and a request queue.	2012-01-20 02:36:18 +00:00
Gabriel Wicke	fc2088bb21	Add some rudimentary noinclude / includeonly support and fix up TokenCollector.	2012-01-20 01:46:16 +00:00
Gabriel Wicke	c15e0d4167	Minor cleanup in TemplateHandler	2012-01-20 00:49:27 +00:00
Gabriel Wicke	d0ece16c86	Fix async template expansion, so we can now render simple pages with templates directly to WikiDom from enwiki using a commandline like this: echo '{{User:GWicke/Test}}' \| node parse.js Wohoo! Complex pages with templates won't render properly yet, as noinclude / includeonly and parser functions are not yet implemented. As a result, the parser will run out of memory or hit the currently low expansion depth limit as it tries to expand documentation for all templates.	2012-01-19 23:43:39 +00:00
Gabriel Wicke	2233d0a488	Eventify parser tests and parse.js commandline wrapper to actuallly allow async template fetching. Async expansion is not yet fully debugged, but at least the preconditions for that are now there.	2012-01-18 23:46:01 +00:00
Gabriel Wicke	5b8054636e	Make template fetching somewhat functional on node with Inez' help, but disable it by default in parserTests as it tries to fetch all sorts of parser functions and is not yet fully supported in parserTests. The next step will be to build a list of parser functions (to avoid fetching them as templates) and pushing the event interface into parserTests.	2012-01-18 19:38:32 +00:00
Gabriel Wicke	4bd4307924	Fix comment to reflect the actual regexp/spec in the JS version as well.	2012-01-18 19:35:13 +00:00
Gabriel Wicke	14e6728cc4	Add the start of a minimal sanitizer stage, that only strips IDN ignored characters from host portions of links hrefs for now. This module needs to be filled up with pretty much everything Sanitizer.php does, including tag and attribute whitelists and attribute value sanitation (especially for style attributes). We'll also need to think about round-tripping of sanitized tokens.	2012-01-18 01:42:56 +00:00
Gabriel Wicke	336be4f617	Eat '[[[' as plain text token, makes it 212 passing.	2012-01-18 00:23:17 +00:00
Gabriel Wicke	178adbc342	Accept IPv6 (and IPv4) addresses in the tokenizer, so another test passes.	2012-01-18 00:00:47 +00:00
Gabriel Wicke	e7381da5b8	Trim whitespace off template titles and argument names. 209 parser tests now passing.	2012-01-17 23:18:33 +00:00
Gabriel Wicke	f50fecf1e3	Fix template argument expansion. 200 parser tests now passing.	2012-01-17 22:29:26 +00:00
Gabriel Wicke	34025251a3	Clean up 'END' token handling a bit.	2012-01-17 20:01:21 +00:00
Gabriel Wicke	7f579398c7	Use isBlockTag in DOMPostProcessor	2012-01-17 18:30:22 +00:00
Gabriel Wicke	6bd7ca1e75	Misc improvements, now 196 parser tests passing. * Add handler for post-expand paragraph wrapping on token stream, to handle things like comments on its own line post-expand * Add general Util module * Fix self-closing tag handling in HTML5 tree builder	2012-01-17 18:22:10 +00:00
Gabriel Wicke	f4081bef08	First template expansion tests start working, and a bug fix in DOMPostProcessor paragraph wrapper. 187 parser tests now passing.	2012-01-14 00:58:20 +00:00
Gabriel Wicke	196d704e8e	Template expansion now enabled and somewhat working, but template fetching still fails all the time.	2012-01-13 18:48:25 +00:00
Gabriel Wicke	32c9bccd7c	Results of early template expansion debugging. Still disabled by default, but getting closer.	2012-01-11 19:48:49 +00:00
Gabriel Wicke	6b6ec2933d	More work towards template expansion. * Created AttributeTokenTransformManager for generic attribute conversion, and removed { title, template argument {key, value} } expansion from TemplateHandler. * Added caching for attribute and input sub-pipelines. Especially attribute pipelines would otherwise be recreated for each attribute value and key.	2012-01-11 00:05:51 +00:00
Gabriel Wicke	5ec30252f1	More token transform and pipeline setup refactoring to support template expansion better.	2012-01-10 01:09:50 +00:00
Gabriel Wicke	287604c422	A bit of cleanup in ParserPipeline, with better and more consistent support for multiple input types.	2012-01-09 19:33:49 +00:00
Gabriel Wicke	becf3cb7ea	Add generic 'collect all tokens between delimiter tokens and call a transform function on it' util for synchronous transformation phases. This can be used to implement parser hooks (aka extension tags) besides other things.	2012-01-09 18:13:45 +00:00
Gabriel Wicke	e99d7a2a55	Two batteries worth of token transform manager refactoring. * TokenTransformDispatcher is now renamed to TokenTransformManager, and is also turned into a base class * SyncTokenTransformManager and AsyncTokenTransformManager subclass TokenTransformManager and implement synchronous (phase 1,3) and asynchronous (phase 2) transformation stages. * Communication between stages uses the same chunk / end events as all the other token stages. * The AsyncTokenTransformManager now supports the creation of nested AsyncTokenTransformManagers for template expansion. The AsyncTokenTransformManager object takes on the responsibilities of a preprocessor frame. Transforms are newly created (or potentially resurrected from a cache), so that transforms do not have to worry about concurrency. * The environment is pushed through to all transform managers and the individual transforms.	2012-01-09 17:49:16 +00:00
Gabriel Wicke	6601c544e6	Handle default for template arg expansion, add template fetch functionality and tweak a few minor things in the grammar and QuoteTransformer.	2012-01-06 17:19:14 +00:00
Gabriel Wicke	f0c844f28f	Add template expansion handler skeleton, not yet functional. Also note improvements needed in the tokenizer template handling.	2012-01-06 14:30:55 +00:00
Gabriel Wicke	2e35171fd1	Fix quote handling and tweak the whitelist a bit. 'any' token registrations are now merged with specific registrations by rank. Not yet clear if that is a good idea overall, need to check use cases when implementing template expansion and other functionality. 183 parser test now passing.	2012-01-04 14:09:05 +00:00
Gabriel Wicke	6cd95fea37	Fix up constructors in EventEmitter inheritance and tweak a few more comments.	2012-01-04 12:28:41 +00:00
Gabriel Wicke	e3ae9a702b	Fix JSHint warnings (mostly about comment indentation) from r108012.	2012-01-04 11:06:24 +00:00
Gabriel Wicke	4c4a24f0a0	Hook up the DOMPostProcessor using events as well, and rename the subscription methods to tell a story. Also document idea on how to dynamically configure the pipeline depending on event registrations in comment.	2012-01-04 11:00:54 +00:00
Gabriel Wicke	f0399d2ec5	Clean up comments in TokenTransformDispatcher and mark private methods with underscore.	2012-01-04 09:48:24 +00:00
Gabriel Wicke	ee79158e53	Add trailing newline in commandline parser wrapper	2012-01-04 08:42:53 +00:00
Gabriel Wicke	29362cc53c	Rename ParseThingy to ParserPipeline and fix up broken WikiDom generation and commandline runner.	2012-01-04 08:39:45 +00:00
Gabriel Wicke	bd98eb4c5a	Land big TokenTransformDispatcher and eventization refactoring. The TokenTransformDispatcher now actually implements an asynchronous, phased token transformation framework as described in https://www.mediawiki.org/wiki/Future/Parser_development/Token_stream_transformations. Additionally, the parser pipeline is now mostly held together using events. The tokenizer still emits a lame single events with all tokens, as block-level emission failed with scoping issues specific to the PEGJS parser generator. All stages clean up when receiving the end tokens, so that the full pipeline can be used for repeated parsing. The QuoteTransformer is not yet 100% fixed to work with the new interface, and the Cite extension is disabled for now pending adaptation. Bold-italic related tests are failing currently.	2012-01-03 18:44:31 +00:00
Neil Kandalgaonkar	20374b5911	fix substr for IE, followup r107464	2011-12-30 21:51:03 +00:00
Gabriel Wicke	8e00a72d0a	Improvements to link trail handling, and two tweaks to the whitelist. 182 tests now passing. Link trails depend on language-dependent positive character classes in the PHP parser. These classes all seem to disallow punctuation implicitly and list differing plain text characters instead, so it might be possible to get away with identifying a common class of non-trail punctuation instead. This would help to keep the tokenizer independent of configurations, which is very desirable for caching and simplified external parsing.	2011-12-30 12:47:06 +00:00
Gabriel Wicke	11ece76b7b	Fix suffix handling for wiki links.	2011-12-30 09:35:57 +00:00
Gabriel Wicke	b3a0270d69	Remove env and load grammar in tokenizer constructor. Re-add property hack to keep parserTests running for now. Really need a different pipeline for html serialization or a reference to the HTML DOM.	2011-12-28 17:04:16 +00:00
Gabriel Wicke	3a63fb118e	Add a few comments inline, and remove unneeded html serialization as we are only interested in WikiDom output in this parser wrapper.	2011-12-28 13:46:52 +00:00
Neil Kandalgaonkar	8fbf36e63e	put add terminal token inside tokenize method (will pull it out again for streaming interface)	2011-12-28 01:37:15 +00:00
Neil Kandalgaonkar	6103646ec8	remove need to add newline at end of input	2011-12-28 01:37:11 +00:00
Neil Kandalgaonkar	4158f82d7e	refactor parser to ParseThingy in different module, can be invoked with command line utility parse.js	2011-12-28 01:37:06 +00:00
Neil Kandalgaonkar	d91a67ba99	nodeName not defined	2011-12-28 01:36:54 +00:00
Neil Kandalgaonkar	962d1262fc	create tokenizer without need to modify namespace with PEG source	2011-12-28 01:36:36 +00:00
Gabriel Wicke	33e60dd4d9	Update comments a bit.	2011-12-22 12:37:24 +00:00
Gabriel Wicke	9ee0e660ec	Fix regression introduced by r107060 for regular table cells. Good to have a test suite ;)	2011-12-22 12:09:25 +00:00
Gabriel Wicke	a94d0ec10c	Re-add support for row-only tables.	2011-12-22 11:58:32 +00:00
Gabriel Wicke	1c7fe0eb34	Refactor table productions to support table fragments in templates (table start / row / table end). The old productions are not deleted yet to make it easy to compare the output on more complex articles. 181 tests passing after adding two table tests with whitespace-only differences to the whitelist.	2011-12-22 11:43:55 +00:00
Gabriel Wicke	2845ba9552	Handle noinclude and includeonly at start of line, so that syntax after it still matches as if it actually was preceded by a newline.	2011-12-21 11:38:50 +00:00
Gabriel Wicke	3a631db6d9	Fix ranges for annotations in implicit paragraphs within branch nodes.	2011-12-16 19:36:04 +00:00
Gabriel Wicke	cc06551f2e	Rename table_header production to table_heading. Those non-natives strike again.	2011-12-16 19:24:59 +00:00
Gabriel Wicke	605ed23fd2	Fix attributes in table headings.	2011-12-16 19:22:13 +00:00
Gabriel Wicke	08255ff3e6	Small bug fix to heading level, spotted by Mike from localwiki- thanks!	2011-12-15 23:59:35 +00:00
Gabriel Wicke	a04744b2ec	Add some more attribute remapping capabilities to the DOMConverter, and clean up some grammar formatting.	2011-12-15 17:33:07 +00:00
Gabriel Wicke	e98dd9e722	Implement 1-char-minimum width for annotations, and some additonal minor cleanup.	2011-12-15 11:05:52 +00:00
Gabriel Wicke	22ba27295b	Clean up the DOMConverter a bit.	2011-12-15 10:55:30 +00:00
Gabriel Wicke	e72dee76e4	Follow-up to r106208 and r106207. Both good catches, thanks Yair! As this code is in its early stages and nowhere near deployment, please Be Bold and just commit things like this directly! IMHO it makes more sense to fully review this once it settles down a bit.	2011-12-15 10:13:50 +00:00
Gabriel Wicke	3585bd9c8e	Accept row-only tables. The parser now eats [[en:Barack Obama]] as-is. Hooray!	2011-12-15 00:39:28 +00:00
Gabriel Wicke	6df94a34a1	Less lust for urls	2011-12-15 00:26:22 +00:00
Gabriel Wicke	ce2ee067f7	Minor tweak to wiki link production	2011-12-15 00:12:58 +00:00
Gabriel Wicke	377226a120	Comment out a stray console.log	2011-12-14 23:44:58 +00:00
Gabriel Wicke	574abd9774	A collection of small bug fixes to the grammar, Cite, the Token format converter and the HTML DOM -> WikiDom converter. The tokenizer now digests all parserTests.	2011-12-14 23:38:46 +00:00
Gabriel Wicke	dc77d73ad5	Add ability to pass through JSON data to WikiDom in data-json-* attributes, and fix parser to actually parse the Barack Obama article except for one table with nested templates at the start-of-line.	2011-12-14 17:25:09 +00:00
Gabriel Wicke	f6e4267fca	Handle a few more element types, and reset offset for each leaf node. Not sure if the latter is correct, as the documentation at https://www.mediawiki.org/wiki/Visual_editor/Software_design#Data_Structures and the actual sample WikiDom in the editor sandbox seem to disagree on this point.	2011-12-14 16:22:27 +00:00
Gabriel Wicke	6676a47008	Add implicit level attribute to WikiDom headings.	2011-12-14 15:55:58 +00:00
Gabriel Wicke	3018ca690b	Improve WikiDom conversion: Handle text and annotations in branch nodes as paragraphs and treat list items as branches.	2011-12-14 15:40:40 +00:00
Gabriel Wicke	a09aa4d599	Add rough HTML DOM to WikiDom conversion. You can see serialized WikiDom of parser tests using 'node parserTests.js --wikidom'.	2011-12-14 15:15:41 +00:00
Gabriel Wicke	5f80d30428	Clean up access to document and body after building the tree.	2011-12-14 09:40:49 +00:00
Gabriel Wicke	30749b8d8d	Update comments a bit and add a note on things to improve in API.	2011-12-14 09:33:25 +00:00
Gabriel Wicke	55ff272847	Comment TokenTransformDispatcher.	2011-12-13 20:13:09 +00:00
Gabriel Wicke	44deefe303	Minor tweak to comment.	2011-12-13 18:55:44 +00:00
Gabriel Wicke	c61b32eaa7	Clean up and comment the Cite extension a bit.	2011-12-13 18:45:09 +00:00

... 2 3 4 5 6 ...

425 commits