Gabriel Wicke
3568dfee14
Add some support for functionhooks in test parser and parserTests.js, and
...
tweak a few parser functions.
2012-02-22 15:59:11 +00:00
Gabriel Wicke
d7da324272
Basic fall-through support for #switch parser function
2012-02-22 14:57:50 +00:00
Gabriel Wicke
491ad5ffef
Cleanup and commenting.
2012-02-22 13:13:18 +00:00
Gabriel Wicke
9b3313d923
Speed up flatten slightly by avoiding garbage for already flat arrays. Also,
...
use simple string concatenation instead of arrays as the strings tend to be
few and short.
2012-02-22 11:25:44 +00:00
Gabriel Wicke
8dde1f77b4
Reduce debug print overhead, roughly a 10% speed-up on parserTests.
2012-02-21 18:49:43 +00:00
Gabriel Wicke
058c4213a4
Remove some more unused code and tidy up some more.
2012-02-21 18:26:40 +00:00
Gabriel Wicke
416126c041
Fix the bug in the inline_breaks replacement, and write another switch-based
...
version, which is slightly faster and shorter. Performance is improved by
about 5% for parserTests.
2012-02-21 17:57:30 +00:00
Gabriel Wicke
18a04f7581
Tidy up and comment the tokenizer a bit more. Start to move code into
...
mediawiki.tokenizer.js module, and pass a reference to parse(). Faster
inline_breaks production using a JS function which seems to be generally
correct, but still breaks five tests when enabled. Seems to be some weird
interaction with peg.js, possibly something to do with caching.
2012-02-21 17:21:42 +00:00
Gabriel Wicke
8718bd65bc
Add list of HTML5 and deprecated HTML3/4 elements in preparation for
...
end-of-potential-extension rules; Support indented tag-wrapped pre blocks.
2012-02-21 14:44:56 +00:00
Gabriel Wicke
ffec77273a
Comment and minor code tweaks.
2012-02-21 11:24:20 +00:00
au
ea15bffb27
Revert "* Always sort attributes (+1 test pass)."
...
This reverts commit 45ca281da8eef8030bdd1986418cb914fc9a717c.
2012-02-20 22:26:12 +00:00
Gabriel Wicke
5806705733
Push transformer setup a bit further into the attribute pipeline.
2012-02-20 12:56:00 +00:00
Gabriel Wicke
8eddb4ec6b
Add some comments to the Sanitizer
2012-02-20 11:14:53 +00:00
Gabriel Wicke
71e95bd54b
Set up token stream transformers from a map of phases per input content type.
...
Not yet applied to attribute pipeline creation. 249 tests passing.
2012-02-20 11:07:21 +00:00
au
9c55f5e8b7
* Always sort attributes (+1 test pass).
...
The performance impact for .sort is quite small (12.079s => 12.158s)
and Sanitizer is probably one of the more accessible places to do this.
2012-02-18 21:01:07 +00:00
au
aa589d989b
* Rudimentary CSS validation; +4 tests pass. (Bug 2304, 3244).
2012-02-18 20:16:23 +00:00
Gabriel Wicke
4d80b8daa8
Detail comments about next steps and divide parser functions in those that
...
need more information from the wiki and readily implementable items.
2012-02-17 10:23:14 +00:00
Gabriel Wicke
059ff94bc4
Reject match for invalid urlencoded code points.
2012-02-16 13:57:56 +00:00
Gabriel Wicke
dc1d30fcb5
Tweaked template parameters a bit further, and made the self-closing tag
...
protection a bit less trigger-happy.
2012-02-15 15:56:11 +00:00
Gabriel Wicke
089413298c
Protect self-closing tags in generic attribute production.
2012-02-15 13:23:50 +00:00
Gabriel Wicke
5e94a238fc
Prepare for the support of tables (and later generally block-level elements)
...
in template parameters. 244 tests passing.
2012-02-15 11:51:29 +00:00
Gabriel Wicke
774a3189c8
Improve support for generic attribute names coming from
...
templates/templateargs.
2012-02-15 10:19:39 +00:00
Gabriel Wicke
1ce6f5a3c4
Improve support for single-line attributes with preprocessor support. 243
...
tests passing.
2012-02-14 21:25:52 +00:00
Gabriel Wicke
f02b3d91c6
Port urlencoded char support to preprocessor-supporting link target
...
production, and remove old link_target production.
2012-02-14 21:08:25 +00:00
Gabriel Wicke
001194b140
Replace console.log with console.warn in all debug statements
2012-02-14 20:56:14 +00:00
Gabriel Wicke
f42b379e52
Fix named wikilink options (image options really) in template arguments, and
...
speed up template parameter parsing by eliminating some backtracking. 238
tests passing (unchanged).
2012-02-14 15:45:18 +00:00
Gabriel Wicke
64f63b3714
request is automatically installed by jsdom. Follow-up to r111459. Thanks
...
Hashar!
2012-02-14 14:15:50 +00:00
Gabriel Wicke
466e8e54ad
Tweak comment about request module
2012-02-14 14:01:13 +00:00
Gabriel Wicke
0b8d1b0387
* Add custom toString methods for tokens to aid debugging
...
* Convert all attributes into strings in Sanitizer
* Use strict comparison against empty string in tokenizer
* Add very simple sitename parserfunction
* 138 tests passing
2012-02-13 17:02:23 +00:00
Gabriel Wicke
9945175416
Reformat Date.replaceChars
2012-02-13 14:23:48 +00:00
Gabriel Wicke
0b40741e1c
Strip trailing newlines from included templates
2012-02-13 14:17:03 +00:00
Gabriel Wicke
025f9cddb3
Prefix all internal data- attributes with data-mw- and adjust the whitelist
...
and test output normalization accordingly. 235 tests passing.
2012-02-13 13:54:07 +00:00
Gabriel Wicke
b1617b1d71
Add some support for ideographic spaces in external links, support the
...
int: namespace alias and perform some normalization on the MediaWiki namespace
prefix.
2012-02-13 13:35:46 +00:00
Gabriel Wicke
55ddb4fd66
Remove WikiDom default serialization and --html argument from parse.js
...
wrapper. HTML ist now the only supported format. The DOMConverter is now no
longer used. Roan, feel free to remove / butcher it for direct HTML to linear
model conversion.
2012-02-11 17:59:17 +00:00
Gabriel Wicke
a122e51eec
Move data-* annotations into separate object on tokens, that is then
...
serialized into a single data-mw-rt attribute if present. Update parserTests
to ignore this attribute for comparisons with expected parser output.
A few more tweaks and notes are thrown into this commit too. 233 tests are
passing now.
2012-02-11 16:43:25 +00:00
Gabriel Wicke
aff30be131
Some comments and reshuffling in the grammar, and a typo in the
...
AttributeExpander.
2012-02-09 22:27:45 +00:00
Gabriel Wicke
6e33255503
Improve support for preprocessor functionality in attributes; Support
...
multi-line xmlish tags with preprocessor stuff in attributes.
2012-02-09 16:36:29 +00:00
Gabriel Wicke
16ded7d955
Fix a bug in wikilink with trail tokenization.
2012-02-09 14:06:35 +00:00
Gabriel Wicke
6983481561
Move attribute expansion back to separate handler, as this makes it easier to
...
only expand used branches selected by parser functions. Template (and
-argument) expansion is simply registered before general expansion.
Additionally, a few more simple time-based magic words are added in
ParserFunctions.
2012-02-09 13:44:20 +00:00
Gabriel Wicke
3f7c1499cd
Enable support for general preprocessor functionality in attribute keys and
...
values. This includes comments, templates and template arguments.
This also replaces the specialized expansion logic in the TemplateHandler. The
removal of link validation lets one more parser test fail for now. External
link target validation will need to be implemented in the token stream handler
for links. This is noted as TODO in
https://www.mediawiki.org/wiki/Future/Parser_development#Token_stream_transforms .
2012-02-08 15:10:30 +00:00
Gabriel Wicke
157c495a9e
Normalize the title in localurl. 232 tests passing.
2012-02-07 12:26:00 +00:00
Gabriel Wicke
b4892102a4
Clean up transform callback interface
2012-02-07 11:53:29 +00:00
Gabriel Wicke
1f6db903e9
Pluck a few low-hanging fruit in external link tokenization, and add a simple
...
localurl parser function implementation. 230 parser tests now passing.
2012-02-07 10:28:23 +00:00
Gabriel Wicke
cf8b7bf45d
External links don't nest.
2012-02-07 09:38:28 +00:00
Gabriel Wicke
53bf4f2bd0
Temporarily disable the sanitizer and start to support preprocessor
...
functionality (comments, templates, template arguments) in arbitrary
attributes. The grammar for this is still quite rough, will need to
consolidate that area.
2012-02-06 19:15:44 +00:00
Gabriel Wicke
c26243989e
Improve toJSON handlers to include all properties
2012-02-06 19:12:29 +00:00
Gabriel Wicke
0bea9fdfbb
Fix nowiki tokenization regression introduced r110495
2012-02-03 13:10:04 +00:00
Gabriel Wicke
26f2026cff
Add custom JSON serializers for tokens that include a type attribute
2012-02-03 13:09:01 +00:00
Gabriel Wicke
8c75aa1a7a
Remove type attribute for tag tokens.
2012-02-01 18:37:48 +00:00
Gabriel Wicke
689f697a93
Push token format conversion a bit further along, and add defines that were
...
missing in last commit.
2012-02-01 17:03:08 +00:00
Gabriel Wicke
a5cc10a06b
Change token format to plain strings for text tokens, and specific objects for
...
other tokens. This is only the first half of the conversion. The next step is
to drop the type attribute on most tokens and match on the constructor in the
token transform machinery.
2012-02-01 16:30:43 +00:00
Gabriel Wicke
dd3707ded5
Remove some modules normally bundled with node.js from dependencies, and
...
remove some older ones that are only used in currently-dead code.
2012-02-01 10:32:33 +00:00
Gabriel Wicke
e65c6502c0
Add source for #time implementation in comment
2012-02-01 10:14:01 +00:00
Gabriel Wicke
14a8a13678
A few more debug helpers including a --trace mode for light debugging. Some
...
improvements to parser functions on the way to support the cite extensions.
Preparation for generic template and template arg in attribute support. 222
parser tests now passing.
2012-01-31 16:50:16 +00:00
Neil Kandalgaonkar
2688f823ef
added dependencies to README
2012-01-31 00:56:07 +00:00
Neil Kandalgaonkar
f0b934ef2e
first pass at an API method that returns wikidom. Shells out to node. Some issues with XML API result formatting but works fine in JSON
2012-01-31 00:02:48 +00:00
Gabriel Wicke
7cd94df47d
A few minor tweaks to reduce memory usage
2012-01-27 13:32:44 +00:00
Gabriel Wicke
4e6a54560a
* Emit token chunks for top-level block elements by patching the source of the
...
tokenizer
* Fix a bug uncovered by this
* Increase the number of outstanding listeners on a single download to 10000
2012-01-22 23:21:53 +00:00
Gabriel Wicke
7ea4d7d3db
A few parser function fixes and maximum template expansion in environment
...
config.
2012-01-22 19:32:28 +00:00
Gabriel Wicke
561cf3c237
Bug fixes and a first stab at a #time parser function. You can expand the main
...
page like this:
cd extensions/VisualEditor/modules/parser
echo '{{:Main Page}}' | node parse.js
echo '{{:Main Page}}' | node parse.js --html
echo '{{:Main Page}}' | node parse.js --debug
Even the date-based includes work somewhat, although they don't yet accept
passed-in dates.
2012-01-22 07:07:16 +00:00
Gabriel Wicke
60e45bb739
A bit of template expansion bug fixing and parser function documentation
2012-01-22 01:27:22 +00:00
Gabriel Wicke
e8a7034acf
Add some commandline switches to parse.js. Supports switching on/off debug
...
mode and a selection of html/WikiDom serialization.
2012-01-21 22:42:54 +00:00
Gabriel Wicke
785a4af76f
Implement a few parser functions. 220 parser tests now passing.
2012-01-21 20:38:13 +00:00
Gabriel Wicke
1a6546fbca
Support empty template arguments and default values in arg expansion
2012-01-21 03:03:33 +00:00
Gabriel Wicke
fdd048b3b2
Remove a few stray debug prints and disable debugging in parse.js
2012-01-20 22:21:33 +00:00
Gabriel Wicke
145df2655c
* NoInclude and IncludeOnly improvements
...
* Tokenizer support for templates and template args in template arguments and titles
* Async attribute expansion fixes
2012-01-20 22:02:23 +00:00
Gabriel Wicke
348cac6cf0
Fix a bug in TokenCollector, and misc tweaks for template expansions.
2012-01-20 18:47:17 +00:00
Gabriel Wicke
7cc8e69147
Collapse all requests per template into a single outstanding request using an
...
event-emitting TemplateRequest object and a request queue.
2012-01-20 02:36:18 +00:00
Gabriel Wicke
fc2088bb21
Add some rudimentary noinclude / includeonly support and fix up
...
TokenCollector.
2012-01-20 01:46:16 +00:00
Gabriel Wicke
c15e0d4167
Minor cleanup in TemplateHandler
2012-01-20 00:49:27 +00:00
Gabriel Wicke
d0ece16c86
Fix async template expansion, so we can now render simple pages with templates
...
directly to WikiDom from enwiki using a commandline like this:
echo '{{User:GWicke/Test}}' | node parse.js
Wohoo!
Complex pages with templates won't render properly yet, as noinclude /
includeonly and parser functions are not yet implemented. As a result, the
parser will run out of memory or hit the currently low expansion depth limit
as it tries to expand documentation for all templates.
2012-01-19 23:43:39 +00:00
Gabriel Wicke
2233d0a488
Eventify parser tests and parse.js commandline wrapper to actuallly allow
...
async template fetching. Async expansion is not yet fully debugged, but at
least the preconditions for that are now there.
2012-01-18 23:46:01 +00:00
Gabriel Wicke
5b8054636e
Make template fetching somewhat functional on node with Inez' help, but
...
disable it by default in parserTests as it tries to fetch all sorts of parser
functions and is not yet fully supported in parserTests. The next step will be
to build a list of parser functions (to avoid fetching them as templates) and
pushing the event interface into parserTests.
2012-01-18 19:38:32 +00:00
Gabriel Wicke
4bd4307924
Fix comment to reflect the actual regexp/spec in the JS version as well.
2012-01-18 19:35:13 +00:00
Gabriel Wicke
14e6728cc4
Add the start of a minimal sanitizer stage, that only strips IDN ignored
...
characters from host portions of links hrefs for now. This module needs to be
filled up with pretty much everything Sanitizer.php does, including tag and
attribute whitelists and attribute value sanitation (especially for style
attributes).
We'll also need to think about round-tripping of sanitized tokens.
2012-01-18 01:42:56 +00:00
Gabriel Wicke
336be4f617
Eat '[[[' as plain text token, makes it 212 passing.
2012-01-18 00:23:17 +00:00
Gabriel Wicke
178adbc342
Accept IPv6 (and IPv4) addresses in the tokenizer, so another test passes.
2012-01-18 00:00:47 +00:00
Gabriel Wicke
e7381da5b8
Trim whitespace off template titles and argument names. 209 parser tests now
...
passing.
2012-01-17 23:18:33 +00:00
Gabriel Wicke
f50fecf1e3
Fix template argument expansion. 200 parser tests now passing.
2012-01-17 22:29:26 +00:00
Gabriel Wicke
34025251a3
Clean up 'END' token handling a bit.
2012-01-17 20:01:21 +00:00
Gabriel Wicke
7f579398c7
Use isBlockTag in DOMPostProcessor
2012-01-17 18:30:22 +00:00
Gabriel Wicke
6bd7ca1e75
Misc improvements, now 196 parser tests passing.
...
* Add handler for post-expand paragraph wrapping on token stream, to handle
things like comments on its own line post-expand
* Add general Util module
* Fix self-closing tag handling in HTML5 tree builder
2012-01-17 18:22:10 +00:00
Gabriel Wicke
f4081bef08
First template expansion tests start working, and a bug fix in
...
DOMPostProcessor paragraph wrapper. 187 parser tests now passing.
2012-01-14 00:58:20 +00:00
Gabriel Wicke
196d704e8e
Template expansion now enabled and somewhat working, but template fetching
...
still fails all the time.
2012-01-13 18:48:25 +00:00
Gabriel Wicke
32c9bccd7c
Results of early template expansion debugging. Still disabled by default, but
...
getting closer.
2012-01-11 19:48:49 +00:00
Gabriel Wicke
6b6ec2933d
More work towards template expansion.
...
* Created AttributeTokenTransformManager for generic attribute conversion, and
removed { title, template argument {key, value} } expansion from
TemplateHandler.
* Added caching for attribute and input sub-pipelines. Especially attribute
pipelines would otherwise be recreated for each attribute value and key.
2012-01-11 00:05:51 +00:00
Gabriel Wicke
5ec30252f1
More token transform and pipeline setup refactoring to support template
...
expansion better.
2012-01-10 01:09:50 +00:00
Gabriel Wicke
287604c422
A bit of cleanup in ParserPipeline, with better and more consistent support
...
for multiple input types.
2012-01-09 19:33:49 +00:00
Gabriel Wicke
becf3cb7ea
Add generic 'collect all tokens between delimiter tokens and call a transform
...
function on it' util for synchronous transformation phases. This can be used
to implement parser hooks (aka extension tags) besides other things.
2012-01-09 18:13:45 +00:00
Gabriel Wicke
e99d7a2a55
Two batteries worth of token transform manager refactoring.
...
* TokenTransformDispatcher is now renamed to TokenTransformManager, and is
also turned into a base class
* SyncTokenTransformManager and AsyncTokenTransformManager subclass
TokenTransformManager and implement synchronous (phase 1,3) and asynchronous
(phase 2) transformation stages.
* Communication between stages uses the same chunk / end events as all the
other token stages.
* The AsyncTokenTransformManager now supports the creation of nested
AsyncTokenTransformManagers for template expansion.
The AsyncTokenTransformManager object takes on the responsibilities of a
preprocessor frame. Transforms are newly created (or potentially resurrected
from a cache), so that transforms do not have to worry about concurrency.
* The environment is pushed through to all transform managers and the
individual transforms.
2012-01-09 17:49:16 +00:00
Gabriel Wicke
6601c544e6
Handle default for template arg expansion, add template fetch functionality
...
and tweak a few minor things in the grammar and QuoteTransformer.
2012-01-06 17:19:14 +00:00
Gabriel Wicke
f0c844f28f
Add template expansion handler skeleton, not yet functional. Also note
...
improvements needed in the tokenizer template handling.
2012-01-06 14:30:55 +00:00
Gabriel Wicke
2e35171fd1
Fix quote handling and tweak the whitelist a bit. 'any' token registrations
...
are now merged with specific registrations by rank. Not yet clear if that is a
good idea overall, need to check use cases when implementing template expansion
and other functionality.
183 parser test now passing.
2012-01-04 14:09:05 +00:00
Gabriel Wicke
6cd95fea37
Fix up constructors in EventEmitter inheritance and tweak a few more comments.
2012-01-04 12:28:41 +00:00
Gabriel Wicke
e3ae9a702b
Fix JSHint warnings (mostly about comment indentation) from r108012.
2012-01-04 11:06:24 +00:00
Gabriel Wicke
4c4a24f0a0
Hook up the DOMPostProcessor using events as well, and rename the subscription
...
methods to tell a story. Also document idea on how to dynamically configure
the pipeline depending on event registrations in comment.
2012-01-04 11:00:54 +00:00
Gabriel Wicke
f0399d2ec5
Clean up comments in TokenTransformDispatcher and mark private methods with
...
underscore.
2012-01-04 09:48:24 +00:00
Gabriel Wicke
ee79158e53
Add trailing newline in commandline parser wrapper
2012-01-04 08:42:53 +00:00
Gabriel Wicke
29362cc53c
Rename ParseThingy to ParserPipeline and fix up broken WikiDom generation and
...
commandline runner.
2012-01-04 08:39:45 +00:00
Gabriel Wicke
bd98eb4c5a
Land big TokenTransformDispatcher and eventization refactoring.
...
The TokenTransformDispatcher now actually implements an asynchronous, phased
token transformation framework as described in
https://www.mediawiki.org/wiki/Future/Parser_development/Token_stream_transformations .
Additionally, the parser pipeline is now mostly held together using events.
The tokenizer still emits a lame single events with all tokens, as block-level
emission failed with scoping issues specific to the PEGJS parser generator.
All stages clean up when receiving the end tokens, so that the full pipeline
can be used for repeated parsing.
The QuoteTransformer is not yet 100% fixed to work with the new interface, and
the Cite extension is disabled for now pending adaptation. Bold-italic related
tests are failing currently.
2012-01-03 18:44:31 +00:00
Neil Kandalgaonkar
20374b5911
fix substr for IE, followup r107464
2011-12-30 21:51:03 +00:00
Gabriel Wicke
8e00a72d0a
Improvements to link trail handling, and two tweaks to the whitelist. 182
...
tests now passing.
Link trails depend on language-dependent positive character classes in the PHP
parser. These classes all seem to disallow punctuation implicitly and list
differing plain text characters instead, so it might be possible to get away
with identifying a common class of non-trail punctuation instead. This would
help to keep the tokenizer independent of configurations, which is very
desirable for caching and simplified external parsing.
2011-12-30 12:47:06 +00:00
Gabriel Wicke
11ece76b7b
Fix suffix handling for wiki links.
2011-12-30 09:35:57 +00:00
Gabriel Wicke
b3a0270d69
Remove env and load grammar in tokenizer constructor. Re-add property hack to
...
keep parserTests running for now. Really need a different pipeline for html
serialization or a reference to the HTML DOM.
2011-12-28 17:04:16 +00:00
Gabriel Wicke
3a63fb118e
Add a few comments inline, and remove unneeded html serialization as we are
...
only interested in WikiDom output in this parser wrapper.
2011-12-28 13:46:52 +00:00
Neil Kandalgaonkar
8fbf36e63e
put add terminal token inside tokenize method (will pull it out again for streaming interface)
2011-12-28 01:37:15 +00:00
Neil Kandalgaonkar
6103646ec8
remove need to add newline at end of input
2011-12-28 01:37:11 +00:00
Neil Kandalgaonkar
4158f82d7e
refactor parser to ParseThingy in different module, can be invoked with command line utility parse.js
2011-12-28 01:37:06 +00:00
Neil Kandalgaonkar
d91a67ba99
nodeName not defined
2011-12-28 01:36:54 +00:00
Neil Kandalgaonkar
962d1262fc
create tokenizer without need to modify namespace with PEG source
2011-12-28 01:36:36 +00:00
Gabriel Wicke
33e60dd4d9
Update comments a bit.
2011-12-22 12:37:24 +00:00
Gabriel Wicke
9ee0e660ec
Fix regression introduced by r107060 for regular table cells. Good to have a
...
test suite ;)
2011-12-22 12:09:25 +00:00
Gabriel Wicke
a94d0ec10c
Re-add support for row-only tables.
2011-12-22 11:58:32 +00:00
Gabriel Wicke
1c7fe0eb34
Refactor table productions to support table fragments in templates (table
...
start / row / table end). The old productions are not deleted yet to make it
easy to compare the output on more complex articles. 181 tests passing after
adding two table tests with whitespace-only differences to the whitelist.
2011-12-22 11:43:55 +00:00
Gabriel Wicke
2845ba9552
Handle noinclude and includeonly at start of line, so that syntax after it
...
still matches as if it actually was preceded by a newline.
2011-12-21 11:38:50 +00:00
Gabriel Wicke
3a631db6d9
Fix ranges for annotations in implicit paragraphs within branch nodes.
2011-12-16 19:36:04 +00:00
Gabriel Wicke
cc06551f2e
Rename table_header production to table_heading. Those non-natives strike again.
2011-12-16 19:24:59 +00:00
Gabriel Wicke
605ed23fd2
Fix attributes in table headings.
2011-12-16 19:22:13 +00:00
Gabriel Wicke
08255ff3e6
Small bug fix to heading level, spotted by Mike from localwiki- thanks!
2011-12-15 23:59:35 +00:00
Gabriel Wicke
a04744b2ec
Add some more attribute remapping capabilities to the DOMConverter, and clean
...
up some grammar formatting.
2011-12-15 17:33:07 +00:00
Gabriel Wicke
e98dd9e722
Implement 1-char-minimum width for annotations, and some additonal minor
...
cleanup.
2011-12-15 11:05:52 +00:00
Gabriel Wicke
22ba27295b
Clean up the DOMConverter a bit.
2011-12-15 10:55:30 +00:00
Gabriel Wicke
e72dee76e4
Follow-up to r106208 and r106207. Both good catches, thanks Yair! As this code
...
is in its early stages and nowhere near deployment, please Be Bold and just
commit things like this directly! IMHO it makes more sense to fully review this
once it settles down a bit.
2011-12-15 10:13:50 +00:00
Gabriel Wicke
3585bd9c8e
Accept row-only tables. The parser now eats [[en:Barack Obama]] as-is. Hooray!
2011-12-15 00:39:28 +00:00
Gabriel Wicke
6df94a34a1
Less lust for urls
2011-12-15 00:26:22 +00:00
Gabriel Wicke
ce2ee067f7
Minor tweak to wiki link production
2011-12-15 00:12:58 +00:00
Gabriel Wicke
377226a120
Comment out a stray console.log
2011-12-14 23:44:58 +00:00
Gabriel Wicke
574abd9774
A collection of small bug fixes to the grammar, Cite, the Token format
...
converter and the HTML DOM -> WikiDom converter. The tokenizer now digests all
parserTests.
2011-12-14 23:38:46 +00:00
Gabriel Wicke
dc77d73ad5
Add ability to pass through JSON data to WikiDom in data-json-* attributes,
...
and fix parser to actually parse the Barack Obama article except for one table
with nested templates at the start-of-line.
2011-12-14 17:25:09 +00:00
Gabriel Wicke
f6e4267fca
Handle a few more element types, and reset offset for each leaf node. Not sure
...
if the latter is correct, as the documentation at
https://www.mediawiki.org/wiki/Visual_editor/Software_design#Data_Structures
and the actual sample WikiDom in the editor sandbox seem to disagree on this
point.
2011-12-14 16:22:27 +00:00
Gabriel Wicke
6676a47008
Add implicit level attribute to WikiDom headings.
2011-12-14 15:55:58 +00:00
Gabriel Wicke
3018ca690b
Improve WikiDom conversion: Handle text and annotations in branch nodes as
...
paragraphs and treat list items as branches.
2011-12-14 15:40:40 +00:00
Gabriel Wicke
a09aa4d599
Add rough HTML DOM to WikiDom conversion. You can see serialized WikiDom of
...
parser tests using 'node parserTests.js --wikidom'.
2011-12-14 15:15:41 +00:00
Gabriel Wicke
5f80d30428
Clean up access to document and body after building the tree.
2011-12-14 09:40:49 +00:00
Gabriel Wicke
30749b8d8d
Update comments a bit and add a note on things to improve in API.
2011-12-14 09:33:25 +00:00
Gabriel Wicke
55ff272847
Comment TokenTransformDispatcher.
2011-12-13 20:13:09 +00:00
Gabriel Wicke
44deefe303
Minor tweak to comment.
2011-12-13 18:55:44 +00:00
Gabriel Wicke
c61b32eaa7
Clean up and comment the Cite extension a bit.
2011-12-13 18:45:09 +00:00
Gabriel Wicke
feee9ded9f
Convert the Cite extension to a token stream transformer.
...
This required a few further additions to the TokenTransformDispatcher. In
particular, there is now an 'any' token match whose callbacks are executed
before more specific callbacks. This is used by the Cite extension to eat all
tokens between ref and /ref tags. This need is very common, so should be
broken out to an intermediate layer in the future.
In general, the requirements for the TokenTransformDispatcher API are now
clearer, and the API should likely be cleaned up / simplified.
2011-12-13 14:48:47 +00:00
Gabriel Wicke
8e55e79b67
Rename TokenTransformer to TokenTransformDispatcher.
2011-12-13 11:45:12 +00:00
Gabriel Wicke
8231511217
Replace custom object copy with $.extend.
2011-12-13 11:18:15 +00:00
Gabriel Wicke
39aedd4378
Improve comments in QuoteTransformer.
2011-12-13 10:25:18 +00:00
Gabriel Wicke
0ad08b9ae3
Add a README file pointing to the wiki documentation.
2011-12-12 22:30:11 +00:00
Gabriel Wicke
a8fa9433c4
Convert quote handling (italic/bold) to a core extension operating on the
...
token stream. This is the first token transformation exercising the
TokenTransformer class as its dispatcher. Template expansions, wiki link
formatting, tag sanitation and extensions should be able to use the same
dispatcher by registering for specific token types.
The parser performance is very slightly improved as the token stream is only
traversed once.
2011-12-12 20:53:14 +00:00
Gabriel Wicke
752b0990b2
Refactor parserTests somewhat into a class-like structure, and wire up the
...
TokenTransformer.
2011-12-12 14:03:54 +00:00
Gabriel Wicke
d616f07a79
Don't re-build the wiki tokenizer for each test. This speeds up the full
...
parserTests.js run slightly from 7-8 minutes to about 14 seconds ;)
A few very minor tweaks to the grammar are also thrown into this commit.
2011-12-12 10:47:42 +00:00
Gabriel Wicke
89c5e0cafb
Follow-up to r105859: Add missing new.
2011-12-12 10:09:13 +00:00
Gabriel Wicke
9ebce5839a
Further development of the TokenTransformer framework.
2011-12-12 10:01:47 +00:00
Gabriel Wicke
80d5067813
Add a TokenTransformer dispatcher class. This class provides subscriptions by
...
token type, and supports asynchronous token expansion (for example for async
template expansion). This code is not yet tested or used. The interface for
token insertion from transformation functions will be expanded as needed.
2011-12-08 14:37:31 +00:00
Gabriel Wicke
c2b69e2486
Clean up newline handling. Emit a NEWLINE token for each
...
non-{comment,pre,nowiki} newline.
2011-12-08 14:34:18 +00:00
Gabriel Wicke
abc2254110
A bit of comment clean-up and wrapping of tree building into try/catch block
...
to actually count failures.
2011-12-08 11:40:59 +00:00
Gabriel Wicke
92fdf99384
Further renaming, this time from pegParser to pegTokenizer.
2011-12-08 10:59:44 +00:00
Gabriel Wicke
76bc477038
Rename html5TokenEmitter to HTML5TreeBuilder, and the contained Tokenizer to
...
TreeBuilder.
2011-12-08 10:37:18 +00:00
Gabriel Wicke
19a1f0850f
Tidy up the grammar a bit.
2011-12-08 10:33:23 +00:00
Gabriel Wicke
3742d70abd
Add some documentation to syntax flags
2011-12-07 15:54:55 +00:00
Gabriel Wicke
545ca1809f
Convert template argument production to generic inline with syntactic stop.
...
Fix a bug in generic inline production. Nested multi-line templates are now
parsed okayish.
2011-12-07 15:39:39 +00:00
Gabriel Wicke
902db40a1f
Process template arguments into an object.
2011-12-07 14:46:07 +00:00
Gabriel Wicke
51a40e4dbc
Follow-up to r105423: Fix off-by-one bug.
2011-12-07 11:56:12 +00:00
Gabriel Wicke
49c286a67b
Fix a bug in doQuotes (bitten by surprising JS sort() behavior), and improve
...
tag-only-line handling. 180 parser tests now passing.
2011-12-07 11:51:24 +00:00
Gabriel Wicke
418a5067c6
Parse attributes in tables using generic attribute production. Some table
...
tests still do not pass as the MW table output reorders attributes ;)
2011-12-06 22:03:21 +00:00
Gabriel Wicke
3d06707152
Slightly speed up inline tag productions using guards and grouping; Fix list
...
processing function.
2011-12-06 18:35:05 +00:00
Gabriel Wicke
ea8f226fd5
Remove ext and references special cases, now subsumed by generic XML tag
...
productions. Document issue around special tokenizer mode for other extension
tags.
2011-12-06 16:44:27 +00:00
Gabriel Wicke
e7de089d5b
Decode urls and html entities, 163 tests now passing.
2011-12-06 13:17:14 +00:00
Gabriel Wicke
a72a9e55a3
Don't match internal links with url as target. 161 passing.
2011-12-06 12:26:57 +00:00
Gabriel Wicke
2b5cc67bf5
Further tweaks to headings. 157 tests now passing.
2011-12-06 11:59:41 +00:00
Gabriel Wicke
f4d123886e
Convert heading rules to single rule that figures out the level. This saves a
...
lot of backtracking and inline break complexity.
2011-12-06 11:06:05 +00:00
Gabriel Wicke
33e19f7275
Recognize block-level elements independent of case; Ignore toc and section
...
edit links in tests. 148 parser tests passing.
2011-12-05 20:03:24 +00:00
Gabriel Wicke
9ed9cb31bd
Fix template argument handling somewhat.
2011-12-05 17:58:11 +00:00
Gabriel Wicke
1760210d13
Fixes to tables, headings and misc smaller stuff. Tracked down an issue caused
...
by improperly caching of production results, which interfered with the
flag-dependent inline_break production.
2011-12-04 19:23:24 +00:00
Gabriel Wicke
63c728924b
Use pegjs from npm
2011-12-01 15:23:23 +00:00
Antoine Musso
5ab379f479
fix vim modeline
2011-12-01 15:19:37 +00:00
Gabriel Wicke
0ce1e9fcf3
Add a quick html entity decoding hack, and document need for general decoder.
2011-12-01 14:39:55 +00:00
Gabriel Wicke
d00743ad79
Improve external links and definition lists, now 133 tests passing ;)
...
Also add printwhitelist option to test runner, provides js code copy/pastable
to whitelist.
2011-12-01 14:25:59 +00:00
Gabriel Wicke
82e31ffd42
Do not allow newlines in various attributes
2011-11-30 15:12:53 +00:00
Gabriel Wicke
821162484e
Allow inlines in the term part of ; term : definition
2011-11-30 14:53:28 +00:00
Gabriel Wicke
f758894de7
Let another test pass by swapping the default order of italic/bold for '''''.
...
Minor test output cosmetics.
2011-11-30 13:54:57 +00:00
Gabriel Wicke
e0fca805a6
Expand tabs in grammar.
2011-11-30 13:42:26 +00:00
Gabriel Wicke
2bb512a4de
A bit of tokenizer grammar clean-up and additional expected-html
...
normalization. 99 parser tests now passing.
2011-11-30 13:40:17 +00:00
Gabriel Wicke
127d8c8621
Simplify DOM paragraph wrapping postprocessor
2011-11-30 12:28:45 +00:00
Gabriel Wicke
f0edc5cb9a
Fix a few more tests by allowing inline content inside links. 76 now passing.
2011-11-29 18:43:27 +00:00
Gabriel Wicke
ae0b5f9af4
* Split paragraph handling between tokenizer and DOM postprocessor for better
...
html markup handling.
* Remove global 'use strict' declarations from html5 parser.
* Add trailing whitespace handling in dt
Overall, 55 parser tests are now passing.
2011-11-29 15:11:51 +00:00
Gabriel Wicke
b16c295b98
Consider dl as a block-level element.
2011-11-28 16:54:58 +00:00
Gabriel Wicke
d3f0196df7
Add primitive HTML comparison to detect passing parser tests. The expected
...
HTML is parsed using a HTML parser and re-serialized, and the output compared
to the serialization of the new parser's dom. Newline normalization is a
cheap hack for now, need to improve that later.
2011-11-28 11:10:39 +00:00
Gabriel Wicke
6b8c109cf0
Separate block-level tags in tokenizer to delimit inlines and avoid wrapping
...
block-level in paragraphs.
2011-11-25 17:41:26 +00:00
Gabriel Wicke
859379a635
Improvements to nowiki/pre interaction. Will need to distinguish block-level
...
tags from inline HTML tags next.
2011-11-25 15:02:44 +00:00
Gabriel Wicke
dd5cd59ac6
Better HTML, pre and blocklevel handling. Hackish source formatting for easier
...
comparison with parserTest results.
2011-11-25 12:47:03 +00:00
Gabriel Wicke
5b3a4497aa
Add generic HTML tokenization and nowiki handling.
2011-11-25 10:59:43 +00:00
Gabriel Wicke
6c36ddcbce
Follow-up to r104164: Clean-up comments, remove old italic/bold productions.
2011-11-24 14:20:56 +00:00
Gabriel Wicke
dee262658f
Add MediaWiki-compatible quote handling including quirks and overlapped
...
structures like ''[[Link|Link text'']]. This is another transform on the token
stream.
2011-11-24 13:56:30 +00:00
Gabriel Wicke
baf55875b9
Re-add modified wiki list handling to tokenizer.
2011-11-23 14:27:51 +00:00
Gabriel Wicke
694b998f24
Minor improvement to italic/bold, documentation on failed modularization of
...
static parser functions.
2011-11-22 16:51:05 +00:00
Gabriel Wicke
d1b0293569
Fix comment token conversion and serialization
2011-11-21 09:22:30 +00:00
Gabriel Wicke
65afd9b610
Improve internal link handling
2011-11-18 14:48:32 +00:00
Gabriel Wicke
d744e65c48
Add missing token adapter.
2011-11-18 14:00:14 +00:00
Gabriel Wicke
b750ce38b8
Add node.js-compatible HTML5 parser and hook it up to the PEG tokenizer.
...
Builds a DOM tree (jsdom) from the tokens and then serializes that using
document.innerHTML. This is all very experimental, so don't be surprised by
rough edges.
2011-11-18 13:57:07 +00:00
Gabriel Wicke
11e487d8c0
Flatten inline token lists before merging text into text tokens.
2011-11-17 15:43:31 +00:00
Gabriel Wicke
ea87e7aaee
Convert PEG parser to tokenizer for back-end HTML parser. Now emits a list of
...
tokens, which for now is still completely built before parsing can proceed.
For each top-level block, the source start/end positions are added as
attributes to the top-most tokens. No tracking of wiki vs. html syntax yet.
2011-11-17 15:26:02 +00:00
Gabriel Wicke
ef3c84bd2e
Extract text from inline elements for better testing. Slightly improved
...
handling of comment-only lines. Change pre to leaf content model.
2011-11-08 16:08:05 +00:00
Gabriel Wicke
18ead89b37
Improved paragraph, br, comment parsing and switched headings to
...
generic inlineline with syntactic flags.
2011-11-07 23:09:30 +00:00
Gabriel Wicke
944d010eb2
Indentation cleanup in PEG parser and Html serializer
2011-11-07 21:05:37 +00:00
Gabriel Wicke
c3a0c56e56
rename definition{term,description} to just {term,description}
2011-11-07 20:36:34 +00:00
Gabriel Wicke
71891131c3
Grammar improvements
...
* replaced regexp stack with a set of break rules for inline content within
specialized parse contexts, switched more rules to generic
inlineline/inline/block rules.
* don't consume end-of-line for proper start-of-line matching
* added some pre support
* still no conversion of inline elements to annotations
2011-11-07 14:39:12 +00:00
Gabriel Wicke
06ca9f12fe
Rename definitiondata to definitiondescription, minor fixes
2011-11-04 12:25:01 +00:00
Gabriel Wicke
7e5c196732
Some more progress for tables and definition lists
2011-11-04 12:06:49 +00:00
Gabriel Wicke
83a80bad49
Fixes for definition lists
2011-11-04 11:08:11 +00:00
Gabriel Wicke
85def70a8a
Add basic list serialization to HtmlSerializer
...
* Added 'definitionterm' and 'definitiondata' styles to support definition
lists, and special-case handling in the serializer to wrap both in dls.
2011-11-04 10:02:59 +00:00
Gabriel Wicke
63398b5749
Update parserTests to latest serializers
2011-11-04 07:45:05 +00:00
Gabriel Wicke
a8838dab18
Start by handling paragraphs, at least a bit.
2011-11-03 15:16:05 +00:00
Gabriel Wicke
0d30a5528e
First combination of WikiDom serializers with existing parser in
...
tests/parser/parserTests.js.
* Removed var from es in es.js to allow node.js to access it as global. Only
alternative solution appears to be a node-specific 'exports' construct:
http://nodejs.org/docs/v0.3.1/api/modules.html
* Added es.Document.js and es.Document.Serializer.js in es/bases. Not sure if
this is the desired location.
* Changed es.extend to es.extendClass in the serializers
* Modified the first parser test to include the WikiDom modules and call the
new HTML serializer
2011-11-03 13:55:48 +00:00
Trevor Parscal
5bae153214
Moving parser stuff back into the modules folder (oops)
2011-11-02 21:45:57 +00:00
Trevor Parscal
2b499d5990
Reorganized modules by javascript namespace
2011-11-02 21:31:45 +00:00
Brion Vibber
213ee7d4a8
followup r101685: the peg definition
2011-11-02 21:09:19 +00:00
Brion Vibber
56a75ccca7
Copy several of the experimental JS parser bits from ParserPlayground to VisualEditor. They'll need retooling to hook up with the wikidom stuff.
2011-11-02 21:07:51 +00:00