Gabriel Wicke
8e00a72d0a
Improvements to link trail handling, and two tweaks to the whitelist. 182
...
tests now passing.
Link trails depend on language-dependent positive character classes in the PHP
parser. These classes all seem to disallow punctuation implicitly and list
differing plain text characters instead, so it might be possible to get away
with identifying a common class of non-trail punctuation instead. This would
help to keep the tokenizer independent of configurations, which is very
desirable for caching and simplified external parsing.
2011-12-30 12:47:06 +00:00
Gabriel Wicke
11ece76b7b
Fix suffix handling for wiki links.
2011-12-30 09:35:57 +00:00
Gabriel Wicke
b3a0270d69
Remove env and load grammar in tokenizer constructor. Re-add property hack to
...
keep parserTests running for now. Really need a different pipeline for html
serialization or a reference to the HTML DOM.
2011-12-28 17:04:16 +00:00
Gabriel Wicke
3a63fb118e
Add a few comments inline, and remove unneeded html serialization as we are
...
only interested in WikiDom output in this parser wrapper.
2011-12-28 13:46:52 +00:00
Neil Kandalgaonkar
8fbf36e63e
put add terminal token inside tokenize method (will pull it out again for streaming interface)
2011-12-28 01:37:15 +00:00
Neil Kandalgaonkar
6103646ec8
remove need to add newline at end of input
2011-12-28 01:37:11 +00:00
Neil Kandalgaonkar
4158f82d7e
refactor parser to ParseThingy in different module, can be invoked with command line utility parse.js
2011-12-28 01:37:06 +00:00
Neil Kandalgaonkar
d91a67ba99
nodeName not defined
2011-12-28 01:36:54 +00:00
Neil Kandalgaonkar
962d1262fc
create tokenizer without need to modify namespace with PEG source
2011-12-28 01:36:36 +00:00
Gabriel Wicke
33e60dd4d9
Update comments a bit.
2011-12-22 12:37:24 +00:00
Gabriel Wicke
9ee0e660ec
Fix regression introduced by r107060 for regular table cells. Good to have a
...
test suite ;)
2011-12-22 12:09:25 +00:00
Gabriel Wicke
a94d0ec10c
Re-add support for row-only tables.
2011-12-22 11:58:32 +00:00
Gabriel Wicke
1c7fe0eb34
Refactor table productions to support table fragments in templates (table
...
start / row / table end). The old productions are not deleted yet to make it
easy to compare the output on more complex articles. 181 tests passing after
adding two table tests with whitespace-only differences to the whitelist.
2011-12-22 11:43:55 +00:00
Gabriel Wicke
2845ba9552
Handle noinclude and includeonly at start of line, so that syntax after it
...
still matches as if it actually was preceded by a newline.
2011-12-21 11:38:50 +00:00
Gabriel Wicke
3a631db6d9
Fix ranges for annotations in implicit paragraphs within branch nodes.
2011-12-16 19:36:04 +00:00
Gabriel Wicke
cc06551f2e
Rename table_header production to table_heading. Those non-natives strike again.
2011-12-16 19:24:59 +00:00
Gabriel Wicke
605ed23fd2
Fix attributes in table headings.
2011-12-16 19:22:13 +00:00
Gabriel Wicke
08255ff3e6
Small bug fix to heading level, spotted by Mike from localwiki- thanks!
2011-12-15 23:59:35 +00:00
Gabriel Wicke
a04744b2ec
Add some more attribute remapping capabilities to the DOMConverter, and clean
...
up some grammar formatting.
2011-12-15 17:33:07 +00:00
Gabriel Wicke
e98dd9e722
Implement 1-char-minimum width for annotations, and some additonal minor
...
cleanup.
2011-12-15 11:05:52 +00:00
Gabriel Wicke
22ba27295b
Clean up the DOMConverter a bit.
2011-12-15 10:55:30 +00:00
Gabriel Wicke
e72dee76e4
Follow-up to r106208 and r106207. Both good catches, thanks Yair! As this code
...
is in its early stages and nowhere near deployment, please Be Bold and just
commit things like this directly! IMHO it makes more sense to fully review this
once it settles down a bit.
2011-12-15 10:13:50 +00:00
Gabriel Wicke
3585bd9c8e
Accept row-only tables. The parser now eats [[en:Barack Obama]] as-is. Hooray!
2011-12-15 00:39:28 +00:00
Gabriel Wicke
6df94a34a1
Less lust for urls
2011-12-15 00:26:22 +00:00
Gabriel Wicke
ce2ee067f7
Minor tweak to wiki link production
2011-12-15 00:12:58 +00:00
Gabriel Wicke
377226a120
Comment out a stray console.log
2011-12-14 23:44:58 +00:00
Gabriel Wicke
574abd9774
A collection of small bug fixes to the grammar, Cite, the Token format
...
converter and the HTML DOM -> WikiDom converter. The tokenizer now digests all
parserTests.
2011-12-14 23:38:46 +00:00
Gabriel Wicke
dc77d73ad5
Add ability to pass through JSON data to WikiDom in data-json-* attributes,
...
and fix parser to actually parse the Barack Obama article except for one table
with nested templates at the start-of-line.
2011-12-14 17:25:09 +00:00
Gabriel Wicke
f6e4267fca
Handle a few more element types, and reset offset for each leaf node. Not sure
...
if the latter is correct, as the documentation at
https://www.mediawiki.org/wiki/Visual_editor/Software_design#Data_Structures
and the actual sample WikiDom in the editor sandbox seem to disagree on this
point.
2011-12-14 16:22:27 +00:00
Gabriel Wicke
6676a47008
Add implicit level attribute to WikiDom headings.
2011-12-14 15:55:58 +00:00
Gabriel Wicke
3018ca690b
Improve WikiDom conversion: Handle text and annotations in branch nodes as
...
paragraphs and treat list items as branches.
2011-12-14 15:40:40 +00:00
Gabriel Wicke
a09aa4d599
Add rough HTML DOM to WikiDom conversion. You can see serialized WikiDom of
...
parser tests using 'node parserTests.js --wikidom'.
2011-12-14 15:15:41 +00:00
Gabriel Wicke
5f80d30428
Clean up access to document and body after building the tree.
2011-12-14 09:40:49 +00:00
Gabriel Wicke
30749b8d8d
Update comments a bit and add a note on things to improve in API.
2011-12-14 09:33:25 +00:00
Gabriel Wicke
55ff272847
Comment TokenTransformDispatcher.
2011-12-13 20:13:09 +00:00
Gabriel Wicke
44deefe303
Minor tweak to comment.
2011-12-13 18:55:44 +00:00
Gabriel Wicke
c61b32eaa7
Clean up and comment the Cite extension a bit.
2011-12-13 18:45:09 +00:00
Gabriel Wicke
feee9ded9f
Convert the Cite extension to a token stream transformer.
...
This required a few further additions to the TokenTransformDispatcher. In
particular, there is now an 'any' token match whose callbacks are executed
before more specific callbacks. This is used by the Cite extension to eat all
tokens between ref and /ref tags. This need is very common, so should be
broken out to an intermediate layer in the future.
In general, the requirements for the TokenTransformDispatcher API are now
clearer, and the API should likely be cleaned up / simplified.
2011-12-13 14:48:47 +00:00
Gabriel Wicke
8e55e79b67
Rename TokenTransformer to TokenTransformDispatcher.
2011-12-13 11:45:12 +00:00
Gabriel Wicke
8231511217
Replace custom object copy with $.extend.
2011-12-13 11:18:15 +00:00
Gabriel Wicke
39aedd4378
Improve comments in QuoteTransformer.
2011-12-13 10:25:18 +00:00
Gabriel Wicke
0ad08b9ae3
Add a README file pointing to the wiki documentation.
2011-12-12 22:30:11 +00:00
Gabriel Wicke
a8fa9433c4
Convert quote handling (italic/bold) to a core extension operating on the
...
token stream. This is the first token transformation exercising the
TokenTransformer class as its dispatcher. Template expansions, wiki link
formatting, tag sanitation and extensions should be able to use the same
dispatcher by registering for specific token types.
The parser performance is very slightly improved as the token stream is only
traversed once.
2011-12-12 20:53:14 +00:00
Gabriel Wicke
752b0990b2
Refactor parserTests somewhat into a class-like structure, and wire up the
...
TokenTransformer.
2011-12-12 14:03:54 +00:00
Gabriel Wicke
d616f07a79
Don't re-build the wiki tokenizer for each test. This speeds up the full
...
parserTests.js run slightly from 7-8 minutes to about 14 seconds ;)
A few very minor tweaks to the grammar are also thrown into this commit.
2011-12-12 10:47:42 +00:00
Gabriel Wicke
89c5e0cafb
Follow-up to r105859: Add missing new.
2011-12-12 10:09:13 +00:00
Gabriel Wicke
9ebce5839a
Further development of the TokenTransformer framework.
2011-12-12 10:01:47 +00:00
Gabriel Wicke
80d5067813
Add a TokenTransformer dispatcher class. This class provides subscriptions by
...
token type, and supports asynchronous token expansion (for example for async
template expansion). This code is not yet tested or used. The interface for
token insertion from transformation functions will be expanded as needed.
2011-12-08 14:37:31 +00:00
Gabriel Wicke
c2b69e2486
Clean up newline handling. Emit a NEWLINE token for each
...
non-{comment,pre,nowiki} newline.
2011-12-08 14:34:18 +00:00
Gabriel Wicke
abc2254110
A bit of comment clean-up and wrapping of tree building into try/catch block
...
to actually count failures.
2011-12-08 11:40:59 +00:00