Commit graph

112 commits

Author SHA1 Message Date
Gabriel Wicke 8e00a72d0a Improvements to link trail handling, and two tweaks to the whitelist. 182
tests now passing. 

Link trails depend on language-dependent positive character classes in the PHP
parser. These classes all seem to disallow punctuation implicitly and list
differing plain text characters instead, so it might be possible to get away
with identifying a common class of non-trail punctuation instead. This would
help to keep the tokenizer independent of configurations, which is very
desirable for caching and simplified external parsing.
2011-12-30 12:47:06 +00:00
Gabriel Wicke 11ece76b7b Fix suffix handling for wiki links. 2011-12-30 09:35:57 +00:00
Gabriel Wicke b3a0270d69 Remove env and load grammar in tokenizer constructor. Re-add property hack to
keep parserTests running for now. Really need a different pipeline for html
serialization or a reference to the HTML DOM.
2011-12-28 17:04:16 +00:00
Gabriel Wicke 3a63fb118e Add a few comments inline, and remove unneeded html serialization as we are
only interested in WikiDom output in this parser wrapper.
2011-12-28 13:46:52 +00:00
Neil Kandalgaonkar 8fbf36e63e put add terminal token inside tokenize method (will pull it out again for streaming interface) 2011-12-28 01:37:15 +00:00
Neil Kandalgaonkar 6103646ec8 remove need to add newline at end of input 2011-12-28 01:37:11 +00:00
Neil Kandalgaonkar 4158f82d7e refactor parser to ParseThingy in different module, can be invoked with command line utility parse.js 2011-12-28 01:37:06 +00:00
Neil Kandalgaonkar d91a67ba99 nodeName not defined 2011-12-28 01:36:54 +00:00
Neil Kandalgaonkar 962d1262fc create tokenizer without need to modify namespace with PEG source 2011-12-28 01:36:36 +00:00
Gabriel Wicke 33e60dd4d9 Update comments a bit. 2011-12-22 12:37:24 +00:00
Gabriel Wicke 9ee0e660ec Fix regression introduced by r107060 for regular table cells. Good to have a
test suite ;)
2011-12-22 12:09:25 +00:00
Gabriel Wicke a94d0ec10c Re-add support for row-only tables. 2011-12-22 11:58:32 +00:00
Gabriel Wicke 1c7fe0eb34 Refactor table productions to support table fragments in templates (table
start / row / table end). The old productions are not deleted yet to make it
easy to compare the output on more complex articles. 181 tests passing after
adding two table tests with whitespace-only differences to the whitelist.
2011-12-22 11:43:55 +00:00
Gabriel Wicke 2845ba9552 Handle noinclude and includeonly at start of line, so that syntax after it
still matches as if it actually was preceded by a newline.
2011-12-21 11:38:50 +00:00
Gabriel Wicke 3a631db6d9 Fix ranges for annotations in implicit paragraphs within branch nodes. 2011-12-16 19:36:04 +00:00
Gabriel Wicke cc06551f2e Rename table_header production to table_heading. Those non-natives strike again. 2011-12-16 19:24:59 +00:00
Gabriel Wicke 605ed23fd2 Fix attributes in table headings. 2011-12-16 19:22:13 +00:00
Gabriel Wicke 08255ff3e6 Small bug fix to heading level, spotted by Mike from localwiki- thanks! 2011-12-15 23:59:35 +00:00
Gabriel Wicke a04744b2ec Add some more attribute remapping capabilities to the DOMConverter, and clean
up some grammar formatting.
2011-12-15 17:33:07 +00:00
Gabriel Wicke e98dd9e722 Implement 1-char-minimum width for annotations, and some additonal minor
cleanup.
2011-12-15 11:05:52 +00:00
Gabriel Wicke 22ba27295b Clean up the DOMConverter a bit. 2011-12-15 10:55:30 +00:00
Gabriel Wicke e72dee76e4 Follow-up to r106208 and r106207. Both good catches, thanks Yair! As this code
is in its early stages and nowhere near deployment, please Be Bold and just
commit things like this directly! IMHO it makes more sense to fully review this
once it settles down a bit.
2011-12-15 10:13:50 +00:00
Gabriel Wicke 3585bd9c8e Accept row-only tables. The parser now eats [[en:Barack Obama]] as-is. Hooray! 2011-12-15 00:39:28 +00:00
Gabriel Wicke 6df94a34a1 Less lust for urls 2011-12-15 00:26:22 +00:00
Gabriel Wicke ce2ee067f7 Minor tweak to wiki link production 2011-12-15 00:12:58 +00:00
Gabriel Wicke 377226a120 Comment out a stray console.log 2011-12-14 23:44:58 +00:00
Gabriel Wicke 574abd9774 A collection of small bug fixes to the grammar, Cite, the Token format
converter and the HTML DOM -> WikiDom converter. The tokenizer now digests all
parserTests.
2011-12-14 23:38:46 +00:00
Gabriel Wicke dc77d73ad5 Add ability to pass through JSON data to WikiDom in data-json-* attributes,
and fix parser to actually parse the Barack Obama article except for one table
with nested templates at the start-of-line.
2011-12-14 17:25:09 +00:00
Gabriel Wicke f6e4267fca Handle a few more element types, and reset offset for each leaf node. Not sure
if the latter is correct, as the documentation at
https://www.mediawiki.org/wiki/Visual_editor/Software_design#Data_Structures
and the actual sample WikiDom in the editor sandbox seem to disagree on this
point.
2011-12-14 16:22:27 +00:00
Gabriel Wicke 6676a47008 Add implicit level attribute to WikiDom headings. 2011-12-14 15:55:58 +00:00
Gabriel Wicke 3018ca690b Improve WikiDom conversion: Handle text and annotations in branch nodes as
paragraphs and treat list items as branches.
2011-12-14 15:40:40 +00:00
Gabriel Wicke a09aa4d599 Add rough HTML DOM to WikiDom conversion. You can see serialized WikiDom of
parser tests using 'node parserTests.js --wikidom'.
2011-12-14 15:15:41 +00:00
Gabriel Wicke 5f80d30428 Clean up access to document and body after building the tree. 2011-12-14 09:40:49 +00:00
Gabriel Wicke 30749b8d8d Update comments a bit and add a note on things to improve in API. 2011-12-14 09:33:25 +00:00
Gabriel Wicke 55ff272847 Comment TokenTransformDispatcher. 2011-12-13 20:13:09 +00:00
Gabriel Wicke 44deefe303 Minor tweak to comment. 2011-12-13 18:55:44 +00:00
Gabriel Wicke c61b32eaa7 Clean up and comment the Cite extension a bit. 2011-12-13 18:45:09 +00:00
Gabriel Wicke feee9ded9f Convert the Cite extension to a token stream transformer.
This required a few further additions to the TokenTransformDispatcher. In
particular, there is now an 'any' token match whose callbacks are executed
before more specific callbacks. This is used by the Cite extension to eat all
tokens between ref and /ref tags. This need is very common, so should be
broken out to an intermediate layer in the future.

In general, the requirements for the TokenTransformDispatcher API are now
clearer, and the API should likely be cleaned up / simplified.
2011-12-13 14:48:47 +00:00
Gabriel Wicke 8e55e79b67 Rename TokenTransformer to TokenTransformDispatcher. 2011-12-13 11:45:12 +00:00
Gabriel Wicke 8231511217 Replace custom object copy with $.extend. 2011-12-13 11:18:15 +00:00
Gabriel Wicke 39aedd4378 Improve comments in QuoteTransformer. 2011-12-13 10:25:18 +00:00
Gabriel Wicke 0ad08b9ae3 Add a README file pointing to the wiki documentation. 2011-12-12 22:30:11 +00:00
Gabriel Wicke a8fa9433c4 Convert quote handling (italic/bold) to a core extension operating on the
token stream. This is the first token transformation exercising the
TokenTransformer class as its dispatcher. Template expansions, wiki link
formatting, tag sanitation and extensions should be able to use the same
dispatcher by registering for specific token types.

The parser performance is very slightly improved as the token stream is only
traversed once.
2011-12-12 20:53:14 +00:00
Gabriel Wicke 752b0990b2 Refactor parserTests somewhat into a class-like structure, and wire up the
TokenTransformer.
2011-12-12 14:03:54 +00:00
Gabriel Wicke d616f07a79 Don't re-build the wiki tokenizer for each test. This speeds up the full
parserTests.js run slightly from 7-8 minutes to about 14 seconds ;)

A few very minor tweaks to the grammar are also thrown into this commit.
2011-12-12 10:47:42 +00:00
Gabriel Wicke 89c5e0cafb Follow-up to r105859: Add missing new. 2011-12-12 10:09:13 +00:00
Gabriel Wicke 9ebce5839a Further development of the TokenTransformer framework. 2011-12-12 10:01:47 +00:00
Gabriel Wicke 80d5067813 Add a TokenTransformer dispatcher class. This class provides subscriptions by
token type, and supports asynchronous token expansion (for example for async
template expansion). This code is not yet tested or used. The interface for
token insertion from transformation functions will be expanded as needed.
2011-12-08 14:37:31 +00:00
Gabriel Wicke c2b69e2486 Clean up newline handling. Emit a NEWLINE token for each
non-{comment,pre,nowiki} newline.
2011-12-08 14:34:18 +00:00
Gabriel Wicke abc2254110 A bit of comment clean-up and wrapping of tree building into try/catch block
to actually count failures.
2011-12-08 11:40:59 +00:00