Commit graph

50 commits

Author SHA1 Message Date
Gabriel Wicke 418a5067c6 Parse attributes in tables using generic attribute production. Some table
tests still do not pass as the MW table output reorders attributes ;)
2011-12-06 22:03:21 +00:00
Gabriel Wicke 3d06707152 Slightly speed up inline tag productions using guards and grouping; Fix list
processing function.
2011-12-06 18:35:05 +00:00
Gabriel Wicke ea8f226fd5 Remove ext and references special cases, now subsumed by generic XML tag
productions. Document issue around special tokenizer mode for other extension
tags.
2011-12-06 16:44:27 +00:00
Gabriel Wicke e7de089d5b Decode urls and html entities, 163 tests now passing. 2011-12-06 13:17:14 +00:00
Gabriel Wicke a72a9e55a3 Don't match internal links with url as target. 161 passing. 2011-12-06 12:26:57 +00:00
Gabriel Wicke 2b5cc67bf5 Further tweaks to headings. 157 tests now passing. 2011-12-06 11:59:41 +00:00
Gabriel Wicke f4d123886e Convert heading rules to single rule that figures out the level. This saves a
lot of backtracking and inline break complexity.
2011-12-06 11:06:05 +00:00
Gabriel Wicke 33e19f7275 Recognize block-level elements independent of case; Ignore toc and section
edit links in tests. 148 parser tests passing.
2011-12-05 20:03:24 +00:00
Gabriel Wicke 9ed9cb31bd Fix template argument handling somewhat. 2011-12-05 17:58:11 +00:00
Gabriel Wicke 1760210d13 Fixes to tables, headings and misc smaller stuff. Tracked down an issue caused
by improperly caching of production results, which interfered with the
flag-dependent inline_break production.
2011-12-04 19:23:24 +00:00
Antoine Musso 5ab379f479 fix vim modeline 2011-12-01 15:19:37 +00:00
Gabriel Wicke 0ce1e9fcf3 Add a quick html entity decoding hack, and document need for general decoder. 2011-12-01 14:39:55 +00:00
Gabriel Wicke d00743ad79 Improve external links and definition lists, now 133 tests passing ;)
Also add printwhitelist option to test runner, provides js code copy/pastable
to whitelist.
2011-12-01 14:25:59 +00:00
Gabriel Wicke 82e31ffd42 Do not allow newlines in various attributes 2011-11-30 15:12:53 +00:00
Gabriel Wicke 821162484e Allow inlines in the term part of ; term : definition 2011-11-30 14:53:28 +00:00
Gabriel Wicke f758894de7 Let another test pass by swapping the default order of italic/bold for '''''.
Minor test output cosmetics.
2011-11-30 13:54:57 +00:00
Gabriel Wicke e0fca805a6 Expand tabs in grammar. 2011-11-30 13:42:26 +00:00
Gabriel Wicke 2bb512a4de A bit of tokenizer grammar clean-up and additional expected-html
normalization. 99 parser tests now passing.
2011-11-30 13:40:17 +00:00
Gabriel Wicke f0edc5cb9a Fix a few more tests by allowing inline content inside links. 76 now passing. 2011-11-29 18:43:27 +00:00
Gabriel Wicke ae0b5f9af4 * Split paragraph handling between tokenizer and DOM postprocessor for better
html markup handling. 
* Remove global 'use strict' declarations from html5 parser. 
* Add trailing whitespace handling in dt

Overall, 55 parser tests are now passing.
2011-11-29 15:11:51 +00:00
Gabriel Wicke b16c295b98 Consider dl as a block-level element. 2011-11-28 16:54:58 +00:00
Gabriel Wicke d3f0196df7 Add primitive HTML comparison to detect passing parser tests. The expected
HTML is parsed using a HTML parser and re-serialized, and the output compared
to the serialization of the new parser's dom. Newline normalization is a
cheap hack for now, need to improve that later.
2011-11-28 11:10:39 +00:00
Gabriel Wicke 6b8c109cf0 Separate block-level tags in tokenizer to delimit inlines and avoid wrapping
block-level in paragraphs.
2011-11-25 17:41:26 +00:00
Gabriel Wicke 859379a635 Improvements to nowiki/pre interaction. Will need to distinguish block-level
tags from inline HTML tags next.
2011-11-25 15:02:44 +00:00
Gabriel Wicke dd5cd59ac6 Better HTML, pre and blocklevel handling. Hackish source formatting for easier
comparison with parserTest results.
2011-11-25 12:47:03 +00:00
Gabriel Wicke 5b3a4497aa Add generic HTML tokenization and nowiki handling. 2011-11-25 10:59:43 +00:00
Gabriel Wicke 6c36ddcbce Follow-up to r104164: Clean-up comments, remove old italic/bold productions. 2011-11-24 14:20:56 +00:00
Gabriel Wicke dee262658f Add MediaWiki-compatible quote handling including quirks and overlapped
structures like ''[[Link|Link text'']]. This is another transform on the token
stream.
2011-11-24 13:56:30 +00:00
Gabriel Wicke baf55875b9 Re-add modified wiki list handling to tokenizer. 2011-11-23 14:27:51 +00:00
Gabriel Wicke 694b998f24 Minor improvement to italic/bold, documentation on failed modularization of
static parser functions.
2011-11-22 16:51:05 +00:00
Gabriel Wicke d1b0293569 Fix comment token conversion and serialization 2011-11-21 09:22:30 +00:00
Gabriel Wicke 65afd9b610 Improve internal link handling 2011-11-18 14:48:32 +00:00
Gabriel Wicke b750ce38b8 Add node.js-compatible HTML5 parser and hook it up to the PEG tokenizer.
Builds a DOM tree (jsdom) from the tokens and then serializes that using
document.innerHTML. This is all very experimental, so don't be surprised by
rough edges.
2011-11-18 13:57:07 +00:00
Gabriel Wicke 11e487d8c0 Flatten inline token lists before merging text into text tokens. 2011-11-17 15:43:31 +00:00
Gabriel Wicke ea87e7aaee Convert PEG parser to tokenizer for back-end HTML parser. Now emits a list of
tokens, which for now is still completely built before parsing can proceed.
For each top-level block, the source start/end positions are added as
attributes to the top-most tokens. No tracking of wiki vs. html syntax yet.
2011-11-17 15:26:02 +00:00
Gabriel Wicke ef3c84bd2e Extract text from inline elements for better testing. Slightly improved
handling of comment-only lines. Change pre to leaf content model.
2011-11-08 16:08:05 +00:00
Gabriel Wicke 18ead89b37 Improved paragraph, br, comment parsing and switched headings to
generic inlineline with syntactic flags.
2011-11-07 23:09:30 +00:00
Gabriel Wicke 944d010eb2 Indentation cleanup in PEG parser and Html serializer 2011-11-07 21:05:37 +00:00
Gabriel Wicke c3a0c56e56 rename definition{term,description} to just {term,description} 2011-11-07 20:36:34 +00:00
Gabriel Wicke 71891131c3 Grammar improvements
* replaced regexp stack with a set of break rules for inline content within
  specialized parse contexts, switched more rules to generic
  inlineline/inline/block rules.
* don't consume end-of-line for proper start-of-line matching
* added some pre support
* still no conversion of inline elements to annotations
2011-11-07 14:39:12 +00:00
Gabriel Wicke 06ca9f12fe Rename definitiondata to definitiondescription, minor fixes 2011-11-04 12:25:01 +00:00
Gabriel Wicke 7e5c196732 Some more progress for tables and definition lists 2011-11-04 12:06:49 +00:00
Gabriel Wicke 83a80bad49 Fixes for definition lists 2011-11-04 11:08:11 +00:00
Gabriel Wicke 85def70a8a Add basic list serialization to HtmlSerializer
* Added 'definitionterm' and 'definitiondata' styles to support definition
  lists, and special-case handling in the serializer to wrap both in dls.
2011-11-04 10:02:59 +00:00
Gabriel Wicke 63398b5749 Update parserTests to latest serializers 2011-11-04 07:45:05 +00:00
Gabriel Wicke a8838dab18 Start by handling paragraphs, at least a bit. 2011-11-03 15:16:05 +00:00
Gabriel Wicke 0d30a5528e First combination of WikiDom serializers with existing parser in
tests/parser/parserTests.js.

* Removed var from es in es.js to allow node.js to access it as global. Only
  alternative solution appears to be a node-specific 'exports' construct:
  http://nodejs.org/docs/v0.3.1/api/modules.html
* Added es.Document.js and es.Document.Serializer.js in es/bases. Not sure if
  this is the desired location.
* Changed es.extend to es.extendClass in the serializers
* Modified the first parser test to include the WikiDom modules and call the
  new HTML serializer
2011-11-03 13:55:48 +00:00
Trevor Parscal 5bae153214 Moving parser stuff back into the modules folder (oops) 2011-11-02 21:45:57 +00:00
Trevor Parscal 2b499d5990 Reorganized modules by javascript namespace 2011-11-02 21:31:45 +00:00
Brion Vibber 213ee7d4a8 followup r101685: the peg definition 2011-11-02 21:09:19 +00:00