Commit graph

1256 commits

Author SHA1 Message Date
Gabriel Wicke b2adee0ae7 Basic rt support for indent pre variant
* Added a generic stx_v 'syntax variant' round-trip attribute
* For pre, use stx:'html' vs. no syntax annotation. This might not be 100%
  safe for arbitrary html input, so we might want to flip this to stx:'wiki'
  later.
* 181 round-trip tests passing

Change-Id: If6080917a3a7c069066db3db60efe59b1f6c28d8
2012-05-25 18:55:38 +02:00
Gabriel Wicke c52c24b0cb Slightly improve formatting of web service; test commit message tweak
Change-Id: Ibac3ce3dd9aa2c4faf11eed351fea941ebf1e4b3
2012-05-25 16:36:14 +02:00
Gabriel Wicke c692bc2307 Use '/bin/sh' instead of '/bin/bash'
Bash omits the time output for some reason. We would like to keep a record of
performance too.

Change-Id: I7c435b1cf2e2f237f78a45b2819a195a240e3aa4
2012-05-25 16:19:46 +02:00
Gabriel Wicke a31ccaabe4 Support definition lists with empty definition
Change-Id: I81c39a7e49f2ea7ce32cdd3600caeb5eb9f50d84
2012-05-25 15:40:32 +02:00
Gabriel Wicke 1ce2bc605d Add a small test runner with result archiving in git repo
Supports both -> HTML DOM and round-trip testing. Displays the diff to the
last results using less -r.

Change-Id: Ib3fbadeda3c8f4f7e3d2e6e5236a73ff7a773623
2012-05-25 14:28:53 +02:00
Gabriel Wicke 06b51b1f3f Properly round-trip dd/dt; 178 round-trip tests passing.
Need to track variable whitespace before elements to make some more tests
pass.

Change-Id: Ia86535d6f352e2ffe7965547cd506b0dbb6dfba2
2012-05-25 13:59:55 +02:00
Gabriel Wicke 6f62878c78 Resolve subpage links, and remove hack for H: titles
Change-Id: I6c9c64179274e5c1641a3b127ac3b273a3c5254e
2012-05-24 17:57:41 +02:00
Gabriel Wicke dc61f313a2 Notes on missing parser functions, more error reporting tweaks
Change-Id: Ib6ce60cf1b55671a6ff57aa47edb5787ec3aefea
2012-05-24 17:31:26 +02:00
Gabriel Wicke cc10aab54f Add self alias
Change-Id: I47682f407da6b554179611c7d0f63f882ab5a871
2012-05-24 17:16:35 +02:00
Gabriel Wicke 13ae7cda11 A few (partly hackish) improvements
* Very basic support attribute key-value pairs emitted from templates
* Add TALKPAGENAME stub implementation
* Only show 'no revisions' message for top-level pages

Change-Id: I4b4ac0c7b2c0531ac4b39f0f49f4217302576ab9
2012-05-24 16:30:26 +02:00
Gabriel Wicke 540d14d8fe More tweaks to the intro message
Change-Id: I059ba4fc584eae45092376b5b2258df7ed52b55c
2012-05-24 15:10:03 +02:00
Gabriel Wicke 987ff8aa84 Add a longer welcome/help message and a link to the Parsoid docs
Change-Id: I39e4873908bc0619738353a55577aa81abb76287
2012-05-24 14:58:35 +02:00
Gabriel Wicke 3e0e11b1d0 Sanity check for tokens being an array
Change-Id: Ia4e4071e1469c31e3b320d854500938bb0245f82
2012-05-24 14:35:58 +02:00
Gabriel Wicke 93ce7453f0 Fake fullpagename et al a bit better
Change-Id: I85ddf9e88e5f8ac274f371bea0879600997001e4
2012-05-24 11:05:31 +02:00
Gabriel Wicke cdd1eca42d Fix non-existing revision error reporting
Change-Id: I6b8687bcde98b92d9d6217a738a177db279fd006
2012-05-24 10:50:47 +02:00
Gabriel Wicke f03fc39d15 Report missing revisions when retrieving templates
Change-Id: I9f33acafc4d3fbd062125d824e2614dafd4cd5a0
2012-05-24 10:45:01 +02:00
Gabriel Wicke caf2fa663d Keep going on tokenizer errors
Change-Id: I76fab4528f89b425845aef1685b3a54ddfeceef4
2012-05-24 10:30:32 +02:00
Gabriel Wicke a5c96be8b7 Improve shell wrapper for parsoid service
Change-Id: I92c536af60613705229faea97a26d1486d4730a1
2012-05-24 10:24:14 +02:00
Gabriel Wicke e70448e53a Use text/x-mediawiki content type, and handle tokenizer errors without --debug
Change-Id: I154cd344306aa05ada7ff30f631d487f39fa9739
2012-05-24 10:19:25 +02:00
Translation updater bot c5cd131e1c Localisation updates from http://translatewiki.net.
Change-Id: Ida79fff5a8ef258ef2c8b8f666909618c3ab21d2
2012-05-23 19:21:14 +00:00
Gabriel Wicke 4cc2d25e70 Fix a debug print reference error
Change-Id: Ic26d29aced4129c3dd718c4751dadb62a0be1a27
2012-05-23 20:52:45 +02:00
Gabriel Wicke 6dac7de2f4 Capital T in second part of Content-Type
Change-Id: I20a9af9a8c0e05e96c77c9ac7566ea7bc1a6dabd
2012-05-23 18:12:09 +02:00
Gabriel Wicke d6af3b3375 Improve the serializer and its output display in the web service
Change-Id: Id3ca96846cad42517d7d4bada8f4bb250d54247b
2012-05-23 17:50:35 +02:00
Gabriel Wicke 95496c02db Add an extra newline before headings, and ignore favicon.ico requests
Change-Id: Ibacac3453afefa5dbe803c1e0260e8c943785f12
2012-05-23 17:17:54 +02:00
Gabriel Wicke e2ee66e532 Start slightly more workers than there are CPUs
Change-Id: I7381e09ead0420d5d0b8c7dd3045c88c3cbfaa87
2012-05-23 16:46:08 +02:00
Gabriel Wicke 21286a50df Make sure pageName is set in the web service, and handle empty page name in parser function
Change-Id: I5d36eefecc2f35a860d00a8960004f8e651ed17c
2012-05-23 16:43:45 +02:00
Gabriel Wicke a862718ad8 Add some checks against undefined tokens returned from async transforms
Change-Id: Ie19537083b96b1b2e12e1c4b65a7a044753c18ac
2012-05-23 16:32:21 +02:00
Gabriel Wicke a4c5d43ff7 Fix an external link regression, and add server shell wrapper and setup docs
Change-Id: I9a4f7690e98313d003a2fec35324ed70556e6461
2012-05-23 16:25:42 +02:00
Gabriel Wicke b89f5071e5 Basic parser / serializer web service
* After installing Parsoid (sudo npm install -g in modules/parser), run 'node
  server.js' from the api directory and navigate to http://localhost:8000/ and
  follow the directions. You can start to navigate the English wikipedia at
  http://localhost:8000/Main_Page, or manually enter wikitext or HTML DOM to
  convert.
* Uses the express framework, could also use just connect
* Uses the cluster module to manage workers per-core and restart those on
  failure

Change-Id: I443f2996ed3df00826b038b7476a2f966ab0c425
2012-05-23 12:35:00 +02:00
Gabriel Wicke febb912ead No end delimiter after template row attributes
Change-Id: Iba304fb797d221e2d65ae055d266bff2f6301df8
2012-05-23 09:30:07 +02:00
Gabriel Wicke 39c6f42879 Link round-tripping and other improvements
* Changed RDFa for links according to
  http://www.mediawiki.org/wiki/Parsoid/RDFa_vocabulary
* Added basic support for internal/external link serialization
* Moved numbering of external links from tokenizer to LinkHandler
* Added round-tripping for generic HTML tags
* Replaced nowiki tag with <meta typeOf="mw:tag" content="nowiki"> and <meta
  typeOf="mw:tag" content="/nowiki"> for now.
* 154 round-trip tests passing (node parserTests.js --roundtrip).

Change-Id: I16c4db21b1b543ee57c73e569c83025b64664542
2012-05-22 13:36:06 +02:00
Gabriel Wicke 7e21b7380a Merge "Round-trip nowiki" 2012-05-21 17:16:56 +00:00
Gabriel Wicke bd30e51b38 Merge "Serializer and table round-tripping improvements" 2012-05-21 16:38:20 +00:00
Siebrand cfac325df5 Merge "Add .gitignore" 2012-05-21 16:25:19 +00:00
Gabriel Wicke fb7d5418a5 Round-trip nowiki
Change-Id: I5f7e6a43f5fdc1708ee710b2a601b20db733452c
2012-05-21 18:06:09 +02:00
Gabriel Wicke a6610e52c2 Serializer and table round-tripping improvements
* added stx: 'html' round-trip information for html tags
* added t_stx: 'row' info for row-wise table wiki syntax, and support for it
  in the serializer
* the first table row is implicit in wikitext
* renamed lastToken to prevToken in serializer
* strip first newline in an initial chunkCB

Change-Id: I014b046539d1b674d830551c5fd1b74a67f81993
2012-05-21 14:59:53 +02:00
Gabriel Wicke e069e7cb1c Merge "Support table captions and properly delimit the end of table options" 2012-05-21 12:51:58 +00:00
Reedy 2b08cf237b Add .gitignore
Change-Id: I48c8b2939ef7c749a9b0a718f77e58238c01889e
2012-05-21 01:50:53 +01:00
Gabriel Wicke 54e75b93b7 Support table captions and properly delimit the end of table options
Change-Id: I15eb8df19528cfceadfee368370501b30f0e36a0
2012-05-18 10:46:43 +02:00
Gabriel Wicke c39eb36968 Use outerHTML to serialize unhandled DOM node in serializer
Change-Id: I37350712c9450c34025740a8d6de51344739c2b7
2012-05-18 10:03:16 +02:00
Gabriel Wicke 3c6d829708 Fix first bug caught by new roundtrip mode for parserTests
Change-Id: Id152fd29606d8ee34ac300945f41e2a5f48f087f
2012-05-18 09:55:22 +02:00
GWicke 5ae2653697 Merge "First pass updating parserTests to verify dom->wikitext serialization." 2012-05-18 07:53:40 +00:00
Subramanya Sastry 9b84e931db First pass updating parserTests to verify dom->wikitext serialization.
- Just a quick first pass updating the parserTests.js script so we can
  test DOM -> wikitext serialization (but which in effect really tests
  roundtripping).
- There is no output normalization yet which is needed for now since we
  are not yet preserving white-space.

Change-Id: Ie52058e0dc3330f852c24fa05641dced19f950e0
2012-05-18 09:51:22 +02:00
Subramanya Sastry ae4810b201 Renamed items to itemCount for better code readability.
Change-Id: I53851c07a4746928fddec4b3737136f081d49178
2012-05-17 12:32:46 -05:00
Subramanya Sastry 58da03bc85 Track list prefixes in the list start handler and use them to output
serialized text in list item handlers.

Change-Id: Ic7562d531d2313bedcf3b7450b4f28f02bc2b5a3
2012-05-17 12:12:46 -05:00
Gabriel Wicke 04fc74c76a Strip RDFa attributes in parserTests
We are adding some extra information in those, which should not make tests
fail.

Change-Id: I42cca596330252efeff5d51508f97ef1c566475b
2012-05-17 17:03:44 +02:00
Gabriel Wicke e2815b516c Start to handle links
Change-Id: I1fb975910651820fd889d77152562fd4fbcb5db8
2012-05-17 14:32:56 +02:00
Gabriel Wicke b7fd4498a9 Use single _serializeToken handler for both DOM and tokens
Change-Id: I45e1d90b53a5ddc678f7744f27274bebcfc375fe
2012-05-17 13:20:39 +02:00
Gabriel Wicke 8dbc2f573f Simplistic wikitext round-tripping with parse.js --wikitext
Lists are a bit tricky, as nested lists are not wrapped in a separate list
item. Should work now though.

Change-Id: I2e5f29f6afa6bdd2d5e5c0c5d019b70c611b73d1
2012-05-17 12:44:46 +02:00
Gabriel Wicke 3414418b1f Don't eat newline tokens in the ListHandler
This fix only affects following transforms, of which there are few right now.
Also removed a stray token mutation in QuoteTransformer.

Change-Id: Id6d4adce944b06fc1a3651cfbf63fc2670125225
2012-05-16 23:14:21 +02:00