Commit graph

1248 commits

Author SHA1 Message Date
Gabriel Wicke cc10aab54f Add self alias
Change-Id: I47682f407da6b554179611c7d0f63f882ab5a871
2012-05-24 17:16:35 +02:00
Gabriel Wicke 13ae7cda11 A few (partly hackish) improvements
* Very basic support attribute key-value pairs emitted from templates
* Add TALKPAGENAME stub implementation
* Only show 'no revisions' message for top-level pages

Change-Id: I4b4ac0c7b2c0531ac4b39f0f49f4217302576ab9
2012-05-24 16:30:26 +02:00
Gabriel Wicke 540d14d8fe More tweaks to the intro message
Change-Id: I059ba4fc584eae45092376b5b2258df7ed52b55c
2012-05-24 15:10:03 +02:00
Gabriel Wicke 987ff8aa84 Add a longer welcome/help message and a link to the Parsoid docs
Change-Id: I39e4873908bc0619738353a55577aa81abb76287
2012-05-24 14:58:35 +02:00
Gabriel Wicke 3e0e11b1d0 Sanity check for tokens being an array
Change-Id: Ia4e4071e1469c31e3b320d854500938bb0245f82
2012-05-24 14:35:58 +02:00
Gabriel Wicke 93ce7453f0 Fake fullpagename et al a bit better
Change-Id: I85ddf9e88e5f8ac274f371bea0879600997001e4
2012-05-24 11:05:31 +02:00
Gabriel Wicke cdd1eca42d Fix non-existing revision error reporting
Change-Id: I6b8687bcde98b92d9d6217a738a177db279fd006
2012-05-24 10:50:47 +02:00
Gabriel Wicke f03fc39d15 Report missing revisions when retrieving templates
Change-Id: I9f33acafc4d3fbd062125d824e2614dafd4cd5a0
2012-05-24 10:45:01 +02:00
Gabriel Wicke caf2fa663d Keep going on tokenizer errors
Change-Id: I76fab4528f89b425845aef1685b3a54ddfeceef4
2012-05-24 10:30:32 +02:00
Gabriel Wicke a5c96be8b7 Improve shell wrapper for parsoid service
Change-Id: I92c536af60613705229faea97a26d1486d4730a1
2012-05-24 10:24:14 +02:00
Gabriel Wicke e70448e53a Use text/x-mediawiki content type, and handle tokenizer errors without --debug
Change-Id: I154cd344306aa05ada7ff30f631d487f39fa9739
2012-05-24 10:19:25 +02:00
Translation updater bot c5cd131e1c Localisation updates from http://translatewiki.net.
Change-Id: Ida79fff5a8ef258ef2c8b8f666909618c3ab21d2
2012-05-23 19:21:14 +00:00
Gabriel Wicke 4cc2d25e70 Fix a debug print reference error
Change-Id: Ic26d29aced4129c3dd718c4751dadb62a0be1a27
2012-05-23 20:52:45 +02:00
Gabriel Wicke 6dac7de2f4 Capital T in second part of Content-Type
Change-Id: I20a9af9a8c0e05e96c77c9ac7566ea7bc1a6dabd
2012-05-23 18:12:09 +02:00
Gabriel Wicke d6af3b3375 Improve the serializer and its output display in the web service
Change-Id: Id3ca96846cad42517d7d4bada8f4bb250d54247b
2012-05-23 17:50:35 +02:00
Gabriel Wicke 95496c02db Add an extra newline before headings, and ignore favicon.ico requests
Change-Id: Ibacac3453afefa5dbe803c1e0260e8c943785f12
2012-05-23 17:17:54 +02:00
Gabriel Wicke e2ee66e532 Start slightly more workers than there are CPUs
Change-Id: I7381e09ead0420d5d0b8c7dd3045c88c3cbfaa87
2012-05-23 16:46:08 +02:00
Gabriel Wicke 21286a50df Make sure pageName is set in the web service, and handle empty page name in parser function
Change-Id: I5d36eefecc2f35a860d00a8960004f8e651ed17c
2012-05-23 16:43:45 +02:00
Gabriel Wicke a862718ad8 Add some checks against undefined tokens returned from async transforms
Change-Id: Ie19537083b96b1b2e12e1c4b65a7a044753c18ac
2012-05-23 16:32:21 +02:00
Gabriel Wicke a4c5d43ff7 Fix an external link regression, and add server shell wrapper and setup docs
Change-Id: I9a4f7690e98313d003a2fec35324ed70556e6461
2012-05-23 16:25:42 +02:00
Gabriel Wicke b89f5071e5 Basic parser / serializer web service
* After installing Parsoid (sudo npm install -g in modules/parser), run 'node
  server.js' from the api directory and navigate to http://localhost:8000/ and
  follow the directions. You can start to navigate the English wikipedia at
  http://localhost:8000/Main_Page, or manually enter wikitext or HTML DOM to
  convert.
* Uses the express framework, could also use just connect
* Uses the cluster module to manage workers per-core and restart those on
  failure

Change-Id: I443f2996ed3df00826b038b7476a2f966ab0c425
2012-05-23 12:35:00 +02:00
Gabriel Wicke febb912ead No end delimiter after template row attributes
Change-Id: Iba304fb797d221e2d65ae055d266bff2f6301df8
2012-05-23 09:30:07 +02:00
Gabriel Wicke 39c6f42879 Link round-tripping and other improvements
* Changed RDFa for links according to
  http://www.mediawiki.org/wiki/Parsoid/RDFa_vocabulary
* Added basic support for internal/external link serialization
* Moved numbering of external links from tokenizer to LinkHandler
* Added round-tripping for generic HTML tags
* Replaced nowiki tag with <meta typeOf="mw:tag" content="nowiki"> and <meta
  typeOf="mw:tag" content="/nowiki"> for now.
* 154 round-trip tests passing (node parserTests.js --roundtrip).

Change-Id: I16c4db21b1b543ee57c73e569c83025b64664542
2012-05-22 13:36:06 +02:00
Gabriel Wicke 7e21b7380a Merge "Round-trip nowiki" 2012-05-21 17:16:56 +00:00
Gabriel Wicke bd30e51b38 Merge "Serializer and table round-tripping improvements" 2012-05-21 16:38:20 +00:00
Siebrand cfac325df5 Merge "Add .gitignore" 2012-05-21 16:25:19 +00:00
Gabriel Wicke fb7d5418a5 Round-trip nowiki
Change-Id: I5f7e6a43f5fdc1708ee710b2a601b20db733452c
2012-05-21 18:06:09 +02:00
Gabriel Wicke a6610e52c2 Serializer and table round-tripping improvements
* added stx: 'html' round-trip information for html tags
* added t_stx: 'row' info for row-wise table wiki syntax, and support for it
  in the serializer
* the first table row is implicit in wikitext
* renamed lastToken to prevToken in serializer
* strip first newline in an initial chunkCB

Change-Id: I014b046539d1b674d830551c5fd1b74a67f81993
2012-05-21 14:59:53 +02:00
Gabriel Wicke e069e7cb1c Merge "Support table captions and properly delimit the end of table options" 2012-05-21 12:51:58 +00:00
Reedy 2b08cf237b Add .gitignore
Change-Id: I48c8b2939ef7c749a9b0a718f77e58238c01889e
2012-05-21 01:50:53 +01:00
Gabriel Wicke 54e75b93b7 Support table captions and properly delimit the end of table options
Change-Id: I15eb8df19528cfceadfee368370501b30f0e36a0
2012-05-18 10:46:43 +02:00
Gabriel Wicke c39eb36968 Use outerHTML to serialize unhandled DOM node in serializer
Change-Id: I37350712c9450c34025740a8d6de51344739c2b7
2012-05-18 10:03:16 +02:00
Gabriel Wicke 3c6d829708 Fix first bug caught by new roundtrip mode for parserTests
Change-Id: Id152fd29606d8ee34ac300945f41e2a5f48f087f
2012-05-18 09:55:22 +02:00
GWicke 5ae2653697 Merge "First pass updating parserTests to verify dom->wikitext serialization." 2012-05-18 07:53:40 +00:00
Subramanya Sastry 9b84e931db First pass updating parserTests to verify dom->wikitext serialization.
- Just a quick first pass updating the parserTests.js script so we can
  test DOM -> wikitext serialization (but which in effect really tests
  roundtripping).
- There is no output normalization yet which is needed for now since we
  are not yet preserving white-space.

Change-Id: Ie52058e0dc3330f852c24fa05641dced19f950e0
2012-05-18 09:51:22 +02:00
Subramanya Sastry ae4810b201 Renamed items to itemCount for better code readability.
Change-Id: I53851c07a4746928fddec4b3737136f081d49178
2012-05-17 12:32:46 -05:00
Subramanya Sastry 58da03bc85 Track list prefixes in the list start handler and use them to output
serialized text in list item handlers.

Change-Id: Ic7562d531d2313bedcf3b7450b4f28f02bc2b5a3
2012-05-17 12:12:46 -05:00
Gabriel Wicke 04fc74c76a Strip RDFa attributes in parserTests
We are adding some extra information in those, which should not make tests
fail.

Change-Id: I42cca596330252efeff5d51508f97ef1c566475b
2012-05-17 17:03:44 +02:00
Gabriel Wicke e2815b516c Start to handle links
Change-Id: I1fb975910651820fd889d77152562fd4fbcb5db8
2012-05-17 14:32:56 +02:00
Gabriel Wicke b7fd4498a9 Use single _serializeToken handler for both DOM and tokens
Change-Id: I45e1d90b53a5ddc678f7744f27274bebcfc375fe
2012-05-17 13:20:39 +02:00
Gabriel Wicke 8dbc2f573f Simplistic wikitext round-tripping with parse.js --wikitext
Lists are a bit tricky, as nested lists are not wrapped in a separate list
item. Should work now though.

Change-Id: I2e5f29f6afa6bdd2d5e5c0c5d019b70c611b73d1
2012-05-17 12:44:46 +02:00
Gabriel Wicke 3414418b1f Don't eat newline tokens in the ListHandler
This fix only affects following transforms, of which there are few right now.
Also removed a stray token mutation in QuoteTransformer.

Change-Id: Id6d4adce944b06fc1a3651cfbf63fc2670125225
2012-05-16 23:14:21 +02:00
Gabriel Wicke 542921b5a3 Removed html5 parser patch no longer needed with 0.3.8
Change-Id: Id8c23d34e8cca49a360f536e792144a85a8468a3
2012-05-16 12:06:42 +02:00
Mark Holmquist 96ee9ad45c Add a new wikitext serializer, with limited functionality.
This isn't finished at all, but Gabriel wants to take a crack at it,
so here it is!

Change-Id: I9732aa141f7c69a28c8f5978cb18180e93cb9eda
2012-05-15 10:41:28 -07:00
Gabriel Wicke d918fa18ac Big token transform framework overhaul part 2
* Tokens are now immutable. The progress of transformations is tracked on
  chunks instead of tokens. Tokenizer output is cached and can be directly
  returned without a need for cloning. Transforms are required to clone or
  newly create tokens they are modifying.

* Expansions per chunk are now shared between equivalent frames via a cache
  stored on the chunk itself. Equivalence of frames is not yet ideal though,
  as right now a hash tree of *unexpanded* arguments is used. This should be
  switched to a hash of the fully expanded local parameters instead.

* There is now a vastly improved maybeSyncReturn wrapper for async transforms
  that either forwards processing to the iterative transformTokens if the
  current transform is still ongoing, or manages a recursive transformation if
  needed.

* Parameters for parser functions are now wrapped in abstract Params and
  ParserValue objects, which support some handy on-demand *value* expansions.
  Keys are always expanded. Parser functions are converted to use these
  interfaces, and now properly expand their values in the correct frame.
  Making this expansion lazier is certainly possible, but would complicate
  transformTokens and other token-handling machinery. Need to investigate if
  it would really be worth it. Dead branch elimination is certainly a bigger
  win overall.

* Complex recursive asynchronous expansions should now be closer to correct
  for both the iterative (transformTokens) and recursive (maybeSyncReturn
  after transformTokens has returned) code paths.

* Performance degraded slightly. There are no micro-optimizations done yet
  and the shared expansion cache still has a low hit rate. The progress
  tracking on chunks is not yet perfect, so there are likely a lot of unneeded
  re-expansions that can be easily eliminated. There is also more debug
  tracing right now. Obama currently expands in 54 seconds on my laptop.

Change-Id: I4a603f3d3c70ca657ebda9fbb8570269f943d6b6
2012-05-15 17:05:47 +02:00
Translation updater bot 9314e76ce0 Localisation updates from http://translatewiki.net.
Change-Id: Ia63ca00e00816c1cd807532fc4589738f7a0be7b
2012-05-14 19:42:58 +00:00
Catrope c256ea7d71 Fix fatal error in parse.js
Trying something trivial like echo 'Hello world' | node parse.js
would throw TypeError: Function.prototype.apply: Arguments list has wrong type

Change-Id: Ia0a1154b0f3edbfb1f228a1d2072fced1b147141
2012-05-10 12:04:57 -07:00
Gabriel Wicke b1bd0d73ec Don't eat end token in ListHandler, and lazier Quote handler registration
* Setting the rank on tokens is still used currently, but will be phased out
  in favor of setting it on chunks. Tokens will be immutable to allow sharing
  and caching without a need for cloning.
* Only register for newline and end tokens in QuoteTransformer when active.

Change-Id: I2c45bc7e4a105219a1404ab221eed7f242128f1e
2012-05-10 09:47:53 +02:00
GWicke a909f1a10c Merge "List markup is created during the sync23 phase." 2012-05-08 09:45:59 +00:00
Adam Wight 0a7f0b7630 List markup is created during the sync23 phase.
This makes it possible to transclude list items from a template.

Note: "5 quotes" test is broken by this patch, it appears that ListHandler
newline processing is changing some state which mysteriously affects the
QuoteTransformer.  This is ominous, hopefully there's a simple explanation...

gwicke: fix a bug in tokenizer triggered by definition lists like this:
**; foo : bar

Change-Id: I4e3a86596fe9bffcbfc4bf22895362c3bf742bad
2012-05-08 11:39:36 +02:00