Commit graph

25 commits

Author SHA1 Message Date
Gabriel Wicke bc1a77a812 Make modified newlines visible by replacing empty lines with a space
Change-Id: If7b811245e0d01a7a147ab54c3801fc1754730a9
2012-06-06 17:11:29 +02:00
Gabriel Wicke 1876d785a7 Swap ins/del in the diff
Change-Id: Id336d713d1767a4b7859b158f2c2ddf9adc11cfb
2012-06-06 16:02:54 +02:00
Subramanya Sastry bff08b799e Improvement to the refineDiffs function to improve diff quality.
* Attempt to accumulate consecutive add-delete pairs
  with "short text" separating the pairs.  This is equivalent to
  the <b><i> ... </i></b> minimization to expand range of
  <b> and <i> tags, except there is no optimal solution except
  as determined by heuristics ("short text": <= 2 chars).

Change-Id: I408e318c315eba18aac4051ed84d77e3e092d497
2012-06-06 00:08:00 -05:00
Gabriel Wicke c1d8270bdb Fix wgScriptPath in round-trip mode without interwiki
Change-Id: I7cc80b7be1afffc586a2ea45d21303e9ba07c0d4
2012-06-05 12:11:45 +02:00
Gabriel Wicke 3346aed86e Support interwiki links, and some cleanup
Change-Id: I205c53a03f5230e3ef9100487f4934f97bdc179a
2012-06-05 12:05:33 +02:00
Gabriel Wicke cc96ff4f5e Very basic interwiki support
Pages titles with a wikipedia interwiki prefix now load the page from
corresponding Wikipedia. Links in a page then stay within the given language.

Note that Parsoid currently makes no effort to recognize localized namespaces,
so it won't render media files, categories etc correctly.

Change-Id: I7bc4102e81a402772ea23231170734d580ea15b9
2012-06-05 11:19:58 +02:00
Gabriel Wicke 0eabd2c67e Add round-trip form and split out rt diffing
Change-Id: I3bc8ad7f273937ce6c767b8d7bbccdc86cbd93b4
2012-06-04 10:49:59 +02:00
Gabriel Wicke 99c98d6c56 Diff refinement fixes
Change-Id: I11c69de0fdcd636ccd11cd0b6cb16c5acdb188b3
2012-06-04 10:16:05 +02:00
Gabriel Wicke d2602c47a6 Switch back to word-based diff
The char-based diff looked good in some pages, but yielded terrible results in
others. The word-based algo is more consistent overall.

Change-Id: I7f2d40315ad96df037c2d9a1d50739e3d21b6c81
2012-06-04 00:02:49 +02:00
Gabriel Wicke d01581c380 Create a 'refinement diff' algorithm
The word or char-based algorithm does not scale well beyond 5k chars or so. We
now perform a line-based diff and then continue to diff the line differences
using the char-based algorithm. This gives a char-based diff even for bigger
inputs.

Change-Id: Iec87ca56540060e4df2859ba54c992e7ff5cfe10
2012-06-03 23:46:57 +02:00
Gabriel Wicke b11b8d8a6b Revert to line diff, word diff explodes on some pages
Change-Id: Ic338498b47bb6b6c98fa6280f44464cd70a48b1b
2012-06-03 11:39:03 +02:00
Gabriel Wicke b5e067e086 Some more web service tweaks
* Stay in round-trip mode in HTML DOM output
* Return DOM, wikitext and diff as soon as they are available

Change-Id: I7f8f44cfe8eed63a521d1318d116c22232cb6b1b
2012-06-03 11:04:40 +02:00
Gabriel Wicke 7c18891504 Snazzy html word diff for roundtrip view
Also show the HTML DOM, Wikitext output and diff.

Change-Id: Ibe744fbc895239f4e48f6e0e2f2b2f345c0845bd
2012-06-03 01:36:56 +02:00
Gabriel Wicke 4cf74497b7 Update web service start page documentation
Change-Id: I38efc5a9d5b919c6168cf97d0efbae9db967e351
2012-06-02 17:17:37 +02:00
Gabriel Wicke 7c7ddd22a7 Retrieve content from the main namespace instead of templates
Change-Id: Id917fa617d6fba1e1b290b2ed20c24aed24d39d2
2012-06-02 16:48:00 +02:00
Gabriel Wicke d3975a8d03 Very basic round-trip test mode for the API
Returns both the resulting wikitext and the diff with the original input.

Change-Id: Iad25039beb054a84e1ad51ffa9fee924db49c60b
2012-06-02 16:20:54 +02:00
Gabriel Wicke c52c24b0cb Slightly improve formatting of web service; test commit message tweak
Change-Id: Ibac3ce3dd9aa2c4faf11eed351fea941ebf1e4b3
2012-05-25 16:36:14 +02:00
Gabriel Wicke 540d14d8fe More tweaks to the intro message
Change-Id: I059ba4fc584eae45092376b5b2258df7ed52b55c
2012-05-24 15:10:03 +02:00
Gabriel Wicke 987ff8aa84 Add a longer welcome/help message and a link to the Parsoid docs
Change-Id: I39e4873908bc0619738353a55577aa81abb76287
2012-05-24 14:58:35 +02:00
Gabriel Wicke e70448e53a Use text/x-mediawiki content type, and handle tokenizer errors without --debug
Change-Id: I154cd344306aa05ada7ff30f631d487f39fa9739
2012-05-24 10:19:25 +02:00
Gabriel Wicke 6dac7de2f4 Capital T in second part of Content-Type
Change-Id: I20a9af9a8c0e05e96c77c9ac7566ea7bc1a6dabd
2012-05-23 18:12:09 +02:00
Gabriel Wicke d6af3b3375 Improve the serializer and its output display in the web service
Change-Id: Id3ca96846cad42517d7d4bada8f4bb250d54247b
2012-05-23 17:50:35 +02:00
Gabriel Wicke 95496c02db Add an extra newline before headings, and ignore favicon.ico requests
Change-Id: Ibacac3453afefa5dbe803c1e0260e8c943785f12
2012-05-23 17:17:54 +02:00
Gabriel Wicke 21286a50df Make sure pageName is set in the web service, and handle empty page name in parser function
Change-Id: I5d36eefecc2f35a860d00a8960004f8e651ed17c
2012-05-23 16:43:45 +02:00
Gabriel Wicke b89f5071e5 Basic parser / serializer web service
* After installing Parsoid (sudo npm install -g in modules/parser), run 'node
  server.js' from the api directory and navigate to http://localhost:8000/ and
  follow the directions. You can start to navigate the English wikipedia at
  http://localhost:8000/Main_Page, or manually enter wikitext or HTML DOM to
  convert.
* Uses the express framework, could also use just connect
* Uses the cluster module to manage workers per-core and restart those on
  failure

Change-Id: I443f2996ed3df00826b038b7476a2f966ab0c425
2012-05-23 12:35:00 +02:00