This mode strips all newlines from the html source before serializing it back
to wikitext, thus simulating newline-less DOM output from the VE. This
simplistic method also strips newlines in preformatted text, which will show
up as noise in the diff. This simple mode is still useful for the
identification of basic newline-less DOM serialization issues.
An improved version could try to approximate the VE's behavior more closely by
only stripping some newlines.
Due to the experimental nature this mode is not linked from the index page for
now.
Change-Id: I1dfec7ec3e6c12b7de4bbb9ff6f2d8b7834e2857
We should probably add some sort of LocalSetting.php type system for environment specific settings like this and git ignore it
Change-Id: I362296ea286d0348e023d1673bba55e098229280
* Moved wikipedia default prefixes to environment
* Added 'addInterwiki' method
* Adjusted link handling normalizeTitle to reflect this
Change-Id: If5b2314cc36346b6da8649ed410457a612d80a22
* mw:Foo now loads pages from mediawiki.org
* The default prefix still is 'en'. You can switch this to 'mw' in ParserService.js.
Change-Id: I1208667e6114bd711b7988a8b3adb32ffab70969
The improved merge algorithm now makes diffChars output more palatable. Things
could still be improved by collecting single-character 'neutral' changes in a
block of 'add' changes and converting them to adds / removes.
Change-Id: I8439e8acab4360c08b89d9ce8a6b8523e7a0a210
- Check if consecutive diffs are separate by 1 word in addition
to max 3 chars. This takes care of diffs introduced by template diffs
separated by the template name and creates a clean single diff.
Change-Id: I9181d2ed9a07bee6ca5d5ebd6ddea84f7e2cecac
* Attempt to accumulate consecutive add-delete pairs
with "short text" separating the pairs. This is equivalent to
the <b><i> ... </i></b> minimization to expand range of
<b> and <i> tags, except there is no optimal solution except
as determined by heuristics ("short text": <= 2 chars).
Change-Id: I408e318c315eba18aac4051ed84d77e3e092d497
Pages titles with a wikipedia interwiki prefix now load the page from
corresponding Wikipedia. Links in a page then stay within the given language.
Note that Parsoid currently makes no effort to recognize localized namespaces,
so it won't render media files, categories etc correctly.
Change-Id: I7bc4102e81a402772ea23231170734d580ea15b9
The char-based diff looked good in some pages, but yielded terrible results in
others. The word-based algo is more consistent overall.
Change-Id: I7f2d40315ad96df037c2d9a1d50739e3d21b6c81
The word or char-based algorithm does not scale well beyond 5k chars or so. We
now perform a line-based diff and then continue to diff the line differences
using the char-based algorithm. This gives a char-based diff even for bigger
inputs.
Change-Id: Iec87ca56540060e4df2859ba54c992e7ff5cfe10
* Stay in round-trip mode in HTML DOM output
* Return DOM, wikitext and diff as soon as they are available
Change-Id: I7f8f44cfe8eed63a521d1318d116c22232cb6b1b
* After installing Parsoid (sudo npm install -g in modules/parser), run 'node
server.js' from the api directory and navigate to http://localhost:8000/ and
follow the directions. You can start to navigate the English wikipedia at
http://localhost:8000/Main_Page, or manually enter wikitext or HTML DOM to
convert.
* Uses the express framework, could also use just connect
* Uses the cluster module to manage workers per-core and restart those on
failure
Change-Id: I443f2996ed3df00826b038b7476a2f966ab0c425