Commit graph

72 commits

Author SHA1 Message Date
GWicke b07fd93c75 Merge "Output processing time to console (only for article parsing)" 2012-07-23 22:14:32 +00:00
Alex Monk 199d6f742e addInterwiki doesn't exist, use setInterwiki
Change-Id: I844842141033bb7b7d76487934d5c2cdf58a7270
2012-07-21 18:03:12 +01:00
Catrope 6ca1b06641 Merge "Rename addInterwiki to setInterwiki and add removeInterwiki complement" 2012-07-20 22:41:05 +00:00
Gabriel Wicke b877906bfe Merge "Add localhost prefix pointing to http://localhost/w by default" 2012-07-19 04:33:55 +00:00
Gabriel Wicke f43f2d61c8 Add localhost prefix pointing to http://localhost/w by default
Change-Id: Ie77eff2997ea1e00dde3f2868e3c585221de42f4
2012-07-18 21:33:14 -07:00
Gabriel Wicke 9e0ee30446 Merge "Document GPL license for Parsoid" 2012-07-19 03:19:12 +00:00
Gabriel Wicke b91467e144 Document GPL license for Parsoid
This is just to avoid re-licensing along with VE. We want to be compatible
with MediaWiki core to make sure a closely-integrated C port is still
GPL-compatible. We could consider adding MIT to the JS implementation after
porting to C.

Change-Id: Ia83e8620e26c95625793438c4c5e8ddcf2702368
2012-07-18 20:14:13 -07:00
Subramanya Sastry fdb149e581 Output processing time to console (only for article parsing)
Change-Id: I1bef9fb6f0007505914cb7fd48da68ef3b36481d
2012-07-16 16:08:34 -05:00
Catrope c79b62f869 When the main server process is killed, kill the children too
Change-Id: If07a8c290d5c1d4ac977222fe8cd25ef345c4132
2012-07-14 16:08:30 -07:00
Gabriel Wicke ef92024b99 Rename addInterwiki to setInterwiki and add removeInterwiki complement
This makes it clear that setInterwiki can modify existing mappings, and adds a
similar method for the removal of existing mappings.

Change-Id: Ic603a4b2ccec35d086513fa7cf711702bfb2baa0
2012-07-13 12:40:30 -07:00
Gabriel Wicke 64d2a089d8 Make the parser service slightly more robust against serializer failures
Change-Id: I50624a56fd0319f6acb6fd1c171c7c6f92a97d31
2012-07-11 12:01:05 +02:00
Gabriel Wicke f731125804 Explain reasoning behind number of worker calculation
Change-Id: I92ae0bb6e02caef98b0d68de4424e775d8922651
2012-06-23 17:30:52 +02:00
Trevor Parscal 9f54bbfbfe Added support for localsettings.js and improved logging of instance status
Change-Id: Ia46ebb89fd704afe185ee13b726fb4ec044dff81
2012-06-21 15:29:40 -07:00
Gabriel Wicke 68d2ab7022 Add localhost interwiki prefix by default, and fix links after default change
It would be nicer if the editor used an explicit prefix instead of relying on
the Parsoid default. I left the default pointing to mediawiki.org for now, but
will switch it back to en.wikipedia.org after the release to fix existing
links pointing to the service.

Change-Id: I2abae56f171f62e789d31bba78155afd0eccd142
2012-06-21 09:53:18 +02:00
Catrope 46c7910c0f Use mediawiki.org as the default source in production
Change-Id: I949eebf82ea152b545e6f10764ba4a9865b46144
2012-06-20 20:09:43 -07:00
Catrope b9acd5cbc3 Revert "Added localhost to available sources and made it default"
We need to keep master deployable, and obviously this won't deploy
right. For now you'll need to keep a local hack in your own Parsoid, and
later we should have a settings file.

This was also breaking people's dev installs because not everyone uses
http://localhost/mediawiki

This reverts commit 0e0d9fbb50.

Change-Id: Iba69ee62432d8e5f489ee31d9280961a90b79c20
2012-06-20 20:05:29 -07:00
Catrope c97d0a5d9e Merge branch 'dmrewrite'
This merges essentially all editor development in the past two months

Change-Id: I2c8653effc4dbb01a6b99a2ac2b87d83fbafa405
2012-06-19 18:28:49 -07:00
Gabriel Wicke 9d25b975ae Switch back to line-based diff, refineDiff hides some diffs
Change-Id: Ia422b67d2a9010d583780f77df60fb19eab18239
2012-06-20 01:31:46 +02:00
Catrope dc3be737b4 Merge remote-tracking branch 'origin/master' into dmrewrite
Conflicts:
	VisualEditor.i18n.php
	api/ParserService.js
	modules/parser/mediawiki.WikitextSerializer.js

Change-Id: I47b299ff3a6d948dcbeaf53cde5786362b23f66c
2012-06-19 16:16:53 -07:00
GWicke d01911e18f Revert "Fixed newline stripping in rtve mode."
This reverts commit bb7d7c09a5
2012-06-19 18:02:37 +00:00
Subramanya Sastry bb7d7c09a5 Fixed newline stripping in rtve mode.
- Only strip newlines after ">" chars (still not robust,
  but better than stripping everywhere). This prevents
  useless/incorrect diffs in rtve mode and lets us identify
  real bugs.

Change-Id: Iab7b41c4b3d6351c090f8d3a3070330325e876d4
2012-06-19 12:34:42 -05:00
Gabriel Wicke 9e2a47d540 Switch diff algo back to diffWords by default
Faster than diffChars, and still easier to read than diffLines.

Change-Id: Id450a2f8a098bb0a71ccf54616f82dad4f25441c
2012-06-19 00:21:34 +02:00
Gabriel Wicke 97fb2d3c0d Serializer refactoring
* tokens are not modified any more (they are supposed to be immutable)
* handler info is now split in start / end objects and potentially a 'make'
  method; added more flags to govern the newline behavior of different tags
* added a generic singleLine mode for single-line syntactical environments
* switched the web service to line-based diffs to avoid issues when diffing
  the round-trip results of [[:en:Programming language]]
* 280 round-trip tests are passing now

Change-Id: I74b8ffbf69643c5d6e5ec852ec58e680c9018901
2012-06-18 21:52:15 +02:00
Gabriel Wicke 1d2866f105 Experimental /_rtve/ round-trip test mode for web API
This mode strips all newlines from the html source before serializing it back
to wikitext, thus simulating newline-less DOM output from the VE. This
simplistic method also strips newlines in preformatted text, which will show
up as noise in the diff. This simple mode is still useful for the
identification of basic newline-less DOM serialization issues.

An improved version could try to approximate the VE's behavior more closely by
only stripping some newlines.

Due to the experimental nature this mode is not linked from the index page for
now.

Change-Id: I1dfec7ec3e6c12b7de4bbb9ff6f2d8b7834e2857
2012-06-18 11:25:21 -07:00
Gabriel Wicke 910f2ed87a Experimental /_rtve/ round-trip test mode for web API
This mode strips all newlines from the html source before serializing it back
to wikitext, thus simulating newline-less DOM output from the VE. This
simplistic method also strips newlines in preformatted text, which will show
up as noise in the diff. This simple mode is still useful for the
identification of basic newline-less DOM serialization issues.

An improved version could try to approximate the VE's behavior more closely by
only stripping some newlines.

Due to the experimental nature this mode is not linked from the index page for
now.

Change-Id: I1dfec7ec3e6c12b7de4bbb9ff6f2d8b7834e2857
2012-06-17 17:40:48 +02:00
Trevor Parscal 8cb5cbf75d Merge branch 'refs/heads/master' into dmrewrite 2012-06-13 15:00:44 -07:00
Subramanya Sastry 51958e4c6a Removed unused parser pipeline construction
Change-Id: Id2a7dde895b7c3fbf776a2035009686afd4301df
2012-06-13 15:08:13 -05:00
Catrope 6bf79475e4 Add suggested development configuration in comments
Change-Id: I3ec8c5326faced6d4b5c878f26f37a281b03bd95
2012-06-12 19:19:48 -07:00
Trevor Parscal 0e0d9fbb50 Added localhost to available sources and made it default
We should probably add some sort of LocalSetting.php type system for environment specific settings like this and git ignore it

Change-Id: I362296ea286d0348e023d1673bba55e098229280
2012-06-08 15:21:20 -07:00
Trevor Parscal faa58d1e4a Merge branch 'refs/heads/master' into dmrewrite 2012-06-08 12:29:57 -07:00
Gabriel Wicke 3f61dc9821 Link talk page separately
Change-Id: Ib839f619e7e14ccf0ef698fc2e780ef4b0d65505
2012-06-07 13:42:05 +02:00
Gabriel Wicke 3549a16085 Add a 'report issue' link below round-trip results
Change-Id: I5e3a785a328af0debcf83dc2038b5e5417fa5158
2012-06-07 13:37:40 +02:00
Gabriel Wicke bec7fb2f8c Mention citations as not round-tripping
Change-Id: I57e25f6f4072bae2f5681b8611e98f899875d1e2
2012-06-07 13:18:44 +02:00
Gabriel Wicke 76cca063ba Add hint on where to support issues in web service entry page
* Explain what we are currently interested in and link to
  :mw:Talk:Parsoid/Todo.

Change-Id: I747c6ee8a021a7a73ec91b73281c1c679a00da8f
2012-06-07 13:16:05 +02:00
Gabriel Wicke 1ca586e5f1 Improve interwiki config a bit
* Moved wikipedia default prefixes to environment
* Added 'addInterwiki' method
* Adjusted link handling normalizeTitle to reflect this

Change-Id: If5b2314cc36346b6da8649ed410457a612d80a22
2012-06-07 12:30:16 +02:00
Gabriel Wicke 2fa5baabbb Make it easier to configure the default wiki, and add support for mediawiki.org
* mw:Foo now loads pages from mediawiki.org
* The default prefix still is 'en'. You can switch this to 'mw' in ParserService.js.

Change-Id: I1208667e6114bd711b7988a8b3adb32ffab70969
2012-06-07 11:50:40 +02:00
Catrope 3383dc9011 Delete old ApiQueryParseTree module, unused
Change-Id: Ib39ce97a44fe63a8f38efc4935a215b5d2b854fa
2012-06-06 16:11:29 -07:00
Gabriel Wicke 413df0c471 Strip \r from form input- we normalize everything to Unix
Change-Id: I5cd255e1a7ab9958f120fad408362e6f709e4b91
2012-06-06 19:26:29 +02:00
Gabriel Wicke 47204c4ca0 Use diffChars instead of diffWords, as the former misses some changes
The improved merge algorithm now makes diffChars output more palatable. Things
could still be improved by collecting single-character 'neutral' changes in a
block of 'add' changes and converting them to adds / removes.

Change-Id: I8439e8acab4360c08b89d9ce8a6b8523e7a0a210
2012-06-06 18:36:28 +02:00
Subramanya Sastry f8221b128b Used a more robust heuristic for merging consecutive diffs
- Check if consecutive diffs are separate by 1 word in addition
  to max 3 chars.  This takes care of diffs introduced by template diffs
  separated by the template name and creates a clean single diff.

Change-Id: I9181d2ed9a07bee6ca5d5ebd6ddea84f7e2cecac
2012-06-06 11:01:47 -05:00
Gabriel Wicke 2bc066b42d Up the diff merge size heuristic a bit and always use the same algorithm
Change-Id: I707c8a55ed1758cdd591d2fc95e03a360c8e76d1
2012-06-06 17:46:25 +02:00
Gabriel Wicke bc1a77a812 Make modified newlines visible by replacing empty lines with a space
Change-Id: If7b811245e0d01a7a147ab54c3801fc1754730a9
2012-06-06 17:11:29 +02:00
Gabriel Wicke 1876d785a7 Swap ins/del in the diff
Change-Id: Id336d713d1767a4b7859b158f2c2ddf9adc11cfb
2012-06-06 16:02:54 +02:00
Subramanya Sastry bff08b799e Improvement to the refineDiffs function to improve diff quality.
* Attempt to accumulate consecutive add-delete pairs
  with "short text" separating the pairs.  This is equivalent to
  the <b><i> ... </i></b> minimization to expand range of
  <b> and <i> tags, except there is no optimal solution except
  as determined by heuristics ("short text": <= 2 chars).

Change-Id: I408e318c315eba18aac4051ed84d77e3e092d497
2012-06-06 00:08:00 -05:00
Gabriel Wicke c1d8270bdb Fix wgScriptPath in round-trip mode without interwiki
Change-Id: I7cc80b7be1afffc586a2ea45d21303e9ba07c0d4
2012-06-05 12:11:45 +02:00
Gabriel Wicke 3346aed86e Support interwiki links, and some cleanup
Change-Id: I205c53a03f5230e3ef9100487f4934f97bdc179a
2012-06-05 12:05:33 +02:00
Gabriel Wicke cc96ff4f5e Very basic interwiki support
Pages titles with a wikipedia interwiki prefix now load the page from
corresponding Wikipedia. Links in a page then stay within the given language.

Note that Parsoid currently makes no effort to recognize localized namespaces,
so it won't render media files, categories etc correctly.

Change-Id: I7bc4102e81a402772ea23231170734d580ea15b9
2012-06-05 11:19:58 +02:00
Gabriel Wicke 0eabd2c67e Add round-trip form and split out rt diffing
Change-Id: I3bc8ad7f273937ce6c767b8d7bbccdc86cbd93b4
2012-06-04 10:49:59 +02:00
Gabriel Wicke 99c98d6c56 Diff refinement fixes
Change-Id: I11c69de0fdcd636ccd11cd0b6cb16c5acdb188b3
2012-06-04 10:16:05 +02:00
Gabriel Wicke d2602c47a6 Switch back to word-based diff
The char-based diff looked good in some pages, but yielded terrible results in
others. The word-based algo is more consistent overall.

Change-Id: I7f2d40315ad96df037c2d9a1d50739e3d21b6c81
2012-06-04 00:02:49 +02:00