Commit graph

17126 commits

Author SHA1 Message Date
Gabriel Wicke 2bc066b42d Up the diff merge size heuristic a bit and always use the same algorithm
Change-Id: I707c8a55ed1758cdd591d2fc95e03a360c8e76d1
2012-06-06 17:46:25 +02:00
Gabriel Wicke bc1a77a812 Make modified newlines visible by replacing empty lines with a space
Change-Id: If7b811245e0d01a7a147ab54c3801fc1754730a9
2012-06-06 17:11:29 +02:00
Gabriel Wicke 1876d785a7 Swap ins/del in the diff
Change-Id: Id336d713d1767a4b7859b158f2c2ddf9adc11cfb
2012-06-06 16:02:54 +02:00
Gabriel Wicke 350e700d8f Add core-upgrade
Change-Id: I5ad0955e8272d376f009f89461bed310978b25e4
2012-06-06 15:58:17 +02:00
Gabriel Wicke d0a0454ada Merge "Improve the handling of newlines for round-tripping" 2012-06-06 13:54:04 +00:00
Gabriel Wicke aee35f627d Merge "Update patched html5 library to version 0.3.8" 2012-06-06 13:53:37 +00:00
Gabriel Wicke a146fcb8ad Improve the handling of newlines for round-tripping
An improvement, but there still are some extra newlines inserted after
paragraphs. Example input:

-------

Foo:
{|
|foo
|}
-------

Extra newlines are inserted after the Foo: and the foo in the table. They are
not fed as tokens or text to the tree builder, so there is likely a bug in the
html5 library or JSDom.

Change-Id: I83eb6180e3cd1c4e7f9b15b31d339e1d32bccd3f
2012-06-06 10:17:03 +02:00
Gabriel Wicke 59fc634cce Update patched html5 library to version 0.3.8
Change-Id: I321d9a58ea1af33842a606fc8706938093a8330f
2012-06-06 10:17:03 +02:00
Subramanya Sastry bff08b799e Improvement to the refineDiffs function to improve diff quality.
* Attempt to accumulate consecutive add-delete pairs
  with "short text" separating the pairs.  This is equivalent to
  the <b><i> ... </i></b> minimization to expand range of
  <b> and <i> tags, except there is no optimal solution except
  as determined by heuristics ("short text": <= 2 chars).

Change-Id: I408e318c315eba18aac4051ed84d77e3e092d497
2012-06-06 00:08:00 -05:00
Subramanya Sastry fe6f289486 Merge changes I5d98c704,Ib8d3de75
* changes:
  A few tweaks to link round-tripping
  Use word diff if --color is enabled
2012-06-05 16:04:23 +00:00
Subramanya Sastry b095db4303 Simpler implementation of flatten.
* Possibly more efficient under heavy GC load -- untested.
* No change in time and memory use for single file parsing.

Change-Id: Id2f3f65cc0e5f38ed968bbda60b97e46523e700e
2012-06-05 10:47:46 -05:00
Gabriel Wicke dc3168cf6d A few tweaks to link round-tripping
* Moved the tail attribute to the second attribute (a bit cleaner)
* Disallowed newlines in the tail production
* Improved the selection of round-tripped href vs. generated content vs. href
  in the serializer
* renamed state.linkTail to state.dropTail

Change-Id: I5d98c704b6ea566011e22237786f8da17548570f
2012-06-05 17:26:27 +02:00
Gabriel Wicke 0f9d939b00 Use word diff if --color is enabled
Change-Id: Ib8d3de75ac306974abfdaca22bfc7b69bc62891d
2012-06-05 16:10:13 +02:00
Catrope d378182bff Reapply typo fixes from c0e1991 , were undone in b0f6f64
Change-Id: If84e13b91781ed96d3bd9e94171e16020c65ea42
2012-06-05 06:54:35 -07:00
Catrope 528728558b Fix bug in selectNodes's logic for traversing back up the tree
Change-Id: I0fc5a2ad2c9a8d162e8ddbf3cc6d31684d364928
2012-06-05 06:50:09 -07:00
Gabriel Wicke d16032ae9a Track html syntax in block_tag production
Change-Id: If560523644f007485809762f12216e08fb3c3ed3
2012-06-05 12:39:56 +02:00
Gabriel Wicke c1d8270bdb Fix wgScriptPath in round-trip mode without interwiki
Change-Id: I7cc80b7be1afffc586a2ea45d21303e9ba07c0d4
2012-06-05 12:11:45 +02:00
Gabriel Wicke 3346aed86e Support interwiki links, and some cleanup
Change-Id: I205c53a03f5230e3ef9100487f4934f97bdc179a
2012-06-05 12:05:33 +02:00
Gabriel Wicke cc96ff4f5e Very basic interwiki support
Pages titles with a wikipedia interwiki prefix now load the page from
corresponding Wikipedia. Links in a page then stay within the given language.

Note that Parsoid currently makes no effort to recognize localized namespaces,
so it won't render media files, categories etc correctly.

Change-Id: I7bc4102e81a402772ea23231170734d580ea15b9
2012-06-05 11:19:58 +02:00
Trevor Parscal b0f6f64d90 Made pushRetain do nothing if you give it 0 and throw an exception if you give it a negative length
Change-Id: Ib9955660b05a04503325ddb20f9e9a525b4d6832
2012-06-04 16:27:33 -07:00
Trevor Parscal 9111e34a0b Added nodeOuterRange to selectNodes
Change-Id: I9ef0c383fbb2515c752d2d3c52e8632aac73d811
2012-06-04 16:21:29 -07:00
Trevor Parscal a0ea700090 Added tests for transactions that deal with removing alien nodes
Change-Id: Id36413c62386dbe5ebc8c8b3a1d3c5e301a8175a
2012-06-04 16:21:29 -07:00
Catrope c0e19915ef Typo fixes and missing 'var' in newFromRemoval()
Change-Id: Ibb984d862670b5386ff76fc55ef3322f695b6ae1
2012-06-04 16:07:16 -07:00
Catrope db793009be fixupInsertion() variable documentation and cleanup
Functional changes (fixes):
* Make writeElement() also update parentNode and parentType for openings
* Also add to fixupStack when opening a wrapper for a text node

Non-functional changes (cleanup&docs):
* Document all variables at the beginning of the function
* Group variables according to where/how they're used
* Move expectedType into writeElement()
* Kill node, duplicates parentNode unnecessarily
* Kill paragraphOpened, was misnamed and unnecessary
* Rename closedElements to reopenElements

Change-Id: Ie5b4e4f30b267943048fdc170accb29139039192
2012-06-04 16:07:16 -07:00
Catrope f7445b37b8 Retain attributes when reopening closed nodes
* Push entire elements onto openingStack rather than type strings
* When closing an element, build a clone of the opening and push it onto
  closedElements, then insert that clone when reopening the element

Change-Id: I8b0fb44394aed6c471dc6dacaab03e44c2333733
2012-06-04 16:07:16 -07:00
Catrope b167453f0d Add ve.dm.Node.getAttributes() to get a reference to the attributes object
Change-Id: Ic2463d4f7053a5f6defd212f04deb5ea71542843
2012-06-04 16:01:14 -07:00
Catrope e67b2fd793 Add some more tests for newFromInsertion
Change-Id: I3c403928a8176d8685e48a63759fefec4657ca96
2012-06-04 16:01:14 -07:00
Catrope 109624e8b3 fixupInsertion fixes, wrapping content works now
Change-Id: I1eee6afcffbf09955578b7f0534aa5b7234802df
2012-06-04 16:01:13 -07:00
Rob Moen fd5eb80dd7 Document annotate method in surface model.
Comment cleanup

Change-Id: Ifd3eeab9046f376529a827dfafdc28506845ac15
2012-06-04 15:06:37 -07:00
Rob Moen f958297102 Create tests for surface model annotate method.
Fix up surface model tests

Change-Id: I02b57edc5c3faeb39e0b3c1f473f03fbd49d85b4
2012-06-04 15:02:56 -07:00
Rob Moen c338304d33 Rewrite annotate as more low level method in Surface model.
TODO: follow up with annotate tests

Change-Id: If0e68bd3a09840b1e5f3e8d85fd22a8c10134b58
2012-06-04 14:29:27 -07:00
Gabriel Wicke 92f753a365 Pre and link target improvements
* Don't explicitly add the newline in the pre, as we preserve newline tokens
  now. This avoids doubling of newlines when round-tripping.
* Use the sHref attribute even if the href contains spaces.

Change-Id: I8bec8fbfd6a7836bf2e5eec20869a0edd95c93b6
2012-06-04 14:03:05 +02:00
Gabriel Wicke ee2ddbd3cb Fix list handler issues
Lists interrupted by non-empty lines would not close the list properly.
Register for any token instead of just for newlines and close the list if no
listItem follows the newline.

Change-Id: I1743901e3db541bbeda78d17707db943e6ceb9b9
2012-06-04 13:38:43 +02:00
Gabriel Wicke f821eac102 Optionally round-trip sHref in data-mw
If the href would not denormalize, add a copy of the original href in data-mw
and use it to preserve non-conventional capitalization etc.

Change-Id: Ifef50eec7343b0e6b0ba66b6d19a8a3e8c9f8001
2012-06-04 12:28:05 +02:00
Gabriel Wicke e0809209ec Don't set the data-mw attribute if the object is actually empty.
Change-Id: I984f1b44bba67d7a9f1a709738d14c0ee02f69a9
2012-06-04 12:26:03 +02:00
Gabriel Wicke 2774e5aa6c Actually replace all underscores in wikilink target
Change-Id: I633f8d6e4f639aff90fd456600376b7c6515fd50
2012-06-04 11:48:59 +02:00
Gabriel Wicke 3f2c72f920 Fix padleft / padright (mis)use as substr
Change-Id: I0645e11c8ef8b550ad35300d1904788940fc748a
2012-06-04 11:30:45 +02:00
Gabriel Wicke 0eabd2c67e Add round-trip form and split out rt diffing
Change-Id: I3bc8ad7f273937ce6c767b8d7bbccdc86cbd93b4
2012-06-04 10:49:59 +02:00
Gabriel Wicke 99c98d6c56 Diff refinement fixes
Change-Id: I11c69de0fdcd636ccd11cd0b6cb16c5acdb188b3
2012-06-04 10:16:05 +02:00
Gabriel Wicke d2602c47a6 Switch back to word-based diff
The char-based diff looked good in some pages, but yielded terrible results in
others. The word-based algo is more consistent overall.

Change-Id: I7f2d40315ad96df037c2d9a1d50739e3d21b6c81
2012-06-04 00:02:49 +02:00
Gabriel Wicke 4533c274ca Fix a crasher in the serializer
A tail containing regexp syntax (a ? in [[:en:Main Page]]) would crash the
serializer. Use substr instead.

Change-Id: I8519aec9c07dfe31893d676b1c936a42d2af74a0
2012-06-04 00:00:54 +02:00
Gabriel Wicke d01581c380 Create a 'refinement diff' algorithm
The word or char-based algorithm does not scale well beyond 5k chars or so. We
now perform a line-based diff and then continue to diff the line differences
using the char-based algorithm. This gives a char-based diff even for bigger
inputs.

Change-Id: Iec87ca56540060e4df2859ba54c992e7ff5cfe10
2012-06-03 23:46:57 +02:00
Gabriel Wicke b11b8d8a6b Revert to line diff, word diff explodes on some pages
Change-Id: Ic338498b47bb6b6c98fa6280f44464cd70a48b1b
2012-06-03 11:39:03 +02:00
Gabriel Wicke b5e067e086 Some more web service tweaks
* Stay in round-trip mode in HTML DOM output
* Return DOM, wikitext and diff as soon as they are available

Change-Id: I7f8f44cfe8eed63a521d1318d116c22232cb6b1b
2012-06-03 11:04:40 +02:00
Gabriel Wicke 7c18891504 Snazzy html word diff for roundtrip view
Also show the HTML DOM, Wikitext output and diff.

Change-Id: Ibe744fbc895239f4e48f6e0e2f2b2f345c0845bd
2012-06-03 01:36:56 +02:00
Gabriel Wicke 4cf74497b7 Update web service start page documentation
Change-Id: I38efc5a9d5b919c6168cf97d0efbae9db967e351
2012-06-02 17:17:37 +02:00
Gabriel Wicke 31522d3d49 Add ApiRequest
Change-Id: I5f2a1cb65223a68f10bc63903000248efca05586
2012-06-02 16:52:51 +02:00
Gabriel Wicke 7c7ddd22a7 Retrieve content from the main namespace instead of templates
Change-Id: Id917fa617d6fba1e1b290b2ed20c24aed24d39d2
2012-06-02 16:48:00 +02:00
Gabriel Wicke 63abd57fc8 Improve newline-before-paragraph round-tripping support
Change-Id: I9176a97f9695018650d9a63b89514c07e0d6be90
2012-06-02 16:39:33 +02:00
Gabriel Wicke d3975a8d03 Very basic round-trip test mode for the API
Returns both the resulting wikitext and the diff with the original input.

Change-Id: Iad25039beb054a84e1ad51ffa9fee924db49c60b
2012-06-02 16:20:54 +02:00