Very few of these jobs seem to be finishing, due to some
replicas being so far behind we get DBReplicationWaitError
thrown, which causes the job to be restarted. Catch and log,
but don't stop processing because of it.
There is also a problem with jobs that blow out the memory limits,
but not sure what to do with that.
Change-Id: Idffcfad76936f5e62e9018c58f2cb57db35af4b8
Trying to run this script in the cluster fatals out due to memory
problems somewhat regularly. The --start option helps to restart
it where it fell down, but when trying to run against hundreds of
wiki's that is a one-off solution that makes ensuring everything is
actually visited a pain.
To try and isolate errors add an option to push the parsing into the
job queue. There is still the possibility to miss pages, but job queue
retries should take care of us for the most part. Attempts to keep
load down on the databases by making sure no more than a specified
number of jobs are queued/processing at a given time.
Bug: T152155
Change-Id: I3a4e3a415b2f03de0bb36ac0515241e950130fde