Why:
* Echo stores agents by their user id or by the name if the user
is not registered. This works for IPs since the "event_agent_ip"
field has limit of 39 bytes (32× [0-9A-F] + 7× colon for IPv6).
* However, it's possible to hold a user identity that is not
an IP address, but the user name has not been or cannot be
registered (e.g., external users). Echo wouldn't validate this
and would attempt to insert the user name into "event_agent_ip",
possibly causing silent truncation and data corruption.
What:
* Do not let events with such agents be saved. For now, log an
error in the production. Wikibase, the only known source of this
problem, has already been fixed.
* In runtime, replace every possibly corrupted user name with
a placeholder to avoid unexpected null values and exceptions
in production.
Bug: T367638
Change-Id: Ic2bd218b10651d13da9e9aea54dd2d668a33d946
Depends-On: I03b4367355dc5a3fc0c14aad5fdf19fbcd0caa3d
Depends-On: I92eb93983e81708b289e9f7d837884d539dade0b
Why:
* On wikis with lots of bot activity like Wikidata, there is a large
volume of edits which can potentially create an article-linked
notification. These notifications are now actually rarely sent
because they are disabled for bots (T318523). However, the event
record is always inserted into the database, with no reference to
it, bloating the database.
What:
* Do not unconditionally insert an event into the database when
Event::create is called. Pass it to downstream calls and have
it inserted when it's clear it will actually be needed (i.e.,
a notification is definitely going to be created).
* Pass the event's payload to the job queue instead of requiring
its ID. Introduce Event::newFromArray, which unlike ::loadFromRow
handles ::toDbArray values that haven't been inserted into
the database yet.
* Introduce Event::acquireId which ensures the event has been
inserted prior to returning its ID as well as it does not get
re-inserted.
Bug: T221258
Change-Id: I8b9a99a197d6af2845d85d9e35c6703640f70b91