Commit graph

2 commits

Author SHA1 Message Date
Matěj Suchánek 11b9e66f9f Disallow anonymous non-IP agents, handle truncated names
Why:
* Echo stores agents by their user id or by the name if the user
  is not registered. This works for IPs since the "event_agent_ip"
  field has limit of 39 bytes (32× [0-9A-F] + 7× colon for IPv6).
* However, it's possible to hold a user identity that is not
  an IP address, but the user name has not been or cannot be
  registered (e.g., external users). Echo wouldn't validate this
  and would attempt to insert the user name into "event_agent_ip",
  possibly causing silent truncation and data corruption.

What:
* Do not let events with such agents be saved. For now, log an
  error in the production. Wikibase, the only known source of this
  problem, has already been fixed.
* In runtime, replace every possibly corrupted user name with
  a placeholder to avoid unexpected null values and exceptions
  in production.

Bug: T367638
Change-Id: Ic2bd218b10651d13da9e9aea54dd2d668a33d946
Depends-On: I03b4367355dc5a3fc0c14aad5fdf19fbcd0caa3d
Depends-On: I92eb93983e81708b289e9f7d837884d539dade0b
2024-11-14 11:44:19 +01:00
Matěj Suchánek 4ae63d1b4d Avoid event insertion if possible
Why:
* On wikis with lots of bot activity like Wikidata, there is a large
  volume of edits which can potentially create an article-linked
  notification. These notifications are now actually rarely sent
  because they are disabled for bots (T318523). However, the event
  record is always inserted into the database, with no reference to
  it, bloating the database.

What:
* Do not unconditionally insert an event into the database when
  Event::create is called. Pass it to downstream calls and have
  it inserted when it's clear it will actually be needed (i.e.,
  a notification is definitely going to be created).
* Pass the event's payload to the job queue instead of requiring
  its ID. Introduce Event::newFromArray, which unlike ::loadFromRow
  handles ::toDbArray values that haven't been inserted into
  the database yet.
* Introduce Event::acquireId which ensures the event has been
  inserted prior to returning its ID as well as it does not get
  re-inserted.

Bug: T221258
Change-Id: I8b9a99a197d6af2845d85d9e35c6703640f70b91
2024-10-11 20:12:11 +02:00