Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ It detects any configuration problems in the cluster and fixes them. Here is the
| `ClusterHasNoPrimary` | VTOrc detects when a shard doesn't have any primary tablet elected | VTOrc runs PlannedReparentShard to elect a new primary |
| `DeadPrimary` | VTOrc detects when the primary tablet is dead | VTOrc runs EmergencyReparentShard to elect a different primary |
| `IncapacitatedPrimary` | VTOrc detects when the primary tablet is consistently failing health checks but is still network-reachable | VTOrc runs PlannedReparentShard, falling back to EmergencyReparentShard if that fails |
| `InnoDBStalledPrimary` | VTOrc detects when the primary's MySQL is stalled on an InnoDB semaphore wait (mysqld is alive but writes are not committing). Requires MySQL 8.0+ and SELECT privilege on `performance_schema.error_log` for the `dba` user. | VTOrc runs EmergencyReparentShard to elect a different primary |
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added InnoDBStalledPrimary recovery action based on PR #20169 which implements the InnoDBStalledPrimary analysis code, the HasRecentInnoDBLongSemaphoreWait function in go/vt/mysqlctl/replication.go, and the recovery wiring in go/vt/vtorc/logic/topology_recovery.go.

Source: vitessio/vitess#20169

| `PrimaryIsReadOnly`, `PrimarySemiSyncMustBeSet`, `PrimarySemiSyncMustNotBeSet` | VTOrc detects when the primary tablet has configuration issues like being read-only, semi-sync being set or not being set | VTOrc fixes the configurations on the primary. |
| `NotConnectedToPrimary`, `ConnectedToWrongPrimary`, `ReplicationStopped`, `ReplicaIsWritable`, `ReplicaSemiSyncMustBeSet`, `ReplicaSemiSyncMustNotBeSet` | VTOrc detects when a replica has configuration issues like not being connected to the primary, connected to the wrong primary, replication stopped, replica being writable, semi-sync being set or not being set | VTOrc fixes the configurations on the replica. |
| `StaleTopoPrimary` | VTOrc detects when a tablet still has type PRIMARY in the topology but a newer primary has already been elected. This can happen if a topology update fails during an emergency reparent operation. | VTOrc demotes the stale primary to a read-only replica, updates its type to REPLICA in the topology, and configures it to replicate from the current primary. |
Expand Down