{question}
What messages/errors will show in the Tracelogs (memsql.log) when replication between two clusters fails?
{question}
{answer}
In general, you may replicate data between two SingleStore DB clusters, a Primary cluster (master) and a DR cluster (replica). However, when a replica is unable to replicate from the master, we can expect two classes of messages to show up in the Tracelogs on the DR cluster:
1. Error in replication
"%s: Slave data write failed with error %d (%s) while in state %s."
"%s: Slave packet read (%d) failed with error %d (%s) while in state %s."
"%s: Slave data read failed with error %d (%s) while in state %s."
"%s: Slave data write failed with error %d (%s) while in state %s."
"Trying to establish replication connection for database '%s' from %s:x@'%s':%d/'%s'."
If the replica succeeds in reconnecting to the master, you will see a success message. However, if the replica fails to re-connect to the master, we'll encounter Reconnect errors, discussed below.
2. Reconnect errors
"Unexpected response while establishing replication connection."
"Replicating from a master older than 7.0 is not supported."
"Establishing replication connection: peer is not a valid MemSQL peer."
"Failed to setup the new master (term %lu) with the slave (term %lu)."
Further troubleshooting may be required to determine the underlying connection issue between replica and master.
Please note that most error messages relating to replication are logged on the DR replica cluster. The Primary cluster only notices when replicas are disconnected. For example, you may see one of the following disconnect messages in the Tracelogs on the Primary cluster:
"%s: Removed slave at node %ld because of disconnect."
"%s: Disconnecting slave at node %ld because its removal was requested."
More information on replication can be found here.
{answer}