Logical Backup and Federated Tables in: The Mystery of the Naughty Backup

Backup lógico e tabelas federadas em: o mistério do backup malcriado
Backup

It’s very frustrating when you need to restore a backup and inevitably discover the file is corrupted or even incomplete. Sometimes, beyond frustration, this mundane situation can bring losses, annoyances, dismissals, and other troubles.

Therefore, taking care of backups is like managing a long-term investment. It requires close monitoring… with both eyes wide open!

It’s never too much to repeat my perfect backup mantra:

– RAID is not a backup! Having a mega-super-advanced disk controller and its mirrors doesn’t help. In case of accidental DELETE or UPDATE, your controller will update all disks at full speed, and not even Mr. Sulu can help you reverse it.

– Multiple slaves are more of the same when it comes to accidental writes. Ask Mr. Spock.

– Backup must be treated as a security and business continuity policy composed of: strict processes, techniques, equipment, and people. Processes describe how (type of backup: logical/textual, binary, incremental, differential, full), when, and with what granularity the backup should be done and its quality control. Techniques define which technical resources and elements must be adopted. Equipment directs what type of local or remote device will be used, and People means who does it, who checks it, who acts in case of disaster.

That said, let’s get to a mystery that puzzled a friend: the mystery of the naughty backup. A truly intriguing mystery. Let me create the scenario I found when I was called to act as a medium and chase this ghost away.

RDBMS: MySQL 5.6.10
Storage Engine: InnoDB, Memory, and Federated
Database Size (data + indexes): 350GB
Continuity/Backup Policy: None
Security and Availability: Disks in RAID 1, Master – Slave and RH Cluster
Backup Type: Only Logical/Textual with MySQLDump
Backup Size: 370GB
Backup Time: 2h37min
Restore Time: unknown

There are many errors in this scenario that can and should be explored—and will be in future articles. In this article, we focus on my friend’s request: chasing the ghost that made the backup stop without explanation.

Backup

And that was the case. The backup would invariably stop inexplicably. Sometimes it completed successfully, many other times it did not.

Rule for any self-respecting troubleshooting: LOOK FOR PATTERNS. And that’s exactly what I did. I looked for patterns and discovered the backups always stopped with a very similar byte count (size). When analyzing the file generated by MySQLDump, I noticed it always stopped on a specific group of tables—and only those.

By exploring the structure using the command SHOW TABLE STATUS, two important findings came to light:

  • The Storage Engine of the tables that corrupted the backup were always FEDERATED, without exception.
  • When the failure occurred, the COMMENT field showed the message: “Got an error reading communication packets”.

Additional clarifications for first-timers and seasoned pros:

FEDERATED is an atypical storage engine. In plain terms, a federated table is a snapshot of a table that exists on another domain, another MySQL instance, on another server. Creating a federated table is a neat resource when you need to do a JOIN, for example, on tables located on different servers.

“GOT AN ERROR READING COMMUNICATION PACKETS” is a common MySQL error, usually caused by a communication failure between client and server. In this case, the server containing the federated tables (FEDERATED) acted as a CLIENT to another MySQL server that held the ROOT tables, which acted as SERVER.

What happened was that the MySQL server with the image tables (FEDERATED) was remote and connected via MPLS to the MySQL server with the ROOT tables (InnoDB in this case). For reasons later questioned with the access provider, the MPLS connection failed in the early morning hours, just before the backup routine was triggered.

Backup

With this intermittent connection failure, the MySQL client lost connection to the MySQL server. This generated the error and during the backup (dump) process, it aborted because it couldn’t connect to the MySQL server (ROOT) to back up the federated tables’ data.

Actually, if backup logs existed and were checked, the message “mysqldump: Couldn’t execute ‘show table status like ‘historico_financeiro’: Got an error writing communication” would be noticed. To spice up the problem, when mysqldump was run manually during the day, the problem never appeared.

Lessons Learned

  • Never, under any circumstance, back up FEDERATED tables. This only generates brutal and unnecessary network traffic. These tables should be backed up on their respective servers (ROOT).
  • Know, manage, and protect your database—sizes, times, and what happens inside it. Don’t let anyone create tables (or other objects) without your knowledge (these tables were created by developers without the DBA’s involvement).
  • Monitor your backups: keep logs (reports), track times, and do restores at least every 30 days.
  • Don’t rely on a single backup method. For example, do textual backups but also binary ones. Never give up the famous duo: full and incremental backups.
  • Remember always: “Backup and a dose of cowardice keep your checking account in good shape.”

Solution

  • Resolved the connection issue with the provider
  • Removed all FEDERATED tables from the logical backup by adding multiple lines in my.cnf with the parameter: ignore-table=database.federated_table_name
  • Created a decent security and continuity policy

Schedule a meeting here

Visit our Blog

Learn more about databases

Learn about monitoring with advanced tools

Backup

Have questions about our services? Visit our FAQ

Want to see how we’ve helped other companies? Check out what our clients say in these testimonials!

Discover the History of HTI Tecnologia

Compartilhar: