Description
This article describes how to fix an issue where a High Availability (HA) setup undergoes failover and database replication does not work because MySQL is not starting on the Secondary Server.
Scope
FortiNAC.
Solution
To verify mysql state in both Primary and Secondary Servers, type:
service mysqld status
Example of MySQL not starting:
service mysqld status
Redirecting to /bin/systemctl status mysqld.service
● mysqld.service - MySQL Community Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: activating (start-post) since Tue 2019-10-15 11:06:22 CDT; 2min 57s ago
Process: 7388 ExecStart=/usr/bin/mysqld_safe (code=exited, status=0/SUCCESS)
Process: 7377 ExecStartPre=/usr/bin/mysql-systemd-start pre (code=exited, status=0/SUCCESS)
Main PID: 7388 (code=exited, status=0/SUCCESS); : 7389 (mysql-systemd-s)
CGroup: /system.slice/mysqld.service
└─control
├─7389 /bin/bash /usr/bin/mysql-systemd-start post
└─8297 sleep 1
This article provides steps to restart MySQL and database replication.
Login to the Secondary Server CLI as root and perform the following steps:
- Kill the Management Process PID.
shutdownCampusMgr –kill
- Stop MySQL.
service mysqld stop
- In /var/lib/mysql/ directory, delete only the bsc sub-directory, ib* and mysqld-relay-bin.* files.
rm -rf /var/lib/mysql/bsc
rm /var/lib/mysql/ib*
rm /var/lib/mysql/mysqld-relay-bin.*
rm /var/lib/mysql/mysql-bin.* (later versions)
- Verify the server-id value in /etc/my-runtime.cnf is different from that on the Primary Server. If the same, modify the value (refer to comments in the file for suggested values).
- On the Primary Server, stop control and management processes. Run the following:
shutdownCampusMgr
Wait ten to fifteen seconds, then run the following:
shutdownCampusMgr –kill
Wait a further ten to fifteen seconds, then run the following:
service mysqld restart
startupCampusMgr
- On the Secondary Server, restart MySQL and verify the state:
service mysqld restart
service mysqld status
MySQL is running when the Active status displays 'Active: active (running)'.
- On the Secondary Server, restart replication:
hsStartSQLReplication <Primary IP address> bsiadmin bra26153
- On the Secondary Server, start the Management Process:
startupCampusMgr
- Wait several minutes, then validate that database replication is working. Enter the following:
hsIsSlaveActive
If the response containing the line slave is active, database replication is working.
Alternatively, tail the processManager logs after starting the Management Process and look for 'slave is active':
tail -F /bsc/logs/output.processManager
Note: It can take several attempts before replication starts working, so be patient. Errors may also print, but that does not necessarily mean the replication will fail.
If several database replication attempts have been made in the logs and the status remains slave is inactive, further investigation is required. In both primary and secondary control servers, run the grab-log-snapshot script and provide resulting files to Support. For instructions, refer to the related article below.
Note:
From ticket 9223583, running FortiNAC in v9.4.3 - the command in step 7 should not include 'bsiadmin bra26153' anymore. It should be run only as:
hsStartSQLReplication <Primary IP address>
Related articles:
Technical Note: Full disk causes database replication to fail
Technical Tip: How to Use grab-log-snapshot
Technical Tip: Rebuild mysql directory on secondary server
Technical Tip: How to fix DB synchronization on a secondary FortiNAC server with error: 'Access deni...