Always do something...Never do nothing

NetApp - Configuration Backup and Restore

Baking up the cluster configuration enables you to restore the configuration of any node or the cluster in the event of a disaster or emergency.

Configuration backup files are archive files(.7z) that contain information for all configurable options that are necessary for the cluster, and the node within it, to operate properly. There are two types of configuration files.

Node configuration backup file

Each healthy node in the cluster includes a node configuration backup file, which contains all of the configuration information and metadata necessary for the node to operate healthy in the cluster.

Cluster configuration backup file

These files include an archive of all of the node configuration backup files in the cluster, plus the replicated cluster configuration information (the replicated database, or RDB file). Cluster configuration backup files enable you to restore the configuration of the entire cluster or of any node in the cluster. There cluster configuration backup schedules create these files automatically and store them on several nodes in the cluster.

Procedure to perform configuration backup

On node cluster1-02
cluster1::*> system configuration backup create -node cluster1-02 -backup-type cluster -backup-name test
[Job 1950] Job is queued: Cluster Backup OnDemand Job.

On node cluster1-04
cluster1::*> system configuration backup copy -from-node cluster1-02 -backup test.7z -to-node cluster1-04

Procedure to perform restore node from the backup

On node cluster1-02

cluster1::*> cluster modify -node cluster1-04 -eligibility false

On node cluster1-04

cluster1::*> system configuration recovery node restore -backup test.7z -nodename-in-backup cluster1-04

Warning: This command overwrites local configuration files with files contained

in the specified backup file. Use this command only to recover from a

disaster that resulted in the loss of the local configuration files.

The node will reboot after restoring the local configuration.

Do you want to continue? {y|n}: y

Verifying that the node is offline in the cluster.

Verifying that the backup tarball exists.

Extracting the backup tarball.

Verifying that software and hardware of the node match with the backup.

Stopping cluster applications.

...

cluster1::> cluster show

Node Health Eligibility

--------------------- ------- ------------

clus false false

cluster1-02 false true

cluster1-03 false true

cluster1-04 false false

4 entries were displayed.

On node cluster1-02

cluster1::*> cluster modify -node cluster1-04 -eligibility true

cluster1::> cluster show
Node Health Eligibility
--------------------- ------- ------------
clus false false
cluster1-02 true true
cluster1-03 true true
cluster1-04 true true
4 entries were displayed.

Procedure to perform restore cluster from the backup

To restore a cluster configuration from an existing configuration you re-create the cluster using the cluster configuration and made available to the recovery node.

cluster1::*> storage failover show
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
clus - - Node unreachable
cluster1-02 cluster1-01 true Connected to cluster1-01
cluster1-03 cluster1-04 true Connected to cluster1-04
cluster1-04 cluster1-03 true Connected to cluster1-03
4 entries were displayed.

cluster1::*> storage failover modify -node cluster1-02 -enabled false

cluster1::*> storage failover show
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
clus - - Node unreachable
cluster1-02 cluster1-01 false Connected to cluster1-01, Takeover
is not possible: Storage failover is
disabled
cluster1-03 cluster1-04 false Connected to cluster1-04, Takeover
is not possible: Storage failover is
disabled
cluster1-04 cluster1-03 false Connected to cluster1-03, Takeover
is not possible: Storage failover is
disabled
4 entries were displayed.

Halt each node except for the recovering node

cluster1::*> system node halt -node cluster1-03

Warning: Are you sure you want to halt node "cluster1-03"? {y|n}: y

cluster1::*> system node halt -node cluster1-04

Warning: Are you sure you want to halt node "cluster1-04"? {y|n}: y
cluster1::*> system configuration recovery cluster recreate -from backup -backup test.7z

Warning: This command will destroy your existing cluster. It will rebuild a
new single-node cluster consisting of this node by using the contents
of the specified backup package. This command should only be used to
recover from a disaster. Do not perform any other recovery operations
while this operation is in progress. This command will cause all the
cluster applications on this node to restart, causing an interruption
in CLI and Web interface.
Do you want to continue? {y|n}: y
Executing cluster recreate script.
Checking to ensure that backup replicas exist.
Stopping cluster applications.
The management gateway server restarted. Waiting to see if the connection can be reestablishedRemoving current replicas.
Restoring replicas from backup.
Restarting cluster applications; access to the CLI and Web interface will be available shortly.
The management gateway server restarted. Waiting to see if the connection can be reestablished..

The connection with the management gateway server has been reestablished.
If the root cause of the interruption was a process core, you can see the core file details by issuing the following command:
system node coredump show -node local -type application -corename mgwd.* -instance

cluster1::> set -priv advanced

Warning: These advanced commands are potentially dangerous; use them only when
directed to do so by NetApp personnel.
Do you want to continue? {y|n}: y

cluster1::*> system configuration recovery cluster show

Recovery Status: in-progress

Is Recovery Status Persisted: true

Boot each node that needs to be rejoined to the re-created cluster. Reboot one node at a time

cluster1::*> system configuration recovery cluster rejoin -node cluster1-03

Warning: This command will rejoin node "node2" into the local

cluster, potentially overwriting critical cluster

configuration files. This command should only be used

to recover from a disaster. Do not perform any other

recovery operations while this operation is in progress.

This command will cause node "node2" to reboot.

Do you want to continue? {y|n}: y

The target node reboots and then joins the cluster. Make sure the node is part of the cluster

cluster1::>cluster show -eligibility true

Once all nodes are healthy and if restore is done from the backup file use given to complete the recovery status

cluster1::>system configuration recovery cluster modify -recovery-status complete

In case RDB on the other node is not in sync use given command from the healthy node to sync the RDB

clustser1::*>system configuration recovery cluster sync -node cluster1-03

Bye...

Always do something...Never do nothing

Wednesday, 3 January 2018

No comments:

Post a Comment