Wednesday 3 January 2018

NetApp - Configuration Backup and Restore


Baking up the cluster configuration enables you to restore the configuration of any node or the cluster in the event of a disaster or emergency.

Configuration backup files are archive files(.7z) that contain information for all configurable options that are necessary for the cluster, and the node within it, to operate properly.  There are two types of configuration files.

Node configuration backup file

Each healthy node in the cluster includes a node configuration backup file, which contains all of the configuration information and metadata necessary for the node to operate healthy in the cluster.

Cluster configuration backup file

These files include an archive of all of the node configuration backup files in the cluster, plus the replicated cluster configuration information (the replicated database, or RDB file). Cluster configuration backup files enable you to restore the configuration of the entire cluster or of any node in the cluster.  There cluster configuration backup schedules create these files automatically and store them on several nodes in the cluster.


Procedure to perform configuration backup

On node cluster1-02
cluster1::*> system configuration backup create -node cluster1-02 -backup-type cluster -backup-name test
[Job 1950] Job is queued: Cluster Backup OnDemand Job.  

On node cluster1-04
cluster1::*> system configuration backup copy -from-node cluster1-02 -backup test.7z -to-node cluster1-04
 
 
Procedure to perform restore node from the backup

On node cluster1-02
cluster1::*> cluster modify -node cluster1-04 -eligibility false


On node cluster1-04
cluster1::*> system configuration recovery node restore -backup test.7z -nodename-in-backup cluster1-04

Warning: This command overwrites local configuration files with files contained
         in the specified backup file. Use this command only to recover from a
         disaster that resulted in the loss of the local configuration files.
         The node will reboot after restoring the local configuration.
Do you want to continue? {y|n}: y
Verifying that the node is offline in the cluster.
Verifying that the backup tarball exists.
Extracting the backup tarball.
Verifying that software and hardware of the node match with the backup.
Stopping cluster applications.
...
...

cluster1::> cluster show
Node                  Health  Eligibility
--------------------- ------- ------------
clus                  false   false
cluster1-02           false   true
cluster1-03           false   true
cluster1-04           false   false
4 entries were displayed.

On node cluster1-02
cluster1::*> cluster modify -node cluster1-04 -eligibility true

cluster1::> cluster show
Node                  Health  Eligibility
--------------------- ------- ------------
clus                  false   false
cluster1-02           true    true
cluster1-03           true    true
cluster1-04           true    true
4 entries were displayed.


Procedure to perform restore cluster from the backup

To restore a cluster configuration from an existing configuration you re-create the cluster using the cluster configuration and made available to the recovery node.

cluster1::*> storage failover show
                              Takeover
Node           Partner        Possible State Description
-------------- -------------- -------- -------------------------------------
clus           -              -        Node unreachable
cluster1-02    cluster1-01    true     Connected to cluster1-01
cluster1-03    cluster1-04    true     Connected to cluster1-04
cluster1-04    cluster1-03    true     Connected to cluster1-03
4 entries were displayed.

cluster1::*> storage failover modify -node  cluster1-02 -enabled false

cluster1::*> storage failover show
                              Takeover
Node           Partner        Possible State Description
-------------- -------------- -------- -------------------------------------
clus           -              -        Node unreachable
cluster1-02    cluster1-01    false    Connected to cluster1-01, Takeover
                                       is not possible: Storage failover is
                                       disabled
cluster1-03    cluster1-04    false    Connected to cluster1-04, Takeover
                                       is not possible: Storage failover is
                                       disabled
cluster1-04    cluster1-03    false    Connected to cluster1-03, Takeover
                                       is not possible: Storage failover is
                                       disabled
4 entries were displayed.


Halt each node except for the recovering node

cluster1::*> system node halt -node cluster1-03

Warning: Are you sure you want to halt node "cluster1-03"? {y|n}: y

cluster1::*> system node halt -node cluster1-04

Warning: Are you sure you want to halt node "cluster1-04"? {y|n}: y
cluster1::*> system configuration recovery cluster recreate -from backup -backup test.7z

Warning: This command will destroy your existing cluster. It will rebuild a
         new single-node cluster consisting of this node by using the contents
         of the specified backup package. This command should only be used to
         recover from a disaster. Do not perform any other recovery operations
         while this operation is in progress. This command will cause all the
         cluster applications on this node to restart, causing an interruption
         in CLI and Web interface.
Do you want to continue? {y|n}: y
Executing cluster recreate script.
Checking to ensure that backup replicas exist.
Stopping cluster applications.
The management gateway server restarted. Waiting to see if the connection can be reestablishedRemoving current replicas.
Restoring replicas from backup.
Restarting cluster applications; access to the CLI and Web interface will be available shortly.
The management gateway server restarted. Waiting to see if the connection can be reestablished..

The connection with the management gateway server has been reestablished.
If the root cause of the interruption was a process core, you can see the core file details by issuing the following command:
system node coredump show -node local -type application -corename mgwd.* -instance

cluster1::> set -priv advanced

Warning: These advanced commands are potentially dangerous; use them only when
        directed to do so by NetApp personnel.
Do you want to continue? {y|n}: y



cluster1::*> system configuration recovery cluster show


              Recovery Status: in-progress

 Is Recovery Status Persisted: true

 

Boot each node that needs to be rejoined to the re-created cluster.  Reboot one node at a time

 

 

cluster1::*> system configuration recovery cluster rejoin -node cluster1-03

Warning: This command will rejoin node "node2" into the local

cluster, potentially overwriting critical cluster

configuration files. This command should only be used

to recover from a disaster. Do not perform any other

recovery operations while this operation is in progress.

This command will cause node "node2" to reboot.

Do you want to continue? {y|n}: y

 

The target node reboots and then joins the cluster. Make sure the node is part of the cluster

cluster1::>cluster show -eligibility true

Once all nodes are healthy and if restore is done from the backup file use given to complete the recovery status

cluster1::>system configuration recovery cluster modify -recovery-status complete

 

In case RDB on the other node is not in sync use given command from the healthy node to sync the RDB

 

clustser1::*>system configuration recovery cluster sync -node cluster1-03

 
Bye...

No comments:

Post a Comment