Sunday 21 January 2018


NetApp HCI

NetApp HCI is architected in building blocks either at the chassis or node level.   Each chassis can hold 4 nodes made up of storage running SolidFire Element OS and/or compute nodes with VMware hypervisor (or another hypervisor… may be in later stage). Nodes are inserted and removed from the back of the chassis and SSD’s for storage nodes are populated in the front.  Minimum configuration is 2 chassis with 6 nodes, 4 storages and 2 computes.  2 additional blank spots can be used for expansion. Compute and Storage nodes can be mixed and matched.

Storage Nodes and compute nodes comes in 3 configurations small, medium, and large.

Storage - Large 22TB/44TB, Medium 6TB/22TB, Small 3TB/11TB

Compute - Large 36 cores 768GB, Medium 24 cores 512GB, Small 16 cores 384GB

The specific value propositions of NetApp HCI are the following

Guaranteed performance: delivers predictable performance, consolidates mixed workloads, and provides granular control at the virtual machine level.

Flexibility and scale: scales compute and storage independently, optimizes and protects existing investments, and eliminates HCI "tax" by separating the scaling of computer and storage.

Automated infrastructure: deploys capabilities rapidly, automates and streamlines management, and simplifies processes through a comprehensive API library.

 

First-generation HCI scales compute and storage together in fixed ratio. NetApp HCI scales independently sot that if customers need only compute, they do not pay for and overprovision storage.  Because NetApp storage and compute nodes scale independently, customers can mix and match to fit their needs.  All nodes in the minimum configuration should be the same size and the largest node should be no more than one-third larger than the combination of the rest of the nodes.

 

With NetApp Deployment Engine(NDE) HCI can be deployed quickly (around 30min)

NetApp has automated and streamlined the deployment steps, reducing more than 400 entries to fewer than 30 entries.  This automation reduces the risk of error and enables customers to begin using HCI in about 30 minutes.  Because they system is intuitive, process data, such as user name and passwords, when possible, so customers need to enter the data only once. Customers are not required to reenter data or select several options at varying complexity levels. The system automatically checks for user errors, eliminating manual checks.

Originally, data enters were constructed with hardware.  Software played only a supporting role.  Hyper converged infrastructure(HCI) is "software-defined" because it employs a high degree of virtualization for storage, servers and support services.  The virtualization layer, which is a common software layer, runs on and manages the hardware, Software-defined data center (SDDC) architecture also enables higher degrees of automation.  The software layer has automation helpers, such as APIs.

 

HCI addresses business requirements by improving data efficiency and simplifying management of all infrastructure resources and virtual machines. HCI accomplishes this goal by providing a single point of administration at a fraction of the cost of a three-tier architecture.  Bringing all data center resources into the resource stack improves performance, and the data architecture improves data efficiency by providing one-time deduplication, compression and optimization of data.  A reduced need for hardware resources, streamlined operations, and automation greatly reduce the TCO.

 

NetApp HCI is good for work consolidating in highly virtualized, mixed-workload environments, where customers want to run thousands of applications predictably, with guaranteed performance.

NetApp HCI is good for web infrastructures where customers want to deliver predictable performance to web applications and scale resources independently to meet or exceed SLAs.

NetApp HCI is good for databases environments running SQL and NoSQL (for example MongoDB) database workloads that need resources to run properly without the capital expenditure(capex) and operational expenditure(opex) burdens of dedicated hardware.

NetApp HCI is good for end-user computing environments where customers want to cost-effectively deliver the flexibility and adaptability that are required to manage an evolving large-scale, end-user computing environment. With granular quality-of-server (QoS) controls and independent scale-out architecture. NetApp HCI is uniquely suited to manage and adapt to the mixed and unpredictable performance for every application and true multitenancy.  NetApp HCI is designed for the Data Fabric, so customers can access their data across any cloud – hybrid, public or private.
How it differs from Nutanix...look for my future blogs or see updates on the same blog itself.
 
Bye...

Saturday 6 January 2018


NetBackup 7.x Technical Overview


NetBackup Components and Architecture

NetBackup's 3-tier architecture (Master Server, Media Server, Client servers) gives the power, scalability, and flexibility needed to match the demands of modern enterprise-class workloads.

Master Server Overview

Hosts catalog database, backup policy creation and scheduling, administration console, enterprise Media Manager, Centralized monitoring, reporting, and restore execution. EMM server managed and allocates resources required for NetBackup operations.  Its part of master server and can be installed with master or on separate server.
NetBackup is not a program but rather a collection of process that work together.
Process name prefixes
bp____= legacy process (bp comes from Backup Plus the orginal product)
np____= newer processes.  Multithreaded (6.x) always running.
nbrb= NetBackup resource Brocket.  Allocates and tracks resources.
nbproxy=NetBackup Proxy used to talk to legacy process.  Its intermediate between old bp____ and new nb_____ process.

Master Server Processes

bprd(request daemon) always running on the master server and responsible for taking backup and restore request.

nbpem(schedule/policy execution) is a process for creating a policy and running them at scheduled time.  In case policy is updated, nbpem is informed and all client and objects in that policy are updated too.

bpjobd(job monitor)

nbjm(job manager) takes the job information from nbpem and update the nbpem once the job is completed.

bpdbm(database manager) is responsible for database and catalog.  It is running all the time on NetBackup master server.
EMM server can be running on the master server or it can run separately and provide resources to other master servers too. nbrb and nbemm runs only on emm server.  bpsched(pre 6.x) has been replaced by nbpem, nbjm, nbrb

nbrb(EMM) (resource broker) acquire the resources from nbemm running on emm server.

nbemm(EMM) (media manager)

nbproxy(EMM) process is required for retrieving Storage Lifecycle polciy from the client so that it can give input to Ops Center within NetBackup.


Media Server

Media Server, FT (Fiber Transport)Media Server - transfer data over SAN, control storage interaction, reads/writes data to/from storage, controlled by master server, multiple media servers can be used for load balancing.

Media Server Process

bpbrm(backup/restore manager)

bptm/bpdm(tape/disk manager)

bpcd(communication between master and clients)

nbftsrv/nbfdrv64(FT services)

Client Overview

Software agent installed to client, standard client, SAN client, snapshot client, data movement engine, controlled by Master, encryption, deduplication.

Clients Process

bpcd (communication)

vnetd(firewall communications)

bpbkar(backup/archive client)

tar(restore service)

nbflclnt (SAN client)

Basic disk storage Unit

NetBackup can use simple disk storage as backup and staging location and it does not require license.  It has some limitation when compared to advanced disk.  Disk storage device can be local or available via the network (NAS). Disk storage devices can be exposed to NetBackup as a Basic Disk storage units. Once defined as a storage unit, devices can be used as a backup destination within a policy.

Advanced Disk requires DPO (Data Protection Optimization) feature.  With advanced disk multiple disk volume can be pooled to create logical units (pools).  It supports SLP (Storage Lifecycle Policies).  It is easy to add capacity to Advanced Disk pool.  It supports CIFS/NFS shares and encryption.

Basic MSDP(Media Server Deduplication Pool), deduplication engine is embeded in NBU7.x code base.  Deduplication can be done at client level, media server level or third-party appliances.  Media server hosts deduplicates data on local host.  In Off-host deduplication media server runs deduplication inline. It requires DPO

OpenStorge(OST) requires DPO.  It enables multiple NetBackup media servers to share intelligent disk appliance storage.

 


NetBackup Appliances is purpose built backup appliance gives standard and predictable performance.  NetBackup 5230 and NetBackup 5330 storage shelf have RAID6. Monitored by Veritas support via call home. Operating system is on RAID1

 

Management Options

WebGUI, install NetBackup remote client on 64-bit system, through SSH

IPMI - Manage system remotely, change BIOS settings, power on/off or recycle appliance, reimage appliance.

NetBackup Features

NetBackup Instant recovery for VMware enables to start the VM from the backup and then do the VMotion to move VM from backup storage to regular storage. High speed recovery event boots backup VM images directly from storage safe; backup VM image kept in read-only mode during recovery.

Auto Image Replication (AIR) move image from one domain to another.  Requires DPO. AIR leverages SLP to simplify multi-site disaster recovery.

Accelerator Technology can transform the way you protect your critical IT infrastructure by providing the power of full back up using incremental backup. uses Synthetic backup.

FlashBackup capability is designed specifically to offer a performance solution for server with highly utilized disk file system containing large number of files. NetBackup Client creates raw backup of file system instead of file-by-file backup.  Can increase performance for highly utilized file system with many files. Supports restore of individual file objects. File system backup transferred to NetBackup Media Server as a single, raw image. Backup process change from file stream into a bit stream.

NetBackup helps customers leverage flexibility of public cloud storage by supporting all major cloud storage providers and differentiates from other solutions through proprietary OpenStorge(OST) technology.

NetBackup OpsCenter is reporting and monitoring tools. It can manage multiple domains centrally. NetBackup OpsCenter is free and NetBackup OpsCenter analytics is licensed and can forecast and generate custom reports.

NetBackup should always be updated from the top down. OpsCenter, Master Server, Media Server, Client. They do not all need to be done at the same time. A master can work with mixed media server versions and mixed client versions with some limitations and exceptions.  OpsCenter must always  be the highest level or at least match the master server.

/user/openv/netbackup/bp.conf
BPRD_VERBOSE = 5
#/user/openv/netbackup/bin/bprdreq - rereadconfig

vxlogging can be configured by the command line or GUI. Some process such as NBEMM, NBPROXY and PBX have to be configured through the command line using vxlogcfg.
Use vxlogview to retrieve the logs

Cleaning up
/usr/openv/netbackup/vxlogmgr -F  purge all vxlogs.

NetBackup Support Utility - NBSU - collects logs for support analysis.
/usr/openv/netbackup/bin/support

Troubleshooting -
Documentation and preparation are key. The catalog backup e-mail contains most of the information you need to perform the recovery.  Annual DR tests should be performed to keep documentation current. For DR tests bring extra backup tapes for each application (including multiple catalog tapes).  Do not user "overwrite files" on system restores unless your system admins tell you to.

Network communication between master/media or media/client
../admincmd/bptestbpcd - host -hostname -debug -verbose
/usr/openv/bpclntcmd -pn - checks connectivity to master server from a media server or client.

../netbackup/bin/vxlogview -p 51216 -t 00:05:00 - To print log output of the last 5 minutes run

to restrart PBX process when NBU is stopped.
/opt/VRTSpbx/bin/vxpbx_exchanged (stop|start)

DataCollect is a utility included with NetBackup appliances to collect logs for support analysis.

Catalog Backup configuration
/usr/openv/netbackup/db
/usr/openv/var
/user/open/netbackup/vault/sessions
/usr/openv/db/staging

Given important files are missing from catalog backup
/usr/openv/netbackup/bp.conf
/usr/openv/volmgr/vm.conf
/usr/openv/netbackup(include/exclude lists)
HKLM\software\Veritas\CurrentVersion\Config

Daily full catalog backup and differential increment backup - every 6 hours and retention 1-2 week

User Storage Lifecycle Policies (SLPs) to make multiple copies when possible for automation.

NetBackup Auto Image Replication introduced in NetBackup 7.1, allows a NetBackup domain to replicate its backup storage and catalog to one or more NetBackup domains. 

OpenStorage allows storage vendors to become part of STEP (Symantec Technology enabled program) and get access to OST API.  Storage vendors can write plugin using OST APIs that can be installed on the NetBackup Media server.  This enables tight integration between the storage and NetBackup. OpenStorage supports any connectivity, any protocol(FC, TCP/IP, combination) and any format.  Without OST if storage device like DataDomain performs deduplication, replication, creating copies and writing directly to tape then NetBackup will never come to know about this.  That is OST is required for the tight integration of storage with NetBackup.
 
Deploying OST plug-in for AltaVault 4.2  and NetBackup Media service 7.6/7.7 with OS updated. Download OST-plug for Windows/Redhat from NetApp AltaVault.
  • In AltaVault,  create OST share on AltaVault and select OST user
  • In NetBackup Disk Storage Server, select OST (OpenStorage) - OST sharename Underscore AltaVault name and OST user and disk pool will be created by same wizard and then create storage unit. Create policy to use  just created storage unit and initiate backup.
  •  NetBackup restore using the client.  Images manually expired will be removed from OST share on AltaVault.
admin/admin
config t
ost enable
no ost enable
show ost server
ost user ?
ost share ?
ost ssl enable - require NetBackup stop and NetBackup start to take effect
no ost ssl enable - require NetBackup stop and NetBackup start to take effect

on Linux to see if plug-in is installed correctly use
/usr/openv/NetBackup/bin/admincmd/bpstsinfo -pi | grep NetApp




NetBackup Starts with bprd process on master server and ltid on media and master server.  All process starts including nbpem, nbjm, nbrb, nbemm as required and install it on Media Server.

Backup flow at the process





NBPEM(policy execution manager) -> NBJM -> BPJOBD(make entry in jobDB) -> NBJM -> NBRB -> NBEMM -> NBJM -> BPJOBD -> NBJM -> BPDBM (catalog entry) -> NBJM -> BPBRM (media server) -> BPBKAR(client) -> LTID -> BPTM(spawn of BPTM and BPBKAR sends data to BPTM child which puts it into buffers) -> BPTM (BPTM parent puts the data in the storage) once complete then above processes runs in reverse to give completion acknowledgement.

Reference - Symantec Veritas website




https://www.youtube.com/watch?v=PBYg8naRf1M

for NetBackup pre 6.x version refer

https://vox.veritas.com/t5/Backup-Recovery-Community-Blog/Netbackup-processes-and-commands/ba-p/778784

https://annurkarthik.wordpress.com/category/data-protection/symantec-netbackup/full-system-level-restore-symantec-netbackup/

Bye...

Wednesday 3 January 2018

NetApp - Configuration Backup and Restore


Baking up the cluster configuration enables you to restore the configuration of any node or the cluster in the event of a disaster or emergency.

Configuration backup files are archive files(.7z) that contain information for all configurable options that are necessary for the cluster, and the node within it, to operate properly.  There are two types of configuration files.

Node configuration backup file

Each healthy node in the cluster includes a node configuration backup file, which contains all of the configuration information and metadata necessary for the node to operate healthy in the cluster.

Cluster configuration backup file

These files include an archive of all of the node configuration backup files in the cluster, plus the replicated cluster configuration information (the replicated database, or RDB file). Cluster configuration backup files enable you to restore the configuration of the entire cluster or of any node in the cluster.  There cluster configuration backup schedules create these files automatically and store them on several nodes in the cluster.


Procedure to perform configuration backup

On node cluster1-02
cluster1::*> system configuration backup create -node cluster1-02 -backup-type cluster -backup-name test
[Job 1950] Job is queued: Cluster Backup OnDemand Job.  

On node cluster1-04
cluster1::*> system configuration backup copy -from-node cluster1-02 -backup test.7z -to-node cluster1-04
 
 
Procedure to perform restore node from the backup

On node cluster1-02
cluster1::*> cluster modify -node cluster1-04 -eligibility false


On node cluster1-04
cluster1::*> system configuration recovery node restore -backup test.7z -nodename-in-backup cluster1-04

Warning: This command overwrites local configuration files with files contained
         in the specified backup file. Use this command only to recover from a
         disaster that resulted in the loss of the local configuration files.
         The node will reboot after restoring the local configuration.
Do you want to continue? {y|n}: y
Verifying that the node is offline in the cluster.
Verifying that the backup tarball exists.
Extracting the backup tarball.
Verifying that software and hardware of the node match with the backup.
Stopping cluster applications.
...
...

cluster1::> cluster show
Node                  Health  Eligibility
--------------------- ------- ------------
clus                  false   false
cluster1-02           false   true
cluster1-03           false   true
cluster1-04           false   false
4 entries were displayed.

On node cluster1-02
cluster1::*> cluster modify -node cluster1-04 -eligibility true

cluster1::> cluster show
Node                  Health  Eligibility
--------------------- ------- ------------
clus                  false   false
cluster1-02           true    true
cluster1-03           true    true
cluster1-04           true    true
4 entries were displayed.


Procedure to perform restore cluster from the backup

To restore a cluster configuration from an existing configuration you re-create the cluster using the cluster configuration and made available to the recovery node.

cluster1::*> storage failover show
                              Takeover
Node           Partner        Possible State Description
-------------- -------------- -------- -------------------------------------
clus           -              -        Node unreachable
cluster1-02    cluster1-01    true     Connected to cluster1-01
cluster1-03    cluster1-04    true     Connected to cluster1-04
cluster1-04    cluster1-03    true     Connected to cluster1-03
4 entries were displayed.

cluster1::*> storage failover modify -node  cluster1-02 -enabled false

cluster1::*> storage failover show
                              Takeover
Node           Partner        Possible State Description
-------------- -------------- -------- -------------------------------------
clus           -              -        Node unreachable
cluster1-02    cluster1-01    false    Connected to cluster1-01, Takeover
                                       is not possible: Storage failover is
                                       disabled
cluster1-03    cluster1-04    false    Connected to cluster1-04, Takeover
                                       is not possible: Storage failover is
                                       disabled
cluster1-04    cluster1-03    false    Connected to cluster1-03, Takeover
                                       is not possible: Storage failover is
                                       disabled
4 entries were displayed.


Halt each node except for the recovering node

cluster1::*> system node halt -node cluster1-03

Warning: Are you sure you want to halt node "cluster1-03"? {y|n}: y

cluster1::*> system node halt -node cluster1-04

Warning: Are you sure you want to halt node "cluster1-04"? {y|n}: y
cluster1::*> system configuration recovery cluster recreate -from backup -backup test.7z

Warning: This command will destroy your existing cluster. It will rebuild a
         new single-node cluster consisting of this node by using the contents
         of the specified backup package. This command should only be used to
         recover from a disaster. Do not perform any other recovery operations
         while this operation is in progress. This command will cause all the
         cluster applications on this node to restart, causing an interruption
         in CLI and Web interface.
Do you want to continue? {y|n}: y
Executing cluster recreate script.
Checking to ensure that backup replicas exist.
Stopping cluster applications.
The management gateway server restarted. Waiting to see if the connection can be reestablishedRemoving current replicas.
Restoring replicas from backup.
Restarting cluster applications; access to the CLI and Web interface will be available shortly.
The management gateway server restarted. Waiting to see if the connection can be reestablished..

The connection with the management gateway server has been reestablished.
If the root cause of the interruption was a process core, you can see the core file details by issuing the following command:
system node coredump show -node local -type application -corename mgwd.* -instance

cluster1::> set -priv advanced

Warning: These advanced commands are potentially dangerous; use them only when
        directed to do so by NetApp personnel.
Do you want to continue? {y|n}: y



cluster1::*> system configuration recovery cluster show


              Recovery Status: in-progress

 Is Recovery Status Persisted: true

 

Boot each node that needs to be rejoined to the re-created cluster.  Reboot one node at a time

 

 

cluster1::*> system configuration recovery cluster rejoin -node cluster1-03

Warning: This command will rejoin node "node2" into the local

cluster, potentially overwriting critical cluster

configuration files. This command should only be used

to recover from a disaster. Do not perform any other

recovery operations while this operation is in progress.

This command will cause node "node2" to reboot.

Do you want to continue? {y|n}: y

 

The target node reboots and then joins the cluster. Make sure the node is part of the cluster

cluster1::>cluster show -eligibility true

Once all nodes are healthy and if restore is done from the backup file use given to complete the recovery status

cluster1::>system configuration recovery cluster modify -recovery-status complete

 

In case RDB on the other node is not in sync use given command from the healthy node to sync the RDB

 

clustser1::*>system configuration recovery cluster sync -node cluster1-03

 
Bye...

Monday 1 January 2018


NetApp - SP and BMC (AFF A700s)

You can manage a node remotely using an onboard controller, caller a Service Processor (SP) or Baseboard Management Controller(BMC).  This remote management controller is included in all current platform models.  The controller stays operations regardless of the operating state of the node.

Most AFF, FAS, and Vseries platform have an SP, except for the AFF A700s, which has BMC.

SP

The SP enables to access a node remotely to diagnose, shutdown, power-cycle, or reboot the node, regardless of the state of the node controller. The SP is powered by standby voltage, which is available as long as the node has input power to at least one of its power supplies.

The SP monitors environmental sensors.

The SP logs events such as boot progress, FRU changes, events by Data ONTAP. The AutoSupport configuration settings and message content behavior are inherited from Data ONTAP.

The SP uses SMTP and it has nonvolatile memory buffer that stores up to 4,000 events in a system event log(SEL) to help you diagnose issues.

Ctrl-G to access SP and Ctrl-D and Enter to exit SP

BMC

Baseboard Management Controller(BMC) for AFF A700s woks similar to the Service Processor(SP) and uses many of the same commands.

Configure the BMC network settings, access a node remotely and perform node management tasks such as diagnose, shutdown, power-cycle or reboot the node.

Difference between SP and BMC

Automatic firmware updates are not available.  The BMC can be updated manually.
The BMC reports sensor information to ONTAP via IPMI.
AutoSupport messages are not send by the BMC.

Access the SP and BMC using the SSH.  Network configuration can be automated and access can be given to specific admin hosts

SP> help
date - print date and time
exit - exit from the SP command line interface
events - print system events and event information
help - print command help
priv - show and set user mode
sp - commands to control the SP
system - commands to control the system
version - print SP version


SP> help events
events all - print all system events
events info - print system event log information
events newest - print newest system events
events oldest - print oldest system events
events search - search for and print system events


BMC> system
Usage: system cmd [option]
Support cmd list
help - display this help message
fw - platform firmware related feature
log - log related feature
reset - reset the system
console - connect to the system console
core - dump the system core and reset
power - system power related feature
fru - FRU related feature


BMC> system power help
Usage: system power [option]
Support option list
help: print this help message
cycle: power cycle the system, then on
off: power the system off
on: power the system on
status: system power status


 

With SP API services you can run many SP commands from Data ONTAP


sp status - system service-processor show
system power status - system node power show
version - system service-processor image show





system console - system node run-console

On SP we status of various sensor with value under Current, LCR (Lower Critical), LNC (Lower Non-Critical), UNC (Upper Non Critical), UCR (Upper Critical)
Under normal conditions Current value should be between LNC and UNC. If not, then require attention.

Data Disk: Stores data within RAID group of data aggregates.
Mailbox disks: Store the HA-Pair state information necessary to be persistent across reboots. This data includes information about the cluster state, state of the mirrors, whether a shutdown was performed clean. Each node of the HA-pair designates two disks the in first raid group of root aggregate as the mailbox disk.

NSE and Standard disks can be utilized in the same cluster but they should not be mixed in the same HA nodes of the clusters.
Installation Steps:

Initate
Plan and Design - review NetApp Sales Transfer Package sKTP or dKTP, BoM, configBuilder Pdf and word doc.
Install
Test and Validate
Close out
HA interconnect: The internal bridge or external cable that facilitates the pairing of two FAS systems into a High-Availability(HA) pair.
Multipath: Specifically, a single-controller configuration whre the controller has two connections to each stack.  Generally, refers multipath and MPHA
Quad-Path specifically, a single-controller configuration where the controller has four connections to each shelf stack. Generally, refers generically to Quad-Path and Quad-Path HA.
Quad-Path High Availability(HA) An HA pair configuration where each controller has four connections to each shelf stack.
Shelf stack A group of 1-10 disk shelves daisy-chained together by one or more shelf-to-shelf connector cables.
Single Point of Failure(SPOF) Any part of a system or network that, it it fails, will stop the entire system or network from working properly.
FAS 6200 systems uses QSFP SAS cables in the NVRAM card
FAS8080 systems use infiniband cables in ports ib0a and ib0b
 
Bye...
NetApp - FlexVol vs Infinite Volume vs FlexGroup
FlexVol - ONTAP 7 - 16TB(32-bit Aggregate) and 100 TB(64-bit Aggregate) model dependent, Files 2 billion

Infinite Volume - ONAP 8.1.X - 20 PB, files 2 billions , spread across maximum of 10 nodes, single namespace metadata volume

FlexGroup - ONTAP 9.1 - 20 PB (200 constituent of 100TB), files 400 billion, spread across multiple nodes

FlexVol - Capability of increasing and decreasing volume capacity on the fly.  Can have multiple volume in single Aggregate.

Infinite volume is single namespace metadata volume therefore you get bigger volume of 20 PB but file limitation of 2 billions.

FlexGroup  volume contains multiple constituent that manage their own metadata and therefore has no single namespace metadata volume and it can have 2 * 200 billions files and bigger volume of 20PB.

reference  tr-4037.pdf from NetApp
Bye...