All About Enterprise Data Management : 2018

Wednesday, 26 December 2018

Commvault error during backup | CreateIndex

If you are running a MA that was upgraded/updated recently to Commvault V11 then you can exeperience the following error:

Index processing on MediaAgent [MA name] failed due to a CreateIndex processing error.

To fix the above error, run a full backup (not synthetic full) of the subclient.

Thursday, 20 December 2018

NetApp OnCommand Unified Manager import OVF template invalid value false specified for property org.linuxdistx.IPV6Auto

If you experience errors while deploying the OCUM OVA and you experience errors like below then it is recommended to download the OVA again and use a different browser to perform the VM deployment. You can also try deploying the OVA to a different ESX host.

.import OVF template invalid value false specified for property org.linuxdistx.IPV6Auto

Thanks. If anyone else has this issue then try using the Vmware Web Client (Flash)

Wednesday, 19 December 2018

NetApp device sending traps to old SNMP trap host

Recently, there was a NetApp device that was sending SNMP traps to a trap host that wasnt configured anymore.

System snmp trapshot show command showed only one host while the device was sending traps to two hosts.

The following command was used to identify the traphosts that were configured :

::*> debug smdb table snmpTargetAddr show

Sunday, 16 December 2018

NetApp VSC: Default username and password

The default username for VSC is maint and the default password for that user is admin123

Also, if you have trouble viewing the plugin in Vcenter after the VSC is installed and integrated then it would help to reboot the Vcenter appliance.

Sunday, 9 December 2018

Pure Storage : Turn the LED on/off

If a part of the Pure storage is being replaced then you can use the System Health page of the System tab in the GUI to turn on the LED

Pure Storage : Array Capacity information

The Analysis tab of PURE storage GUI display Array capacity utilization.

Here is a breakdown of the color code:

1) All Volume: (white): Is the total usable capacity of the array.
2) All volumes - system: (Grey) : space occupied by internal array metadata
3) All volumes - Shared space: (blue) : Physical space occupied by deduplicated data (volume and snapshots)
4) All volumes - snapshot : Purple: Physical space occupied of unique data
5) All volumes - volume: Light Blue: Physical space used by volume and not shared with other volumes.

Pure Storage : Default user name and password

Just in case.

The default username and password for Pure storage is pureuser and pureuser

What is your story, please share via comments.

Monday, 3 December 2018

NetApp : Some of the aggregates are in the offline or restricted state and the volume in these aggregates are not displayed

If you see the following message on the system manager then it is best to raise a support case with NetApp support.
Some of the aggregates are in the offline or restricted state and the volume in these aggregates are not displayed

Provide them the link to the following KB article.

The debug vreport show provides more details into what is happening.

Thursday, 29 November 2018

Commvault : Backup capacity utilization based on backup clients

When the Commvault disk library fills up in Commvault, one very useful information to have is the distribution of capacity based on the clients. This information allows the identification of clients that consume the most capacity on the library.

There are no reports available in the Commcell console that provides this information. However, the following webconsole report (Client Storage Utilization by Storage Policy Copy report) is helpful.

https://cloud.commvault.com/webconsole/softwarestore/#!/135/664/6195

Thursday, 15 November 2018

Migrate OnCommand performance manager data to unified manager ( Job remain in progress)

Recently, an OCPM (OnCommand performance manager) managed NetApp Cluster data was migrated to OCUM (OnCommand Unified Manager) v7.2P1.

The steps to migrate the cluster data was straight forward and the steps are mentioned here.

The trouble was that after the migration was initiated the job remained IN-PROGRESS for almost 2 weeks and there was no way to ascertain if the job was in progress.

A support case was raised with NetApp and a support bundle was generated. Upon analysis it was found that there was an events table that was too large. Here is a snippet from the logs:

c.n.d.u.m.i.m.o.OpmBaseMigrator (http://OpmBaseMigrator.java:76) - Migrating data from table - continuous_event_participant succeeded in 20,066 sec

2018-11-06 04:59:29,890 INFO [main] c.n.d.u.m.i.m.o.OpmBaseMigrator (http://OpmBaseMigrator.java:68) - Migrating data from table - continuous_event_participant_stats

2018-11-06 11:47:12,191 INFO [main] c.n.d.u.m.u.MigrationUtil (http://MigrationUtil.java:96) - Looking for process id - 8,149. Process list command output - Optional[[ 8149 java]]

2018-11-06 11:47:12,265 INFO [main] c.n.d.u.m.i.MigrationStatusService (http://MigrationStatusService.java:310) - Validate Active Migrations - Found a datasource with id 2 and its java process id 8,149

2018-11-06 11:47:12,269 INFO [main] c.n.d.u.m.i.MigrationStatusService (http://MigrationStatusService.java:253) - Unable to find data source to migrate or there is no datasource in scheduled status.

If you are running into a similar issue then it is best to raise a support case and share a support bundle with them .

If you are experiencing something similar then feel free to leave a message/note.

Monday, 5 November 2018

Filter Hyper-V VMs using notes || Commvault content filters

Recently, there was a request to filter a bunch of VMs from backups within Commvault.

The following process was followed:

1) Under the subclient properties, a filter was created for the rules that looks for notes as 'nobackup'. See screenshot below.

2) The VM engineer was informed to add notes to the hyper-V VM.

There are two ways to do it:

2.a) Hyper-v Manager sees nobackup in Notes when highlighting the VM Settings>Management>Name to edit the Notes section

2.b)Virtual Machine Manager sees nobackup in Properties>General>Description and can be edited directly

Thursday, 1 November 2018

Commvault V11 SP13 upgrade / update and backup failures

Recently, the Commserve and media agent of a Commvault environment was updated to SP13. Post the update, backups failed to run with client reporting errors related to 'client services not running'.

It was a big mess with all clients being impacted.

The fix to the issue was to download and install the hotfixes for SP13 from cloud.commvault.com and then restart the services.

Hope this helps someone out there.

Leave a comment to let me know that you share my pain. :-)

Commvault SQL restore fails

Recently, while performing a restore of a SQL DB in Commvault, the restore job failed. The attempt tab of the restore job showed no details and the events tab had the following entry with a timestamp [note the time stamp as it will be handy when reviewing logs].

Received failed message for job [job ID] , phase [Database Restore].

The next step was to login to the SQL client and review the SQLiDA logs. (if the access to the client is not available then right click the Commvault restore job and click on 'view logs' and look for the following section and search for the string 'VD_E_TIMEOUT'

File : SQLiDA.log

Here is how the logs might look:

Solution:

Restart the commvault services on the client.
Ensure the client is running the latest service pack (in my case the client had to be updated from V11 SP 9 to V11 SP13)
Most importantly, follow the Commvault link and apply both the VDI settings (VDI timeout and VDI retry). http://documentation.commvault.com/commvault/v11/article?p=18144.htm

Monday, 22 October 2018

NetApp ONTAP Service Processor : Connection Refused

Recently, there was an issue with connectivity to the service processor of one of the cluster nodes. The SSH connection to the IP address of the service processor would throw an error 'connection refused'.

The issue was resolved by rebooting the service processor.
<Clustername>::> system service-processor reboot-sp -node <node name>

Wednesday, 17 October 2018

OnCommand Performance Manager migration to OnCommand Unified Manager fails

Background:

Recently, the OnCommand Unified Manager (OCUM) was upgraded to 7.2. After upgrade, OnCommand Performance Manager (OCPM) data from 7.1 instance was migrated to OCUM 7.2.
The following document was used to perform the migration.
https://www.netapp.com/us/media/tr-4589.pdf

Issue:

The migration process was reported as FAILED . One easy way to find out what caused the failure is to re-run the migration. As you re-run the migration, the workflow will show what caused the previous job to fail. Please note the cause of the failure

Alternatively, you can generate a support bundle (KB is here) and review the migration.log under JBOSS directory of the support bundle.

Resolution:

Based on the cause of the failure, you can use one of the following KBs:

Migrating performance data to OnCommand Unified Manager 7.2RC1 shows status 'FAILED':

https://kb.netapp.com/app/answers/answer_view/a_id/1070622/loc/en_US

Performance data migration fails between OnCommand Performance Manager and Unified Manager 7.2x due to error java.io.EOFException: unexpected end of stream:

https://kb.netapp.com/app/answers/answer_view/a_id/1005077/loc/en_US

Performance data migration to newly upgraded UM 7.2 fails due to missing table partitions

https://kb.netapp.com/app/answers/answer_view/a_id/1005199/loc/en_US

Performance data migration from Unified Manager 7.1 to 7.2 fails with error: "Error writing file '/tmp/MYn75pE5' (Errcode: 28 - No space left on device)"

https://kb.netapp.com/app/answers/answer_view/a_id/1072873/loc/en_US

Sunday, 7 October 2018

CommVault V11 SP update : error Required package [56] are not present

Recently, a CommVault customer requested an update of their CommVault environment from v11 SP10 to SP12. While the upgrade of the CommServe was successful, the upgrade of the media agent failed with the following error:
"Required package(s) [56] are not present in current DVD. Record a new custom package with all required packages."

A lot of time was spent on verifying if the procedure used to download the software package was correct. Later CommVault support identified that the antivirus software was blocking an MSI file at the following location:
<Software Cache>\DownloadPackageLocation2\DownloadPackageLocation_WinX64\BinaryPayload\NetworkStorageServer.msi

Fix:

The antivirus software on the server(media agent) was configured to exclude CommVault directories and that helped resolve the issue. Here is the document that I followed:
(http://documentation.commvault.com/commvault/v11/article?p=8665.htm

CommVault VM restore : VM Blue Screens (BSOD)

Recently, a CommVault customer requested a VM restore. The routine browse and restore procedure was used to restore the VM successfully. However, the restored VM would blue screen when powered on.

hmmm... was the VM not backed up properly ? Is the restore procedure right ? Are we missing/ ignoring a flag ? These were some questions that needed answer.

but ... CommVault support looked at the issue and identified it to be a bug in V11 SP10 and suggested an upgrade to SP12. (note: there is a workaround available in SP10).

Tuesday, 18 September 2018

Netapp Storage : check if a network port is open via telnet

If you need to check whether your storage system can access an IP :<port> pair then you can use the following command
cluster*:> system node systemshell -node <node name> -command telnet <IP> <port>

Monday, 27 August 2018

Monitoring NetApp systems

If you are NetApp system admin and want to monitor the storage system using a tool outside of unified manager then you want to consider monitoring the following areas:
1) Availability (For ex: Is the aggregate offline? )
2) Capacity (For ex: Is the aggregate full ?)
3) Configuration ( For ex : Has the aggregate been reconfigured? )
4) Performance (For ex: Has the IO latency been high on the aggregate)

It is advisable that you consider the above mentioned 4 factors on the following components:
a) Aggregates
b) Cluster events
c) Disks events
d) Enclosure events
e) Fan events
f) Flash card events
g) Inodes
h) LIFs
i) LUN
j) MetroCluster bridge
k) MetroCluster connectivity
l) MetroCluster switch
m) cluster nodes
n) NVRAM
o) Ports
p) Power Suppy
q) SnapMirror
r) Qtree
s) SnapShot
t) SnapVault
u) SFO
v) SP
w) Storage services
x) Storage shelf
y) SVM events
z) user and groups
a.a) Volume
a.b) Volume move
a.c) Unified Manager monitoring

Wednesday, 25 July 2018

NetApp E-Series Autosupport not delivered

If you have configured NetApp autosupport from E-Series and you find that the autosupport messages are not being delivered to NetApp then you should look at the 'Storage Manager Event Monitor' service.

Here is a snippet from the NetApp KB.

Perform the following steps in Windows:

Run the command with administrator privilegesservices.msc
In the Services window, navigate to Storage Manager Event Monitor
Right-click the service and select Properties
If the service is not running, click Start and verify if the issue is resolve

Perform the following step for Linux/UNIX:

Start the SMmonitor service: service SMmonitor start

Tuesday, 10 July 2018

Failed to enumerate LUN

Recently, a customer of ours reported the following error:

Failed to enumerate LUN
Device path : xxxx
Storage path: xxxx
SCSI address:
Error code 0xc00402fa
Error description: A LUN with device path \\xxxx and SCSI address (a,b,c,d) is exposed through an unsupported initiator.

The also reported that they could not enumerate the disks in snapdrive.

As it turns out, they had not enabled the Vcenter integration settings.

So, if you are facing this issue as well, then verify the 'VirtualCenter or ESX Server Login Settings' within SnapDrive are accurate and enabled.

Thursday, 28 June 2018

The recycle bin on C: is corrupted

On one of the windows systems, the following error message popped up every other time.

In order to fix the issue, a command prompt was launched as an administrator and the following command was run.

rd /s /q C:\$Recycle.bin

Monday, 25 June 2018

NetApp system manager fails to launch

hmm...

This morning I had an issue with system manager 3.1.2. The application failed to launch. No error , no prompts and just fails to launch.

I would recommend the following steps:

1) Check if browser is compatible (read release notes).

2) If you have system manager installed on a DFM server then it is best to uninstall it from the DFM server and use a different server for system manager.

3) Verify you have the right java version installed (uninstall older versions of Java)

4) Add http://127.0.0.1 to the local intranet zone sites

5) Following the two KBs below:

https://library.netapp.com/ecmdocs/ECMLP2602644/html/frameset.html (refer Unable to launch System Manager)
https://kb.netapp.com/app/answers/answer_view/a_id/1006756/loc/en_US

Sunday, 24 June 2018

network access message the page cannot be displayed [ NetApp unified manager ]

Hmm...

If you were trying to access NetApp products like unified manager and you receive the following error then you can try accessing the unified manager using IP address:

'network access message the page cannot be displayed'

I believe these errors are caused by slow DNS responses.

Wednesday, 20 June 2018

NetApp system Manager 500 Connection has been shutdown

If you encounter the following error on system manager then this is an issue caused by the requirements of the latest java JRE.

To fix the issue, you need to enable TLS on the NetApp controller. Options TLS is the command.
You will also need to enable TLS V2.

500 Connection has been shutdown:
javax.net.ssl.SSLHandsakeException: Server chose SSLv3, but that protocol version is not enabled or not supported by the client.

filer_A> options tls
tls.enable on

Sunday, 17 June 2018

VSC errors: Current vCenter Server context is unknown' a and '"The hostname:port# cannot be resolved"

Issue 1:

Hmm.... The VSC 6.2.1 of mine was reporting the following error. The way to fix the issue was to reboot the server that is running VSC. Ha ha ha ha.

faultCode : Server.Processing

faultString:'java.lang.RuntimeExeption : Current vCenter Server context is unknown'

faultDetail:'null'

If you know what causes the issue then please leave a comment.

Issue 2:

"The hostname:port# cannot be resolved"

To resolve this issue,
1) login to the VSC server and edit the file java.security at the following path:

2) Reduce the limit of accepted RSA keySize by modifying the option jdk.certpath.disabledAlgorithms:

jdk.certpath.disabledAlgorithms=MD2, RSA keySize < 512

3) Register the VSC again and restart the server.

Friday, 20 April 2018

NetApp Cluster interconnect error: csm.sessionFailed: Cluster interconnect session

If you experience the following error messages on a NetApp controller then it can be a bug.

csm.sessionFailed: Cluster interconnect session (req=<node name> :dblade, rsp=<node name>:dblade, uniquifier=<ID>) failed with record state ACTIVE and error CSM_CONNABORTED.

The bug has been fixed in 8.3.2 and 9.x. For details about the bug, refer to the BURT here.

Sunday, 15 April 2018

NetApp SSH session timeout

When working with NetApp Storage systems, it is best to configure SSH session timeout so that the idle SSH sessions to NetApp controllers are closed.

You can do the session timeouts by using the command 'system timeout show' and 'system timeout modify <mins>' .
And if you are feeling lazy, you can set the session timeout to zero and it wont logout idle sessions.

Thursday, 12 April 2018

NetApp : how to list environment environment from cluster shell

Here are the commands that help you list the environment variables of a NetApp FAS controller (without having to reboot the node to loader prompt and type printenv)

::> set diagnostic
::*> debug kenv show -node <node name>

Wednesday, 11 April 2018

Accessing etc files of NetApp via HTTP (error 403 - auto indexing disabled)

If you are accessing the NetApp files on the root volume and you get an error 403 - auto indexing disabled then you can enable autoindexing on the controller.

node> options httpd.autoindex.enable on

NetApp Cluster HA is not working correctly | ALERT rdb.ha.mboxError: Bidirectional failover under the 'cluster HA' configuration is not currently functional due to problem with the on-disk mailboxes.

On one my NetApp 8.3 cluster, the following errors were seen. Here are the steps that were taken to resolve the issue.

From Event logs
ALERT rdb.ha.mboxError: Bidirectional failover under the 'cluster HA' configuration is not currently functional due to problem with the on-disk mailboxes.

From command output
*>Cluster ha show

High Availability Configured: true
High Availability Backend Configured (MBX): false

Warning: Cluster HA is not working correctly. Make sure that both nodes are healthy by using the "cluster show" command; then reconfigure cluuster HA to correct the configuration. Check the output of "cluster ha show" following the reconfiguration to verify node
health. If reconfiguring cluster HA does not resolve the issue, contact technical support for assistance.

First verify, the VLDBs are in sync , especially MGWD. Once the VLDBs were verified, the cluster HA was disabled and enabled.

::*> cluster ha modify -configured false

Warning: The High Availability (HA) configuration SFO mailbox data appears to be damaged or absent, preventing a normal exit from HA configuration. In order to forcibly exit safely, it is required that all
management services be online on both nodes. Please verify this before continuing. The system will exit HA without zeroing the mailbox data.
Do you want to continue? {y|n}: y

Notice: HA is disabled.

::*> cluster ha show
High Availability Configured: false
High Availability Backend Configured (MBX): false

Warning: Cluster HA has not been configured. Cluster HA must be configured
on a two-node cluster to ensure data access availability in the
event of storage failover. Use the "cluster ha modify -configured
true" command to configure cluster HA.

::*> cluster ha modify -configured true

Warning: High Availability (HA) configuration for cluster services requires that both SFO storage failover and SFO auto-giveback be enabled. These actions will be performed if necessary.
Do you want to continue? {y|n}: y

Notice: HA is configured in management.

::*> cluster ha show
High Availability Configured: true
High Availability Backend Configured (MBX): true --> YEAH!!!

Tuesday, 10 April 2018

CommVault VM backup job fails with Failed to Open Disk

If you have a backup copy job in CommVault that fails with the error " Failed to Open Disk" then right click the job and select view logs.

Click 'view all' on the log file window and then search for 'Failed to Open Disk'.

In my case, the error was because there was an independent disk on the VM. Click here for more information about independent disks.

If you would like the backup copy to complete with errors instead of failing the job, then you set an additional setting to the media agent
http://documentation.commvault.com/additionalsetting/details?name="IgnoreUnsupportedDisks"&id=1318

Updated: Dec-2020

Another reason why backup can run into 'Failed to open disk' is when there is an unintialized disk within the windows VMs. So check on the disk management UI if there are any uninitialized disks present.

Monday, 26 March 2018

Enable SSH on all hosts of a Vcenter cluster

Here is a one liner that helps start SSH services on all ESX hosts within the cluster
Get-Cluster | Get-VMHost | ForEach {Start-VMHostService -HostService ($_ | Get-VMHostService | Where {$_.Key -eq “TSM-SSH”})}

and here is the command to disable it

Get-Cluster | Get-VMHost | ForEach {Stop-VMHostService -HostService ($_ | Get-VMHostService | Where {$_.Key -eq “TSM-SSH”})

Thursday, 8 March 2018

VSC migration before Vcenter upgrade

Recently, vcenter server was upgraded from 5.5 to 6.5. This meant that the windows servers used for Vcenter was shutdown and an appliance was deployed.

The VSC server was running on Vcenter and it had to be migrated out of Vcenter as VSC cannot run on the Vcenter appliance. It wasnt possible to deploy VSC appliance as ONTAP didnt support VSC on an appliance.

Therefore, a new windows server was spun up to function as VSC server. Here are the steps I followed:

1) Uninstall VSC on VCenter.

2) Created a copy of the reporsitory folder under <install path>/netApp/virtual storge console/smvi/server/repository

3) Remove VSC plugin from Vcenter
a) login to https://vcenter/mob
b) Click Content
c) Click Extension Manager
d) You will see the NetApp VSC extension there (click on NetApp extension and note the ID)
e)Click unregister extension
f) Enter the name of the plugin and click invoke method
4) Installed VSC on the new server

5) Registered the plugin.

6) Copied the resposity backup created in step 2 to the install path of the new VSC server.

7) Logged into VCenter and add the storge array in VSC.

8) Verified the backup jobs are visible under Home -->Vcenter --> backup jobs.

Reference link : https://kb.netapp.com/app/answers/answer_view/a_id/1029905