It is rather painful to root cause a backup copy failure (especially if you are new to the environment).
Here is a scenario that I observed and hope this comes in handy for someone else.
Issue was with the backup copy job in CommVault had failed with the error OS mount failed.The job was a VM backup. The events tab on the job simply returned an error "vsbkp agent on the media agent has disconnected unexpectedly".
The next step as part of troubleshooting is to look at the Attempts tab. In this case, the discover phase had completed and the backup phase is where the error was seen. (The attempts tab can be viewed by double clicking the job).
The next step in the process is to look vsbkp.log on the proxy server.
But how does one identify the proxy server.
The easiest way to identify the proxy server is to double click the job --> Virtual Machines --> Look for the column 'proxy'.
So, now that the proxy server has been identified, login to the proxy server. Once logged into the proxy server --> process manager --> processes tab --> View logs --> Open vsbkp.log filer under C:\Program Files \CommVault\Simpana\log Files.
The fifth column in the log file is the job ID. Verify that the logs are for the job ID that you are troubleshooting. Resume the backup job and monitor the logs.
Once you hit the error you get back to identify components that are causing the failure. You would need to identify the NetApp storage name, NetApp Volume and ESX host name.
1) The VM host was found using the parameter: GethostVMKernelIPList.
2) The volume and storage information is seen in the following section of the log:
In this case, the snap mount operation had failed. CommVault operation failed to mount the NetApp Volume/Datastore on the ESX host.
The next step as part of troubleshooting was to login to the Vcenter and look for mount errors. The Vcenter had a mount NFS datastore job that failed with the error : NFS has reached the maximum number of supported volumes.
The error was for the same volume that was identified earlier.
The way to work around this issue is to following the steps mentioned in the link below:
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020652
I will try and keep posting similar scenarios.
Here is a scenario that I observed and hope this comes in handy for someone else.
Issue was with the backup copy job in CommVault had failed with the error OS mount failed.The job was a VM backup. The events tab on the job simply returned an error "vsbkp agent on the media agent has disconnected unexpectedly".
The next step as part of troubleshooting is to look at the Attempts tab. In this case, the discover phase had completed and the backup phase is where the error was seen. (The attempts tab can be viewed by double clicking the job).
The next step in the process is to look vsbkp.log on the proxy server.
But how does one identify the proxy server.
The easiest way to identify the proxy server is to double click the job --> Virtual Machines --> Look for the column 'proxy'.
So, now that the proxy server has been identified, login to the proxy server. Once logged into the proxy server --> process manager --> processes tab --> View logs --> Open vsbkp.log filer under C:\Program Files \CommVault\Simpana\log Files.
The fifth column in the log file is the job ID. Verify that the logs are for the job ID that you are troubleshooting. Resume the backup job and monitor the logs.
Once you hit the error you get back to identify components that are causing the failure. You would need to identify the NetApp storage name, NetApp Volume and ESX host name.
1) The VM host was found using the parameter: GethostVMKernelIPList.
2) The volume and storage information is seen in the following section of the log:
In this case, the snap mount operation had failed. CommVault operation failed to mount the NetApp Volume/Datastore on the ESX host.
The next step as part of troubleshooting was to login to the Vcenter and look for mount errors. The Vcenter had a mount NFS datastore job that failed with the error : NFS has reached the maximum number of supported volumes.
The error was for the same volume that was identified earlier.
The way to work around this issue is to following the steps mentioned in the link below:
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020652
I will try and keep posting similar scenarios.
I believe there are many more pleasurable opportunities ahead for individuals that looked at your site. Also Visit this site for Commvault Training
ReplyDelete