Showing posts with label Maintenance Mode. Show all posts
Showing posts with label Maintenance Mode. Show all posts

Monday, February 8, 2010

Maintenance Mode - Clusters, Nodes, SQL Instances

One frustrating thing about OpsMgr's Maintenance Mode system is that if you put both nodes of a cluster into maintenance mode the "Virtual" system will continue to alert. This is a common oversight and the cause of many unnecessary cluster and/or SQL alerts.

The reason for this is that the virtual computer is the top parent which hosts the Cluster Service and SQL DB Engine. Although the nodes form part of this, they are not all-encompassing and will not suffice. Although standard logic dictates that if you're putting a node into maintenance mode you're probably working on the cluster, there is an extra step needed when dealing with this.

This powershell snippet demonstrates how to put a related "Virtual Computer" into maintenance mode. This will also result in the SQL Instance being in maintenance mode as the Virtual Computer is the host of it.

Code:

$class = get-monitoringclass -name:"Microsoft.Windows.Server.Computer"
$node = get-monitoringobject -monitoringclass:$class -criteria:"DisplayName = 'Node1'"
$agent = get-agent | ?{$_.DisplayName -eq $node.DisplayName}
$clusterMachines = $agent.GetRemotelyManagedComputers()
foreach($cluster in $clusterMachines)
{
get-monitoringobject -monitoringclass:$class -criteria:"DisplayName = $cluster.DisplayName" | Select DisplayName
#Here we can carry on and put these associated objects into maintenance mode too
}


As you can see, we make use of $agent.GetRemotelyManagedComputers() to get a list of computers which the specific agent is managing. As you are well aware, we need to Set Proxying = On if we're using clusters, so this is a sure-fire way of determining if anything "virutal" is running of it.

Powershell Script: Bulk Maintenance Mode

If you've ever needed to put a whole lot of machines into maintenance mode you'll know how tedious and time consuming this can be if you go through the GUI. There's no "cntrl + click" functionality, meaning you'll need to go to each machine and put them in.

There are obviously instances where certain machines are related by business process rather than an actual identifiable link.

My solution to this was to do the following: Allow for an import of a CSV file with a list of hostnames (just hostname, not FQDN) - loop through these and put each one into maintenance mode.

So simple. Here's the PS script (check comments for usage):

#Right, this is pretty simple. Use as follows:
#To put a BULK list of computers into maintenance mode, you will need a CSV Formatted with ONE column - that being the hostname of the machine.
#!!!!! VERY NB !!!!!!!!!!! VERY NB !!!!!!
#1) Make sure the first column is titled HostName otherwise this won't work!
#!!!!! VERY NB !!!!!!!!!!! VERY NB !!!!!!
#2) Change the $rootMS to the correct RMS.
#3) Usage is as follows: .
# Start Maintenance Mode: ./mm.ps1 START PathToCSVFile "Maintenance Mode Reason (be sure to encase in quotation marks like here)" DurationInHours
# Stop Maintenance Mode: ./mm.ps1 STOP PathToCSVFile
# Note: If you don't have START or STOP as your first parameter the script will not continue.
#4) This script implements strict error handling. A summary of each operation will be displayed after the operation has completed.
#5) If any errors are encountered, these are trapped and will be saved in CSV format in C:\errorMaintenanceMode.csv

param($action, $pathToCSV, $maintenanceModeReason, $durationHours)

$rootMS = "RMS"

if(([string]$action.CompareTo("START") -ne 0) -and ([string]$action.CompareTo("STOP") -ne 0))
{
Write-Host "Can't continue without a START or STOP action. Please read the comments in this .ps1 file for instructions."
Exit
}
$ErrorActionPreference = "Continue"
$error.Clear()

Set-Location "OperationsManagerMonitoring::" -ErrorVariable errSnapin;
New-ManagementGroupConnection -ConnectionString:$rootMS -ErrorVariable errSnapin;
Set-Location $rootMS -ErrorVariable errSnapin;

$computers = Import-Csv $pathToCSV
$resultsetCollection = @();

$currObjMaintDesc = "Bulk Maintenance Mode Update. Reason Given: " + $maintenanceModeReason
$startTime = [System.DateTime]::Now
$endTime = $startTime.AddHours($durationHours)

foreach($currentObj in $computers)
{
$currentObjName = $currentObj.HostName
$computerClass = Get-MonitoringClass -name:Microsoft.Windows.Computer
$computerCriteria = "DisplayName matches '(?i:" + $currentObjName + ")\.'"
$computer = Get-Monitoringobject -monitoringclass:$computerClass -criteria:$computerCriteria

if($action.ToUpper() -eq 'START')
{
"Starting Maintenance Mode on: " + $currentObjName
New-MaintenanceWindow -startTime:$startTime -endTime:$endTime -monitoringObject:$computer -comment:$currObjMaintDesc
}
elseif($action.ToUpper() -eq 'STOP')
{
"Stopping Maintenance Mode on: " + $currentObjName
Set-MaintenanceWindow -monitoringObject:$computer -endTime:$startTime
}
if(-not $?)
{
$errorFound = $true
$resultObj = "" | Select HostName, Status, ErrorMessage
$resultObj.HostName = $currentObjName
$resultObj.Status = "Failed"
$resultObj.ErrorMessage = $error[0]
$resultsetCollection += $resultObj
$error.Clear()

}
else
{
$resultObj = "" | Select HostName, Status, ErrorMessage
$resultObj.HostName = $currentObjName
$resultObj.Status = "Successful"
$resultObj.ErrorMessage = "No errors reported during operation"
$resultsetCollection += $resultObj
}

}
$resultsetCollection
Write-Host
if($errorFound)
{
Write-Host "Errors were encountered whilst trying to update computers. Please see the file c:\errorMaintenanceMode.ps1 for Error Info."
$resultsetCollection | Export-Csv "c:\errorMaintenanceMode.csv"
}
else
{
Write-Host "Completed entire Bulk Maintenance Mode Operation without any errors."
}