Custom SQL Job Failure Notifications

Job failure notifications is a common way to send notification when a SQL job fails. Setting up job notifications is fairly straight forward. However, sometimes there may be a need to do custom logic on when these notifications need to be sent. Since the job history is logged in the msdb database, we can query the tables and build out of the box solutions. One solution I had to do recently was to send notification email only if the job failed 5 successive times.

In order to do this, whenever the job fails, it should execute a step( part of the job) to check the last 4 run’s job final output (step id = 0) and if they all failed, send an email notification.

Below code shows how this can be done.

Declare @JobID uniqueidentifier
SET @jobID = CONVERT(uniqueidentifier,$(ESCAPE_SQUOTE(JOBID)))

IF ( SELECT MAX(run_status)
FROM (SELECT A.run_status
FROM Msdb.dbo.sysjobhistory A 
INNER JOIN Msdb.dbo.sysjobs B ON A. Job_id=b.job_id
WHERE step_id=0 AND B.Job_ID = @jobID
ORDER BY A.instance_id DESC OFFSET 0 ROWS  FETCH First 4 ROWS ONLY) A)= 0

BEGIN
EXEC msdb.dbo.sp_send_dbmail 
    @profile_name = '<<SQLMailProfile>>', 
    @recipients = 'youremail@email.com', 
    @body = '<<email body>>', 
    @subject = '<<email subject>>' ; 
END

 
Advertisement

Maintenance Plan History

The below query returns the maintenance plan history along with the details on the tasks present in the plan. To log the actual command and task details,”log extended information” under Reporting and Logging option should be checked in the maintenance plan.
Select @@SERVERNAME [servername],
case when D.Succeeded=1 then 'Success' when D.succeeded=0 then 'Failed' End as Result,
A.name,B.subplan_name,D.line1,D.line2,D.line3,D.line4,
D.line5,D.start_time,D.end_time,D.command
From msdb.dbo.sysmaintplan_plans a inner join msdb.dbo.sysmaintplan_subplans b on a.id=b.plan_id
inner join msdb.dbo.sysmaintplan_log c on c.plan_id=b.plan_id and c.Subplan_id=b.subplan_id
inner join msdb.dbo.sysmaintplan_logdetail d on d.task_detail_id=c.task_detail_id
Order By D.start_time DESC

How much time does Restore VerifyOnly take?

While,it is important to verify the database backups by doing a complete restore,it might not be possible to do that in every scenario. So, in such cases, the next best thing is to do “Restore VerifyOnly”. Restore VerifyOnly does not actually restore the database but will validate the check sum, if enabled on the backup and restore some of the data page headers and will try to make sure the backup is readable and restorable as much as possible but cannot be guaranteed.

For some of our backup jobs, we have “Restore VerifyOnly” set up in the same job step as the backup. This means that SQL Agent job step runs till the database backup is taken and restore verify process has completed. Now, if we want to have a break down on how much time was spent to backup the database and verify the backup individually, we need to subtract the database backup time from the job run time to get the time taken to perform restore verify. All the backup information is available in MSDB database in backupset table. We can use msdb.dbo.backupset table to calculate the total time the backup has taken.

The backup job in this case is Full Backup job that runs once a day. So, it is easier to join the database backup with sql agent job info on the date column.

declare @databasename varchar(200)
set @databasename =
mydatabase
declare @JobName varchar(200)
set @JobName=
mybackupjobname

Select A.database_name as [DatabaseName],B.Name as [JobName],B.run_date as [Job_RunDate],B.Run_time as [Job_RunTime],
B.[JobRunTimeINMins] as [TotalJobRunTime_In_Mins],BackupSize_In_GB,[BackupTime_In_Mins],[BackupType],backup_start_date,backup_finish_date,
(JobRunTimeINMins -[BackupTime_In_Mins]) as RestoreVerify_Time_In_Mins
from
(select top 100 percent database_name,cast(backup_size/1024/1024/1024 as decimal(8,3)) as BackupSize_In_GB,Case WHEN type='D' then 'FULL' WHEN type='L' then 'Log' When type='I' then 'Diferential' End as [BackupType],
backup_start_date,backup_finish_date, datediff(minute,backup_start_date,backup_finish_date) as [BackupTime_In_Mins]
from msdb.dbo.backupset where database_name= @databasename and type='D' and Backup_start_date >=dateadd(month,-6,getdate())
order by backup_set_id desc) A INNER JOIN
(select top 100 percent B.Name,A.run_date,A.Run_time,
(cast(SUBSTRING(right('000000'+cast(run_duration as varchar(6)),6),1,2)*3600 as int)+
Cast(SUBSTRING(right('000000'+cast(run_duration as varchar(6)),6),3,2)*60 as int)+
Cast(SUBSTRING(right('000000'+cast(run_duration as varchar(6)),6),5,2) as int))/60 as [JobRunTimeINMins]
from msdb.dbo.sysjobhistory A INNER JOIN msdb.dbo.sysjobs B on A.job_id=B.job_id
where step_id=0 and B.name=@JobName
order by instance_ID desc) B on B.run_date=convert(varchar(8),A.backup_start_date,112)
order by run_date desc

TSQL code to view Job Activity

SQL Server Agent is a very key component of SQL Server and typically used to schedule jobs,alerts etc. Whenever we need to restart SQL Server agent, it is good practice to make sure there are no running jobs. We can check this using job activity monitor or using the TSQL code below.

I prefer the TSQL code as it is easier and also can run it on central management server(CMS) to get the list of running jobs across multiple servers.

--currently running jobs in job activity monitor(SQL 2005 and above)
Select @@SERVERNAME as [DBServer],C.Name as [JobName], run_requested_date,start_execution_date,stop_execution_date
from (Select max(Session_id) as Session_Id from msdb.dbo.syssessions) A
INNER JOIN msdb.dbo.sysjobactivity B on A.Session_id=B.Session_ID
INNER JOIN msdb.dbo.sysjobs C on B.job_id=C.Job_ID
where B.stop_execution_date is null AND B.run_requested_date is not null

Deleting Old Backup Information from MSDB

MSDB is a system database which stores information on backup history,sql jobs,alerts, database email, job history …etc.With regards to backups,It stores all the backup information unless you have some delete routine to delete the information from the backup tables. Generally, there will be a retention policy which defines how long old backup files are to be retained. This can be anywhere between couple of weeks to years depending on the requirement. Once you determine your retention requirement,you might not need backup information in MSDB beyond your retention policy, as you do not have the files, this information does not really help you. Also, this will keep MSDB database size in check and will also result in smaller MSDB backup file.

So, to delete this information, there are bunch of backup tables where the information is stored. The following pic shows the corresponding tables that store backup information.
MSDB Backup Diagram

Now, Create the below procedure that deletes the data from the above tables.

If Exists(Select 1 from sys.procedures where name='usp_DeleteBackupInformation')
Begin
Drop Procedure usp_DeleteBackupInformation
End
GO
Create Procedure usp_DeleteBackupInformation(@Purgedate datetime)
as
Begin
Set nocount on
Begin Try
Begin Tran

--deletes backupfile info
Delete A from msdb.dbo.backupfile A INNER JOIN msdb.dbo.backupset B
on A.backup_set_id=B.backup_set_id Where backup_finish_date < @PurgeDate

--deletes backupfilegroup info
Delete A from msdb.dbo.backupfilegroup A INNER JOIN msdb.dbo.backupset B
on A.backup_set_id=B.backup_set_id Where backup_finish_date < @PurgeDate

--deletes restorefile info
Delete A from msdb.dbo.restorefile A INNER JOIN msdb.dbo.restorehistory B
on A.restore_history_id=b.restore_history_id INNER JOIN msdb.dbo.backupset C
on C.backup_set_id=B.backup_set_id Where backup_finish_date < @PurgeDate

--deletes restorefilegroup info
Delete A from msdb.dbo.restorefilegroup A INNER JOIN msdb.dbo.restorehistory B
on A.restore_history_id=b.restore_history_id INNER JOIN msdb.dbo.backupset C
on C.backup_set_id=B.backup_set_id Where backup_finish_date < @PurgeDate

--deletes restorehistory info
Delete A from msdb.dbo.restorehistory A INNER JOIN msdb.dbo.backupset B
on A.backup_set_id=B.backup_set_id Where backup_finish_date < @PurgeDate

--delete backupset info
Delete from msdb.dbo.backupset where backup_finish_date < @PurgeDate

--deletes backupmedia info
Delete A from msdb.dbo.backupmediafamily A where media_set_id not in
(Select distinct media_set_id from msdb.dbo.backupset)

--deletes backupmediaset info
Delete A from msdb.dbo.backupmediaset A where media_set_id not in
(Select distinct media_set_id from msdb.dbo.backupset)

Commit
End Try
Begin Catch
Select ERROR_NUMBER as ErrorNumber, ERROR_MESSAGE() as ErrorMessage,
ERROR_SEVERITY() as ErrorSeverity
Rollback
End Catch
End

Example : usp_DeleteBackupInformation '20070101'
In the above example, it will delete all the backup information prior to Jan 1st 2007.

PowerShell – Replacing backup path in SQL Jobs

Recently, I had to move backups to a different drive ,so, I copied all the existing backups to the new location but the problem is with our SQL backup jobs. Our backups are scheduled using SQL Server Agent and the backup path is hard coded in the job step. There are 100’s of jobs across multiple servers ,so, I had to edit each one of them and replace the path with the new one. This solution would take forever and so, i thought of doing it in a better way such as using a script.

Below is the powershell code which basically loops through the backup jobs in the text file(I saved the backup job names to a text file) and scripts the backup job step to a text file(backup job has multiple steps but step 1 has backup location hard coded in it) and drops that step from the job. Now rerun the script that is saved into the file after replacing the backup location with the new one.

$Module = Get-Module | where-object {$_.Name -like ‘SQLPS’} | Select-Object Name
IF ($Module -ne “SQLPS”)
{
Import-Module SQLPS
}
$BackupJobs=Get-Content "C:\Users\SQLWhisper\Documents\BackupJobs.Txt"
Foreach ($BackupJob in $BackupJobs)
{
$a =new-object microsoft.sqlserver.management.smo.server "ServerName\InstanceName"
$b=$a.JobServer.Jobs|where-Object {$_.Name -like $BackupJob}
$c=$b.JobSteps|where-object {$_.ID -like "1"}
$C.Script() |Out-File "C:\Users\SQLWhisper\Documents\BackupJobs-Step1.txt" -append
$C.Drop()
}
Remove-Module SQLPS
}

PowerShell – Script to Drop SQL Jobs

Powershell script to drop SQL Agent Jobs :

Param([String] $ServerName,
[String] $JobName)

$Module = Get-Module |where-object {$_.Name -like "SQLPS"} | Select-object Name
If ($Module -ne "SQLPS")
{
Import-Module SQLPS
}
$a=new-object Microsoft.SQLServer.Management.Smo.Server $ServerName
$b=$a.Jobserver.Jobs|where-object {$_.Name -like $JobName}
$b.drop()

Remove-Module SQLPS

Save the script as “PSDropSQLJob.ps1” and to run the script,
1.First the set the location to where the file exists:
set-location c:\users\sqlwhisper\desktop
2.To execute the script,
.\PSDropSQLJob.ps1 “ServerName\InstanceName” “NameOfJob”

PowerShell – script to monitor Job Activity on the Server.

Powershell script to monitor current activity on SQL Server Agent jobs. You can use the below script and find the activity of a specific job or all jobs. The output will be similar to job activity monitor in SQL Server Agent.
Script :
param(
[string]$SQLServer,
[string]$JobName)
$Module=get-module | where-object {$_.Name -like 'SQLPS'}
If ($Module -ne 'SQLPS')
{
import-module SQLPS
}
$a = new-object microsoft.sqlserver.management.smo.server $SqlServer
IF ($JobName -eq 'ALL')
{
$a.Jobserver.Jobs | select-object OriginatingServer,Name,
LastRunDate,LastRunOutCome,NextRunDate,CurrentRunStatus,
CurrentRunStep,EmailLevel,OperatorToEmail | Sort-Object LastRunOutCome,Name |FT -Auto
}
Else
{
$a.Jobserver.Jobs | where-object {$_.Name -like $JobName }| select-object OriginatingServer,Name,LastRunDate,LastRunOutCome,NextRunDate,
CurrentRunStatus,CurrentRunStep,EmailLevel,OperatorToEmail |FT -Auto
}

Remove-module sqlps

After creating the above script(save the script as “PSJobActivity.ps1”), you can run the script as
1. Before executing the script,set the path to where the script file exists.
Set-location C:users\sqlwhisperer\Desktop
2. To Execute the script
— retrieves job activity of a specific job on the server.
.\PSJobActivity.ps1 “ServerName\InstanceName” “NameOfYourJob”
–retrieves job activity of all the jobs on the server.
.\PSJobActivity.ps1 “ServerName\InstanceName” “ALL”