As a data platform expert who manages and maintains hundreds or maybe thousands of database instances, you may be thinking of a better way in managing these number of servers more efficiently. An environment this huge with a lot of vital applications relying on those databases can be quite difficult to keep track of. Especially when you’re too busy with a lot of administration work, with little resources and time you might have difficulties prioritizing your daily tasks.
As your day goes on you will encounter the usual errors which happens from time to time. It may be caused by different reasons for example; a user tried to input his/her credentials, but he/she failed logging in as the credential had already expired, maybe there was a job that failed due to a query timeout, or maybe there was a backup job which failed due to an unknown reason.
The errors mentioned are automatically logged on your error logs in SQL Server. You can view all these by expanding SQL Agent on your management studio, there you will see all the logs of your database instance.
When monitoring tens or maybe hundreds of database instances, you need to be able to track and efficiently monitor errors happening in each instance, within the database farm you are managing.
You may think that there is no easy way to manage and have full control of everything that is happening within each of your instances, luckily there is dbWatch which will assist you with your daily monitoring tasks.
dbWatch contains various tasks which helps you manage and monitor error logs of your database instances. dbWatch helps you keep track of all errors encountered on your database farm. You can also monitor error log file sizes and check if your file size is growing too much or too fast within a period. And lastly, you can configure when you wish to recycle error logs within your database instance, which helps free up disk space being consumed by the number of growing logs.
Here is an example on how to keep track of your error logs within dbWatch.
We will start by going to the monitoring module.
Here we will select a sql server instance. Right click and select configure monitoring.
In this window make sure that you have installed the tasks for monitoring error logs in SQL Server.
The following tasks are:
- Instance error log
- Instance error log file size check
- Cycle error log
After installing the tasks, go back to the monitoring module.
On the column Name, input error on the filter so we could find the tasks regarding the error logs.
If you right click on instance error log, you have an option to run the task to retrieve the updated information of the collected error logs from the instance.
Click on Details, to view the actual errors collected from this SQL Server instance.
A new window will appear, showing you details of the errors which occurred in your sql server instance.
For example, the last lines read from error log table displays all lines read, from the error log file by this task during its last execution.
The error history details table, shows the history of the last 20 collected records from your SQL Server error log file (error-text defined by the "error text" parameter).
Later, I will show you how to filter out errors which you want to be excluded by defining it on the error text parameter.
The error history graph shows the number of errors per day being registered in the log, helping you determine if the errors you are encountering are growing daily.
You don’t want the monitoring task to consume to much resources, and if it takes a long time to read the error log you should investigate it (even there are no errors in the log file). If the number of records read is very high, it could indicate potential problems in your instance.
And lastly, the alert execution aggregated statistics graph. The left vertical axis shows the total rows read per day, and the right vertical axis shows the total time spent reading the SQL Server error log per day. This helps you determine if you are experiencing an increase in error on your instance. With the information available, you can deep dive on the instance where the number of errors is growing.
Going back to the monitoring view, click on Configure.
A configuration window will appear for this Instance error log task.
You can modify the configuration based on your requirements.
The options are:
- Error text - defines which string (errors) the check must look for. Values must be separated with commas (,).
- Error text allowed - Specifies the text strings (specified by the "error text" parameter) to be excluded when found in the same record (error log line).
- History threshold - the maximum numbers of error messages the history table will retain (error_log_err_histr_tab table).
- Last run - the last time the error log has been checked.
- Max elapsed time - the maximum execution time (in seconds) for the task. If this threshold is reached a warning is raised.
- Max lines per execution - the maximum number of lines the task will read during execution before a warning/alarm is raised.
- Return status - return status value (ALARM - 2, WARNING - 1, or OK - 0) when "error text" is found (or "max lines per execution" is reached).
- Click apply to save the configuration.
You can set the schedule on when you wish this task to execute.
Right click on the Instance error log file size check and choose Details.
Note: In order to install and make this task (Instance error log file size check) work, you need to enable xp_cmdshell in SQL Server by executing the following command below, or you can set it on the configuration parameter which I will show later.
EXEC sp_configure 'xp_cmdshell', '1';
A new window will appear showing you the information in your error log directory. It shows the error log file name, create date and file size.
The error log file size history graph shows you information of your historical error log size growth rate.
The error log directory size history graph. Displays the number of files within your log directory, and the total file size for the accumulated logs. With the information available it will help you with your decision making and consider clearing some error log files, which are no longer needed as it consumes additional disk space within your environment.
Go back to the monitoring view, and right click again on the Instance error log file size check. Choose Configure.
Again, you can modify the following configuration based on your requirements
The options are:
- Enable xp_cmdshell - if set to "YES" the sql server instance configuration xp_cmdshell will be set to enabled. This parameter is required if you want to allow the task to execute operating system commands, to discover the size of files and directories.
- Error log directory size threshold - the maximum total size (in MB) of all files in the error log-file directory.
- Error log file path - path where the error log file is located.
- Error log size threshold - maximum size (in MB) of error log file before a warning or an alarm is returned by the alert.
- File count threshold - the maximum number of files in the error log catalog.
- History threshold - the maximum numbers of history statistics (in days) of error log file size.
- Return status - return status value (ALARM - 2, WARNING - 1, or OK - 0) when the "* threshold" parameters are reached.
- Click apply to save the configuration.
Choose Cycle error log. Right click then select Details
A new window will appear which displays the specific log file and the date when it was recycled. This information gives you an idea if the error log was cleared on a specific date.
Go back to the monitoring view. Right click and choose Configure.
A configuration window will appear which you can modify.
The options are:
- Cycle History - number of days to provide information when the error log and agent error log has been recycled.
- Number of SQL Server error logs - by default, there are 7 SQL Server error logs - Errorlog and Errorlog 1 to 6. This parameter can be used to modify the registry value (REG_DWORD type) to the number of logs which you wish to maintain.
Click apply to save the configuration.
Now you have an idea on how you can easily track the various errors happening within your database farm. You can set the return status so that you will be notified once a certain baseline, or error has been identified by dbWatch.
You can also configure your own parameters or baseline based on your business requirements. It is a great way of customizing different alerts based on the errors retrieved by the dbWatch task.
For more information visit: wiki.dbWatch.com