When any change to an IT system happens, events will happen. But in ITIL, you don’t just monitor the events that go wrong or create problems. All events need to be swiftly identified, monitored and rectified if necessary. Thorough event management makes IT systems more proactive and holistic.
ITIL Event management definition
Event management is monitoring the events that occur through changes and improvements in IT infrastructure. Doing this lets normal operations continue while also detecting ‘exception conditions’ or ‘exceptional events’.
What is an ‘exceptional event’ in ITIL?
An exceptional event is when something goes wrong, like a server outage. Thousands or even millions of ‘events’ happen across an IT infrastructure every day, and only a few are exceptional.
An event is any change of configuration item (CI) from one state to another within an IT service. Exception events are considered significant because they require a response to rectify them.
For example, a server moving from online to idle could be an event. It’s worth knowing about because it means action can be taken if needed. But it’s only considered an exception if it went wrong and requires immediate action.
Event management tools
Monitoring tools, or CIs (configuration items), send notifications about events. There are two types of these tools available:
- Active monitoring tools – poll key configuration items’ status and availability. An exception generates an alert.
- Passive monitoring tools – detect and correlate alerts from CIs.
For example, say a switch on a network needs to remain ‘on’. An event management tool would confirm this by monitoring the switch by sending ‘pings’ to it. Failure to respond would be logged as a change in status, sending an alert that galvanises action to fix it.
Event management examples
Events belong to one of three categories:
- Information – a successful task, like a user login or an email being received by the participant.
- Warning – when a device or service is reaching a threshold limit, like a scheduled backup not running or a server’s memory within 10% of its usable memory.
- Exception – an error given off when a component of the system acts abnormally, such as a server going down or a backup failing.
Event management metrics
You can define event management metrics during the design phase of IT services. Decide what types of events need to be generated and how they’ll be generated for each type of CI. Typical event management metrics include:
- Number of events by category
- Number of events by the significance
- Number and percentage of events that required human intervention
- Number and percentage of events that resulted in incidents or changes
- Number and percentage of events caused by existing problem or known errors
Event management process flow
Observing services and components is key to a smooth-running system. So you should regularly record and report selected changes in the system – that is, events. This helps you prioritise services and processes. In other words, knowing the cause and fix to a problem lets you identify and stop it before it happens again.