NIST SP 800-61 defines the following:
- An event is any observable occurrence in a system or network.
- Adverse events are events with negative consequences.
- A computer security incident is a violation or imminent threat of violation of computer security policies, acceptable use policies, or standard security practices.
While the focus in this section is specifically on computer security incidents, keep in mind that these principles apply to power failures, natural disasters, and so on.
Though every incident starts as an event (or multiple events), every event is not necessarily an incident. To take the distinction even further, not every event is even considered adverse or negative. Some examples of events include- A website visitor downloads a file.
- A user enters an incorrect password.
- An Administrator is granted root access to a router.
- A firewall rejects a connection attempt.
- A server crashes.
- A hacker encrypts all your sensitive data and demands a ransom for the keys.
- A user inside your organization steals your customers’ credit card data.
- An Administrator at your company is tricked into clicking on a link inside a phishing email, resulting in a backdoor connection for an attacker.
- Your HR system is taken offline by a Distributed Denial of Service (DDoS) attack
Make sure you don’t use the terms event and incident interchangeably; they’re not the same. I have heard IT professionals refer to simple events as incidents, and that’s a great way to sound alarms that don’t need to be sounded!
Incident handling starts well before an incident even occurs and ends even after things are back to normal. The Incident Response (IR) Lifecycle (shown) describes the steps you take before, during, and after an incident. The key components of the IR Lifecycle are- Preparation
- Detection
- Containment
- Eradication
- Recovery
- Post-Mortem
Preparing for security incidents
You can easily overlook preparation as part of the incident response process, but it’s a critical step for the rapid response to and recovery from incidents. This phase of incident handling includes things like- Developing an Incident Response Plan. Your Incident Response Plan identifies procedures to follow when an incident occurs, as well as roles and responsibilities of all stakeholders.
- Periodically testing your Incident Response Plan. Determine your plan’s effectiveness by conducting table-top exercises and incident simulations.
- Implementing preventative measures to keep the number of incidents as low as possible. This process includes finding vulnerabilities in your systems, conducting threat assessments, and applying a layered approach to security controls (things like network security, host-based security, and so on) to minimize risk.
- Setting up incident analysis equipment. This process can include forensic workstations, backup media, and evidence-gathering accessories (cameras, notebooks and pens, and storage bags/bins to preserve crucial evidence and maintain chain of custody).
Detecting security incidents
During this phase, you acknowledge an incident has indeed occurred, and you feverishly put your IR Plan into action. This phase is all about gathering as much information as possible and analyzing it to gain insights into the origin and impact of the breach. Some examples of activities during this phase are- Conducting log analysis and seeking unusual behavior. A good Security Information and Event Management (SIEM) tool can help aggregate different log sources and provide more intelligent data for your analysis. You’re looking for the smoking gun, or at least a trail of breadcrumbs, that can alert you to how the attack took place.
- Identifying the impact of the incident. What systems were impacted? What data was impacted? How many customers were impacted?
- Notification of appropriate individuals. Your IR Plan should detail who to contact for specific types of incidents. During this phase, you’ll need to enact your communications plan to alert proper teams and stakeholders.
- Documentation of findings. The situation will likely be frantic, so organization is critical. You want to keep detailed notes of your findings, actions taken, chain of custody, and other relevant information. This activity helps with your post-mortem reporting later on and also helps with keeping track of important details that can assist in tracing the attack back to its origin. In addition, many incidents require reporting to law enforcement or other external parties; thorough notes and a strong chain of custody help support any investigations that may arise.
Depending on various factors (the nature of the incident, your industry, or any contractual obligations), you may be responsible for notifying customers, law enforcement, or even US-CERT (part of Department of Homeland Security). Make sure that you keep a comprehensive list of parties to notify in case of a breach.
Containing security incidents
The last thing you want to deal with during an incident is an even bigger incident. Containment is extremely important to stop the bleeding and prevent further damage. It also allows you to use your incident response resources more efficiently and avoid exhausting your analysis and remediation capacity. Some common containment activities include- Disabling Internet connectivity for affected systems
- Isolating/quarantining malware-infected systems from the rest of your network
- Reviewing and/or changing potentially compromised passwords
- Capturing forensic images and memory dumps from impacted systems
Eradicating security incidents
By the time you reach this phase, your primary mission is to remove the threat from your system(s). Eradication involves eliminating any components of the incident that remain. Depending on the number of impacted hosts, this phase can be fairly short or last for quite some time. Here are some key activities during this phase:- Securely removing all traces malware
- Disabling or recreating impacted user and system accounts
- Identifying and patching all vulnerabilities (starting with the ones that led to the breach!)
- Restore known good backups
- Wipe or rebuild critically damaged systems
Recovering from security incidents
The objective of the Recovery phase is to bring impacted systems back into your operational environment and fully resume business as usual. Depending on your organization’s IR Plan, this phase may be closely aligned or share steps with the Eradication phase. It can take several months to fully recover from a large-scale compromise, So you need to have both short-term and long-term recovery objectives that align with your organization’s needs. Some common recovery activities include- Confirming vulnerabilities have been patched and fully remediated
- Validating systems are functioning normally
- Restoring systems to normal operations (for example, reconnecting Internet access, restoring connection to your production network, and so on)
- Closely monitoring systems for any remaining signs of undesirable activity
Conducting a security incident post-mortem
After your systems are back up and running and the worst is over, you need to focus your attention on the lessons learned from the incident. The primary objective of the Post-Mortem phase is to document the lessons and implement the changes required to prevent a similar type of incident from happening in the future. All members of the Incident Response Team (and supporting personnel) should meet to discuss what worked, what didn’t work, and what needs to change within the organization moving forward. Here are some questions to consider during Post-Mortem discovery and documentation:- What vulnerability (technical or otherwise) did this breach exploit?
- What could have been done differently to prevent this incident or decrease its impact on your organization?
- How can you respond more effectively during future incidents?
- What policies need to be updated, and with what content?
- How should you train your employees differently?
- What security controls need to be modified or implemented?
- Do you have proper funding to ensure you are prepared to handle future breaches?
For a more in-depth review of handling security incidents, refer to NIST’s Computer Security Incident Handling Guide (SP 800-61).