Episode 2: Risks
In Episode 1 of this series I talked about oversight of cybersecurity threats and how a Board can engage with senior management to determine the mission of the cybersecurity department and prioritize testing and analysis. Now it's time to move on to cyber risks and what level of detail is appropriate in a Board room. To steer our conversation let's use a Remediation Agility chart. Similar to the Threat Objective portrayal in Episode 1, this visual is meant to sit on the screen for a while and support many conversations on a single slide.
That is a busy visualization. The good news is that it's the only slide you need in risk. A good visual can support many stories with the same picture to avoid losing the audience's concentration for 5 minutes every time you flash a new chart. The remediation agility chart is underpinned with a risk register. A risk register is a table or database of specific risks . This should not contain general concepts like "ransomware", "hacking", or "insider threat". It should be specific. Examples might include "3 production servers have a vulnerable version of the log4j library" or "engineers are not trained on secure coding practices". A good litmus test for a potential risk register entry is whether or not it could be listed as a finding in an audit, test, or examination, or whether or not there will be a specific remediation tasks assigned to fix it.
The Remediation Agility chart summarizes risks at a level that is useful for governance and oversight. First, this is capturing only risks rated as Critical or High in urgency. Lower risks must be remediated on their timeline as well, but for oversight it is helpful to focus on the top issues. Second, the stacked red area is communicating the number of risks that are overdue. This implies a due date based on the criticality, and requires your team to have set criteria for scoring risks and set service level agreements - SLAs - for expecting them to be remediated. Next, we see columns depicting immediate risk identification or remediation in a given period in the foreground, while the stacked graph in the back shows the resultant count of aggregate open risks at each date with an emphasis on those exceeding the due date.
Using this example we can tell a few stories for our fictitious company. While the company begins with 17 critical or high risks that are known, but "on time" for remediation, February sees a new risk identification control onboarded that generated a healthy tranche of new important risk discovery. These risks enter the register in blue (on time) but many of them degrade into red over the next two months as they are not remediated within their prescribed timeline. What this sequence is visualizing includes the fact that a bug bounty program identified previously existing but undetected risks, and that the organization is playing catch-up to get them closed out. As we get to April the organization has about the same number of high or critical risks as it did in February, but the percentage of them that are overdue has grown dramatically. You can see the value of this tale in governance since it shows a failure of remediation agility, which is essential in cybersecurity. In a world where every holiday Friday night seems to bring news of an urgent exploitable vulnerability, an organization who can measure and improve remediation agility in "peace time" will be best prepared to deal with the inevitable. Going back to our sample graph, a wise Director may question the nature of risks that went un-remediated during the red period.
Continuing through our hypothetical example we can see a period of relative calm in June disrupted by an acquisition. As the acquisition closes, a number of risks are brought onto the risk register. Those risks stay in blue until the SLA is exceeded, at which point they turn red and grab attention.
In this hypothetical organizations we end the year on a positive note, with major risks remediated and only a normal cadence of 9 risks and 1 overdue.
Speaking of normal cadence, it's worth discussing the concept of velocity. While an accumulation of overdue risk is alarming, equally concerning is the concept of not seeing substantial risk detection. The velocity of risk identification is more likely to generate concern by being too low - especially in a startup. A flat or empty risk register with no critical or high findings is most likely evident of insufficient risk assessment activity. For a startup, bug bounty programs, cloud security posture management, and attack surface management tools all generate findings rapidly with minimal operational risk. In today's environment of constant adaptation to evolving adversaries it is highly suspicious for an organization to not find issues. A healthy program should be finding items but fixing them.
Finally, let's have a word on this risk register we keep mentioning. A risk register is simply a table, and like any good set of records it will have fields determined based on the reporting expected to come out of it. To support generation of our remediation agility chart, a risk register must record accurate timestamps of when risks are detected and remediated in addition to the assessed urgency. A sample view of useful risk register fields with some sample entries is shared below:
Conclusion
An effective program needs to operate a dependable, consistent risk management program, which requires effective risk detection as well as remediation. A mature program is not risk free, but detects and addresses risk quickly. Reporting no urgent findings is a red flag, and reports should focus on the health of risk discovery tools and programs as well as remediation agility.
Stay tuned for episode 3, where we will talk about incident reporting.