Episode 2: Risks
In Episode 1 of this series I talked about oversight of cybersecurity threats and how a Board can engage with senior management to determine the mission of the cybersecurity department and prioritize testing and analysis. Now it's time to move on to cyber risks and what level of detail is appropriate in a Board room. To steer our conversation let's use a Remediation Agility chart. Similar to the Threat Objective portrayal in Episode 1, this visual is meant to sit on the screen for a while and support many conversations on a single slide.
That is a busy visualization. The good news is that it's the only slide you need in risk. A good visual can support many stories with the same picture to avoid losing the audience's concentration for 5 minutes every time you flash a new chart. The remediation agility chart is underpinned with a risk register. A risk register is a table or database of specific risks . This should not contain general concepts like "ransomware", "hacking", or "insider threat". It should be specific. Examples might include "3 production servers have a vulnerable version of the log4j library" or "engineers are not trained on secure coding practices". A good litmus test for a potential risk register entry is whether or not it could be listed as a finding in an audit, test, or examination, or whether or not there will be a specific remediation tasks assigned to fix it.
Director Question: Do we keep a table of our cyber risks in one centralized location, regardless of the tool or process that discovered them?
Good Answer: Yes, and while many of our risk discovery tools provide their own dashboards, we value recognize those tools will come and go. Our consolidated risk register communicates an apples-to-apples view regardless of risk source.
The Remediation Agility chart summarizes risks at a level that is useful for governance and oversight. First, this is capturing only risks rated as Critical or High in urgency. Lower risks must be remediated on their timeline as well, but for oversight it is helpful to focus on the top issues. Second, the stacked red area is communicating the number of risks that are overdue. This implies a due date based on the criticality, and requires your team to have set criteria for scoring risks and set service level agreements - SLAs - for expecting them to be remediated. Next, we see columns depicting immediate risk identification or remediation in a given period in the foreground, while the stacked graph in the back shows the resultant count of aggregate open risks at each date with an emphasis on those exceeding the due date.
Using this example we can tell a few stories for our fictitious company. While the company begins with 17 critical or high risks that are known, but "on time" for remediation, February sees a new risk identification control onboarded that generated a healthy tranche of new important risk discovery. These risks enter the register in blue (on time) but many of them degrade into red over the next two months as they are not remediated within their prescribed timeline. What this sequence is visualizing includes the fact that a bug bounty program identified previously existing but undetected risks, and that the organization is playing catch-up to get them closed out. As we get to April the organization has about the same number of high or critical risks as it did in February, but the percentage of them that are overdue has grown dramatically. You can see the value of this tale in governance since it shows a failure of remediation agility, which is essential in cybersecurity. In a world where every holiday Friday night seems to bring news of an urgent exploitable vulnerability, an organization who can measure and improve remediation agility in "peace time" will be best prepared to deal with the inevitable. Going back to our sample graph, a wise Director may question the nature of risks that went un-remediated during the red period.
Director Question: Give me an example of the difference in a critical, high, or medium risk.
Good Answer: See the appendix for our formal definition, but one of our critical risks was a bug bounty finding where someone on the open Internet could have bypassed authentication and accessed our sensitive data. One of our high findings was similar, but the system was only accessible to employees. A medium finding included an unpatched operating system vulnerability that could only be exploited by a user local to the system.
Continuing through our hypothetical example we can see a period of relative calm in June disrupted by an acquisition. As the acquisition closes, a number of risks are brought onto the risk register. Those risks stay in blue until the SLA is exceeded, at which point they turn red and grab attention.
Director Question: Why did we exceed our SLAs and have so much challenge remedating high and critical risks during this period?
Good Answer: The majority of risks visualized in this period were associated with a recent acquisition onboarded into our bug bounty program. While we were concerned about this latent risk, as discussed previously we have kept this subsidiary at arm's-length until we can migrate them into our secure cloud environment, which was too large of a project to complete within 30 days. We accepted that increased risk until October, at which point the migration was complete.
In this hypothetical organizations we end the year on a positive note, with major risks remediated and only a normal cadence of 9 risks and 1 overdue.
Speaking of normal cadence, it's worth discussing the concept of velocity. While an accumulation of overdue risk is alarming, equally concerning is the concept of not seeing substantial risk detection. The velocity of risk identification is more likely to generate concern by being too low - especially in a startup. A flat or empty risk register with no critical or high findings is most likely evident of insufficient risk assessment activity. For a startup, bug bounty programs, cloud security posture management, and attack surface management tools all generate findings rapidly with minimal operational risk. In today's environment of constant adaptation to evolving adversaries it is highly suspicious for an organization to not find issues. A healthy program should be finding items but fixing them.
Director Question: Are we running a bug bounty program, or conducting red team assessments based on the threat objectives we identified?
Good Answer: We maintain a bug bounty program to continuously test our external facing assets and notify it when we make major platform changes. We recently conducted a red team engagement based on our top ranked inherent risk, financial fraud. We used intelligence from an attack at a peer organization and an "assume breach" methodology where we gave the testers some advantage to compress time, and the results fed into our risk register. Findings from all of the above contributed to the risk detection activity in our remediation agility chart, and we are pleased to report all critical and high findings were closed without our SLA.
Finally, let's have a word on this risk register we keep mentioning. A risk register is simply a table, and like any good set of records it will have fields determined based on the reporting expected to come out of it. To support generation of our remediation agility chart, a risk register must record accurate timestamps of when risks are detected and remediated in addition to the assessed urgency. A sample view of useful risk register fields with some sample entries is shared below:
An effective program needs to operate a dependable, consistent risk management program, which requires effective risk detection as well as remediation. A mature program is not risk free, but detects and addresses risk quickly. Reporting no urgent findings is a red flag, and reports should focus on the health of risk discovery tools and programs as well as remediation agility.
Stay tuned for episode 3, where we will talk about incident reporting.