Information Safety

Improving technology through lessons from safety.

Cybersecurity NTSB

As I work through cataloging presentations I’ve done this week, I’ve come across a few that I haven’t yet posted here (or on https://transvasive.com). I’ll be posting them here over the next three days.

One of the “missing” talks was a short slide deck I put together as part of a “Papers We Love” discussion on Learning from Cyber Incidents: Adapting Aviation Safety Models to Cybersecurity, a paper published by a working group organized by Harvard’s Belfer Center to explore the concept of creating a “Cyber NTSB”.

I came across this paper having met one of the lead authors, Adam Shostack. Adam especially has been interested in creating a “Cyber NTSB”, an idea we share, although I likely take a broader interest in adapting safety to cybersecurity.

The paper is well written and the workshop seemed well thought out, as it included presentations from people actually working at the NTSB, grounding the discussion in work-as-done instead of work-as-imagined at the NTSB. It also included a session led by the psychologist and safety scientist David Woods on cross-domain learning; as I discovered in my studies, safety doesn’t translate directly (for example between aviation and marine safety). The findings are sound and follow current safety science thinking and are included in the slides.

For me, the practical takeaways were and remain:

  • A recurring theme is discussion of blame, and how NTSB specifically avoids assigning liability in accident investigations, as avoiding blame improves learning
  • There are domain-specific challenges unique to Security; don’t blindly copy what works in aviation safety
  • Near Miss reporting is an important complement to incident investigation; share stories of the close calls

You can download a copy of the slides here.

comment

Secure360 2024

Today I spoke at Secure360 2024! My talk, Security Differently, builds on my earlier post on the topic.

The session was very well attended, so if you were there, thank you! We filled the room - standing room only - 150 people checked in. I got some great and challenging questions at the end, I appreciated everyone’s engagement!

Here is the link I shared in the QR code at the end: https://bento.me/jbenninghoff.

Session Description

Cybersecurity, especially traditional security, has stagnated; adding security controls hasn’t appreciably improved outcomes and we continue to struggle with basic problems like vulnerabilities. Safety faced a similar problem 10-15 years ago; scientists and practitioners saw that safety outcomes were stagnant and concluded that the traditional method of avoiding accidents through centralized policies, procedures, and controls was no longer driving improvements.

I believe we’re seeing the same thing in security: historically, we’ve focused on constraining worker behavior to prevent cybersecurity breaches, and the limits of that approach are becoming increasingly clear. Adapting concepts from Safety Differently and Safety II offers a solution, by supporting success and focusing on positive capacities. In this talk, I will present practical advice on how to create a security program based on modern safety principles using evidence from both security and safety, and how it changes the role of the security professional.

Slides

My slides with notes, including references, are here.

Video

While the talk was not recorded, I did create a short video to promote the talk, you can watch that here.

comment

SRE and Security Aren't Safety... Yet

I’m a long-time listener of the Safety of Work Podcast, hosted by David Provan and Drew Rae, both safety scientists who have worked in industry. Very early on, when listening to the first episode, it struck me how much the podcast could be applied to cybersecurity and reliability - taking the first part of the transcript from Episode 0 and replacing “safety” words with “security”, we get:

“There’s a lot of philosophical arguments about what [security] is and how we achieve [security]. Ultimately, no matter how we define it, [security] is something that comes from operational work. People are kept [secure] or get [breached] because of how work is done where it’s done, who it’s done by, what’s it done with, and what’s it done to.

That’s something that is easy to lose sight of when we are doing [security] work. Most [security] practice the stuff that [security] people do is at least one step removed from the operational work itself. Managers and [security] practitioners don’t do the operational work. They try to influence it using a wide variety of [security] tools and practices.

That’s really what we are here to talk about, is talk about the tools, talk about different practices, and talk about the evidence of what works and what doesn’t work. Where things sometimes get mixed up is people start thinking of the tools and practices themselves and [security]. They get confused between the goal-keeping people [secure]-and the means to that end, which is also called [security].”

Which still works for both security and Site Reliability Engineering (SRE)! I’ve found that most episodes have lessons that directly translate to information risk practices (confidentiality, integrity, and availability). However, every so often there is a podcast that doesn’t fit. Episode 111 is an example that shows how SRE and Security aren’t Safety, at least not yet.

Episode 111, “Are management walkarounds effective?” examines a common safety management practice that has no analog in either security or SRE, the leadership safety visit. For those not familiar, the practice is fairly self-explanatory: an organizational executive (think CEO, COO, or other senior leader not in safety) visits a site with a focus on safety, typically including a site inspection and safety conversations with workers. As with most episodes, Drew and David review a paper, this time one titled “The Effectiveness of Management‐By‐Walking‐Around: A Randomized Field Study.” (Open Access PDF) As far as I know, there is no comparable practice in either security or reliability - I’ve never experienced or heard of a senior technology leader (outside security) taking time to review and discuss security with front-line staff (well, maybe, more on that later).

At the end of the episode, Drew summarizes the answer to the question, “Are management walkarounds effective?” as “sometimes yes, sometimes no.” While the study didn’t show that implementing a specific safety walkaround program had a significant impact on staff perception of safety performance, there were still some interesting findings from the paper, which I would summarize as:

  • Leaders that took action on a safety problem raised during the walkarounds had a positive effect where leaders that spent time prioritizing issues, or those who later decided they weren’t able to take action had a negative effect
  • Other studies showed a strong positive effect on the staff directly involved in the walkaround (those who spent more time with the leader)

So, how could we adapt this to security or SRE? Building on the key takeaways from the podcast, I would suggest:

  • Having senior technology leaders perform a walkaround in support of security or SRE can have a positive effect, especially to those directly engaged
  • When creating a leadership visit program, be deliberate about what you’re trying to influence - is it for the leaders to understand the work better, supporting continuous improvement, something else or some combination of goals?
  • It is important for leaders to listen, own, and take action in response to challenges and solutions raised by staff, instead of delegating responsibility, which shows leadership commitment
  • Prioritization is less important than action - picking an issue and fixing it is more helpful than spending time deciding what to work on

As I mentioned, while I haven’t seen a walkaround in technology, I have seen the impact of a senior leader taking an active role in security and availability. In my case, a software development executive decided to make both security and availability a priority, and actively supported both by listening, supporting, and most importantly, taking action to improve organizational effectiveness. Hearing from him was more impactful than hearing from the CISO or head of infrastructure, especially for the developers on his team.

While SRE and security aren’t safety, we should strive to close the gap by adapting lessons from safety to technology.

comment