Prioritizing Your Work

 Prioritizing your work is absolutely critical. You can do all kinds of things, but if you don’t with some regularity do the right things you won’t be moving forward. A couple of weeks ago in a round-table discussion, someone mentioned prioritizing work based on the principle of maximum benefit for minimum effort. That reminded me that I needed to write up some thoughts and a job aid that I created a few years ago related to that very valuable principle. 

The theory underlying it is really intuitive and straightforward: ensure you maximize the benefit you get from any unit of work. There’s just two problems with that: 1) how do you determine how much work something is, and 2) how do you determine how much benefit you get from the work?

Work estimation is not exactly our strongest point in IT. There was a joke when I started out in software engineering back in,... well, it doesn’t matter when. To estimate how long a software project would take you ask the developer. Then you take their answer, double it, add three, and change to the next higher unit of time. There are days I don’t feel we have gotten any better at this. 

Estimating Work

If your organization uses agile you are probably familiar with story points. I find this a pretty reasonable way to do this. Typically people use either t-shirt sizes: small, medium, large, extra large; or fibonacci numbers, 1, 2, 3, 5, 8, and so on. Fibonacci has a distinct advantage in that it allows for computation, which we shall see later, and that it doesn’t have an upper bound. I typically find software organizations using fibonacci and operations orgs using t-shirt sizing, but that’s a fairly large generalization. Which one you use doesn’t actually matter that much. They are largely interchangeable. For the rest of this article, let’s assume we use fibonacci. 

Once you get to the resource planning, you also need to ascribe work hours to these numbers. They are rarely straight translations, but it is important. Each person works, roughly, 2,000 hours per year. Depending on the role, some portion less than 100% of that can be allocated to planned work. The remainder consists of meetings, coffee breaks, discussions with peers, interrupt requests, and other unplanned work. If you are looking at an organization that largely does planned work, you may be able to assign planned work to 80% or even more of their time. If you are working in product security engineering your time may be primarily interrupt driven and based on requests. The time available for planned work may be only 40% or so. Keep this in mind in your planning.

For resource planning you also need to convert the story points to hours. How you do that is entirely up to you and depends on the work you do, how complex your stories are, and so forth. A couple of extremes could be:


Story Points

Hours of Work

1

1 FTE, 1-4 hours

2

1 FTE 1-3 days

3

1 FTE, 1 week

5

1 FTE, 2 weeks

8

2 FTEs, 2 weeks

or

Story Points

Hours of Work

1

1 FTE, < 1 week

2

1 FTE, 2 weeks

3

Multiple people, 2 weeks

5

Entire team, more than 1 sprint

8

Multiple teams, multiple sprints


The dirty little secret here is that it doesn’t actually matter what units you use, as long as you are consistent in how you use them, and as long as you can find a reasonably good fit for all of your work! It also is not critical to always be 100% accurate. Frankly, we tend to get WAY too wrapped around the “but is this really correct” axle. It’s more important, in my opinion, to move the work forward and learn to be smarter next time than it is to make sure we get everything exactly right the first time. 

Impact Of The Work

The next step in prioritizing according to the principle of least effort for the maximum benefit is to create some scheme for determining the benefit, or impact, or severity of the work. You can get complicated and fancy here, but for most purposes, it really isn’t worth doing. You’d be better off just going with a simple four or five level scheme. Most ticketing systems tend to use a severity or impact scale that has five levels. Let’s use an impact scale. For security, that tends to translate to the security impact if the issue is left alone. For pure security work, you often see a four level scale: low, medium, high, and critical. 

For our purposes, I’m actually going to consider a fifth level: ultra-critical. To understand why, let’s put some actual words together to describe what can go into each level. Let’s also include some sample Service Level Agreements (SLA) for resolving these issues. 

A number of years back we were inundated with high-severity tickets because the climate control system wasn’t up to people’s expectations, so I threw in the severity descriptions for air conditioning I created to explain why paging people because your office is cold is considered impolitic. They help explain the relative severity levels. 

  • Low - I’m a little chilly

    • Basic security hygiene practices.

    • Issues that enhance compliance posture but that do not impact security in a meaningful way.

    • Issues that simplify assessing the environment.

    • Issues that do not result in lasting security impact and do not disrupt customers.

    • Issues that provide security guidance to customers.

    • Issues that are unlikely to result in data compromise for customers.

    • CVSS issues that score 0.1 - 3.9.

    • SLAs

      • Acknowledgement: 14 days

      • Completion: 90 days

  • Medium - I need a sweater, stat!

    • Poor security practice that simplifies lateral movement.

    • Failure to properly protect need-to-know or business confidential data.

    • Issues that enable an attacker to discover opportunities for lateral movement.

    • Enables recoverable data compromise of one or a few customers.

    • CVSS issues that score 4.0 - 6.9.

    • SLAs

      • Acknowledgement: 14 days

      • Completion: 30 days

  • High - My digits just went numb and I can no longer type this ticke...

    • Together with other issues, could be used to effect complete compromise

    • Enables privilege escalation and/or lateral movement once an attacker has achieved initial compromise.

    • Enables unrecoverable data compromise of one or a few customers.

    • Enables recoverable data compromise of all or most customers.

    • Would result in failure of a future audit.

    • CVSS issues that score 7.0 - 8.9.

    • SLAs

      • Acknowledgement: 48 hours

      • Completion: 14 days

  • Critical - There are icicles forming in the office

    • Enables, by itself or with little effort, initial compromise resulting in wide-spread or complete compromise the organization

    • Enables unrecoverable data compromise for all or most customers.

    • Unless addressed, this issue can result in failing an on-going audit.

    • CVSS issues that score 9.0 or higher.

    • SLAs

      • Acknowledgement: 4 hours

      • Completion: 48 hours

  • Ultra-critical - A new ice age has dawned and a glazier just razed HQ

    • The organization is under active attack or an attack is imminent.

    • Anyone can be drafted to help address the issue.

    • The service is currently down.

    • Incident response plan is activated and emergency funds are available for addressing the issue.

    • Zero-day issues that are widely exploited.

    • SLAs

      • Acknowledgement: 15 minutes

      • Completion: work-until-fixed

There is a lot to unpack in this list, and this is not a straight lift to all organizations but more of a sample. Your specific organization’s mission will guide your narrative as well, but having concrete examples is extremely helpful. Notice that no matter how serious a security vulnerability is, you may want to consider reserving the highest impact - the drop-everything-and-do-this-now impact - for things that absolutely cannot wait on anything. Every month vendors release CVSS critical patches. If you call an emergency every month to patch those issues and start paging people, you will soon have a lot of very upset engineers calling you. You also risk finding out that you will not be listened to when you actually have a compromise. We all know that the second Tuesday of the month is Patch Tuesday. There is no need to page everyone at 9:15 AM because patches were released. You can plan for those. Even for an out-of-band patch, we usually know that it is coming and we can plan for it. This does not mean that there are no drop-everything patches. Every now and then there is one that requires us to move fast, but again, most can be planned for even though we do not know exactly which patches will be released. 

Arriving At Your Prioritization

Once we have the impact definition and work effort estimation, we can finally prioritize issues. However, it takes a little bit of analysis to arrive at the right prioritization. For instance, do you prioritize a 2 work effort issue with high impact over a 3 work effort issue with medium impact?

A few years ago I came up with a way to do this analysis that is relatively elegant and easy to use. It involves assigning numeric values to the impact assignments and multiplying through. You also need to invert the Fibonacci numbers associated with your work effort so that the high numbers are associated with less effort, not more. Here is a sample:

As you can see from Figure 1, while we simply inverted the work effort estimates, we used a different scale for the impact. The reason is that by arbitrarily picking a scale where the values go up exponentially, rather than an inverted fibonacci scale, we create an interesting diagonal model. I have color coded it in the figure to make it clearer. The product of the two axes now becomes your prioritization, and as promised, you are now prioritizing lower work effort and high priority items. There is a diagonal that has issues that you probably want to accommodate in the schedule, marked in yellow, and issues that you probably don’t have the ability to focus on, marked in red. 

The exact cut-off values depend on your specific situation, how many resources you have available, and your risk tolerance. As a starting point for discussion, I decided that issues scoring above 10 need to get scheduled as soon as they can be scheduled. Issues scoring below 6 probably can’t get fixed. If you are flush with staff and cash, you may move the cut-off down, and if you are strapped, you may need to move it up. You can also tweak the numbers to make different results. For instance, you may decide to override ultra-critical issues and call them 100, to make sure they get prioritized over any critical issues. The model is flexible in this way. 

Using a decision making model like this you can create a system to prioritize your work in a way that ensures you prioritize work that has higher impact and lower effort. You can tweak it to suit your specific scenario. The purpose here is to ensure that you focus on the right things. 


Comments

Popular posts from this blog

U2F, FIDO2, and Hardware Security Keys

The Busy Executive’s Guide to Personal Information Security

Single Sign-On