Continuous Discussions Podcast

DevOps and Continuous Delivery are essentially a Journey – how do you continually optimize your software delivery process to improve IT and organizational performance?

On Tuesday I participated in an online panel on the subject of Continuous Improvement, as part of Continuous Discussions (#c9d9), a series of community panels about Agile, Continuous Delivery and DevOps. Watch a recording of the panel:

Continuous Discussions is a community initiative by Electric Cloud, which powers Continuous Delivery at businesses like SpaceX, Cisco, GE and E*TRADE by automating their build, test and deployment processes. 

The discussion was a lot of fun. There is a transcription below of some of what I was saying, and there were a lot of valuable points made by the other members of the panel, so I’d recommend going ahead and watching or listening to the episode and subscribing to the podcast.

Key prerequisites for continuous improvement

One of the best prerequisites to have in addition to what was just said is a goal that you can't achieve with the current organizational design that you have. Basically, all of the stuff that you are doing to save time by having more efficient processes is measured in a cost saving ofX number of days of Y rate and so on… But if you can go ahead and say that we can deliver this one project that's worth tens of millions or even hundreds of millions and we couldn't have done before then the organisational improvements are worth tens of millions or hundreds of millions, then that is the thing which can give people really big motivation.

Organizational Design

One of the things that helps a lot is being able to see everything that is going on, such that you can actually have that idea of decentralized command of a project where decisions can be made at the appropriate levels without it having to come back to one person who tries to dictate everything.  A theme of a lot of the stuff that I've been doing with my scrum master hat on is basically to get everybody to publish to the team the list of things that left for them to do in the current sprints. The scrum master can go and help people and say "don't forget that you have got this bug to fix here," but the more can you get individual people to essentially go and publish out to the team saying I've got this list of tasks to do, I found this one problem and straight away they are saying I need help with this one kind of thing, and then because they aren't being told what to do, they’re getting a bit more creative like finding out how they’re going to go out and solve all of those problems. That helps a lot, having something which isn't the centralized plan dictated by a leader, it's something that reflects a democratic nature of what people think is the list of things they need to be doing.

One thing I will talk about as well is that once you have done that, you can actually see a lot of the hidden shadow processes that is going on but people didn't talk about because they are maybe a little bit embarrassed about. We've got this system over here that we designed but actually it needs this one guy to spend two hours a day poking it with a stick, otherwise it won't work. You find out about some of that stuff and then you can at least decide what to do about it. Publishing that stuff as interfaces can help the organisation admit what’s going on. If I want a Redshift cluster I'm not going to have to persuade AWS to give me one - there are defined interfaces for it and it makes it a straightforward process.

Communication

Ages ago I worked with a team that had a rule where to call a meeting you had to have a purpose, agenda and preparation in the notes area of the meeting invitation. You weren't allowed to set a meeting with your team unless you had that. Actually I've been using that a lot myself, it’s a quick trick and it’s extra good to get other people doing that as well.      

Regarding people's expectations when it comes to communicating what your plan is going to be, I remember when I started out as a consultant, hearing about billing system migration, and the consultants were going to be stuck there for two weeks around cutover, so that they would be ready to pounce on a problem if a problem came up. Because it was that much money riding on a smooth transition across this thing,the cost of the consultants sounds like a big number when you're writing it down but in comparison with not having the right people available to solve the problem on time, it's a no brainer. Of course we are then going to have those people on standby and an actual fact is that in that project they didn't had much to do because it went quite smoothly, and that’s fine too.

I think that maybe what is fundamental to that is that the stuff you are doing, if it is valuable enough then you can afford to say – "you know, this is a bit more than a five day thing, let's just put eight days in the plan, don't have to say that it's a six day thing." When you are clear on the benefits and they are that big, then basically you can call out that stuff in advance, set those expectations, that means you can afford that smooth transition, because the unexpected will happen.

Regarding new communication mechanisms people are using today - we use Slack quite a lot, and actually we wanted to test whether it could enable connectivity from an office when there was nobody there from the DevOps team, so I just went on the Slack channel and said – "can anybody see my server from this office" and nobody was in that office and then one guy said - "it's actually two minutes from my house, so I'll go over to that office." I needed to drive over there that evening, instead I went on the Wi-Fi, said it doesn't work and then we adjusted some settings and it did work. That low cost way to ask a broad team is an easy way to ask people if they can help me, because it was easy for him to help me and we didn't have to do meeting to discuss whether he would help me or anything like that. I’d never talked to that guy before, so definitely a big win there.   

How does automation help with continuous improvement? 

One of the things we used a lot when doing testing on data warehouses is that you often start with a big challenge – it's not easy to build up the environments and create a whole new database and put the data in there, that can be quite challenging. The approach that I would say works really well in the data warehousing space and I recommend very strongly, is what we call "passive testing". What we do is we say what are the kinds of checks you can do on data when it's actually in production or it's in a big integration environment or it's in a development environment, things like does this particular table have no duplicates in it, if there is a set of records over here do I have a corresponding set of records over there that match them. 

Those kinds of checks are nondestructive - essentially, you are passively observing the environment without changing it, unlike those kinds of testing where you provision a new environment, put the data in it, run some processes and look at the results, which is what you need for a full end to end process, but a neat thing about these passive tests is that all of that effort that you invest in creating those passive test definitions, aren't just used during the development and test process they are actually set up so you can have running all of the time on your production environment, so it's actually a lot easier to get buy-in with people who just want to know if their data is right, what you can do is to say we've got this series of automated checks that we can do which will allow you to see that the production data is right but we also run this on our integration environment and our development environments to make sure that we aren't breaking anything before we push the stuff out there and. 

Of course it's a lot simpler because you haven't had to solve all of those problems, like how do we make this particular deployment tool play nicely with end to end deployment automation. So what I would say from a data warehousing point of view, an automation recommendation would be - start with these kinds of checks that you can use to make sure your data is right and do them in a way that you can use them on production and on all of your development environments, and then have Jenkins control all of it, because Jenkins is a good tool for that, also a test framework tool called DbFit, that works really well with data warehouses.

Blameless Culture

Interesting idea on that one is that when something goes wrong, if people are stepping forward saying, “I can see what went wrong, I would do that differently, can people show me how to do that thing,” people are stepping up and saying I was involved in that thing and it didn't go quite right and I want to change it, so the idea of blame becomes irrelevant. So there is no point in saying this is Bob's fault and Bob is already there saying, “I'll find out what went wrong, I need someone to teach me how to do this kind of thing.” 

The idea is if people are stepping forward and saying, I want to change this stuff, the idea of blaming people kind of goes away. Another thing I would say is the idea of that net negative programmer, the person who is in the team and, perhaps not deliberately, but he or she doesn't produce any useful output, that can happen just because the way stuff is set up. For example if the way deployments work is just so inefficient that the person in that position can get next to nothing done, then the idea of blaming that person for not getting anything done is kind of useless. The problem is with the process and with the organization itself. When you have got processes that are stronger, the people become a lot more valuable and everyone has got a much better credit balance with everyone else in the team. Even though they may do things wrong from time to time, overall even people who mess things up are still valuable, and that can be a function of how well the team is set up and structured.