In case you missed the big news in the industry this week, a GitLab employee accidentally deleted a ton of production data and took their platform down for hours. It was only when everything was on fire and they were in deep trouble that they turned to their backup systems… only to find that none of them actually worked.
Backup Prod Data Regularly
Not exactly a groundbreaking statement, right? Everybody knows this. If there was a “working in corporate IT 101” manual it would have a chapter on this concept. It’s common sense.
Even still, a lot of people and companies – like GitLab – tend to “set and forget” their backups. They probably created their backup mechanism years ago, tested it at the time, confirmed that it worked, and then scheduled it to run every night at 1am EST or something. Then, since it was out of sight and out of mind, they promptly forgot about it and moved on to other things. After all, they never had a need to check on it right? Nothing had broken down. Until yesterday.
A Guide To Good Backup Process
The secret to ensuring that your backup process is effective and functional is to integrate it into your daily work. One of the best ways to do this is to use it to set up a new dev’s local environment. Have them configure and install the IDE and related tools, and then have them pull down the most recent backup and restore from it to set up their local database. What’s that, you say? It has PII and sensitive data? You’re probably right, which is why your backup process should, as appropriate, create 2 copies: 1 that strips the data (for local dev env) and 1 that doesn’t (for prod restore).
Great, so you’ve confirmed that your backups work for a local environment, but what about production? The next step in a good process is simple too: artificially destroy your production environment regularly. Set up fail-over tests at off hours (and compensate your amazing site reliability / IT team appropriately for conducting these tests in off hours too). I recommend once per quarter as a starting point: at 2am on Sunday drop your production database (but don’t delete it, just take it offline so you can bring it back if you find out that your backup system isn’t working). Let your staff work to restore a recent backup and bring the site back online. Announce the outage in advance to your users, and update people on social media or via email when it begins and ends.
There is much to be learned and gained from this intrusive and destructive process. For one, you will force your dev team to create a good “the site is down” experience since your customers will otherwise see infinitely spinning web pages or terrible error dumps. Another is that you can time the average outage and thus discern how long you’ll be down if your production database ever actually takes a spill. Finally, your disaster recovery staff will be fresh on their skills and able to fix your real outages quickly and predictably. There are many tangible and hidden benefits derived from just a few hours of planned outage per year.
GitLab Did One Thing Right
The final step in your solid, functional backup process which you test quarterly and use to spin up new dev hires is to document the hell out of everything. When you do these planned outages, have the disaster recovery staff document, step by step, the actions taken to fix it. When you have real live outages, document those too and share the knowledge with the public.
GitLab got this part right, and are being heralded as a great example and learning experience in the industry instead of spited for mysterious downtimes and no communication. I promise you that this week, many disaster recovery people are doing extra backup tests that they wouldn’t have thought to do otherwise – all as a direct result of the GitLab incident. Making your disasters and their recoveries public creates goodwill in the community, provides a learning experience, and shows people that you can be trusted.
GitLab took a bad situation and created the best possible outcome, both for themselves and the entire community. For that they should be thanked, not mocked. After all, we are all human and we all make mistakes. Knowing this, you’ll be really glad that you practice making mistakes every quarter when your production database actually goes down in flames.
In part 2 of my series on dev team interactions, I’d like to talk about conducting good code reviews. Most dev teams will find themselves in a situation where code reviews are necessary, and in my experience many do them very poorly. I’ve even worked in companies that had such a negative code review culture that people left the review sessions upset, even considering quitting. With a few easy adjustments, you can quickly learn to conduct excellent and positive code reviews with your team.
The Ground Rules
A code review is a process. Like any good process, clear rules need to be established and followed to ensure a consistent experience. Here are mine:
- Attack the code, never the person. Criticizing code is OK, but people are not code. It’s never OK to criticize the person and make them feel bad. Focus strictly on the code output and never make it personal.
- Don’t laugh or make negative jokes. A person who has their work on display – often on a projector in front of others – is feeling self-conscious as it is. Don’t snicker at their work. Avoid joking about their decisions. I assure you they are trying to do the right thing.
- Set a strict time limit as a means of focus. Make the code review 15 minutes, 30 minutes, or even 1 hour. Stick to this schedule. This forces you to prioritize the important stuff and ensures (intentionally) that you can’t review absolutely everything the person has done. Don’t take unlimited time to scrutinize every single line of code written. Ever had someone comb through your code line by line, making commentary as they go? “And how did that make you feel?”
- Thank the person for sharing their code with you. A code review is an intimidating, scary thing – especially for developers on the junior side of the spectrum. Set the tone correctly by being appreciative of their time and sharing. Make it known that they are valued and you appreciate their work up front. This will help them feel relaxed and learn to enjoy code reviews, which in turn will cause them to want to share more of their code willingly in the future as well.
The above rules serve only as an example from my personal experience. You should create rules that work for your company and work culture. Use real and practical standards that matter to your team, not just theoretical ideologies that someone on your team read in a book. Always remember: a process is only as good as the people that follow it, so try to be consistent with whatever rules you decide on. And if they aren’t working well in practice, change them up!
Seek Understanding, Not Power
The general goal of your code review attendees should be to seek understanding, not explanations. Avoid an us-them conflict or standoff. This can be easily accomplished with a subtle shift in communication approach. Rather then asking aggressive questions that demand the reviewee explain the code to your team such as “why would you do it that way?” or “how could that possibly work?” you need to ask questions to promote understanding the code instead. Once your team and the reviewee correctly understand the code being reviewed, you can proceed to discuss a different approach without conflict or hurt feelings.
When you see some code that makes you think “what the hell were you smoking?” (explain) you should instead ask “Can you tell me why you wrote the method this way?” (understand). These statements are similar but the former gives the power to you while the latter empowers the code reviewee to share their knowledge and thought process in a non-defensive way. This approach to questioning takes a bit of practice, but is very powerful. Some other examples of how you’d change common code review aggressive thoughts and statements from explaining to understanding:
- “This class name is wrong” becomes “This class name doesn’t conform to our standards, was that intentional?” You’ll probably find that they say it wasn’t, and agree to fix it on their own volition.
- “I see a bug in your code!” becomes “I think there’s a null reference exception on line 28 for variable X, do you see how it happens?”
- “This is terrible” becomes nothing – keep your mouth shut. There’s no value in such a statement other than to make the person feel bad. If the code truly is terrible, express these concerns to the person (and maybe their manager) privately for further investigation and resolution.
People are not robots, and will never conform perfectly to your shop’s coding standards. That’s OK. Pick your battles, and call out only the major violations. Let the little things (like naming and spacing) slide. If the dev is hitting 90% of the standards, the rest of the team can pick up on the 10% deviation without much worry or effort. That is to say, the code won’t be that different than what they expect to see.
If you seek understanding consistently, you’ll find that the person drops their defenses and ego, and instead feels encouraged by your positive approach and attitude. They’ll even start pointing out and suggesting fixes to their code themselves – right on the spot – which is the golden sign of trust and confidence. The best kind of code reviews are the ones that a person points out issues themselves to fix, rather than the team having to do it for them.
Avoid Opinion Wars
When reviewing code, there will be two general categories of issue:
- Objective, fact-based issues
- Subjective, opinion-based issues
Focus explicitly on objective issues, and disregard all subjective issues. An objective issue is an exception or oversight in conforming to well-defined coding standards. A subjective issue is you not liking the way the person solved the problem, or feeling that what they did is not what you would have done. You’re right because you’re different people. Striving to make others conform to your thought process as the one true standard is egotistical, destructive, and stressful. In the long run you will be unhappy because there will always be a gap between others and yourself. This is natural so go with the flow. Allow members of your team to be individuals and write code with their own flair and flavor. It’s OK, it will be OK, and you will be OK. I promise.
Code is for computers, but programmers are humans. Be kind to each other and always remember that the thing on the other end of the code review is a real live person with feelings and emotions. Treat them with respect in what you say and do. Recognize that they are valuable and thank them for their hard work. Together you will create a great team that others will love being a part of!
I was working on my fireplace this past weekend. Specifically I had just finished ripping down the old surface to the red brick, and then preparing the brick surface with a layer of thinset for tiling. I spent all of Saturday cutting tiles and then placing them on the fireplace surround and hearth. Even with help it took 11 hours to do, and about 8 hours of it was measuring and cutting tiles.
While I was doing this work, which is just mindless enough that your mind wanders but requires just enough attention that it doesn’t wander freely, I began to recite a common trades mantra. Measure twice, cut once.
This quip – a practical saying – saturates the construction industry. Whether you’re a DIYer like me, or a professional tradesperson, it’s important to measure everything twice and do the work once. This saves you a lot of pain and time down the road, since you can double check your angles and distances and get everything right the first time.
The reason that this practice is important is as simple as considering a tile. Let’s say that I need a 3/4″ width tile, but I measure incorrectly and cut it to 1/2″. There’s no way for me to turn that 1/2″ piece back into a 3/4″ piece, so I just wasted that tile. I need to toss it out (if it can’t be used elsewhere) and cut a new tile to the correct measurement. In short, measuring twice saves you time and money.
As I stood above my trusty wet saw, cutting tile, after tile, after tile, my mind began to wander into the realm of programming. I began to realize something interesting. In my opinion, many IT departments have a policy of measuring twice and cutting once, with the supposed benefit of cost and time savings. One might even call this sort of approach waterfall or agile, where estimates are gathered in detail (measured) long before the work is done (cut).
I believe that this is a fallacy that ironically leads to even more work. Every single developer that I’ve ever met in my career, including myself, cannot accurately estimate anything. We sometimes get close, because we can relate the task at hand to a similar task we accomplished previously, but in general I find that a new task is very much an unknown and the time spent to gather an estimate is pointless since it’s wrong anyway. By measuring twice and cutting once, we waste a ton of time.
I believe that developers should measure once, quickly, for a rough estimate, and then cut. The reason that I believe this is due to a fundamental difference between programming and other kinds of work that is managed with processes and estimates.
Code is not a tile or piece of wood. It is a highly flexible, malleable, mutable, digital thing. If a developer cuts a feature short, they can add on to it later, expanding it seamlessly to the required size. If they overestimate a feature’s length, they can easily chop off the excess and move on to the next feature. There is no significant cost in quick, roughly estimated measurements for programming work.
Immediately your team will regain a ton of time in which they can do their development work. They won’t have to attend hours of planning meetings or requirements gathering sessions. They will just work to get things done as fast and accurately as they can.
The only tradeoff is a lack of estimates that management-types can cite and depend on. I would challenge that any estimates derived are very commonly wrong and useless regardless. More-so, if you do not trust your developers to do the right thing and use their time effectively, why do you keep them employed?
To me, a lot of the process models around development that are popular (waterfall, agile) are derived from the measure twice, cut once methodology. This approach is super practical to physical goods since inaccurate measurements are expensive, but this does not apply to development work. These meetings to gather estimates in the hopes of controlling costs ironically bloat budgets and help to deliver less code and extend goal dates and deadlines. You take people that are hired to code, and tie them up in meetings where they have to try and justify what they’re going to code by the hour. They don’t know how long it will take, but they will have a better idea after a few hours of coding – if you’d just give them a few hours of no meetings to code.
If you’re working on tiling your fireplace, measure twice and cut once. If you’re working on code, take a rough guess at the measurement and get to work!
I think that most devs would agree when I state that the definition of success in the corporate world of development places less emphasis on “good” code and more emphasis on “working” code. Working code is code that can be released to production on or before the deadline, regardless of performance or even bugs in most cases. As a developer, you ultimately feel as if you’ve failed when you toil for nights on end to meet steep deadlines and churn out crappy code. As a business, however, you’ve succeeded when you hit the deadline. My experience tells me that the typical metric upon which development teams are measured is often not quality of code or unit tests or even performance, but instead ability to meet deadlines and deliver solutions to clients. You’ve failed when you do not meet the deadlines and thus piss off the clients/customers. Your job has become a veritable boolean result with the outcomes of true and false. Deadline met? True. Deadline missed? False.
Doesn’t it feel awful to be measured in such a binary way? All or nothing, success or failure, deliver or delay. These are the only outcomes according to the people who write and sign your paychecks.
Why does this happen? A little introspective thought brings light to the subject, at least for me. The reason for these types of metrics becomes obvious when you consider their source. You work for a company who pays you (often with I.T. seen as a cost-center or “money pit”) to accomplish things which the company can then sell to clients. You’re an expensive tool by which they accomplish their means. Though these companies often see software as a profit source, they see the means by which they get the software as an expense and cost. Kind of strange, really.
The problem begins at the very core of the organization; the structure of the company is the starting point for guaranteed failure. In my experience, the dichotomy that forms in most companies is “I.T.” versus “The Business” in a bout-to-knock-the-other-guy-out. The minute you create this relationship of opposing fronts, you’ve already guaranteed development failure. With competing and contrasting goals (the business wants to sell stuff ASAP while I.T. wants to build stuff properly which takes longer) it is not possible for trust to exist within the organization. The Business will not believe a word that I.T. says when it comes to estimates, deadlines, or things that need to happen to ensure stability of the product in the future (technical debt). I.T. will not trust The Business to make rational decisions when it comes to features, development timelines and ensuring product quality. The result is a boxing match where each side is trying to force the other into compliance. Now you have conflict. Conflict dismantles good companies.
The Business is used to tracking their sales teams by metrics like “how many calls did you make/receive today?” and “how many sales did you make?” and “did you make X sales by Y arbitrary date?” where Y could be the end of each month. These are things that they understand, and thus like to control. Ask your favourite sales person for their opinion on the metrics by which their success is measured, and I am confident you’ll find that most will sum it up as “the faster I sell things, and the more things I sell, the more successful I am.” This makes sense from an empirical, see-the-figures-on-paper-on-my-desk-in-my-executive-office point of view, but I bet that the sales person in this case is not loving their life. A constant push to sell more, make more money, and do more. Any success in the future just raises the bar for the success which must follow. It’s a losing scenario. Eventually, they either get promoted out of the trenches of sales or they move to another company, resetting the bar which has been set too high. This buys them another year or two of raising that bar, until they ultimately repeat the process again.
Sales people who are put under the gun in such situations often resort to employing any tactic that they can to reach their goals… One of these strategies is saying anything at all to sign the client up. “Sure, the software can create rainbows and unicorns, just sign on the dotted line!” they say. It’s unfortunate, because customers who are hooked into these contracts tend to be very unhappy with the product when they find out that the software does not, in fact, create rainbows or unicorns. Or even a colour wheel and horses. It doesn’t even come close.
In the above case, The Business fails to measure the things that, long-term, make you the most money: client satisfaction and relationships. A good sales person (they definitely exist) is one that keeps the client happy with rational discussions and promises, and who is very transparent about what can and cannot be done and why. A great sales person is one the client loves so much that they’ll keep using your product, even when a better product exists, simply because they fear losing the relationship. This client is a client for life (or at least a long while) and makes you a lot of money. But how do you measure “happiness” and “relationships” long term? It’s a hard problem. Dating sites have been trying to solve it for over a decade. The Business will likely not dedicate the time and resource to do so themselves. So, they measure phone calls, sales, and other crappy metrics to ensure that the sales team are doing their job.
Here’s where we get back to the topic: developers and failure. The Business, who in most cases pays I.T. to create things to sell, employs these same arbitrary measurements when grading development teams. They often only know how to see success as a measured outcome of facts, and so they create the only measurements that they can empirically apply: features and deadlines. Does the dev team build all of the features and hit the deadline? Great. Do they not? Not great. These measurements themselves are acceptable (even good), but the combination of them (lots of features on short deadlines) is the problem.
The Money Talks
Where it gets tricky is in the realization that “show me the money” is how business ultimately tends to run. The sales people very overtly make the money, so they are seen as successful and important people in the company. The dev team also makes money, but is perceived to cost money, and they are seen as a cost-center that must be carefully weighed and measured to avoid excessive spending. What this leads to is an unhealthy practice of allowing sales people the freedom to employ any tactics necessary to land sales and make the money. In a business such as The Business as described, your life as a developer begins to suck.
To close the deal, the sales person will often promise the client almost anything about the software that you develop. They may promise new feature X by the end of the month, they may even promise 10 new features by the middle of the month. Whatever makes the client sign on. Then, the client says let’s rock and your quality of life drops sharply.
The very next thing that happens is The Business casually tries to confirm what seems obvious and even mandatory to them. “So your team will have these 10 things ready to go by the 15th, right?” they say. “This is a million dollar client, and it would be horrible if we lost them because you couldn’t deliver!” and now the pressure is on to do the nearly-impossible in virtually no time.
The dev team might try to politely push back and say that this is practically impossible, but The Business sees the dollars on the dotted line and will not listen. Flip the kill switch. Forego the QA time, all developers must focus on all of these features, day and night, so that the deadline can be met. Why? Because that’s how the team is measured. If the team doesn’t hit that deadline, they’ve failed and the million dollar deal is lost with the dev team seemingly at fault. Developers don’t want to work extra? Order in pizzas and promise them time-in-lieu as soon as the deadline is over with. Note that they will likely never actually see this time-in-lieu because right after this deadline will be the next one, with similar outlandish expectations and even tighter deadlines. And after that, another one. And another one. And the cycle will probably never end.
The Mad Production Dash
So, as the developer, you develop it as fast as you can. The code starts to resemble Frankenstein as you tack on bits and pieces to make it work ASAP. You subdue your ego and uneasiness about the quality of code by commenting // HACK: undo this crap later everywhere. Somehow that makes you feel better as it creates the slight glimmer of hope that eventually you’ll have enough time to come back and undo this monstrous pile of garbage. But you never will get that time, because the next deal is coming down the pipe. And so the code becomes worse. Your development effort completes 1-2 days before the arbitrary sales deadline, and after your QA team flips their lids on having 48 hours to test 1000+ hours of work, they do “critical path testing” to make sure it at least does something correctly and certify it as “good enough.”
The team releases to production early in the morning of the deadline day, and though it takes 5 hours because there are 17 untested things to fix on-the-fly (and realistically they have no option to abort the release or roll back because the consequences will be dire), they eventually shove the hacked up code out the door and declare it done. The Business shows their appreciation in the form of a short, impersonal e-mail that doesn’t name any person of achievement specifically. The development team is feeling underappreciated and pissed off.
What does the future hold for such a company? The code will probably spiral into bug-filled oblivion until it can’t do anything correctly or in any reasonable amount of time. Despite the weeks and months during which the development team pleaded with the business for time to clean up the technical debt, they are brushed off because taking time off of features loses clients and thus money. Then, as it starts to come crashing down in production, they suddenly beg the developers for a quick fix. “Do whatever it is that needs to be done!” they plead as they see their sales going down the drain. And now, because it is on fire and burning to the ground, the dev team is finally given a moment to pay back some of the technical debt that has been accrued during this vicious cycle. Repeat.
When a dev team has no say in the deadlines of the work they must do, they will usually fail. And when they are set up for failure from the start, they will likely get tired of being blamed for the problems without ever being given the time to devise the solutions. This leads to bad work culture, high turnover, and low productivity.
The way to guarantee dev team success is obvious at this point. It’s really as simple as trust between I.T. and The Business. They must keep each other in the loop as stakeholders. The Business has no product without I.T. and I.T. has no job without The Business’s clients. It’s a mutually beneficial relationship and it should be treated as such, rather than mutually parasitic.
A good company’s sales team will often consult with I.T. prior to promising any dates and deadlines when unknowns are involved. It is practical to ask the people responsible for a task how long it will take them to complete a task. This is much like how you might ask a waitress how long it will take for the food to arrive or a painter how many days they need to paint your home. This is a positive and productive discussion. Hallway conversations should become commonplace: “Hey dev team, I’ve got a client who wants to sign on but not until we build X, how long will that take?” The reply is as easy as “We’ll discuss it as a team and send you an estimate with some assumptions to confirm with the client” and just like that there’s a great working relationship that practically guarantees success. The team knows what work is coming, and also knows how long they have to complete it.
The Correct Measurements
If a dev team continues to fail in an environment where trust exists, then that team is likely not competent. They either cannot estimate correctly or cannot deliver within their own estimates. Sometimes devs suck at estimating because they’ve been making estimates under the oppressive sales gun for so long that they’ve effectively forgotten how to give themselves a fair amount of time. The blame for this remains entirely on the dev team, and they (or The Business) must repair the situation quickly and effectively to maintain the mutually beneficial relationship based on trust. As The Business owes I.T. input into the deadlines, I.T. carries the burden of being fair, accurate, and responsible with those deadlines.
Assuming that The Business now has a competent, skilled dev team, the question turns to the customers. If the customers do not like the estimates given to them, this may cost the company sales. Perhaps the customer wanted the impossible and The Business is giving them a dose of reality. Perhaps The Business does not want such a needy customer and they’re in a situation to be able to afford to tell them no thanks. Perhaps The Business realizes that the client’s request is reasonable but the timeframe of the estimate feels a bit long. In that case they can ask I.T. why. If the answer is not sufficient and justifiable, then perhaps the dev team is still not competent. No dev team should be let loose without checks and measures on productivity, but those metrics should be reasonable.
Ultimately, if you want to guarantee the failure of a development team, simply promise features to clients and customers without ever asking for (or trusting) the input of the team that is actually going to build those features. It’s just like telling the waitress that your food must be on the table in 10 minutes, without first asking the cooks how long it takes to safely and properly make it.
If this situation sounds familiar, try talking with The Business about it. Try to help them see it from your point of view. Ask them “how successful would you be if I demanded that you sell 20 new clients by Friday?” and perhaps some light bulbs will start to go on. Ultimately, we as developers often know nothing about sales and have no business dictating their measurable work expectations. They similarly have no business dictating ours, but a relationship of trust can be built to allow us all to work together and accomplish great things.