Operations Leadership Lessons from the Crowdstrike Incident

Much has been written about the whys and wherefores of the recent Crowdstrike incident. Without dwelling too much on the past (you can get the background here), the question is, what can we do to plan for the future? We asked our expert analysts what concrete steps organizations can take.

Don’t Trust Your Vendors

Does that sound harsh? It should. We have zero trust in networks or infrastructure and access management, but then we allow ourselves to assume software and service providers are 100% watertight. Security is about the permeability of the overall attack surface—just as water will find a way through, so will risk.

Crowdstrike was previously the darling of the industry, and its brand carried considerable weight. Organizations tend to think, “It’s a security vendor, so we can trust it.” But you know what they say about assumptions…. No vendor, especially a security vendor, should be given special treatment.

Incidentally, for Crowdstrike to declare that this event wasn’t a security incident completely missed the point. Whatever the cause, the impact was denial of service and both business and reputational damage.

Treat Every Update as Suspicious

Security patches aren’t always treated the same as other patches. They may be triggered or requested by security teams rather than ops, and they may be (perceived as) more urgent. However, there’s no such thing as a minor update in security or operations, as anyone who has experienced a bad patch will know.

Every update should be vetted, tested, and rolled out in a way that manages the risk. Best practice may be to test on a smaller sample of machines first, then to do the wider rollout, for example, by a sandbox or a limited install. If you can’t do that for whatever reason (perhaps contractual), consider yourself working at risk until sufficient time has passed.

For example, the Crowdstrike patch was an obligatory install, however some organizations we speak to managed to block the update using firewall settings. One organization used its SSE platform to block the update servers once it identified the bad patch. As it had good alerting, this took about 30 minutes for the SecOps team to recognize and deploy.

Another throttled the Crowdstrike updates to 100Mb per minute – it was only hit with six hosts and 25 endpoints before it set this to zero.

Minimize Single Points of Failure

Back in the day, resilience came through duplication of specific systems––the so-called “2N+1” where N is the number of components. With the advent of cloud, however, we’ve moved to the idea that all resources are ephemeral, so we don’t have to worry about that sort of thing. Not true.

Ask the question: “What happens if it fails?” where “it” can mean any element of the IT architecture. For example, if you choose to work with a single cloud provider, look at specific dependencies––is it about a single virtual machine or a region? In this case, the Microsoft Azure issue was confined to storage in the Central region, for example. For the record, it can and should also refer to the detection and response agent itself.

In all cases, do you have another place to failover to should “it” no longer function? Comprehensive duplication is (largely) impossible for multi-cloud environments. A better approach is to define which systems and services are business critical based on the cost of an outage, then to spend money on how to mitigate the risks. See it as insurance; a necessary spend.

Treat Backups as Critical Infrastructure

Each layer of backup and recovery infrastructure counts as a critical business function and should be hardened as much as possible. Unless data exists in three places, it’s unprotected because if you only have one backup, you won’t know which data is correct; plus, failure is often between the host and online backup, so you also need offline backup.

The Crowdstrike incident cast a light on enterprises that lacked a baseline of failover and recovery capability for critical server-based systems. In addition, you need to have confidence that the environment you are spinning up is “clean” and resilient in its own right.

In this incident, a common issue was that Bitlocker encryption keys were stored in a database on a server that was “protected” by Crowdstrike. To mitigate this, consider using a completely different set of security tools for backup and recovery to avoid similar attack vectors.

Plan, Test, and Revise Failure Processes

Disaster recovery (and this was a disaster!) is not a one-shot operation. It may feel burdensome to constantly think about what could go wrong, so don’t––but perhaps worry quarterly. Conduct a thorough assessment of points of weakness in your digital infrastructure and operations, and look to mitigate any risks.

As per one discussion, all risk is business risk, and the board is in place as the ultimate arbiter of risk management. It is everyone’s job to communicate risks and their business ramifications––in financial terms––to the board. If the board chooses to ignore these, then they have made a business decision like any other.

The risk areas highlighted in this case are risks associated with bad patches, the wrong kinds of automation, too much vendor trust, lack of resilience in secrets management (i.e., Bitlocker keys), and failure to test recovery plans for both servers and edge devices.

Look to Resilient Automation

The Crowdstrike situation illustrated a dilemma: We can’t 100% trust automated processes. The only way we can deal with technology complexity is through automation. The lack of an automated fix was a major element of the incident, as it required companies to “hand touch” each device, globally.

The answer is to insert humans and other technologies into processes at the right points. Crowdstrike has already acknowledged the inadequacy of its quality testing processes; this was not a complex patch, and it would likely have been found to be buggy had it been tested properly. Similarly, all organizations need to have testing processes up to scratch.

Emerging technologies like AI and machine learning could help predict and prevent similar issues by identifying potential vulnerabilities before they become problems. They can also be used to create test data, harnesses, scripts, and so on, to maximize test coverage. However, if left to run without scrutiny, they could also become part of the problem.

Revise Vendor Due Diligence

This incident has illustrated the need to review and “test” vendor relationships. Not just in terms of services provided but also contractual arrangements (and redress clauses to enable you to seek damages) for unexpected incidents and, indeed, how vendors respond. Perhaps Crowdstrike will be remembered more for how the company, and CEO George Kurtz, responded than for the issues caused.

No doubt lessons will continue to be learned. Perhaps we should have independent bodies audit and certify the practices of technology companies. Perhaps it should be mandatory for service providers and software vendors to make it easier to switch or duplicate functionality, rather than the walled garden approaches that are prevalent today.

Overall, though, the old adage applies: “Fool me once, shame on you; fool me twice, shame on me.” We know for a fact that technology is fallible, yet we hope with every new wave that it has become in some way immune to its own risks and the entropy of the universe. With technological nirvana postponed indefinitely, we must take the consequences on ourselves.

Contributors: Chris Ray, Paul Stringfellow, Jon Collins, Andrew Green, Chet Conforte, Darrel Kent, Howard Holton

The post Operations Leadership Lessons from the Crowdstrike Incident appeared first on Gigaom.

Much has been written about the whys and wherefores of the recent Crowdstrike incident. Without dwelling too much on the past (you
The post Operations Leadership Lessons from the Crowdstrike Incident appeared first on Gigaom.

Amazon’s Leadership and Corporate Culture: Lessons from Jeff Bezos

Amazon, one of the most influential and innovative companies in the world, has a corporate culture and leadership philosophy shaped largely by its founder, Jeff Bezos. Bezos’s approach to leadership and his vision for Amazon have set the company apart in the highly competitive tech and retail sectors. This article explores the key elements of Amazon’s leadership and corporate culture, and the lessons that can be drawn from Bezos’s unconventional methods.

The Bezos Leadership Philosophy

Jeff Bezos founded Amazon in 1994, and his leadership philosophy has been instrumental in driving the company’s success. Central to Bezos’s approach is a relentless focus on the customer. He famously emphasized, “We’re not competitor obsessed, we’re customer obsessed. We start with the customer and we work backwards.” This mantra has guided Amazon’s product development, customer service, and innovation strategies.

Bezos’s customer-centric philosophy manifests in various ways. For instance, Amazon’s customer service policies are designed to maximize customer satisfaction, often going above and beyond industry standards. This includes offering easy returns, fast shipping, and a broad range of products. The commitment to customer experience is not merely a slogan but a guiding principle that influences every aspect of Amazon’s operations.

Innovation and Risk-Taking

Another hallmark of Bezos’s leadership is a strong emphasis on innovation and risk-taking. Bezos has been known for his willingness to experiment and embrace failure as part of the innovation process. His approach is encapsulated in the principle that “failure and invention are inseparable twins.” This mindset has led to the creation of groundbreaking products and services such as Amazon Web Services (AWS), Kindle, and Alexa.

Bezos encourages employees to think big and explore unconventional ideas. Amazon’s “Day 1” mentality, as described by Bezos, means treating every day as if it were the first day of the company’s existence. This approach fosters a culture of continuous improvement and creativity, where employees are motivated to push boundaries and explore new possibilities.

The Two-Pizza Rule

Bezos’s leadership style also emphasizes the importance of small, agile teams. The “Two-Pizza Rule” is a notable example of this principle. According to this rule, teams should be small enough to be fed with two pizzas. The rationale behind this is that smaller teams are more effective at communicating and collaborating, which leads to faster decision-making and more innovative solutions.

The Two-Pizza Rule has influenced Amazon’s organizational structure, promoting a decentralized and flexible approach to management. It encourages teams to be autonomous and take ownership of their projects, leading to a more dynamic and responsive organization.

Long-Term Thinking

One of the defining features of Bezos’s leadership is his long-term thinking. Bezos has consistently prioritized long-term goals over short-term gains, a strategy that has often led to criticism from investors focused on quarterly results. However, Bezos believes that focusing on the long term is essential for building a sustainable and successful business.

This long-term perspective is reflected in Amazon’s investments in infrastructure, technology, and talent. For example, Amazon’s substantial investments in its fulfillment network and cloud computing services were made with the expectation of significant long-term returns. Bezos’s willingness to forgo immediate profits in favor of future growth has been a key driver of Amazon’s success.

Leadership Principles

Amazon’s leadership principles are a cornerstone of its corporate culture and reflect Bezos’s values and vision. These principles guide decision-making, employee behavior, and company policies. Some of the most notable principles include:

Customer Obsession: Always start with the customer and work backwards. Work to earn and keep customer trust.

Invent and Simplify: Seek out new solutions and simplify processes to make things easier for customers.

Hire and Develop the Best: Raise the performance bar with every hire and promotion. Recognize exceptional talent and help them grow.

Deliver Results: Focus on the key inputs for your business and deliver them with the right quality and in a timely fashion.

These principles help create a unified company culture where employees are aligned with Amazon’s mission and values. They also serve as a framework for evaluating performance and making strategic decisions.

A Culture of High Standards

Bezos is known for his insistence on high standards, which is reflected in Amazon’s corporate culture. He believes that maintaining high standards is crucial for driving excellence and innovation. This approach has led to a demanding work environment where employees are expected to deliver exceptional results and continuously improve.

While this culture of high standards has contributed to Amazon’s success, it has also been a point of contention. Critics have pointed to the intense pressure and high expectations placed on employees, leading to concerns about work-life balance and employee well-being. Bezos and Amazon have addressed these concerns in various ways, including investing in employee benefits and programs aimed at improving workplace conditions.

Lessons for Other Organizations

Amazon’s leadership and corporate culture offer several valuable lessons for other organizations:

Customer Focus: Prioritizing the customer can drive innovation and create a competitive edge. Companies should continually seek to understand and meet customer needs.

Embrace Failure: Viewing failure as a learning opportunity rather than a setback can foster innovation and resilience. Encouraging experimentation and risk-taking can lead to breakthroughs.

Small Teams, Big Impact: Smaller, autonomous teams can be more agile and effective. Empowering teams to make decisions and take ownership can drive productivity and creativity.

Long-Term Vision: Balancing short-term pressures with a focus on long-term goals can lead to sustainable growth and success. Investing in future-oriented projects can pay off over time.

High Standards: Setting high standards and striving for excellence can drive performance and improvement. However, it is important to balance this with consideration for employee well-being.

Conclusion

Jeff Bezos’s leadership and Amazon’s corporate culture have been instrumental in shaping the company’s success and influence. Bezos’s focus on the customer, innovation, long-term thinking, and high standards has created a unique and powerful organizational environment. While there are challenges associated with this approach, the lessons from Amazon’s leadership can provide valuable insights for other organizations striving for excellence and growth.

CategoryBlog

Operations Leadership Lessons from the Crowdstrike Incident | Amznusa.com