DevOps

Culture, Agile Development, & Cloud Native Technologies

DevOps is a recognition that Development and Operations needs to stop working alone in their "siloed" towers and start working together.

To do this we need:

  • A culture of collaboration valuing openness, trust, and transparency

  • An application design that does not require entire systems to be redeployed just to add a single function

  • An application design that does not require entire systems to be redeployed just to add a single function

  • A dynamic software-defined, programmable platform to continuously deploy onto

Traditional Waterfall Development

Requirements -> Design -> Code -> Integration -> Test -> Deploy

In traditional waterfall development: 1. Each step ended when the next begins; 2. Mistakes found in the later stages are more expensive to fix; 3. No provisions for changing requirements

Problem with this approach:

  • No provisions for changing requirements

  • Because all of the teams worked separately, the development team was not always aware of operational roadblocks that might prevent the program from working as anticipated

  • The people the furthest from the code who knew the least about it were deploying it into production

Extreme Programming

In 1996, Kent Beck introduced Extreme Programming based on an interactive approach to software development. It was intended to improve software quality and responsiveness to changing customer requirements. It was one of the first agile methods.

Agile Manifesto

In 2001, seventeen software developers met at a resort in Snowbird, Utah to discuss these lightweight development methods. Together, they published the Manifesto for Agile Software Development.

  • Individuals and interactions over processes and tools

  • Working software over comprehensive documentation

  • Customer collaboration over contract negotiation

  • Responding to change over following a plan

That is, while there is value in the items on the right, we value the items on the left more.

Agile Development

Cycle: ... -> Requirements -> Plan -> Design -> Develop -> Release -> Track & Monitor -> ...

  • Requirements and solutions evolve through the collaborative effort of self-organizing and cross-functional teams and their customers

  • It advocates adaptive planning, evolutionary development, early delivery, and continual improvement

  • It encourages rapid and flexible response to change

Agile Dilemma / Why Agile alone not good enough?

While Agile improved the speed and accuracy of software for developers, it did nothing for operations. Many development team just got frustrated by ops not being able to deliver at the speed of development.

How Agile are your teams?

Working as an Agile Team

  • Iterative Sprints

  • Groomed Backlogs

  • Customer Stories

  • 2 Week Deliverables

Agile Antipatterns

You will fail if you...

  • Lack of real Product Owner

  • If your teams are too large

  • If your teams and not dedicated

  • If your teams are geographically distributed

  • If your teams siloed

  • If your teams are not self managing

The history reminds us that DevOps is:

  • From the practitioners, by practitioners

  • Not a product, specification, job title

  • An experience-based movement

  • Decentralized and open to all

Goal of Microservice is Agility

  • Smart experimentation

  • Moving in-market with maximum velocity and minimum risk

  • Gaining quick valuable insight to continuously change the value proposition and quality

DevOps Thinking / Tenets

  • Social Coding

  • Behavior and Test Driven Development

  • Working in small batches

  • Build Minimum Viable Products for gaining insights

  • Failure leads to understanding

Think Cloud Native

  • The Twelve-Factor App describes patterns for cloud-native architectures which leverage microservices

  • Applications are design as a collection of stateless microservices

  • State is maintained in separate databases and persistent object stores

  • Resilience and horizontal scaling is achieved through deploying multiple instances

  • Failing instances are killed and re-spawned, not debugged and patched (cattle not pets)

  • DevOps pipelines help manage continuous delivery of services

Think Microservices

The microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery.

Design for Failure

  • Embrace failures: they will happen! Move from "How to avoid" —> "How to identify & what to do about it". Move from "Pure operational concern" —> "developer concern".

  • External calls to other services that you don’t control are especially prone to problems:

    • Use separate thread pools

    • Time out quickly

  • Circuit breaker pattern: identify problem and do something about it to avoid cascading failures

  • Bulkhead pattern: Isolation from start to limit scope of failure (separate thread pools)

  • Monkey testing: test by breaking (yes, on purpose! see: Netflix Chaos Monkey and Simian Army)

Think Continuous Automation

Automation is speed + repeatability.

  • Continuous Integration (CI)

  • Continuous Delivery (CD)

  • Build Automation

  • Canary Rollouts / Blue-Green

  • Failing Forward

  • Application Release Automation

Working DevOps

  • Facilitate a culture of teaming and collaboration

  • Establish agile development as a shared discipline

  • Automate relentlessly to enable rapid DevOps response

  • Push smaller releases faster, measure and remediate impact

Taylorism

Named after the US industrial engineer Frederick Winslow Taylor (1856-1915) who in his 1911 book 'Principles Of Scientific Management' laid down the fundamental principles of large-scale manufacturing through assembly-line factories.

  • Adoption of Command and Control Management: the dominant method of management in the Western world

  • Organizations divided into (ostensibly) independent functional silos: organizations divided into (ostensibly) independent functional silos

  • Decision-making is separated from work: managers do the planning and decide what workers should do; workers mindlessly do the tasks they are ask to accomplish

Taylorism is not appropriate for Craft Work:

  • Taylorism may have been good during the industrial revolution, but not so much in the technology revolution

  • Taylorism may have been good during the industrial revolution, but not so much in the technology revolution

  • The people power requirements have been lowered

  • Taylorism is not appropriate for "knowledge" work like software development

Required DevOps behaviors

Traditional IT practicesDevOps

Organizational silos and hand-offs

Shared ownership and high collaboration

Fear of change

Risk management by embracing change

Build once, hand crafted "snow flakes"

Ephemeral infrastructure as code

Manual fulfillment

Automated Self Service

Alarms, call-backs, and escalations

Feedback loops and data driven responses

How DevOps Manages Risk

  • Deployment is king

    • Deployment must be painless

    • You have deployed the same thing several times before it gets to production

  • Deployment is decoupled from activation

    • Risk is managed via activation controls (blue-green deploys with canary testing)

    • 10% of the user base, waves of activation, etc.

  • Deployment is not “one size fits all”

    • It is a rail yard of interconnecting steps and microprocesses

Ephemeral Infrastructure

  • Cattle, not Pets

    • Servers are built on demand via automation

    • Logging into the server is seen as failure

  • Release through parallel infrastructure

    • Build the new version on new infrastructure; stage transition between environments (zero downtime)

  • Transient Infrastructure

    • Throw away when it is no longer needed

    • This eliminates entropy - a major source of failure

Immutable Delivery

  • Applications are packaged in containers

  • Same container that developer runs on their laptop runs in production

  • Rolling updates with immediate roll-back

  • No variance limits side-effects

  • Dependencies are contained

Zero-Downtime Deployments

  • Blue-green deployment is a zero-downtime deployment technique that consists of two nearly identical production environments, called Blue and Green.

  • They differ by the artifacts that the developer has intentionally changed, typically by the version of the application. At any given time, at least one of the environments is active.

  • Using the blue-green deployment technique, you can realize the following benefits:

    • Take software quickly from the final stage of testing to live production.

    • Deploy a new version of an application without disrupting traffic to the application.

    • Rollback rapidly. If there is something wrong with one of your environments, you can quickly switch to the other environment.

Spotify Case Study

Organizational Structure

  • Squads are grouped into Tribes (light-weight matrix)

  • Chapters of competency areas are formed across Squads

  • Guilds are informal light-weight community of interests across the company

Autonomous Squads

  • Each Squad has its own mission aligned with the business

    • Feels like a "mini-startup"

    • Self Organizing / Cross-functional

    • 5-7 engineers, less than 10

  • Squads have end-to-end responsibility for what they build

    • Build, commit, deploy, maintenance, operations, EVERYTHING!

    • With a long term mission usually around a single business domain

DevOps Organizational Objective

Shared Consciousness with Distributed (local) Control

Actions v.s. Consequences: Functional Silos Breed Bad Behavior

  • Bad behavior arises when you abstract people away from the consequences of their actions.

  • Functional silos abstract people away from the consequences of their actions.

  • For example: By adding a QA Team, developers are abstracted away from the consequences of writing buggy code.

Actions have Consequences

  • Make people aware of the consequences of their actions

    • Create cross-functional teams - or -

    • Have developers rotate through operations teams

    • Have operations people attend developer standups and showcases

  • Make people responsible for the consequences of their actions

    • Having developers on Pager Duty, or own the SLA for the products and services they build

DevOps Measurement / Metrics

DevOps changes the objective of the measurement from Mean Time To Failure (MTTF, make sure you never go down) -> Mean Time To Recovery (MTTR, you will go down, make sure you can recover quickly)

Metrics:

  • A BASELINE provides a concrete number for comparison as you implement your DevOps changes:

    • It currently requires six team members 10 hours to deploy a new release of our product.

    • This costs us $X for every release

  • Metric GOALS allow you to reason about these numbers and judge the success of your transition process:

    • Reduce deployment time from 10 hours to 2 hours.

    • Increase percentage of defects detected in testing from 25% to 50%

Actionable Metric Examples

  • Reduce time-to-market for new features.

  • Increase overall availability of the product.

  • Reduce the time it takes to deploy a software release.

  • Increase the percentage of defects detected in testing before production release.

  • Make more efficient use of hardware infrastructure.

  • Provide performance and user feedback to the product manager in a more timely manner

Top 4 Actionable Metric

  1. Mean Lead Time: How long does it take from idea to production?

  2. Release Frequency: How often can you deliver changes?

  3. Change Failure Rate: How often to changes fail?

  4. Mean Time to Recovery (MTTR): How quickly can you recover from failure?

Culture Measurements

  • On my team information is actively sought

  • On my team failures are learning opportunities and messengers of them are not punished

  • On my team responsibilities are shared

  • On my team cross functional collaboration is encouraged and rewarded

  • On my team failure causes inquiry

  • On my team new ideas are welcomed

Key metric: Cycle Time

  • Cycle time is a key metric for Agile kanban teams.

  • Cycle time is the amount of time it takes for a unit of work to travel through the team’s workflow–from the moment work starts to the moment it ships.

  • By optimizing cycle time, the team can confidently forecast the delivery of future work.

Keys to High Performance

You MUST do all three:

  1. Technical Practices

  2. Lean Processes (Agile)

  3. Culture

Busted DevOps Myths

  • You cannot buy DevOps In-A-Box

  • You cannot order 20 units of DevOps for this quarter

  • You cannot sprinkle DevOps on something to make it better

  • You cannot become DevOps without changing your culture

  • You can’t change your companies culture just by adopting new tools …but they can help reinforce it

  • Using Containers won't fix your broken culture

  • You cannot maintain your current organizational structure and become DevOps

DevOps Summary

  • A Cultural Movement

  • Emphasizing Collaboration, Sharing, and Transparency

  • Promoting Automation and Infrastructure as Code

  • Achieving Continuous Integration and Delivery of Changes

  • Immutable Delivery

  • With One set of Metrics to rule them all

DevOps Maturity Matrix

Maturity LevelPeopleProcessTechnology

Level 1:

Ad Hoc

  • Silo based

  • Blame and finger-pointing

  • Dependent on experts

  • Lack of accountability

  • Manual processes

  • Tribal knowledge the norm

  • Unpredictable and reactive

  • Manual builds and deployments

  • Manual testing

  • Environmental inconsistencies

Level 2:

Repeatable

  • Manual builds and deployments

  • Manual testing

  • Environmental inconsistencies

  • Processes established within silos

  • No standards

  • Can repeat what is known, but can’t react to unknowns

  • Automated builds

  • Automated tests written as part of story development

  • Painful but repeatable releases

Level 3:

Defined

  • Collaboration exists

  • Shared decision making

  • Shared accountability

  • Process automated across the software life cycle

  • Standards across organization

  • Automated build and test cycle for every commit

  • Push button deployments

  • Automated user and acceptance testing

Level 4:

Measured

Collaboration based on shared metrics with a focus on removing bottlenecks

  • Proactive monitoring

  • Metrics collected and analyzed against business goals

  • Visibility and predictability

  • Build metrics visible and acted on

  • Orchestrated deployments with automatic rollbacks

  • Nonfunctional requirements defined and measured

Level 5:

Optimized

A culture of continuous improvement permeates through the organization

  • Self-service automation

  • Risk and cost optimization

  • High degree of experimentation

  • Zero downtime deployments

  • Immutable infrastructure

  • Actively enforce resiliency by forcing failures

Key Takeaways

  • DevOps is about breaking down the silos and working as a Single Agile Team

  • Culture is the #1 success factor in DevOps. Building a culture of shared responsibility, transparency and faster feedback is the foundation of every high performing DevOps team

  • DevOps starts with learning how to work differently. It embraces cross-functional teams with openness, transparency, and respect as pillars

  • Being able to recover quickly from failure is more important than having failures less often

  • Measurements should encourage innovation and collaboration, and not punish failure (blameless culture)

Last updated