DevOps
Culture, Agile Development, & Cloud Native Technologies
DevOps is a recognition that Development and Operations needs to stop working alone in their "siloed" towers and start working together.
To do this we need:
- A culture of collaboration valuing openness, trust, and transparency
- An application design that does not require entire systems to be redeployed just to add a single function
- An application design that does not require entire systems to be redeployed just to add a single function
- A dynamic software-defined, programmable platform to continuously deploy onto
Requirements -> Design -> Code -> Integration -> Test -> Deploy
In traditional waterfall development: 1. Each step ended when the next begins; 2. Mistakes found in the later stages are more expensive to fix; 3. No provisions for changing requirements
Problem with this approach:
- No provisions for changing requirements
- Because all of the teams worked separately, the development team was not always aware of operational roadblocks that might prevent the program from working as anticipated
- The people the furthest from the code who knew the least about it were deploying it into production
In 1996, Kent Beck introduced Extreme Programming based on an interactive approach to software development. It was intended to improve software quality and responsiveness to changing customer requirements. It was one of the first agile methods.
In 2001, seventeen software developers met at a resort in Snowbird, Utah to discuss these lightweight development methods. Together, they published the Manifesto for Agile Software Development.
- Individuals and interactions over processes and tools
- Working software over comprehensive documentation
- Customer collaboration over contract negotiation
- Responding to change over following a plan
That is, while there is value in the items on the right, we value the items on the left more.
Cycle: ... -> Requirements -> Plan -> Design -> Develop -> Release -> Track & Monitor -> ...
- Requirements and solutions evolve through the collaborative effort of self-organizing and cross-functional teams and their customers
- It advocates adaptive planning, evolutionary development, early delivery, and continual improvement
- It encourages rapid and flexible response to change
While Agile improved the speed and accuracy of software for developers, it did nothing for operations. Many development team just got frustrated by ops not being able to deliver at the speed of development.
- Small team: 5 ± 2
- Dedicated
- Co-located
- Cross-functional
- Self managing
- Iterative Sprints
- Groomed Backlogs
- Customer Stories
- 2 Week Deliverables
You will fail if you...
- Lack of real Product Owner
- If your teams are too large
- If your teams and not dedicated
- If your teams are geographically distributed
- If your teams siloed
- If your teams are not self managing
- From the practitioners, by practitioners
- Not a product, specification, job title
- An experience-based movement
- Decentralized and open to all
- Smart experimentation
- Moving in-market with maximum velocity and minimum risk
- Gaining quick valuable insight to continuously change the value proposition and quality
- Social Coding
- Behavior and Test Driven Development
- Working in small batches
- Build Minimum Viable Products for gaining insights
- Failure leads to understanding
- The Twelve-Factor App describes patterns for cloud-native architectures which leverage microservices
- Applications are design as a collection of stateless microservices
- State is maintained in separate databases and persistent object stores
- Resilience and horizontal scaling is achieved through deploying multiple instances
- Failing instances are killed and re-spawned, not debugged and patched (cattle not pets)
- DevOps pipelines help manage continuous delivery of services
The microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery.
- Embrace failures: they will happen! Move from "How to avoid" —> "How to identify & what to do about it". Move from "Pure operational concern" —> "developer concern".
- External calls to other services that you don’t control are especially prone to problems:
- Use separate thread pools
- Time out quickly
- Circuit breaker pattern: identify problem and do something about it to avoid cascading failures
- Bulkhead pattern: Isolation from start to limit scope of failure (separate thread pools)
- Monkey testing: test by breaking (yes, on purpose! see: Netflix Chaos Monkey and Simian Army)
Automation is speed + repeatability.
- Continuous Integration (CI)
- Continuous Delivery (CD)
- Build Automation
- Canary Rollouts / Blue-Green
- Failing Forward
- Application Release Automation
- Facilitate a culture of teaming and collaboration
- Establish agile development as a shared discipline
- Automate relentlessly to enable rapid DevOps response
- Push smaller releases faster, measure and remediate impact
Named after the US industrial engineer Frederick Winslow Taylor (1856-1915) who in his 1911 book 'Principles Of Scientific Management' laid down the fundamental principles of large-scale manufacturing through assembly-line factories.
- Adoption of Command and Control Management: the dominant method of management in the Western world
- Organizations divided into (ostensibly) independent functional silos: organizations divided into (ostensibly) independent functional silos
- Decision-making is separated from work: managers do the planning and decide what workers should do; workers mindlessly do the tasks they are ask to accomplish
- Taylorism may have been good during the industrial revolution, but not so much in the technology revolution
- Taylorism may have been good during the industrial revolution, but not so much in the technology revolution
- The people power requirements have been lowered
- Taylorism is not appropriate for "knowledge" work like software development
Traditional IT practices | DevOps |
---|---|
Organizational silos and hand-offs | Shared ownership and high collaboration |
Fear of change | Risk management by embracing change |
Build once, hand crafted "snow flakes" | Ephemeral infrastructure as code |
Manual fulfillment | Automated Self Service |
Alarms, call-backs, and escalations | Feedback loops and data driven responses |
- Deployment is king
- Deployment must be painless
- You have deployed the same thing several times before it gets to production
- Deployment is decoupled from activation
- Risk is managed via activation controls (blue-green deploys with canary testing)
- 10% of the user base, waves of activation, etc.
- Deployment is not “one size fits all”
- It is a rail yard of interconnecting steps and microprocesses
- Cattle, not Pets
- Servers are built on demand via automation
- Logging into the server is seen as failure
- Release through parallel infrastructure
- Build the new version on new infrastructure; stage transition between environments (zero downtime)
- Transient Infrastructure
- Throw away when it is no longer needed
- This eliminates entropy - a major source of failure
- Applications are packaged in containers
- Same container that developer runs on their laptop runs in production
- Rolling updates with immediate roll-back
- No variance limits side-effects
- Dependencies are contained
- Blue-green deployment is a zero-downtime deployment technique that consists of two nearly identical production environments, called Blue and Green.
- They differ by the artifacts that the developer has intentionally changed, typically by the version of the application. At any given time, at least one of the environments is active.
- Using the blue-green deployment technique, you can realize the following benefits:
- Take software quickly from the final stage of testing to live production.
- Deploy a new version of an application without disrupting traffic to the application.
- Rollback rapidly. If there is something wrong with one of your environments, you can quickly switch to the other environment.
- Squads are grouped into Tribes (light-weight matrix)
- Chapters of competency areas are formed across Squads
- Guilds are informal light-weight community of interests across the company
- Each Squad has its own mission aligned with the business
- Feels like a "mini-startup"
- Self Organizing / Cross-functional
- 5-7 engineers, less than 10
- Squads have end-to-end responsibility for what they build
- Build, commit, deploy, maintenance, operations, EVERYTHING!
- With a long term mission usually around a single business domain
Shared Consciousness with Distributed (local) Control
- Bad behavior arises when you abstract people away from the consequences of their actions.
- Functional silos abstract people away from the consequences of their actions.
- For example: By adding a QA Team, developers are abstracted away from the consequences of writing buggy code.
- Make people aware of the consequences of their actions
- Create cross-functional teams - or -
- Have developers rotate through operations teams
- Have operations people attend developer standups and showcases
- Make people responsible for the consequences of their actions
- Having developers on Pager Duty, or own the SLA for the products and services they build
DevOps changes the objective of the measurement from Mean Time To Failure (MTTF, make sure you never go down) -> Mean Time To Recovery (MTTR, you will go down, make sure you can recover quickly)
Metrics:
- A BASELINE provides a concrete number for comparison as you implement your DevOps changes:
- It currently requires six team members 10 hours to deploy a new release of our product.
- This costs us $X for every release
- Metric GOALS allow you to reason about these numbers and judge the success of your transition process:
- Reduce deployment time from 10 hours to 2 hours.
- Increase percentage of defects detected in testing from 25% to 50%
- Reduce time-to-market for new features.
- Increase overall availability of the product.
- Reduce the time it takes to deploy a software release.
- Increase the percentage of defects detected in testing before production release.
- Make more efficient use of hardware infrastructure.
- Provide performance and user feedback to the product manager in a more timely manner
- 1.Mean Lead Time: How long does it take from idea to production?
- 2.Release Frequency: How often can you deliver changes?
- 3.Change Failure Rate: How often to changes fail?
- 4.Mean Time to Recovery (MTTR): How quickly can you recover from failure?
- On my team information is actively sought
- On my team failures are learning opportunities and messengers of them are not punished
- On my team responsibilities are shared
- On my team cross functional collaboration is encouraged and rewarded
- On my team failure causes inquiry
- On my team new ideas are welcomed
- Cycle time is a key metric for Agile kanban teams.
- Cycle time is the amount of time it takes for a unit of work to travel through the team’s workflow–from the moment work starts to the moment it ships.
- By optimizing cycle time, the team can confidently forecast the delivery of future work.
You MUST do all three:
- 1.Technical Practices
- 2.Lean Processes (Agile)
- 3.Culture
- You cannot buy DevOps In-A-Box
- You cannot order 20 units of DevOps for this quarter
- You cannot sprinkle DevOps on something to make it better
- You cannot become DevOps without changing your culture
- You can’t change your companies culture just by adopting new tools …but they can help reinforce it
- Using Containers won't fix your broken culture
- You cannot maintain your current organizational structure and become DevOps
- A Cultural Movement
- Emphasizing Collaboration, Sharing, and Transparency
- Promoting Automation and Infrastructure as Code
- Achieving Continuous Integration and Delivery of Changes
- Immutable Delivery
- With One set of Metrics to rule them all
Maturity Level | People | Process | Technology |
---|---|---|---|
Level 1: Ad Hoc |
|
|
|
Level 2: Repeatable |
|
|
|
Level 3: Defined |
|
|
|
Level 4: Measured | Collaboration based on shared metrics with a focus on removing bottlenecks |
|
|
Level 5: Optimized | A culture of continuous improvement permeates through the organization |
|
|
- DevOps is about breaking down the silos and working as a Single Agile Team
- Culture is the #1 success factor in DevOps. Building a culture of shared responsibility, transparency and faster feedback is the foundation of every high performing DevOps team
- DevOps starts with learning how to work differently. It embraces cross-functional teams with openness, transparency, and respect as pillars
- Being able to recover quickly from failure is more important than having failures less often
- Measurements should encourage innovation and collaboration, and not punish failure (blameless culture)
Last modified 1yr ago