What is DevOps?
DevOps refers to a cultural movement whose principles and practices aim to improve software development performance by increasing throughput of valuable software, and product stability. DevOps practices help balance the natural tension between the need to deliver new features to remain competitive and the necessity of having stable, reliable, and secure software for customers.
Stride’s approach to DevOps
Striders apply DevOps practices in an effort to achieve high-performance software delivery as measured by the four key software delivery metrics that are predictive of company performance according to DevOps Research and Assessment (DORA). However, the practices are not one-size-fits-all. As we chart a path toward greater throughput and stability within the constraints of your organization, we are guided by three overarching DevOps principles (credited to The DevOps Handbook by Gene Kim, Jez Humble, Patrick Debois, and John Willis):
- Flow: Enable the fast and efficient flow of work from idea to production.
- Feedback: Ensure there are mechanisms in place to generate feedback about the quality of software, the performance with users, the performance of the business, and the way in which the team works.
- Continual learning: Use mechanisms to propagate learning and turn it into an edge for your organization.
Flow
Flow focuses on practices that enable the fast flow of work from “concept to cash.”
Continuous delivery pipeline
We lower the friction of deploying even the smallest changes by focusing on building the capability to deliver software as soon as possible. This enables you to realize the value of new features as soon as possible, to accelerate learning, and to minimize the risk of an individual release.
Components of a continuous deployment pipeline include:
- Artifact repositories like Sonatype Nexus or JFrog Artifactory that ensure that we use stable open-source or internal versions of software and improve the security posture of software dependencies.
- Infrastructure as Code (IaC) tools like Terraform or CloudFormation, which automate the creation of reproducible software environments, making changes visible and reducing manual errors.
- Deployment scripts that support safe, consistent deployments for your chosen compute model (e.g., Kubernetes, Serverless, or virtual machines) through techniques such as blue-green or canary deployments.
Automated testing
Stride engineers strongly prefer a test-first approach to building software through test-driven development. Among other benefits, it improves the safety and confidence the team has in making changes. These automated tests run as part of the deployment process. The deployment proceeds only when the tests pass.
Automated testing makes use of the following components:
- Static analysis tools can catch syntax errors, type mismatches, and security flaws. Some examples include ESLint, javac compiler checks, and Snyk.
- Testing frameworks and test runners like Jest, JUnit, PyTest, rspec, Appium, and Cypress are often language-dependent and provide tooling for writing and running tests.
- Assertion languages like Chai and Cucumber provide alternative ways to express tests, such as those underpinning behavior-driven development.
Continuous integration
Continuous integration is the practice of integrating new code into a main branch in source control several times a day. Continuous integration minimizes the divergence between production code and code that is under development.
Often, the longer the divergence lasts, the more rework is needed. Rework reduces throughput and velocity. It also increases the probability of defects.
Continuous integration relies on a few bedrock tools and practices:
- Version control systems like Git and branching strategies like trunk-based development enable many developers to frequently integrate small changes.
- Automated builds should run on every integration using tools like CircleCI, Jenkins, or GitHub Actions.
Feedback
Feedback focuses on practices that generate fast feedback from systems.
Understand systems with telemetry and observability
Telemetry is the discipline of tracing, logging, and monitoring infrastructure. Teams need to monitor a running system, especially after each new release, to rapidly detect defects or service degradation. Rich observability tools can help infer the causes of issues and even anticipate problems.
We have used many tools to get feedback from software systems:
- Log aggregation tools like Logstash, Fluentd, SumoLogic, or Splunk provide insight into application events occurring during a time period of interest.
- Monitoring tools like Prometheus, Nagios, and AWS CloudWatch provide key infrastructure metrics.
- Observability tools like DataDog, NewRelic, Graphana, and Honeycomb help teams infer the behavior of many interacting services.
Hypothesis-driven development
Analytical techniques like A/B testing leverage data from existing users to guide future product development. Tools like Optimizely enable this kind of hypothesis-driven development that goes beyond user interviews.
Increase quality with reviews
Stride leans on pair programming and frequent pair rotation as a means of continuously reviewing code and eliminating the biggest bottleneck in software development: context. We’ve found that this form of feedback dramatically improves the quality of code and technical decisions, and promotes shared code ownership between developers better than any other technique.
Continual learning and experimentation
Continual learning and experimentation incorporate practices that propagate individual learning and convert it into organizational opportunities.
Learning culture
The faster teams learn and propagate learning, the quicker they can improve the quality of the system and the decisions that go into building products. Stride teams propagate learnings through “Today I Learned” (TIL) messages, Lunch and Learn sessions, and lightning talk sessions. Finally, when appropriate, Striders follow a lightweight Pair Feedback routine at the end of each day where they reflect on what worked and what could be improved in the future.
Psychological safety
When working in systems with an established user base, production issues inevitably arise. We support teams’ holding blameless postmortems to nurture the curious impulse and make it better than you found it. Teams hold regular sprint retrospectives to incrementally improve their process.
Resilience
Software can be resilient to failures when resilient teams build and maintain the software. Rehearsing incidents, constantly reviewing alarm thresholds, and being selective about the metrics driving performance help foster resilient teams.
Fill out the form below to speak with a Stride representative about our DevOps capability