Throughout the last decade, cloud computing has disrupted the technology industry and beyond. It has opened up new ways for engineering teams to build solutions and work with end customers. Our engineering teams now have access to highly scalable infrastructure, the ability to rapidly deploy across the globe and, in most cases, the ability to pay for what we use rather than high upfront costs.
But this change goes beyond technology implementation and touches how we work as software engineering teams. In some cases, it has caused us to adopt new internal processes, work more iteratively with our customers and ultimately, think in a different mindset. As an industry, we’re transitioning away from shipping products to shipping services.
In this blog, we explore how this impacts software engineering teams and end users and how you can set yourself up for success on your service-led journey.
The role of a developer has evolved
The role of a developer has drastically changed, whether you’re comparing that over the last 10, 20, 30 years or beyond—most recently, with the disruption of AI and Large Language Models (LLMs), cloud computing, the internet, mainframes, graphical user interfaces, and more. Developers have been evolving and refining their development practices throughout these evolutions to keep up to date with the latest technologies and paradigms.
Diving into some of these evolutions further:
- Working with multiple programming languages and frameworks.With cloud computing and microservices, individual engineering teams are empowered to choose the most appropriate programming language, framework and platform for their needs.
- Infrastructure-as-code (IaC). Cloud has provided APIs access to manage and configure the underlying infrastructure that our teams build upon. Scaling infrastructure to deal with unexpected spikes or spinning up a deployment stamp in another region has never been easier.
- Increased consumer expectations. Imagine if your favorite shopping app was unavailable. What would you do? You’d probably complete your purchase elsewhere, and may even switch loyalty. Those expectations translate into enhanced engineering requirements, such as Service Level Agreements (SLAs), Recovery Time/Point Objectives (RTOs/RPOs), fast page loads and more, which add additional engineering complexity and pressure for our teams.
Throughout these evolutions, there is a subtle underlying expectation that I haven’t explicitly called out. The lines between development, testing, operations and similar begin to blur. This isn’t surprising, as this is one of the critical premises of a DevOps cultural transformation, removing silos, and focusing on delivering value for your customers.
This means that many of our engineers are likely cross-skilling and learning areas outside their areas of expertise. Developers are learning more about infrastructure as code and networking concepts. Operations teams are learning more about the applications and design patterns being adopted. This is before we even consider platforms like Kubernetes, where the responsibilities are further intertwined.
This is different to several years ago, where software engineering teams would ship approved application versions to users on discs (take your choice of CD or floppy disk). Product teams would prioritize the most critical features, while customer feature requests may have to wait for a later product release, months, or even years later. The cadence and pace of releases were also slower compared to now.
One example is Microsoft’s transition from Office as a boxed product to Office 365. This has also translated to subscription-based payments and customer-focused models, where updates are shipped frequently and quality is essential in continuously delighting customers, and delivering value in the platform.
But this change doesn’t happen overnight and is a journey. The DevOps transformation you’ve likely already embarked on is a vital part of that. Let’s identify some of the common challenges and strategies to overcome them.
Navigating the change: challenges and strategies
We’ve covered some of those strategies and challenges in our introduction, so let’s examine them more explicitly.
Shipping more frequently, safely
One of the promises of DevOps is to break down barriers across teams. This leads to a mindset where the engineering team thinks about the application holistically rather than application versus infrastructure. This subtle change already shifts the perspective away from product development. The team now takes end-to-end accountability for the build, testing and release to customers, shifting the focus to the customer’s experience rather than the internal friction and product development viewpoint.
Automation is commonly adopted to enable this change, particularly in Continuous Integration (CI) and Continuous Delivery (CD).
CI allows you to regularly check that the code compiles successfully and that the project passes some tests.
Have you started writing code, only to find that it does not compile? Or, perhaps you’ve written a new method, but found other parts of the codebase are not passing their tests?
Building a robust CI process helps you gain confidence in the quality of the software that you’re creating. CD then allows you to automate the release of that application to your target environments, and progress those towards production.
Have you ever accidentally executed a script against a production environment intended for dev? How about manually publishing a new release, to find that you have incorrectly configured the environment? Or, attempting to diagnose why the application will not deploy to your environment after following your internally documented steps?
CD allows you to remove the overhead that can slow down a release of the software to your users. Most importantly, the potential for error-prone human intervention is reduced, and a consistent release process enables more releases.
As organizations gain confidence in their processes, they may begin shipping directly to production. For example, an engineering team might ship versions of their product to the engineering organization and use that in their day-to-day work. Critically, if the application has flaws, it will impact the engineering team’s productivity. Therefore, standards need to be followed, and a level of quality needs to be baked into the process.
Bring quality into everything that you do
Quality can mean many different things, and entirely depends on where you are in your transformation:
- Does the code compile?
- Does the code successfully pass your tests?
- Does the code pass a certain level of code coverage?
But we can consider a wide variety of checks:
- Is the user interface accessible for all users (for example, screen readers, high contrast, and similar)?
- Does the application scale to our expectations?
- Does the application respond in an expected timeframe (for example, less than a certain number of milliseconds)?
- Is the application as reliable as we expect it to be? What happens if a core service fails?
- Are there vulnerable code paths in the code that we have written?
- Are we relying upon vulnerable dependencies in our project?
- Have we accidentally leaked secrets, which risks continuity of our ongoing services?
This is where several of the puzzle pieces start coming together. Cloud and IaC allow us to create new environments on demand as part of our automated processes. With that, we can begin bringing scalability tests, performance tests, chaos tests, accessibility tests and more to a running instance of our service, configured in a like-for-like environment to production.
That way, we’re no longer hypothesizing whether the application can scale. Instead, we’re able to deploy a version of the application, and simulate the expected scenarios.
However, these checks should not first occur on builds/releases from our production codebase. Ideally, we should be bringing many of these checks earlier into our development flow, checking for quality in a pull request before we merge to the main branch.
Customer-focused development, not product-focused
So far, we’ve been on a journey. Building automation and quality into everything we do so we can frequently deliver updated product versions to our customers.
But as we ship more frequently and move towards a cloud deployment model, our customers’ expectations shift. Depending on the service, downtime may be unacceptable. Similarly, customers may expect transactions to be completed within a given time or that the application meets specific accessibility standards.
The most important part here is the customer, and understanding what is most important to them. From a business perspective, this is important for several reasons:
- You must be perceived as bringing value to a prospective customer to attract new users. Therefore, if your application is hard to use, slow, and does not meet their needs, they will be unlikely to choose you over a competitor.
- In a service model, organizations typically pay for access through subscriptions. These subscriptions are recurring payments (potentially monthly or annual), meaning it’s vital to continually demonstrate value. If you don’t, you risk customer churn to competitors.
Rather than believing we know what customers want, we must know what they want by building feedback loops into our platforms, building communities with our top users, and, ultimately, using their feedback to prioritize our backlog.
Continuous learning and improvement
As I’ve outlined, running these services can come with high expectations. Depending on your size, scale, or even the types of customers, you may have a high set of availability targets to reach.
When things go wrong, it’s okay to demonstrate continuous learning, identify areas for improvement, and remediate it for the future. When something goes wrong, conduct a blameless retrospective. The idea is not to find who to blame, but to discover the root cause of the problem and determine how automation, refined processes, additional tests, and more can be adopted to prevent similar issues from happening in the future. Treat each failure as an opportunity to learn and improve.
Much like the functional requirements I’ve alluded to throughout the process (specific platform features), non-functional requirements should also be considered on our backlog. Communicating with transparency and with your customers in mind is vital. What have you identified? What are you going to do about it? And how are you going to improve it in the future?
From strategies to action
We’ve talked through how the industry has transformed, and the strategies that can overcome the potential challenges which may arise. But how do we put this into practice?
Implement CI/CD
Consider your current development practices:
- Have you prevented manual/human error-prone issues from taking place? Manual tasks, whether typing the incorrect configuration value or dragging and dropping deployment files, can quickly go wrong.
- Are there enough quality checks in place today? You are aiming to provide an exceptional experience to your customers. Are you including checks and balances within the development lifecycle to deliver on that? As challenges and live site issues arise, are you adding those to the backlog to prevent future repercussions?
- Is your production codebase always in a shippable state? Are you preventing your engineering teams from directly committing to the production codebase? In other words, have you implemented branch protection rules or repository rules to ensure changes are made through pull requests? This will allow you to run some of these checks earlier in your development process, preventing issues from ever reaching the production codebase.
If you’ve answered ‘yes’ to all three questions, then you’re progressing along the right lines! Investing in automation will help bring consistency and rigor to your deployment lifecycle.
‘Everything’ as code
‘Code’ has historically referred to application code. But now, we can create our infrastructure and CI/CD workflows as code. This evolution has a subtle set of benefits which are worth remembering.
Our code is stored in version control, which means we’re able to reuse the same practices that we’re used to as our application code:
- Changes can be viewed over time, allowing us to understand who made a change and why.
- You can use pull requests to ensure that each change is reviewed by at least one other human. After all, two heads are better than one, as they say!
- Changes to workflow code appear next to app-code changes in the same commit or pull request, allowing us to understand and verify their relationship.
- Automation can be triggered based on changes to a branch of code, and can surface results in a commonly-visible area (the pull request). Therefore, whether from an operations background, a traditional developer, or a data scientist, we can benefit from automating quality checks by storing our code in version control.
Check out this blog if you’re interested in learning more about GitOps, and how you can operate your infrastructure using similar techniques to application code.
Many enterprises operate in silos, but building a culture of innovation, collaboration, and learning is possible. The open source community builds openly, asynchronously, globally, and at scale, many lessons that we can adopt for our internal development.
Check out how you can reuse these practices for your internal software development.
This is the first step of a more extensive cultural transformation. Once you’ve removed those silos, consider how you can begin collaborating across your organization. What are the most critical customer priorities, and how are you collectively working towards solving those? Are you capturing and sharing feedback so that it can be delivered to the right teams across the organization?
And finally, challenges happen. While no one enjoys live site incidents, it’s a natural occurrence in the nature of our work. The vital part is being able to learn from those situations:
- What went wrong?
- How can you prevent it in the future?
- Have you communicated to your customers in a transparent and customer-focused way?
To put this into context, here are some of the ways that GitHub communicates openly:
- We publish a public roadmap, so you have an idea of our upcoming priorities.
- The communities discussions area provides an opportunity for you to give us feedback, and for you to discuss your proven practices, challenges, and opportunities with other community members.
- The GitHub Blog provides updates on significant product announcements, thought leadership, and policy updates relating to the industry.
- The GitHub Changelog is a more granular set of updates on our recent releases, including incremental feature updates.
- Our GitHub Status page provides an up-to-date view of the platform’s operational health. If there are any ongoing issues, you’ll be able to find out about them here.
- GitHub publishes monthly availability reports, including a summary of each incident, the cause, and steps taken to improve for the future.
Conclusion
Through this blog, we’ve explored the journey from a product to a service mindset. Much of this has been driven by new industry opportunities and trends (such as DevOps transformation, cloud computing and IaC). Even so, there is much for us to consider along this path:
- Implement CI/CD to help you ship more frequently and safely by bringing quality into everything you do.
- Adopt an ‘everything as code’ mentality to benefit from the above automation practices.
- Promote a culture of innovation, collaboration and learning by pivoting your focus to the customer, accepting that things may go wrong, and adopting a culture of continuous improvement.
Written by
Chris is a passionate developer advocate and senior program manager in GitHub’s Developer Relations team. He works with execs, engineering leads, and teams from the smallest of startups, established enterprises, open source communities and individual developers, helping them GitHub and unlock their software engineering potential.