Overview
This article explains the definition, necessity, implementation methods, and decision-making criteria for platform engineering.
Platform engineering is a specialized field that abstracts and provides complex infrastructure and operational processes as a common foundation, allowing developers to focus on application development. It fundamentally addresses issues such as the cognitive load on developers, bottlenecks in development speed, and the lack of standardization that become apparent as organizations scale.
First, this article will describe the challenges that platform engineering addresses and the components necessary for its realization (IDP, Golden Path). Next, it will clarify the specific benefits gained from implementation and the differences from related methodologies such as DevOps and SRE. Furthermore, it will present the four roles necessary for success (selection, abstraction, self-service, operational responsibility) and the required skill sets, concluding with criteria for determining when to implement.
Definition of Platform Engineering
Platform engineering is a specialized field that abstracts and provides complex infrastructure and operational processes as a "common foundation (platform)" so that developers can focus on application development.
By utilizing the platform, development teams can efficiently build, deploy, and manage applications.
Platform engineering streamlines the software development process within organizations and aims to improve developer productivity.
Why is Platform Engineering Necessary?
As organizations grow and the number of development teams and services increases, the following challenges may become apparent.
Increased Cognitive Load on Developers
With the spread of DevOps, developers have taken on the responsibility of "building and operating" their own applications. However, the areas they need to manage continue to expand, including cloud services, containers, orchestration, security, and compliance. Developers must acquire detailed knowledge of infrastructure in addition to their core application development tasks and manage complex configurations. This increased cognitive load leads to developer fatigue and decreased productivity.
Bottlenecks in Development Speed
When developers are forced to issue tickets to the operations team and wait for approvals every time they provision or deploy infrastructure, it lengthens the lead time for deployments. Constraints such as "it takes several days to prepare the environment" and "we must wait for the weekly release window for production deployment" delay time-to-market and increase the risk of missing business opportunities.
Lack of Standardization and Complex Maintenance
When multiple teams build their own unique tools, CI/CD pipelines, and workflows independently, it becomes difficult to ensure consistent quality and security across the organization. This leads to a situation where, if a responsible person leaves or transfers, "no one understands this configuration." Additionally, multiple teams redundantly solve similar technical challenges, resulting in significant waste across the organization.
Instability in Operations and Lack of Governance
When security and compliance requirements are added retroactively, significant friction arises in the development process. If different security settings and monitoring methods are adopted by each team, it becomes impossible to enforce governance across the organization, increasing the risk of vulnerabilities and incidents.
Platform engineering has emerged to fundamentally address these challenges. By abstracting complex infrastructure and operational processes and providing a common foundation that allows developers to develop safely and quickly in a self-service manner, it enhances productivity and reliability across the organization.
Components of Platform Engineering
To realize a platform, the following components are required.
Internal Developer Platform (IDP)
A toolset that abstracts technical complexity and enables self-service.
The main functions of an IDP are as follows:
- Service Catalog: A list of available services and templates
- Self-Service Provisioning: A mechanism that allows developers to create and manage resources on their own
- Integrated Documentation: Aggregated usage instructions and best practices for the platform
- Observability Dashboard: A centralized view of the application's status
Representative tools for building an IDP include:
- Backstage (by Spotify): A portal that integrates service catalogs and documentation
- Port: A platform that allows IDP construction without code
- Kratix: A framework for building platforms on Kubernetes
- Humanitec: An IDP specialized in application configuration management
Golden Path
Standardized templates and workflows that guarantee safe and rapid deployment.
Examples of Golden Paths include:
- Application Templates: Pre-configured project structures and CI/CD pipelines
- Infrastructure Templates: Standard configurations for Terraform modules and cloud resources
- Deployment Patterns: Implemented workflows for Blue/Green deployments, canary releases, etc.
- Security Settings: Configurations that include encryption, access control, and audit logs
Developers do not need to worry about "how to build"; by following the Golden Path, they naturally adhere to best practices. This allows even less experienced developers to build high-quality and secure systems.
Benefits of Platform Engineering
By implementing platform engineering, organizations can enjoy numerous benefits.
1. Increased Development Speed and Efficiency
With a self-service platform, developers can reduce the time spent waiting for requests or approvals from other departments. They can autonomously provision infrastructure, deploy, and set up environments, thereby shortening time-to-market.
2. Reduced Cognitive Load on Developers
By abstracting the complexities of infrastructure, cloud configurations, security, and scaling, the platform reduces the cognitive load on developers.
By working along the Golden Path, developers can naturally follow best practices, which enhances their productivity.
3. Strengthened Security and Compliance
In platform engineering, security is not an afterthought but is integrated into the platform from the design phase (secure by design).
The platform automatically applies consistent security policies, access controls, and compliance checks across all services.
4. Improved Consistency and Quality through Standardization
By providing standardized templates known as the Golden Path, consistent development practices are achieved across the organization.
This prevents the proliferation of disparate methods for each project, reducing maintenance complexity. Additionally, onboarding new members becomes smoother, allowing even less experienced developers to achieve high-quality and secure configurations.
5. Observability and Operational Stability
By ensuring integrated observability, insights into the performance of middleware and applications can be obtained. This allows issues to be detected and resolved before impacting end users.
6. Cost Optimization
Platform engineering achieves efficient resource utilization and reduces operational costs. By centralizing and automating common development processes, redundant tasks and waste of human resources can be eliminated. As a result, it becomes possible to reduce costs while improving profitability.
Differences Between Platform Engineering and DevOps
DevOps is about "culture and practices," while platform engineering is about "specific implementations and solutions."
While DevOps refers to the framework and practices aimed at "integrating development and operations," platform engineering implements and provides these practices as specific mechanisms (tools and foundations).
DevOps is a methodology and practice under the principle of "You build it, you run it," where development teams also take on operational responsibilities. However, as the areas of management such as infrastructure and security continue to expand, the burden on developers (cognitive load) becomes excessive.
On the other hand, platform engineering abstracts complex operational processes as a "self-service common foundation" while maintaining the ideals of DevOps. It provides an "environment" as a product that allows developers to release safely and quickly without deep expertise, thereby improving the development experience.
Differences Between Platform Engineering and SRE
SRE focuses on "production reliability," while platform engineering focuses on "developer productivity."
The customers of SRE (Site Reliability Engineering) are "end users," and it is responsible for ensuring the "reliability (stability)" of services.
In contrast, the customers of platform engineering are "internal developers," and it aims to enhance productivity in development by providing the platform.
Four Roles of Platform Engineering
The aforementioned components (IDP, Golden Path) indicate "what to create," but these four roles define "how to promote it with principles and mindsets." Platform engineering brings value to organizations by promoting it with the same mindset as "product development" based on the following four roles.
1. Selection as a Product
Define the platform not merely as "infrastructure" but as a "product" with internal developers as customers. The principle is to reduce the cognitive load on developers and allow them to focus on delivering value, strategically providing "the shortest path to success" rather than unlimited freedom.
In practice, both product and selection are crucial. While being customer (developer) centric, it is necessary to clearly delineate what will not be done if it does not align with the organization's governance. As a Golden Path, provide a single workflow that integrates multiple tools for most standard needs. Additionally, as a railway (Rails), build a new common foundation to fill gaps such as batch processing and notifications that are lacking internally.
2. Software-Based Abstraction
A strong engineering attitude is required: "If you are not writing software, it is not platform engineering." The principle is to provide an excellent developer experience (Developer Experience) through automation and abstraction to enhance developer satisfaction and productivity.
In practice, providing stable APIs is crucial. Abstract the complexity of underlying OSS or vendor products into "stable APIs" that are easy for applications to use. Additionally, provide components that enhance reliability, such as caching, sharding, and sidecars, to improve reliability and performance without changing the application code. Furthermore, establish a system for managing metadata and visualization that allows for centralized management of ownership, costs, and access rights. Note that building an IDP (portal) is not mandatory; priorities should be determined based on the "pain points" of the field.
3. Self-Service with Guardrails
Provide an environment where developers can work autonomously while incorporating "guardrails" that ensure safety and governance. The principle is to strike a balance between freedom and constraints, creating a system that allows developers to develop quickly and confidently, enhancing productivity across the organization from beginners to power users.
In practice, enable developers to provision environments on their own through UI, CLI, and CI integration. At the same time, implement "guardrails" in advance to prevent critical configuration errors and cost increases, achieving self-service with guardrails. Additionally, when issues arise, it is important to provide a debugging experience that allows developers to immediately discern whether the problem is due to "their code or a platform issue."
4. Operational Responsibility as a Foundation
It is not enough to provide tools; one must take responsibility for the entire lifecycle afterward. Recognizing that the reliability of the platform itself is the foundation of the business, the principle is to prioritize operational stability.
In practice, take end-to-end responsibility for all operations, including self-written code as well as the behavior of dependent OSS and cloud vendors. While promoting self-service, build a "support system" and "positive culture" that can respond to advanced production failures and diverse technical questions, achieving democratization of operations and support. Furthermore, by making anomaly detection and proactive responses habitual, and continuously managing the platform's SLO/SLA, maintain operational discipline that safeguards the reliability of the business foundation.
Skill Set for Platform Engineers
Platform engineers are involved in both development and operational responsibilities, requiring a diverse set of soft and technical skills.
Soft Skills
To succeed in platform engineering, the following soft skills are considered important:
- Collaboration and Communication: Work closely with software developers, DevOps, and Site Reliability Engineering (SRE) teams to understand their needs and reflect them in platform functionality. The platform is a "product," and dialogue with developer customers is essential.
- Problem-Solving Ability: Tackle complex infrastructure challenges and continuously improve platform functionality. Find optimal solutions within technical constraints.
- Empathy for Developers: Developers are the primary customers. Focus on Developer Experience, designing the platform to reduce friction and support productivity. Think from the developers' perspective and build a platform that they want to use.
Technical Skills
Platform engineers should be proficient in the following technical areas:
- Infrastructure as Code: Manage and automate infrastructure provisioning as code. Build reproducible infrastructure through declarative configuration management.
- Containers and Orchestration: Manage containers, automate deployment of microservices, and scale efficiently. Operate distributed systems effectively.
- CI/CD Pipelines: Build continuous integration and delivery systems. Design mechanisms that automatically test, build, and deploy when developers push code.
- Monitoring and Observability: Visualize system health and detect issues early. Define SLI/SLO and set alerts appropriately.
- Security Practices: Understand security principles such as encryption, access control, secret management, and vulnerability scanning, and incorporate them into the platform. Reflect the secure by design concept in implementations.
- Programming Skills: Develop automation tools and CLIs. Not only combine existing tools but also create unique tools to solve platform-specific challenges.
When to Implement Platform Engineering
Platform engineering is not suitable for every organization. Implementation incurs the cost of a dedicated team to nurture the platform as a product, so it is essential to assess the organization's growth stage and technical maturity before making a decision.
In small teams or simple configurations, the overhead of building a platform (management costs) may outweigh the benefits gained. It is crucial to carefully evaluate whether the platform is "worth investing in as a product" from perspectives such as "Is development speed clearly slowing down due to dependence on the operations team?" and "Are there multiple teams that would benefit from standardization?"
Conclusion
Platform engineering enhances organizational competitiveness by reducing cognitive load on developers and improving development speed and productivity. Its true value arises not from the technical implementation itself, but from a shift in organizational mindset that views the platform as a "product" and considers developers as customers.
The platform should be provided as a product, continuously reflecting feedback from developers and designed with the entire service lifecycle in mind. By embracing this change and practicing the principles of platform engineering, organizations can achieve improved developer productivity and strengthened competitiveness overall.
References
- www.oreilly.co.jp - Platform Engineering: Guidelines for Building Successful Platforms and Teams
- www.gartner.com - What is Platform Engineering?
- learn.microsoft.com - What is platform engineering?
- newrelic.com - What is Platform Engineering? A Clear Explanation of the Basics
- platformengineering.org - What is platform engineering?
- payara.fish - What is Platform Engineering – An Overview without Buzzwords
- www.gartner.com - Platform Engineering That Empowers Users and Reduces Risk
- cloud.google.com - Transitioning with Google Cloud's Platform Engineering
- cloud.google.com - Light the way ahead: Platform Engineering, Golden Paths, and the power of self-service
- cloud.google.com - Laying the foundation for a career in platform engineering
- github.com - What is Platform Engineering
- internaldeveloperplatform.org - Internal Developer Platform
- www.thoughtworks.com - The evolution of platform engineering
- www.youtube.com - Maturing your platform engineering initiative - Nicki Watt PlatformCon 2024