Platform Engineering has emerged as one of the most transformative approaches to software delivery in 2025. As organizations scale, the cognitive load on developers increases exponentially—they're expected to manage infrastructure, navigate complex deployment pipelines, handle security requirements, and still ship features quickly. Platform Engineering addresses this by creating Internal Developer Platforms (IDPs) that abstract away complexity while providing developers with self-service capabilities. This guide covers everything you need to know to build a successful platform engineering practice.
What is Platform Engineering?
Platform Engineering is the discipline of designing and building toolchains and workflows that enable self-service capabilities for software engineering organizations. The goal is to reduce cognitive load on developers by providing golden paths—pre-configured, well-supported ways to accomplish common tasks.
"Platform Engineering is about building a product for developers. The platform team's customers are the developers in your organization, and your success is measured by their productivity and satisfaction."
— Camille Fournier, Author of 'The Manager's Path'
Platform Engineering vs DevOps vs SRE
- DevOps: A culture and set of practices that emphasizes collaboration between development and operations teams
- SRE (Site Reliability Engineering): Applies software engineering principles to infrastructure and operations problems
- Platform Engineering: Builds and maintains the platform and tools that enable developers to be self-sufficient
These disciplines are complementary, not competing. Platform Engineering operationalizes DevOps and SRE principles into a product that development teams consume.
The rise of platform engineering reflects a maturation in how organizations think about developer productivity. Early DevOps efforts focused on breaking down silos between development and operations teams. SRE brought software engineering discipline to operations. Platform Engineering takes the next step by recognizing that the tooling, workflows, and infrastructure that enable developers should be treated as a product—with its own roadmap, user research, and success metrics.
The Internal Developer Platform (IDP)
An Internal Developer Platform is the foundation of platform engineering. It's a self-service layer that sits on top of your infrastructure and tooling, providing developers with everything they need to build, deploy, and operate applications without tickets or waiting for other teams.
The concept of an IDP emerged from a common pattern observed across high-performing engineering organizations. Companies like Spotify, Netflix, and Airbnb built internal platforms that dramatically improved their developers' productivity. These platforms abstracted away infrastructure complexity while providing powerful self-service capabilities. What used to require days of coordination and multiple handoffs could now be accomplished in minutes by developers themselves.
A well-designed IDP addresses several key challenges. It reduces cognitive load by providing sensible defaults and hiding unnecessary complexity. It accelerates onboarding by giving new developers everything they need to be productive quickly. It improves consistency by establishing standard patterns for common tasks. And it enables scale by allowing platform teams to support many more developers than would be possible with a ticket-based approach.
Core Capabilities of an IDP
An IDP typically consists of multiple layers, each serving a specific purpose. The developer portal provides discovery, documentation, and self-service interfaces. The orchestration layer coordinates between different tools and systems. Individual capability domains—CI/CD, infrastructure, observability, security—provide the actual functionality developers need.
┌──────────────────────────────────────────────────────────────────────────────┐
│ INTERNAL DEVELOPER PLATFORM (IDP) │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ DEVELOPER PORTAL (UI/CLI) │ │
│ │ • Service Catalog • Documentation • Self-Service Workflows │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ PLATFORM ORCHESTRATION │ │
│ │ • Backstage/Port • Score/Humanitec • Custom Orchestrator │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────┬──────────────┬────────────────┬────────────────────┐ │
│ │ CI/CD │Infrastructure│ Observability │ Security │ │
│ │ │ │ │ │ │
│ │• GitHub │• Terraform │• Prometheus │• Vault │ │
│ │ Actions │• Crossplane │• Grafana │• OPA │ │
│ │• ArgoCD │• Pulumi │• OpenTelemetry │• Trivy │ │
│ │• Tekton │• AWS CDK │• PagerDuty │• Falco │ │
│ └─────────────┴──────────────┴────────────────┴────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ INFRASTRUCTURE LAYER │ │
│ │ Kubernetes • Cloud Providers • Databases • Message Queues │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────┘This architecture diagram illustrates how different IDP components fit together. At the top, the developer portal provides the primary interface—whether through a web UI, CLI, or both. The orchestration layer coordinates between the portal and underlying capabilities, handling workflows that span multiple tools. Individual capability domains provide the actual functionality, each potentially backed by multiple tools that the platform abstracts away.
Building Your IDP: A Step-by-Step Approach
Building an effective IDP is an iterative process. Start with the highest-impact capabilities and expand based on developer feedback. Here's a proven approach.
A common mistake when building an IDP is trying to do everything at once. Organizations attempt to build a complete platform from day one, resulting in a multi-year project that delivers value too slowly. A more effective approach is to identify the highest-pain, highest-value capabilities and deliver them incrementally. Each phase should provide tangible improvements that build trust and momentum for the next phase.
The phased approach also allows for learning and adaptation. Early phases reveal what developers actually need versus what you assumed they needed. Platform teams can adjust their roadmap based on real usage data and feedback. This iterative approach is essential because every organization is different—the specific pain points and priorities vary based on technology stack, team structure, and organizational culture.
Phase 1: Foundation - Developer Portal and Service Catalog
The first phase focuses on visibility and discovery. Before developers can use self-service capabilities, they need to understand what exists, who owns it, and how it fits together. A service catalog provides this foundation by documenting all services, their owners, dependencies, and relevant metadata. The developer portal makes this information accessible and searchable.
Backstage, originally developed by Spotify and now a CNCF project, has emerged as the leading open-source platform for building developer portals. It provides a plugin architecture that allows organizations to integrate their existing tools while presenting a unified interface to developers. The following configuration shows how to set up a Backstage instance with GitHub integration, Kubernetes visibility, and technical documentation:
# Backstage app-config.yaml - Developer Portal Configuration
app:
title: Jishu Labs Developer Portal
baseUrl: https://developer.jishulabs.com
organization:
name: Jishu Labs
backend:
baseUrl: https://developer-api.jishulabs.com
database:
client: pg
connection:
host: ${POSTGRES_HOST}
port: ${POSTGRES_PORT}
user: ${POSTGRES_USER}
password: ${POSTGRES_PASSWORD}
integrations:
github:
- host: github.com
token: ${GITHUB_TOKEN}
kubernetes:
serviceLocatorMethod:
type: multiTenant
clusterLocatorMethods:
- type: config
clusters:
- url: https://prod-cluster.k8s.local
name: production
authProvider: serviceAccount
- url: https://staging-cluster.k8s.local
name: staging
authProvider: serviceAccount
catalog:
import:
entityFilename: catalog-info.yaml
pullRequestBranchName: backstage-integration
rules:
- allow: [Component, System, API, Resource, Location, Template]
locations:
- type: url
target: https://github.com/jishulabs/software-catalog/blob/main/all-components.yaml
- type: url
target: https://github.com/jishulabs/software-templates/blob/main/all-templates.yaml
techdocs:
builder: external
generator:
runIn: docker
publisher:
type: awsS3
awsS3:
bucketName: jishulabs-techdocs
region: us-west-2# catalog-info.yaml - Service Definition
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: order-service
description: Handles order processing and management
annotations:
github.com/project-slug: jishulabs/order-service
backstage.io/techdocs-ref: dir:.
argocd/app-name: order-service
prometheus.io/alert: 'order-service-alerts'
pagerduty.com/integration-key: ${PAGERDUTY_KEY}
tags:
- python
- fastapi
- orders
links:
- url: https://grafana.jishulabs.com/d/order-service
title: Grafana Dashboard
- url: https://order-service.docs.jishulabs.com
title: API Documentation
spec:
type: service
lifecycle: production
owner: team-orders
system: e-commerce-platform
dependsOn:
- resource:orders-database
- component:inventory-service
- component:payment-service
providesApis:
- order-api
consumesApis:
- inventory-api
- payment-api
---
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
name: order-api
description: REST API for order management
spec:
type: openapi
lifecycle: production
owner: team-orders
definition:
$text: https://github.com/jishulabs/order-service/blob/main/openapi.yamlThe service definition above shows how metadata flows through the platform. Annotations link the service to external systems—ArgoCD for deployments, Prometheus for alerts, PagerDuty for incident management. Dependencies are explicitly declared, enabling the platform to visualize service relationships and identify blast radius during incidents. Tags enable filtering and discovery across the catalog.
Phase 2: Self-Service Infrastructure Provisioning
With visibility established, the next phase enables developers to provision resources themselves. Software templates define standardized ways to create new services, complete with CI/CD pipelines, monitoring, and deployment configurations. Instead of following a wiki page with manual steps, developers fill out a form and the platform handles the rest.
The template approach provides several benefits beyond convenience. It ensures consistency—every new service follows organizational standards for structure, testing, security, and deployment. It captures institutional knowledge that would otherwise live in team members' heads or outdated documentation. It enables enforcement of requirements like security scanning or documentation without blocking developers.
The following Backstage software template demonstrates a complete service creation workflow. When a developer fills out the form, the template creates a repository from a skeleton, provisions any required infrastructure, configures GitOps deployment, and registers the service in the catalog—all automatically:
# Backstage Software Template for New Service
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: microservice-template
title: Create Microservice
description: Create a new microservice with CI/CD, monitoring, and Kubernetes deployment
tags:
- recommended
- microservice
spec:
owner: platform-team
type: service
parameters:
- title: Service Information
required:
- name
- description
- owner
properties:
name:
title: Service Name
type: string
description: Unique name for the service
ui:autofocus: true
pattern: '^[a-z0-9-]+$'
description:
title: Description
type: string
description: What does this service do?
owner:
title: Owner
type: string
description: Team that owns this service
ui:field: OwnerPicker
ui:options:
allowedKinds:
- Group
- title: Technical Choices
required:
- language
- database
properties:
language:
title: Programming Language
type: string
enum:
- python-fastapi
- nodejs-express
- go-gin
- java-spring
enumNames:
- Python (FastAPI)
- Node.js (Express)
- Go (Gin)
- Java (Spring Boot)
database:
title: Database
type: string
enum:
- postgresql
- mysql
- mongodb
- none
enumNames:
- PostgreSQL
- MySQL
- MongoDB
- No Database
enableCache:
title: Enable Redis Cache
type: boolean
default: false
- title: Deployment Configuration
properties:
cpuRequest:
title: CPU Request
type: string
default: '100m'
memoryRequest:
title: Memory Request
type: string
default: '256Mi'
replicas:
title: Initial Replicas
type: integer
default: 2
minimum: 1
maximum: 10
steps:
- id: fetch-template
name: Fetch Template
action: fetch:template
input:
url: ./skeleton/${{ parameters.language }}
values:
name: ${{ parameters.name }}
description: ${{ parameters.description }}
owner: ${{ parameters.owner }}
database: ${{ parameters.database }}
enableCache: ${{ parameters.enableCache }}
- id: create-database
name: Provision Database
if: ${{ parameters.database !== 'none' }}
action: http:backstage:request
input:
method: POST
path: /api/infrastructure/database
body:
name: ${{ parameters.name }}-db
type: ${{ parameters.database }}
environment: development
- id: create-repo
name: Create Repository
action: publish:github
input:
repoUrl: github.com?owner=jishulabs&repo=${{ parameters.name }}
description: ${{ parameters.description }}
defaultBranch: main
protectDefaultBranch: true
requireCodeOwnerReviews: true
- id: create-argocd-app
name: Configure GitOps Deployment
action: argocd:create-resources
input:
appName: ${{ parameters.name }}
argoInstance: production
namespace: ${{ parameters.name }}
repoUrl: ${{ steps.create-repo.output.repoContentsUrl }}
path: kubernetes/
- id: register-catalog
name: Register in Catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps.create-repo.output.repoContentsUrl }}
catalogInfoPath: /catalog-info.yaml
output:
links:
- title: Repository
url: ${{ steps.create-repo.output.remoteUrl }}
- title: Open in Catalog
icon: catalog
entityRef: ${{ steps.register-catalog.output.entityRef }}Phase 3: GitOps and Continuous Deployment
With service creation automated, the next phase establishes reliable, auditable deployment pipelines. GitOps has emerged as the standard approach for Kubernetes deployments, using Git as the single source of truth for declarative infrastructure and applications. Changes are made through pull requests, providing code review, audit trails, and easy rollback capabilities.
ArgoCD is the most widely adopted GitOps tool, continuously reconciling the desired state (in Git) with the actual state (in Kubernetes clusters). ApplicationSets extend ArgoCD's capabilities, enabling dynamic generation of applications based on patterns. This is particularly valuable for multi-environment and multi-service deployments where maintaining individual ArgoCD Application resources would be impractical.
The following configuration demonstrates an ApplicationSet that automatically creates ArgoCD Applications for every service in every environment. When a new service is added to the services directory, it automatically gets deployed to development, staging, and production environments with appropriate configurations:
# ArgoCD Application with ApplicationSet for Multi-Environment
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: microservices
namespace: argocd
spec:
generators:
# Generate apps for each service in each environment
- matrix:
generators:
- git:
repoURL: https://github.com/jishulabs/platform-config
revision: HEAD
directories:
- path: 'services/*'
- list:
elements:
- environment: development
cluster: dev-cluster
namespace-suffix: -dev
values-file: values-dev.yaml
- environment: staging
cluster: staging-cluster
namespace-suffix: -staging
values-file: values-staging.yaml
- environment: production
cluster: prod-cluster
namespace-suffix: ''
values-file: values-prod.yaml
template:
metadata:
name: '{{path.basename}}-{{environment}}'
labels:
app: '{{path.basename}}'
environment: '{{environment}}'
spec:
project: default
source:
repoURL: https://github.com/jishulabs/platform-config
targetRevision: HEAD
path: '{{path}}/kubernetes'
helm:
valueFiles:
- '{{values-file}}'
parameters:
- name: environment
value: '{{environment}}'
destination:
server: '{{cluster}}'
namespace: '{{path.basename}}{{namespace-suffix}}'
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- PruneLast=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3mThe ApplicationSet configuration above uses a matrix generator to create applications from two sources: a list of services (discovered from Git directories) and a list of environments. The resulting Cartesian product ensures that every service is deployed to every environment. Automated sync policies enable continuous deployment while retry configurations handle transient failures gracefully.
Infrastructure as Code with Crossplane
Crossplane extends Kubernetes to manage cloud infrastructure using familiar Kubernetes APIs. This enables developers to provision databases, caches, and other resources using the same workflows they use for application deployment.
Traditional infrastructure provisioning requires developers to either use cloud-specific tools and portals or wait for infrastructure teams to provision resources. Crossplane changes this by bringing infrastructure management into Kubernetes. Developers request resources using Kubernetes manifests, and Crossplane controllers communicate with cloud providers to create and manage the actual resources.
The key innovation of Crossplane is composite resources. Platform teams define Composite Resource Definitions (XRDs) that specify what options developers can choose, and Compositions that define how those choices translate into cloud resources. This abstraction allows platform teams to enforce standards and hide complexity while giving developers self-service access to infrastructure.
The following example shows a complete Crossplane setup for database provisioning. The XRD defines what a 'Database' means to developers—the options they can choose like engine type and size. The Composition defines how those choices map to actual AWS RDS instances with proper security groups and networking. Finally, a simple claim is all a developer needs to provision a production-ready database:
# Crossplane Composite Resource Definition (XRD)
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: xdatabases.platform.jishulabs.com
spec:
group: platform.jishulabs.com
names:
kind: XDatabase
plural: xdatabases
claimNames:
kind: Database
plural: databases
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
parameters:
type: object
properties:
engine:
type: string
enum: [postgresql, mysql]
default: postgresql
version:
type: string
default: "15"
size:
type: string
enum: [small, medium, large, xlarge]
default: small
highAvailability:
type: boolean
default: false
required:
- engine
- size
required:
- parameters
---
# Crossplane Composition - Defines how to create the database
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: database-aws
labels:
provider: aws
spec:
compositeTypeRef:
apiVersion: platform.jishulabs.com/v1alpha1
kind: XDatabase
resources:
# RDS Instance
- name: rds-instance
base:
apiVersion: rds.aws.crossplane.io/v1beta1
kind: Instance
spec:
forProvider:
region: us-west-2
dbInstanceClass: db.t3.micro
allocatedStorage: 20
publiclyAccessible: false
skipFinalSnapshot: true
vpcSecurityGroupIds: []
dbSubnetGroupName: platform-db-subnet-group
writeConnectionSecretToRef:
namespace: crossplane-system
patches:
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.engine
toFieldPath: spec.forProvider.engine
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.version
toFieldPath: spec.forProvider.engineVersion
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.size
toFieldPath: spec.forProvider.dbInstanceClass
transforms:
- type: map
map:
small: db.t3.micro
medium: db.t3.small
large: db.t3.medium
xlarge: db.t3.large
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.highAvailability
toFieldPath: spec.forProvider.multiAZ
# Security Group
- name: security-group
base:
apiVersion: ec2.aws.crossplane.io/v1beta1
kind: SecurityGroup
spec:
forProvider:
region: us-west-2
vpcId: vpc-platform
description: Database security group
ingress:
- fromPort: 5432
toPort: 5432
protocol: tcp
cidrBlocks:
- 10.0.0.0/8
---
# Developer creates a database with simple claim
apiVersion: platform.jishulabs.com/v1alpha1
kind: Database
metadata:
name: orders-db
namespace: order-service
spec:
parameters:
engine: postgresql
version: "15"
size: medium
highAvailability: true
writeConnectionSecretToRef:
name: orders-db-credentialsThe power of Crossplane becomes apparent when you consider the developer experience. From the developer's perspective, provisioning a production-ready, highly available PostgreSQL database with proper security groups is just a simple Kubernetes manifest. The platform team has encoded all the organizational standards, security requirements, and cloud-specific configuration into the Composition, ensuring consistency while providing flexibility through well-defined parameters.
Developer Experience: The Golden Path
A golden path is a pre-paved, well-supported way to accomplish common developer tasks. It's not a mandate—developers can deviate when necessary—but following the golden path should be the path of least resistance.
The golden path concept is crucial to platform engineering success. It represents the primary, well-lit road through your development workflow. Following the golden path, developers get full support, comprehensive documentation, and the easiest experience. They can step off the path when needed—advanced use cases often require it—but they do so understanding that support may be limited and more expertise may be required.
A well-designed golden path balances several tensions. It must be opinionated enough to provide value—if everything is optional, developers don't know what to choose. But it must also be flexible enough to accommodate legitimate variations. It should enable fast iteration without sacrificing quality or security. And it should be discoverable—developers should naturally find and follow it without extensive documentation reading.
Example Golden Path: From Idea to Production
The following diagram shows a complete golden path for taking a new service from idea to production. Each step is supported by the platform, with automation handling the tedious work and humans focusing on the parts that require judgment:
┌─────────────────────────────────────────────────────────────────────────────┐
│ GOLDEN PATH: NEW SERVICE TO PRODUCTION │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. CREATE SERVICE (5 minutes) │
│ └─> Developer Portal → Create New Service → Fill form → Submit │
│ • Repository created with CI/CD configured │
│ • Database provisioned (if selected) │
│ • Monitoring dashboards created │
│ • Service registered in catalog │
│ │
│ 2. DEVELOP LOCALLY (continuous) │
│ └─> Clone repo → Dev containers auto-configure environment │
│ • All dependencies run in containers │
│ • Hot reload enabled │
│ • Local observability stack available │
│ │
│ 3. PUSH CHANGES (automatic) │
│ └─> git push → CI runs automatically │
│ • Tests executed │
│ • Security scans performed │
│ • Container image built and pushed │
│ • Preview environment created (for PRs) │
│ │
│ 4. REVIEW & MERGE (team process) │
│ └─> Pull request with automated checks │
│ • Test coverage report │
│ • Security vulnerability scan │
│ • Preview environment link │
│ • Required approvals │
│ │
│ 5. DEPLOY TO STAGING (automatic) │
│ └─> Merge to main → GitOps deploys to staging │
│ • ArgoCD syncs automatically │
│ • Integration tests run │
│ • Performance tests run │
│ │
│ 6. DEPLOY TO PRODUCTION (controlled) │
│ └─> Promote via Developer Portal or CLI │
│ • Canary deployment (10% → 50% → 100%) │
│ • Automatic rollback on errors │
│ • Deployment notification to Slack │
│ │
└─────────────────────────────────────────────────────────────────────────────┘This golden path visualization shows how each step builds on the previous one, with the platform handling the integration between stages. Notice that developers interact with familiar tools—Git, their IDE, pull requests—while the platform provides the automation and integration that makes the workflow seamless. The key is reducing friction while maintaining quality gates.
Platform CLI: Developer Interface
While a web portal is great for discovery and complex workflows, developers often prefer CLI tools for day-to-day tasks. A well-designed platform CLI complements the portal.
Developers spend most of their time in terminals and IDEs, not web browsers. A platform CLI that integrates with their existing workflow can significantly improve adoption and satisfaction. The CLI should provide quick access to common operations—deploying, viewing logs, checking status—without requiring context switches to a web interface.
A good platform CLI follows conventions from tools developers already know. Consistent command structure, helpful error messages, and tab completion make the CLI feel familiar. Integration with the platform API ensures that CLI operations are equivalent to portal operations, giving developers the choice of interface without sacrificing functionality.
The following Go implementation demonstrates a complete platform CLI with commands for service creation, deployment, log streaming, and status checking. The CLI uses the same APIs as the web portal, ensuring consistency across interfaces:
// Platform CLI Implementation (Go)
package main
import (
"fmt"
"os"
"github.com/spf13/cobra"
)
var rootCmd = &cobra.Command{
Use: "platform",
Short: "Jishu Labs Internal Developer Platform CLI",
Long: `Interact with the Internal Developer Platform from your terminal.`,
}
// Create new service
var createServiceCmd = &cobra.Command{
Use: "create service [name]",
Short: "Create a new service",
Args: cobra.ExactArgs(1),
Run: func(cmd *cobra.Command, args []string) {
name := args[0]
language, _ := cmd.Flags().GetString("language")
team, _ := cmd.Flags().GetString("team")
fmt.Printf("Creating service: %s\n", name)
fmt.Printf(" Language: %s\n", language)
fmt.Printf(" Team: %s\n", team)
// Call platform API to create service
service, err := platformClient.CreateService(CreateServiceRequest{
Name: name,
Language: language,
Team: team,
})
if err != nil {
fmt.Printf("Error: %v\n", err)
os.Exit(1)
}
fmt.Printf("\n✅ Service created successfully!\n")
fmt.Printf(" Repository: %s\n", service.RepoURL)
fmt.Printf(" Catalog: %s\n", service.CatalogURL)
},
}
// Deploy command
var deployCmd = &cobra.Command{
Use: "deploy [service] [environment]",
Short: "Deploy a service to an environment",
Args: cobra.ExactArgs(2),
Run: func(cmd *cobra.Command, args []string) {
service := args[0]
environment := args[1]
version, _ := cmd.Flags().GetString("version")
fmt.Printf("Deploying %s to %s...\n", service, environment)
// Trigger deployment
deployment, err := platformClient.Deploy(DeployRequest{
Service: service,
Environment: environment,
Version: version,
})
if err != nil {
fmt.Printf("Error: %v\n", err)
os.Exit(1)
}
// Stream deployment progress
for status := range platformClient.WatchDeployment(deployment.ID) {
fmt.Printf("[%s] %s\n", status.Phase, status.Message)
}
},
}
// Logs command
var logsCmd = &cobra.Command{
Use: "logs [service]",
Short: "Stream logs from a service",
Args: cobra.ExactArgs(1),
Run: func(cmd *cobra.Command, args []string) {
service := args[0]
environment, _ := cmd.Flags().GetString("environment")
follow, _ := cmd.Flags().GetBool("follow")
opts := LogsOptions{
Service: service,
Environment: environment,
Follow: follow,
}
logs := platformClient.GetLogs(opts)
for log := range logs {
fmt.Printf("%s %s %s\n", log.Timestamp, log.Pod, log.Message)
}
},
}
// Status command
var statusCmd = &cobra.Command{
Use: "status [service]",
Short: "Get service status across environments",
Args: cobra.ExactArgs(1),
Run: func(cmd *cobra.Command, args []string) {
service := args[0]
status, err := platformClient.GetServiceStatus(service)
if err != nil {
fmt.Printf("Error: %v\n", err)
os.Exit(1)
}
fmt.Printf("\nService: %s\n", service)
fmt.Printf("Owner: %s\n\n", status.Owner)
fmt.Printf("%-12s %-15s %-10s %-20s\n", "ENVIRONMENT", "VERSION", "REPLICAS", "STATUS")
fmt.Printf("%-12s %-15s %-10s %-20s\n", "-----------", "-------", "--------", "------")
for _, env := range status.Environments {
fmt.Printf("%-12s %-15s %-10s %-20s\n",
env.Name,
env.Version,
fmt.Sprintf("%d/%d", env.ReadyReplicas, env.DesiredReplicas),
env.Status,
)
}
},
}
func init() {
createServiceCmd.Flags().StringP("language", "l", "python-fastapi", "Programming language")
createServiceCmd.Flags().StringP("team", "t", "", "Owning team")
deployCmd.Flags().StringP("version", "v", "latest", "Version to deploy")
logsCmd.Flags().StringP("environment", "e", "production", "Environment")
logsCmd.Flags().BoolP("follow", "f", false, "Follow log output")
rootCmd.AddCommand(createServiceCmd)
rootCmd.AddCommand(deployCmd)
rootCmd.AddCommand(logsCmd)
rootCmd.AddCommand(statusCmd)
}
func main() {
if err := rootCmd.Execute(); err != nil {
fmt.Println(err)
os.Exit(1)
}
}The CLI implementation above uses the popular Cobra library for command structure. Commands like 'platform deploy order-service production' feel natural to developers familiar with kubectl, docker, or git CLIs. The streaming deployment progress provides immediate feedback, while the status command gives a quick overview of service health across environments.
Measuring Platform Success
A successful platform engineering practice requires clear metrics to demonstrate value and guide improvements. Here are the key metrics to track.
Platform engineering is an investment that needs to be justified with measurable outcomes. Unlike product features that directly generate revenue, platform improvements deliver value through developer productivity gains and reduced operational costs. Establishing clear metrics before building allows you to demonstrate ROI and make data-driven decisions about where to invest next.
Metrics should cover both technical outcomes (deployment frequency, lead time) and human outcomes (developer satisfaction, adoption rates). Technical metrics indicate that the platform is working correctly, while human metrics indicate that it's actually being used and valued. A platform with great technical metrics but low adoption is failing its mission.
Developer Productivity Metrics
DORA metrics—Deployment Frequency, Lead Time for Changes, Mean Time to Recovery, and Change Failure Rate—have become the standard for measuring software delivery performance. They're valuable because they're outcome-focused rather than activity-focused. High-performing organizations deploy frequently, recover quickly from failures, and have low change failure rates.
- Time to First Deployment: How long from joining to deploying code to production
- Deployment Frequency: How often teams deploy to production
- Lead Time for Changes: Time from commit to production deployment
- Mean Time to Recovery (MTTR): Average time to recover from failures
- Change Failure Rate: Percentage of deployments causing production issues
Platform Adoption Metrics
- Platform Adoption Rate: Percentage of services using the platform
- Self-Service Ratio: Percentage of tasks completed without platform team help
- Template Usage: How often teams use golden path templates
- Developer Satisfaction (NPS): Regular surveys measuring developer happiness
- Support Ticket Volume: Trend in tickets requiring platform team intervention
# Prometheus Rules for Platform Metrics
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: platform-metrics
spec:
groups:
- name: platform-adoption
rules:
- record: platform:services:total
expr: count(kube_deployment_labels{label_platform_managed="true"})
- record: platform:deployments:daily
expr: sum(increase(argocd_app_sync_total[24h]))
- record: platform:deployment_lead_time:p95
expr: histogram_quantile(0.95,
sum(rate(deployment_lead_time_seconds_bucket[7d])) by (le)
)
- record: platform:change_failure_rate
expr: |
sum(rate(deployment_failures_total[7d]))
/
sum(rate(deployments_total[7d]))
- name: platform-slo
rules:
- record: platform:deployment_success_rate
expr: |
1 - (
sum(rate(deployment_failures_total[30d]))
/
sum(rate(deployments_total[30d]))
)
- alert: PlatformDeploymentSLOBreach
expr: platform:deployment_success_rate < 0.995
for: 1h
labels:
severity: critical
annotations:
summary: Platform deployment success rate below 99.5%
description: Current rate is {{ $value | humanizePercentage }}Building the Platform Team
A successful platform engineering practice requires the right team structure and mindset. Platform teams must think like product teams, treating developers as their customers.
Platform Team Composition
- Platform Engineers: Build and maintain platform infrastructure and tools
- Developer Experience Engineers: Focus on UX, documentation, and onboarding
- Product Manager: Gather developer feedback and prioritize roadmap
- Technical Writer: Create and maintain documentation
- Site Reliability Engineers: Ensure platform reliability and performance
Platform Engineering Success Factors
✓ Treat the platform as a product, not a project
✓ Start small and iterate based on developer feedback
✓ Measure adoption and satisfaction, not just technical metrics
✓ Document everything—the best platform is useless if no one knows how to use it
✓ Maintain golden paths, but allow escape hatches for advanced users
✓ Automate toil, but don't over-automate edge cases
✓ Build with extensibility in mind—you can't predict every need
✓ Celebrate wins and share success stories
Conclusion
Platform Engineering represents a maturation of how organizations approach developer productivity and infrastructure management. By building Internal Developer Platforms that provide self-service capabilities, clear golden paths, and excellent developer experience, organizations can dramatically accelerate software delivery while improving quality and reducing operational burden.
Success requires treating the platform as a product, measuring outcomes, and continuously improving based on developer feedback. Start with the highest-impact capabilities, demonstrate value quickly, and expand iteratively. The investment in platform engineering pays dividends in developer productivity, faster time to market, and reduced operational costs.
Next Steps
Building an Internal Developer Platform is a significant undertaking that requires expertise in infrastructure, developer experience, and product thinking. At Jishu Labs, our platform engineering team has extensive experience designing and implementing IDPs for organizations of all sizes.
Contact us to discuss your platform engineering needs, or explore our Cloud Services and Custom Software Development offerings.
About Emily Rodriguez
Emily Rodriguez is VP of Engineering at Jishu Labs with over 14 years of experience building developer platforms and infrastructure at scale. She has led platform teams at multiple Fortune 500 companies and is passionate about improving developer experience and organizational efficiency.