Cloud & DevOps26 min read4,379 words

Platform Engineering in 2025: Building Internal Developer Platforms That Scale

Learn how to build and operate Internal Developer Platforms (IDPs) that boost developer productivity, reduce cognitive load, and accelerate software delivery. A complete guide to platform engineering practices, tools, and organizational patterns.

ER

Emily Rodriguez

Platform Engineering has emerged as one of the most transformative approaches to software delivery in 2025. As organizations scale, the cognitive load on developers increases exponentially—they're expected to manage infrastructure, navigate complex deployment pipelines, handle security requirements, and still ship features quickly. Platform Engineering addresses this by creating Internal Developer Platforms (IDPs) that abstract away complexity while providing developers with self-service capabilities. This guide covers everything you need to know to build a successful platform engineering practice.

What is Platform Engineering?

Platform Engineering is the discipline of designing and building toolchains and workflows that enable self-service capabilities for software engineering organizations. The goal is to reduce cognitive load on developers by providing golden paths—pre-configured, well-supported ways to accomplish common tasks.

"Platform Engineering is about building a product for developers. The platform team's customers are the developers in your organization, and your success is measured by their productivity and satisfaction."

Camille Fournier, Author of 'The Manager's Path'

Platform Engineering vs DevOps vs SRE

  • DevOps: A culture and set of practices that emphasizes collaboration between development and operations teams
  • SRE (Site Reliability Engineering): Applies software engineering principles to infrastructure and operations problems
  • Platform Engineering: Builds and maintains the platform and tools that enable developers to be self-sufficient

These disciplines are complementary, not competing. Platform Engineering operationalizes DevOps and SRE principles into a product that development teams consume.

The rise of platform engineering reflects a maturation in how organizations think about developer productivity. Early DevOps efforts focused on breaking down silos between development and operations teams. SRE brought software engineering discipline to operations. Platform Engineering takes the next step by recognizing that the tooling, workflows, and infrastructure that enable developers should be treated as a product—with its own roadmap, user research, and success metrics.

The Internal Developer Platform (IDP)

An Internal Developer Platform is the foundation of platform engineering. It's a self-service layer that sits on top of your infrastructure and tooling, providing developers with everything they need to build, deploy, and operate applications without tickets or waiting for other teams.

The concept of an IDP emerged from a common pattern observed across high-performing engineering organizations. Companies like Spotify, Netflix, and Airbnb built internal platforms that dramatically improved their developers' productivity. These platforms abstracted away infrastructure complexity while providing powerful self-service capabilities. What used to require days of coordination and multiple handoffs could now be accomplished in minutes by developers themselves.

A well-designed IDP addresses several key challenges. It reduces cognitive load by providing sensible defaults and hiding unnecessary complexity. It accelerates onboarding by giving new developers everything they need to be productive quickly. It improves consistency by establishing standard patterns for common tasks. And it enables scale by allowing platform teams to support many more developers than would be possible with a ticket-based approach.

Core Capabilities of an IDP

An IDP typically consists of multiple layers, each serving a specific purpose. The developer portal provides discovery, documentation, and self-service interfaces. The orchestration layer coordinates between different tools and systems. Individual capability domains—CI/CD, infrastructure, observability, security—provide the actual functionality developers need.

text
┌──────────────────────────────────────────────────────────────────────────────┐
│                    INTERNAL DEVELOPER PLATFORM (IDP)                         │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                      DEVELOPER PORTAL (UI/CLI)                          │ │
│  │  • Service Catalog    • Documentation    • Self-Service Workflows      │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
│                                    │                                         │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                      PLATFORM ORCHESTRATION                             │ │
│  │  • Backstage/Port    • Score/Humanitec    • Custom Orchestrator        │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
│                                    │                                         │
│  ┌─────────────┬──────────────┬────────────────┬────────────────────┐       │
│  │   CI/CD     │Infrastructure│  Observability │     Security       │       │
│  │             │              │                │                    │       │
│  │• GitHub     │• Terraform   │• Prometheus    │• Vault             │       │
│  │  Actions    │• Crossplane  │• Grafana       │• OPA               │       │
│  │• ArgoCD     │• Pulumi      │• OpenTelemetry │• Trivy             │       │
│  │• Tekton     │• AWS CDK     │• PagerDuty     │• Falco             │       │
│  └─────────────┴──────────────┴────────────────┴────────────────────┘       │
│                                    │                                         │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                    INFRASTRUCTURE LAYER                                 │ │
│  │  Kubernetes  •  Cloud Providers  •  Databases  •  Message Queues       │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

This architecture diagram illustrates how different IDP components fit together. At the top, the developer portal provides the primary interface—whether through a web UI, CLI, or both. The orchestration layer coordinates between the portal and underlying capabilities, handling workflows that span multiple tools. Individual capability domains provide the actual functionality, each potentially backed by multiple tools that the platform abstracts away.

Building Your IDP: A Step-by-Step Approach

Building an effective IDP is an iterative process. Start with the highest-impact capabilities and expand based on developer feedback. Here's a proven approach.

A common mistake when building an IDP is trying to do everything at once. Organizations attempt to build a complete platform from day one, resulting in a multi-year project that delivers value too slowly. A more effective approach is to identify the highest-pain, highest-value capabilities and deliver them incrementally. Each phase should provide tangible improvements that build trust and momentum for the next phase.

The phased approach also allows for learning and adaptation. Early phases reveal what developers actually need versus what you assumed they needed. Platform teams can adjust their roadmap based on real usage data and feedback. This iterative approach is essential because every organization is different—the specific pain points and priorities vary based on technology stack, team structure, and organizational culture.

Phase 1: Foundation - Developer Portal and Service Catalog

The first phase focuses on visibility and discovery. Before developers can use self-service capabilities, they need to understand what exists, who owns it, and how it fits together. A service catalog provides this foundation by documenting all services, their owners, dependencies, and relevant metadata. The developer portal makes this information accessible and searchable.

Backstage, originally developed by Spotify and now a CNCF project, has emerged as the leading open-source platform for building developer portals. It provides a plugin architecture that allows organizations to integrate their existing tools while presenting a unified interface to developers. The following configuration shows how to set up a Backstage instance with GitHub integration, Kubernetes visibility, and technical documentation:

yaml
# Backstage app-config.yaml - Developer Portal Configuration
app:
  title: Jishu Labs Developer Portal
  baseUrl: https://developer.jishulabs.com

organization:
  name: Jishu Labs

backend:
  baseUrl: https://developer-api.jishulabs.com
  database:
    client: pg
    connection:
      host: ${POSTGRES_HOST}
      port: ${POSTGRES_PORT}
      user: ${POSTGRES_USER}
      password: ${POSTGRES_PASSWORD}

integrations:
  github:
    - host: github.com
      token: ${GITHUB_TOKEN}
  
  kubernetes:
    serviceLocatorMethod:
      type: multiTenant
    clusterLocatorMethods:
      - type: config
        clusters:
          - url: https://prod-cluster.k8s.local
            name: production
            authProvider: serviceAccount
          - url: https://staging-cluster.k8s.local
            name: staging
            authProvider: serviceAccount

catalog:
  import:
    entityFilename: catalog-info.yaml
    pullRequestBranchName: backstage-integration
  rules:
    - allow: [Component, System, API, Resource, Location, Template]
  locations:
    - type: url
      target: https://github.com/jishulabs/software-catalog/blob/main/all-components.yaml
    - type: url  
      target: https://github.com/jishulabs/software-templates/blob/main/all-templates.yaml

techdocs:
  builder: external
  generator:
    runIn: docker
  publisher:
    type: awsS3
    awsS3:
      bucketName: jishulabs-techdocs
      region: us-west-2
yaml
# catalog-info.yaml - Service Definition
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: order-service
  description: Handles order processing and management
  annotations:
    github.com/project-slug: jishulabs/order-service
    backstage.io/techdocs-ref: dir:.
    argocd/app-name: order-service
    prometheus.io/alert: 'order-service-alerts'
    pagerduty.com/integration-key: ${PAGERDUTY_KEY}
  tags:
    - python
    - fastapi
    - orders
  links:
    - url: https://grafana.jishulabs.com/d/order-service
      title: Grafana Dashboard
    - url: https://order-service.docs.jishulabs.com
      title: API Documentation
spec:
  type: service
  lifecycle: production
  owner: team-orders
  system: e-commerce-platform
  dependsOn:
    - resource:orders-database
    - component:inventory-service
    - component:payment-service
  providesApis:
    - order-api
  consumesApis:
    - inventory-api
    - payment-api
---
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: order-api
  description: REST API for order management
spec:
  type: openapi
  lifecycle: production
  owner: team-orders
  definition:
    $text: https://github.com/jishulabs/order-service/blob/main/openapi.yaml

The service definition above shows how metadata flows through the platform. Annotations link the service to external systems—ArgoCD for deployments, Prometheus for alerts, PagerDuty for incident management. Dependencies are explicitly declared, enabling the platform to visualize service relationships and identify blast radius during incidents. Tags enable filtering and discovery across the catalog.

Phase 2: Self-Service Infrastructure Provisioning

With visibility established, the next phase enables developers to provision resources themselves. Software templates define standardized ways to create new services, complete with CI/CD pipelines, monitoring, and deployment configurations. Instead of following a wiki page with manual steps, developers fill out a form and the platform handles the rest.

The template approach provides several benefits beyond convenience. It ensures consistency—every new service follows organizational standards for structure, testing, security, and deployment. It captures institutional knowledge that would otherwise live in team members' heads or outdated documentation. It enables enforcement of requirements like security scanning or documentation without blocking developers.

The following Backstage software template demonstrates a complete service creation workflow. When a developer fills out the form, the template creates a repository from a skeleton, provisions any required infrastructure, configures GitOps deployment, and registers the service in the catalog—all automatically:

yaml
# Backstage Software Template for New Service
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: microservice-template
  title: Create Microservice
  description: Create a new microservice with CI/CD, monitoring, and Kubernetes deployment
  tags:
    - recommended
    - microservice
spec:
  owner: platform-team
  type: service
  
  parameters:
    - title: Service Information
      required:
        - name
        - description
        - owner
      properties:
        name:
          title: Service Name
          type: string
          description: Unique name for the service
          ui:autofocus: true
          pattern: '^[a-z0-9-]+$'
        description:
          title: Description
          type: string
          description: What does this service do?
        owner:
          title: Owner
          type: string
          description: Team that owns this service
          ui:field: OwnerPicker
          ui:options:
            allowedKinds:
              - Group
    
    - title: Technical Choices
      required:
        - language
        - database
      properties:
        language:
          title: Programming Language
          type: string
          enum:
            - python-fastapi
            - nodejs-express
            - go-gin
            - java-spring
          enumNames:
            - Python (FastAPI)
            - Node.js (Express)
            - Go (Gin)
            - Java (Spring Boot)
        database:
          title: Database
          type: string
          enum:
            - postgresql
            - mysql
            - mongodb
            - none
          enumNames:
            - PostgreSQL
            - MySQL
            - MongoDB
            - No Database
        enableCache:
          title: Enable Redis Cache
          type: boolean
          default: false
    
    - title: Deployment Configuration
      properties:
        cpuRequest:
          title: CPU Request
          type: string
          default: '100m'
        memoryRequest:
          title: Memory Request
          type: string
          default: '256Mi'
        replicas:
          title: Initial Replicas
          type: integer
          default: 2
          minimum: 1
          maximum: 10
  
  steps:
    - id: fetch-template
      name: Fetch Template
      action: fetch:template
      input:
        url: ./skeleton/${{ parameters.language }}
        values:
          name: ${{ parameters.name }}
          description: ${{ parameters.description }}
          owner: ${{ parameters.owner }}
          database: ${{ parameters.database }}
          enableCache: ${{ parameters.enableCache }}
    
    - id: create-database
      name: Provision Database
      if: ${{ parameters.database !== 'none' }}
      action: http:backstage:request
      input:
        method: POST
        path: /api/infrastructure/database
        body:
          name: ${{ parameters.name }}-db
          type: ${{ parameters.database }}
          environment: development
    
    - id: create-repo
      name: Create Repository
      action: publish:github
      input:
        repoUrl: github.com?owner=jishulabs&repo=${{ parameters.name }}
        description: ${{ parameters.description }}
        defaultBranch: main
        protectDefaultBranch: true
        requireCodeOwnerReviews: true
    
    - id: create-argocd-app
      name: Configure GitOps Deployment
      action: argocd:create-resources
      input:
        appName: ${{ parameters.name }}
        argoInstance: production
        namespace: ${{ parameters.name }}
        repoUrl: ${{ steps.create-repo.output.repoContentsUrl }}
        path: kubernetes/
    
    - id: register-catalog
      name: Register in Catalog
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps.create-repo.output.repoContentsUrl }}
        catalogInfoPath: /catalog-info.yaml
  
  output:
    links:
      - title: Repository
        url: ${{ steps.create-repo.output.remoteUrl }}
      - title: Open in Catalog
        icon: catalog
        entityRef: ${{ steps.register-catalog.output.entityRef }}

Phase 3: GitOps and Continuous Deployment

With service creation automated, the next phase establishes reliable, auditable deployment pipelines. GitOps has emerged as the standard approach for Kubernetes deployments, using Git as the single source of truth for declarative infrastructure and applications. Changes are made through pull requests, providing code review, audit trails, and easy rollback capabilities.

ArgoCD is the most widely adopted GitOps tool, continuously reconciling the desired state (in Git) with the actual state (in Kubernetes clusters). ApplicationSets extend ArgoCD's capabilities, enabling dynamic generation of applications based on patterns. This is particularly valuable for multi-environment and multi-service deployments where maintaining individual ArgoCD Application resources would be impractical.

The following configuration demonstrates an ApplicationSet that automatically creates ArgoCD Applications for every service in every environment. When a new service is added to the services directory, it automatically gets deployed to development, staging, and production environments with appropriate configurations:

yaml
# ArgoCD Application with ApplicationSet for Multi-Environment
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: microservices
  namespace: argocd
spec:
  generators:
    # Generate apps for each service in each environment
    - matrix:
        generators:
          - git:
              repoURL: https://github.com/jishulabs/platform-config
              revision: HEAD
              directories:
                - path: 'services/*'
          - list:
              elements:
                - environment: development
                  cluster: dev-cluster
                  namespace-suffix: -dev
                  values-file: values-dev.yaml
                - environment: staging
                  cluster: staging-cluster
                  namespace-suffix: -staging
                  values-file: values-staging.yaml
                - environment: production
                  cluster: prod-cluster
                  namespace-suffix: ''
                  values-file: values-prod.yaml
  
  template:
    metadata:
      name: '{{path.basename}}-{{environment}}'
      labels:
        app: '{{path.basename}}'
        environment: '{{environment}}'
    spec:
      project: default
      source:
        repoURL: https://github.com/jishulabs/platform-config
        targetRevision: HEAD
        path: '{{path}}/kubernetes'
        helm:
          valueFiles:
            - '{{values-file}}'
          parameters:
            - name: environment
              value: '{{environment}}'
      destination:
        server: '{{cluster}}'
        namespace: '{{path.basename}}{{namespace-suffix}}'
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true
          - PruneLast=true
        retry:
          limit: 5
          backoff:
            duration: 5s
            factor: 2
            maxDuration: 3m

The ApplicationSet configuration above uses a matrix generator to create applications from two sources: a list of services (discovered from Git directories) and a list of environments. The resulting Cartesian product ensures that every service is deployed to every environment. Automated sync policies enable continuous deployment while retry configurations handle transient failures gracefully.

Infrastructure as Code with Crossplane

Crossplane extends Kubernetes to manage cloud infrastructure using familiar Kubernetes APIs. This enables developers to provision databases, caches, and other resources using the same workflows they use for application deployment.

Traditional infrastructure provisioning requires developers to either use cloud-specific tools and portals or wait for infrastructure teams to provision resources. Crossplane changes this by bringing infrastructure management into Kubernetes. Developers request resources using Kubernetes manifests, and Crossplane controllers communicate with cloud providers to create and manage the actual resources.

The key innovation of Crossplane is composite resources. Platform teams define Composite Resource Definitions (XRDs) that specify what options developers can choose, and Compositions that define how those choices translate into cloud resources. This abstraction allows platform teams to enforce standards and hide complexity while giving developers self-service access to infrastructure.

The following example shows a complete Crossplane setup for database provisioning. The XRD defines what a 'Database' means to developers—the options they can choose like engine type and size. The Composition defines how those choices map to actual AWS RDS instances with proper security groups and networking. Finally, a simple claim is all a developer needs to provision a production-ready database:

yaml
# Crossplane Composite Resource Definition (XRD)
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
  name: xdatabases.platform.jishulabs.com
spec:
  group: platform.jishulabs.com
  names:
    kind: XDatabase
    plural: xdatabases
  claimNames:
    kind: Database
    plural: databases
  
  versions:
    - name: v1alpha1
      served: true
      referenceable: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                parameters:
                  type: object
                  properties:
                    engine:
                      type: string
                      enum: [postgresql, mysql]
                      default: postgresql
                    version:
                      type: string
                      default: "15"
                    size:
                      type: string
                      enum: [small, medium, large, xlarge]
                      default: small
                    highAvailability:
                      type: boolean
                      default: false
                  required:
                    - engine
                    - size
              required:
                - parameters
---
# Crossplane Composition - Defines how to create the database
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: database-aws
  labels:
    provider: aws
spec:
  compositeTypeRef:
    apiVersion: platform.jishulabs.com/v1alpha1
    kind: XDatabase
  
  resources:
    # RDS Instance
    - name: rds-instance
      base:
        apiVersion: rds.aws.crossplane.io/v1beta1
        kind: Instance
        spec:
          forProvider:
            region: us-west-2
            dbInstanceClass: db.t3.micro
            allocatedStorage: 20
            publiclyAccessible: false
            skipFinalSnapshot: true
            vpcSecurityGroupIds: []
            dbSubnetGroupName: platform-db-subnet-group
          writeConnectionSecretToRef:
            namespace: crossplane-system
      patches:
        - type: FromCompositeFieldPath
          fromFieldPath: spec.parameters.engine
          toFieldPath: spec.forProvider.engine
        - type: FromCompositeFieldPath
          fromFieldPath: spec.parameters.version
          toFieldPath: spec.forProvider.engineVersion
        - type: FromCompositeFieldPath
          fromFieldPath: spec.parameters.size
          toFieldPath: spec.forProvider.dbInstanceClass
          transforms:
            - type: map
              map:
                small: db.t3.micro
                medium: db.t3.small
                large: db.t3.medium
                xlarge: db.t3.large
        - type: FromCompositeFieldPath
          fromFieldPath: spec.parameters.highAvailability
          toFieldPath: spec.forProvider.multiAZ
    
    # Security Group
    - name: security-group
      base:
        apiVersion: ec2.aws.crossplane.io/v1beta1
        kind: SecurityGroup
        spec:
          forProvider:
            region: us-west-2
            vpcId: vpc-platform
            description: Database security group
            ingress:
              - fromPort: 5432
                toPort: 5432
                protocol: tcp
                cidrBlocks:
                  - 10.0.0.0/8
---
# Developer creates a database with simple claim
apiVersion: platform.jishulabs.com/v1alpha1
kind: Database
metadata:
  name: orders-db
  namespace: order-service
spec:
  parameters:
    engine: postgresql
    version: "15"
    size: medium
    highAvailability: true
  writeConnectionSecretToRef:
    name: orders-db-credentials

The power of Crossplane becomes apparent when you consider the developer experience. From the developer's perspective, provisioning a production-ready, highly available PostgreSQL database with proper security groups is just a simple Kubernetes manifest. The platform team has encoded all the organizational standards, security requirements, and cloud-specific configuration into the Composition, ensuring consistency while providing flexibility through well-defined parameters.

Developer Experience: The Golden Path

A golden path is a pre-paved, well-supported way to accomplish common developer tasks. It's not a mandate—developers can deviate when necessary—but following the golden path should be the path of least resistance.

The golden path concept is crucial to platform engineering success. It represents the primary, well-lit road through your development workflow. Following the golden path, developers get full support, comprehensive documentation, and the easiest experience. They can step off the path when needed—advanced use cases often require it—but they do so understanding that support may be limited and more expertise may be required.

A well-designed golden path balances several tensions. It must be opinionated enough to provide value—if everything is optional, developers don't know what to choose. But it must also be flexible enough to accommodate legitimate variations. It should enable fast iteration without sacrificing quality or security. And it should be discoverable—developers should naturally find and follow it without extensive documentation reading.

Example Golden Path: From Idea to Production

The following diagram shows a complete golden path for taking a new service from idea to production. Each step is supported by the platform, with automation handling the tedious work and humans focusing on the parts that require judgment:

text
┌─────────────────────────────────────────────────────────────────────────────┐
│                    GOLDEN PATH: NEW SERVICE TO PRODUCTION                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  1. CREATE SERVICE (5 minutes)                                              │
│     └─> Developer Portal → Create New Service → Fill form → Submit         │
│         • Repository created with CI/CD configured                          │
│         • Database provisioned (if selected)                                │
│         • Monitoring dashboards created                                     │
│         • Service registered in catalog                                     │
│                                                                             │
│  2. DEVELOP LOCALLY (continuous)                                            │
│     └─> Clone repo → Dev containers auto-configure environment              │
│         • All dependencies run in containers                                │
│         • Hot reload enabled                                                │
│         • Local observability stack available                               │
│                                                                             │
│  3. PUSH CHANGES (automatic)                                                │
│     └─> git push → CI runs automatically                                    │
│         • Tests executed                                                    │
│         • Security scans performed                                          │
│         • Container image built and pushed                                  │
│         • Preview environment created (for PRs)                             │
│                                                                             │
│  4. REVIEW & MERGE (team process)                                           │
│     └─> Pull request with automated checks                                  │
│         • Test coverage report                                              │
│         • Security vulnerability scan                                       │
│         • Preview environment link                                          │
│         • Required approvals                                                │
│                                                                             │
│  5. DEPLOY TO STAGING (automatic)                                           │
│     └─> Merge to main → GitOps deploys to staging                          │
│         • ArgoCD syncs automatically                                        │
│         • Integration tests run                                             │
│         • Performance tests run                                             │
│                                                                             │
│  6. DEPLOY TO PRODUCTION (controlled)                                       │
│     └─> Promote via Developer Portal or CLI                                 │
│         • Canary deployment (10% → 50% → 100%)                              │
│         • Automatic rollback on errors                                      │
│         • Deployment notification to Slack                                  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

This golden path visualization shows how each step builds on the previous one, with the platform handling the integration between stages. Notice that developers interact with familiar tools—Git, their IDE, pull requests—while the platform provides the automation and integration that makes the workflow seamless. The key is reducing friction while maintaining quality gates.

Platform CLI: Developer Interface

While a web portal is great for discovery and complex workflows, developers often prefer CLI tools for day-to-day tasks. A well-designed platform CLI complements the portal.

Developers spend most of their time in terminals and IDEs, not web browsers. A platform CLI that integrates with their existing workflow can significantly improve adoption and satisfaction. The CLI should provide quick access to common operations—deploying, viewing logs, checking status—without requiring context switches to a web interface.

A good platform CLI follows conventions from tools developers already know. Consistent command structure, helpful error messages, and tab completion make the CLI feel familiar. Integration with the platform API ensures that CLI operations are equivalent to portal operations, giving developers the choice of interface without sacrificing functionality.

The following Go implementation demonstrates a complete platform CLI with commands for service creation, deployment, log streaming, and status checking. The CLI uses the same APIs as the web portal, ensuring consistency across interfaces:

go
// Platform CLI Implementation (Go)
package main

import (
	"fmt"
	"os"

	"github.com/spf13/cobra"
)

var rootCmd = &cobra.Command{
	Use:   "platform",
	Short: "Jishu Labs Internal Developer Platform CLI",
	Long:  `Interact with the Internal Developer Platform from your terminal.`,
}

// Create new service
var createServiceCmd = &cobra.Command{
	Use:   "create service [name]",
	Short: "Create a new service",
	Args:  cobra.ExactArgs(1),
	Run: func(cmd *cobra.Command, args []string) {
		name := args[0]
		language, _ := cmd.Flags().GetString("language")
		team, _ := cmd.Flags().GetString("team")
		
		fmt.Printf("Creating service: %s\n", name)
		fmt.Printf("  Language: %s\n", language)
		fmt.Printf("  Team: %s\n", team)
		
		// Call platform API to create service
		service, err := platformClient.CreateService(CreateServiceRequest{
			Name:     name,
			Language: language,
			Team:     team,
		})
		
		if err != nil {
			fmt.Printf("Error: %v\n", err)
			os.Exit(1)
		}
		
		fmt.Printf("\n✅ Service created successfully!\n")
		fmt.Printf("   Repository: %s\n", service.RepoURL)
		fmt.Printf("   Catalog: %s\n", service.CatalogURL)
	},
}

// Deploy command
var deployCmd = &cobra.Command{
	Use:   "deploy [service] [environment]",
	Short: "Deploy a service to an environment",
	Args:  cobra.ExactArgs(2),
	Run: func(cmd *cobra.Command, args []string) {
		service := args[0]
		environment := args[1]
		version, _ := cmd.Flags().GetString("version")
		
		fmt.Printf("Deploying %s to %s...\n", service, environment)
		
		// Trigger deployment
		deployment, err := platformClient.Deploy(DeployRequest{
			Service:     service,
			Environment: environment,
			Version:     version,
		})
		
		if err != nil {
			fmt.Printf("Error: %v\n", err)
			os.Exit(1)
		}
		
		// Stream deployment progress
		for status := range platformClient.WatchDeployment(deployment.ID) {
			fmt.Printf("[%s] %s\n", status.Phase, status.Message)
		}
	},
}

// Logs command
var logsCmd = &cobra.Command{
	Use:   "logs [service]",
	Short: "Stream logs from a service",
	Args:  cobra.ExactArgs(1),
	Run: func(cmd *cobra.Command, args []string) {
		service := args[0]
		environment, _ := cmd.Flags().GetString("environment")
		follow, _ := cmd.Flags().GetBool("follow")
		
		opts := LogsOptions{
			Service:     service,
			Environment: environment,
			Follow:      follow,
		}
		
		logs := platformClient.GetLogs(opts)
		for log := range logs {
			fmt.Printf("%s %s %s\n", log.Timestamp, log.Pod, log.Message)
		}
	},
}

// Status command
var statusCmd = &cobra.Command{
	Use:   "status [service]",
	Short: "Get service status across environments",
	Args:  cobra.ExactArgs(1),
	Run: func(cmd *cobra.Command, args []string) {
		service := args[0]
		
		status, err := platformClient.GetServiceStatus(service)
		if err != nil {
			fmt.Printf("Error: %v\n", err)
			os.Exit(1)
		}
		
		fmt.Printf("\nService: %s\n", service)
		fmt.Printf("Owner: %s\n\n", status.Owner)
		
		fmt.Printf("%-12s %-15s %-10s %-20s\n", "ENVIRONMENT", "VERSION", "REPLICAS", "STATUS")
		fmt.Printf("%-12s %-15s %-10s %-20s\n", "-----------", "-------", "--------", "------")
		
		for _, env := range status.Environments {
			fmt.Printf("%-12s %-15s %-10s %-20s\n",
				env.Name,
				env.Version,
				fmt.Sprintf("%d/%d", env.ReadyReplicas, env.DesiredReplicas),
				env.Status,
			)
		}
	},
}

func init() {
	createServiceCmd.Flags().StringP("language", "l", "python-fastapi", "Programming language")
	createServiceCmd.Flags().StringP("team", "t", "", "Owning team")
	
	deployCmd.Flags().StringP("version", "v", "latest", "Version to deploy")
	
	logsCmd.Flags().StringP("environment", "e", "production", "Environment")
	logsCmd.Flags().BoolP("follow", "f", false, "Follow log output")
	
	rootCmd.AddCommand(createServiceCmd)
	rootCmd.AddCommand(deployCmd)
	rootCmd.AddCommand(logsCmd)
	rootCmd.AddCommand(statusCmd)
}

func main() {
	if err := rootCmd.Execute(); err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
}

The CLI implementation above uses the popular Cobra library for command structure. Commands like 'platform deploy order-service production' feel natural to developers familiar with kubectl, docker, or git CLIs. The streaming deployment progress provides immediate feedback, while the status command gives a quick overview of service health across environments.

Measuring Platform Success

A successful platform engineering practice requires clear metrics to demonstrate value and guide improvements. Here are the key metrics to track.

Platform engineering is an investment that needs to be justified with measurable outcomes. Unlike product features that directly generate revenue, platform improvements deliver value through developer productivity gains and reduced operational costs. Establishing clear metrics before building allows you to demonstrate ROI and make data-driven decisions about where to invest next.

Metrics should cover both technical outcomes (deployment frequency, lead time) and human outcomes (developer satisfaction, adoption rates). Technical metrics indicate that the platform is working correctly, while human metrics indicate that it's actually being used and valued. A platform with great technical metrics but low adoption is failing its mission.

Developer Productivity Metrics

DORA metrics—Deployment Frequency, Lead Time for Changes, Mean Time to Recovery, and Change Failure Rate—have become the standard for measuring software delivery performance. They're valuable because they're outcome-focused rather than activity-focused. High-performing organizations deploy frequently, recover quickly from failures, and have low change failure rates.

  • Time to First Deployment: How long from joining to deploying code to production
  • Deployment Frequency: How often teams deploy to production
  • Lead Time for Changes: Time from commit to production deployment
  • Mean Time to Recovery (MTTR): Average time to recover from failures
  • Change Failure Rate: Percentage of deployments causing production issues

Platform Adoption Metrics

  • Platform Adoption Rate: Percentage of services using the platform
  • Self-Service Ratio: Percentage of tasks completed without platform team help
  • Template Usage: How often teams use golden path templates
  • Developer Satisfaction (NPS): Regular surveys measuring developer happiness
  • Support Ticket Volume: Trend in tickets requiring platform team intervention
yaml
# Prometheus Rules for Platform Metrics
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: platform-metrics
spec:
  groups:
    - name: platform-adoption
      rules:
        - record: platform:services:total
          expr: count(kube_deployment_labels{label_platform_managed="true"})
        
        - record: platform:deployments:daily
          expr: sum(increase(argocd_app_sync_total[24h]))
        
        - record: platform:deployment_lead_time:p95
          expr: histogram_quantile(0.95,
            sum(rate(deployment_lead_time_seconds_bucket[7d])) by (le)
          )
        
        - record: platform:change_failure_rate
          expr: |
            sum(rate(deployment_failures_total[7d]))
            /
            sum(rate(deployments_total[7d]))
    
    - name: platform-slo
      rules:
        - record: platform:deployment_success_rate
          expr: |
            1 - (
              sum(rate(deployment_failures_total[30d]))
              /
              sum(rate(deployments_total[30d]))
            )
        
        - alert: PlatformDeploymentSLOBreach
          expr: platform:deployment_success_rate < 0.995
          for: 1h
          labels:
            severity: critical
          annotations:
            summary: Platform deployment success rate below 99.5%
            description: Current rate is {{ $value | humanizePercentage }}

Building the Platform Team

A successful platform engineering practice requires the right team structure and mindset. Platform teams must think like product teams, treating developers as their customers.

Platform Team Composition

  • Platform Engineers: Build and maintain platform infrastructure and tools
  • Developer Experience Engineers: Focus on UX, documentation, and onboarding
  • Product Manager: Gather developer feedback and prioritize roadmap
  • Technical Writer: Create and maintain documentation
  • Site Reliability Engineers: Ensure platform reliability and performance

Platform Engineering Success Factors

✓ Treat the platform as a product, not a project

✓ Start small and iterate based on developer feedback

✓ Measure adoption and satisfaction, not just technical metrics

✓ Document everything—the best platform is useless if no one knows how to use it

✓ Maintain golden paths, but allow escape hatches for advanced users

✓ Automate toil, but don't over-automate edge cases

✓ Build with extensibility in mind—you can't predict every need

✓ Celebrate wins and share success stories

Conclusion

Platform Engineering represents a maturation of how organizations approach developer productivity and infrastructure management. By building Internal Developer Platforms that provide self-service capabilities, clear golden paths, and excellent developer experience, organizations can dramatically accelerate software delivery while improving quality and reducing operational burden.

Success requires treating the platform as a product, measuring outcomes, and continuously improving based on developer feedback. Start with the highest-impact capabilities, demonstrate value quickly, and expand iteratively. The investment in platform engineering pays dividends in developer productivity, faster time to market, and reduced operational costs.

Next Steps

Building an Internal Developer Platform is a significant undertaking that requires expertise in infrastructure, developer experience, and product thinking. At Jishu Labs, our platform engineering team has extensive experience designing and implementing IDPs for organizations of all sizes.

Contact us to discuss your platform engineering needs, or explore our Cloud Services and Custom Software Development offerings.

ER

About Emily Rodriguez

Emily Rodriguez is VP of Engineering at Jishu Labs with over 14 years of experience building developer platforms and infrastructure at scale. She has led platform teams at multiple Fortune 500 companies and is passionate about improving developer experience and organizational efficiency.

Related Articles

Ready to Build Your Next Project?

Let's discuss how our expert team can help bring your vision to life.

Top-Rated
Software Development
Company

Ready to Get Started?

Get consistent results. Collaborate in real-time.
Build Intelligent Apps. Work with Jishu Labs.

SCHEDULE MY CALL