Discover insights, tutorials, and thoughts from our community. Stay updated with the latest trends in platform engineering, DevOps, and software development.
Learn about Crossplane's deletion policies and how improper handling can lead to orphaned cloud assets. This guide covers Kubernetes Admission Protection, Crossplane Delete Policies, Usages for Dependency Ordering, and more to help you manage your infrastructure safely.
The journey from Backstage to Next.js in building CECG's Core Platform dashboard, exploring technology choices, performance optimization, and the challenges of creating a complex platform interface.
A deep dive into mise and nix-shell, two modern tools for managing development environments. This post compares their philosophies, features, and practical applications to help you choose the right solution for ensuring consistency and reproducibility in your projects.
Learn how to leverage ArgoCD in multi-tenanted platforms while maintaining tenant autonomy and isolation. This post covers best practices for configuring AppProjects, managing continuous delivery flows, and automating tenant namespace updates to avoid common pitfalls.
Introducing semver-utils, an open-source tool for streamlined semantic versioning from automated pipelines. This post covers how to fetch and create semantic version tags in Git repositories, manage multiple version sets with prefixes, and integrate with CI/CD workflows.
Learn how to serve large language models on multiple Kubernetes nodes using sig LWS and vLLM. This guide covers the challenges of multi-node inference, the architecture of LeaderWorkerSet, and practical tips for deployment, observability, and efficient model loading.
Explore how AIOps and Grafana Cloud are transforming IT operations from reactive to proactive. This post details our journey through forecasting CPU usage and enhancing incident investigation, sharing key lessons and future directions in intelligent IT systems.
Explore the challenges and trade-offs of deploying local LLMs for sentiment analysis in a platform engineering context. This post covers resource constraints, model accuracy, observability, and the build vs. buy decision to help platform teams integrate AI-powered observability into their workflows.
A detailed comparison of Kubeadmiral and Karmada for multi-cluster Kubernetes management. This post explores their architectures, dynamic placement capabilities, and operational complexities to help you choose the right federation solution.
A comprehensive evaluation of metrics solutions for multi-tenant Kubernetes platforms, comparing Prometheus + Thanos, Victoria Metrics, and Grafana Mimir to address scalability and resource efficiency challenges.
Explore how to support private service access in GCP from a multi-tenanted Kubernetes platform, comparing IAM Auth & Connectivity with Private Service Access (PSA) to help you choose the right solution for your infrastructure.
Explore the journey of migrating a high-traffic ad decision server from Cloud Run to GKE Autopilot. This post details the performance challenges with serverless, the benefits of a VM-based solution, and why GKE Autopilot became the ideal middle ground for scalability, cost-efficiency, and manageability.
Learn how to automate Landing Zones in GCP Organizations.
Learn our four-stage model for executing greenfield projects: discovery, planning, execution, and feedback. This post unveils our strategy for achieving high client satisfaction and making critical decisions efficiently.
A comprehensive comparison of Kubernetes policy engines including OPA Gatekeeper, Kyverno, Kubewarden, and JsPolicy, exploring their architectures, strengths, and use cases for enforcing organizational standards.
Discover how we designed a robust authentication approach which can flexibly handle a diverse range of communication protocols and which scales efficiently.
A personal account of CECG's unique onboarding experience, featuring a comprehensive 1-3 month bootcamp that transforms software developers into platform engineers through hands-on IDP projects and mentorship.
Learn how to scale an HTTP stub for high-performance load testing using WireMock in Kubernetes. This post covers strategies for horizontal scaling, handling dynamic mappings with StatefulSets, and configuring a load generator for effective non-functional testing.
Learn how we implemented identity-based authentication for a developer platform using Google Identity-Aware Proxy (IAP) on GKE. This post covers our technical approach, from ingress architecture to overcoming IAP limitations, to provide secure, seamless access to internal services.
Imagine acquiring sought-after engineering skills that could significantly boost your expertise and confidence, in a matter of weeks.
Explore the challenges of seeking support in big tech companies and the strategies to enhance the support experience. This post delves into the core issues faced by support teams and users of Internal Development Platforms (IDPs), highlighting solutions like comprehensive training, proactive support, and community-driven innovations.
Explore how Internal Developer Platforms (IDPs) streamline common development processes through interfaces like CLI tools, developer portals, and platform orchestrators. This post examines the pros and cons of each approach to help you optimize developer workflows.
A comprehensive review of Crossplane after one year of intensive professional use, exploring its strengths in infrastructure automation and Kubernetes integration, alongside its challenges and limitations.
Learn how CECG helped a client achieve seamless cross-cloud deployments between AWS and GCP, enabling teams to deploy workloads with just one line of YAML while building automated infrastructure pipelines.
Learn how we successfully introduced 1000+ platform users to Horizontal Pod Autoscaling through an interactive knowledge platform with hands-on learning modules.
Discover streamlining landing zone creation from the ground up using a low-code approach, optimising efficiency and reducing development complexities.
Explore emerging trends in CI/CD pipelines that challenge conventional processes, advocating for script-based approaches, local execution capabilities, and tools like Dagger for more dynamic and adaptable workflows.
The client is a large multinational that operates in different parts of the world with different products, requiring a flexible solution with configurable rules and integrations per region.
Explore how to implement robust Platform Lifecycle Management using FluxCD for GitOps and Concourse for delivery management. This post details an automated, reliable, and continuous approach to deploying platform services across multiple environments.
A comprehensive guide comparing different Container Network Interfaces (CNIs) in Kubernetes, including Cilium, Calico, Weave, and Flannel, with practical insights on CNI chaining and real-world applications.
Explore four common mechanisms for integrating Kubernetes with HashiCorp Vault for secret management. This post compares the External Secrets Operator, Kubernetes Secrets Store CSI Driver, Vault Secrets Operator, and Vault Agent, weighing their pros and cons to help you choose the right solution for your platform.
Explore the purpose and value of Kubernetes Operators and CRDs through a seasonal reflection. This post explains how Operators extend the Kubernetes control plane to manage both internal and external resources, simplifying complex application deployments and integrations.
CECG was founded by and is made up of, the most senior software engineers that want to get things done quickly.
Learn how to implement a multi-tenant ingress for a GKE-based developer platform, enabling developers to expose services to the internet seamlessly. This post details a tried-and-tested architecture using Gateway API, Cert Manager, and Traefik to automate DNS, TLS, and load balancing.
A comprehensive journey through the Google Cloud Professional DevOps Exam, exploring how CECG's platform engineering training and real-world project experience provided the practical foundation needed for certification success.
A guide to writing your first Kubernetes Operator, covering 11 essential things to know before you start. This post offers practical advice on using the Operator SDK, handling reconciliation loops, managing state, and testing in isolation to save you time and effort.
Learn from 8 years of experience in upgrading multi-tenanted production Kubernetes clusters. This post details the challenges of keeping clusters up to date, from managing API deprecations to aligning with vendor support schedules, and provides a recommended upgrade timeline to ensure a smooth, business-as-usual process.
Learn how Continuous Load helps monitor network health proactively by running 24/7 network load across infrastructure, enabling teams to find and fix problems before users notice them.
Explore the benefits and challenges of multi-tenancy in Kubernetes, with a detailed comparison of different models like multiple clusters, multiple control planes, and shared control planes. This post dives into frameworks such as Vcluster, Kamaji, HNC, and Capsule to help you choose the right approach for your organization.
Learn how to implement security for an MVP Internal Developer Platform in a retail bank, covering secrets management, access control, vulnerability scanning, and network isolation. This post details a pragmatic approach to building a secure, scalable, and compliant platform from the ground up.
Learn how to monitor an MVP Kubernetes-based developer platform using SLOs and SLIs. This post outlines a structured approach to defining measurable reliability targets for the control plane, data plane, networking, and load balancing to ensure platform stability and tenant satisfaction.
Learn why we use ADRs and how they can help your team.
CECG has been appointed as a Google Cloud Partner, reaffirming our team's expertise in enterprise cloud transformations and GCP products after achieving required certifications and completing Google's onboarding process.
How would you test a Kubernetes operator? We figured we would never be truly confident unless we ran the tests against a Kubernetes cluster using kind.