December | 2025 | The Admin Admin Podcast

In this episode:

Cloud Outages and Incident Reviews

We mention recent service outages involving AWS DNS and Azure Front Door, discussing how both were triggered by minor misconfigurations, such as empty arrays or DNS records.

We highlight Azure’s practice of sharing detailed post-incident reviews on YouTube to boost transparency, similar to what GitLab once did. The need for improved input validation by cloud providers is emphasized following these outages. Also, a brief explanation of HugOps

Migration and Modernization Projects

Jerry describes his current gig involving the migration of legacy on-premises infrastructure to modern cloud solutions, using AWS Transfer Family for SFTP services and migrating SQL Server databases to Azure SQL Managed Instance. SQL Server Management Studio (SSMS) and AWS Database Migration Service are mentioned as typical tools for these migrations, though both are noted for occasional reliability issues.

Linux Laptop Setup and Configuration Management

The discussion shifts to strategies for configuring Linux systems, especially as Windows 10 becomes unsupported.

Different configuration management tools are discussed: Al recently restarted with Ansible (after using Puppet), noting how Ansible scripts can provision a system from scratch efficiently using APT and Flatpaks and the local connection in Ansible.

Playbooks, dotfile management (using solutions like chezmoi), and over-engineered Vim configurations are recurring themes, with mentions of Ansible configs supporting distributions like Debian, RHEL and Arch (but not NixOS yet – someone would have said something ).

Jerry belatedly realises he should sort something out in this respect, though all he really needs to get going is SSH/GPG keys (for pass), ssh-keychain for WSL. Jerry & Stu discuss vim and the vscode vim plugin.

Shells, Package Managers, and Dotfiles

We discuss oh-my-zsh and its productivity-boosting plugins, offering git aliases and improved history searching using fzf. We compare bash, zsh, and fish, with zsh preferred for its better completion and command history features and ability to run Bash one-liners. We also look into the role of package managers (Homebrew (also on Linux, which already has a package manager :), pip, NPM, Cargo, etc.) for managing dev environments

Coding and Tools

We discuss recent experiences (vibe-)coding in Go (Golang) to replace some dodgy powershell scripts, and touch on golang’s learning curve and the fact it’s a compiled language.

We touch on SST (Serverless Stack Toolkit), which is based on TypeScript and offers opinionated AWS resource deployment.

We touch on AI/LLMs again – OpenCode and Claude Code are referenced with their ability to support coding workflows either by making direct changes or providing guidance, we discuss the tradeoffs involved with using them to get stuff done.

Sysadmin and SRE Roles

We discuss the differences and overlaps between the various roles associated with out work: System Administrator (sysadmin), DevOps, Platform Engineering, and Site Reliability Engineering (SRE).

Jerry defines sysadmin as a Windows or Linux engineer, perhaps someone from less of a programming background
We dive a bit deeper into “SRE” is defined as focusing on reliability to a level that meets business and customer needs, balancing automation and reducing toil (work that could be automated) and the concept of user experience monitoring

SLOs (Service Level Objectives), SLIs (Service Level Indicators), and the importance of observability is highlighted – referencing logs, metrics, traces, and (sometimes) profiling.

Observability, Monitoring, and OpenTelemetry

We discuss logs, metrics, and distributed tracing (especially via OpenTelemetry and hosted services such as Datadog and Honeycomb). Jerry mentions an excerpt from Observability Engineering by the Honeycomb engineers. We also touch on the practical need for monitoring at both the system level and deeper into data that may be being collected, with analogies like a pain in the foot turning out to be a broken toe upon further investigation.

The pillars of observability (metrics, logs, and traces) come up again and Stu breaks down their roles in incident investigation and maintaining SLOs. We define a real-world example of a 99.5% SLO.

We go on about SRE so much that we run out of time and touch on the naming of these roles over time (plus new roles that are popping up e.g. “finops”), stay tuned for further discussions…

Get in touch with us at mail@adminadminpodcast.co.uk or via our Telegram channel.

The Admin Admin Podcast

A Podcast for People work in the Real world of IT, if you are a sys admin or want to learn more about servers this podcast is for you.

Monthly Archives: December 2025

Admin Admin Podcast #103 – Show Notes: That’s how I role