Author Archives: Dave Lee

Admin Admin Podcast #105 – The one without the musician

In this episode, Al, Jon, and Jerry catch up on some major life changes, dive deep into the world of Model Context Protocols (MCP), and discuss the practicalities of moving into IT consultancy. Plus, we explore why Docker might be your best friend for CLI tools and how to keep your AWS environment secure with temporary tokens.

Show Notes: https://www.adminadminpodcast.co.uk/ep105sn/

Admin Admin Podcast #105 – Show Notes: The one without the musician

In this episode, Al, Jon, and Jerry catch up on some major life changes, dive deep into the world of Model Context Protocols (MCP), and discuss the practicalities of moving into IT consultancy. Plus, we explore why Docker might be your best friend for CLI tools and how to keep your AWS environment secure with temporary tokens.

Community Update

A Bit of News: The team shares an update regarding Stu, who is taking a step back from the show to focus on outside commitments. We wish him the absolute best! The podcast continues with Al, Jon, and Jerry at the helm.

The Racing Clock: This episode was recorded in a “sprint” style, thanks to Jon’s laptop battery giving a strictly enforced 46-minute deadline.

Development & AI

MCP Server Progress: Jon discusses building Model Context Protocol (MCP) servers, specifically focusing on a Grafana server for incident management. By using an adapter to bridge Slack and Grafana via Claude, the team can correlate network events with real-time conversations.

The MCP Proxy Idea: A look into the “internal lab” at a potential proxy for MCPs to provide better visibility into LLM requests and responses.

Spec-Driven Development: Al breaks down the workflow of using LLMs for spec-driven development. By using tools like GitHub’s Spec Kit, the process moves from markdown standards to LLM-generated tests, ensuring software is built in small, verifiable increments.

The Business of IT

Contracting vs. Consulting: Jerry shares his transition from the contract market toward a full-scale IT Consultancy model.

Market Gap: Discussion on the demand for high-level IT support for small companies that don’t need a full five-day-a-week commitment.

Defining the Roles: A breakdown of the nuances between permanent employment (benefits/stability), contracting (freedom/higher risk), and consulting (multi-client management/marketing).

Tools & Security

Dockerized CLI Tools: Why Jon is moving toward wrapping command-line scripts in Docker containers. This approach eliminates the “it works on my machine” headache by bundling dependencies like Python versions and LibSSL directly into the image.

AWS Security: A shallow-ish dive into AWS Short-Lived Token Service (STS). We discuss why temporary tokens are a superior security choice over long-lived IAM access keys and how they compare to Azure’s SAS keys.

Events & Links

OggCamp 2026: Mark your calendars for April 25th! The crew plans to be there, and Jon is already prepping at least one talk.

Get in Touch

  • Email: mail@adminadminpodcast.co.uk
  • Community: Join our Telegram group to keep the conversation going.

Credits: Huge thanks to Dave Lee for the audio production. We are a proud member of the Other Side Podcast Network.

Admin Admin Podcast #104 – Show Notes: He’s been talking to himself again

In this episode, Jon flies solo to tackle the “scheduling gremlins” affecting the team before diving deep into two powerful tools for the modern SysAdmin: Server Spec for infrastructure unit testing and MCP Servers for integrating LLMs into your monitoring workflow.

Bringing Unit Testing to the Server

Jon breaks down why unit testing isn’t just for software developers anymore. By using Server Spec (an extension of RSpec), admins can verify their infrastructure just like code.  Jon explains – with examples – how to define spec files to check if files exist, are executable, or contains specific strings.  He also talks about how to use Vagrant to verify virtual machine upgrades and service states before they hit production.

The Power of MCP (Model Context Protocol)

The episode shifts to the cutting edge of AI in the data center. Jon discusses how MCP servers allow Large Language Models (LLMs) like Claude and Gemini to interact directly with your infrastructure.  Instead of writing complex SQL or PromQL queries, Jon explains how – using natural language – his team uses MCP to ask Grafana: “What was the impact of the incident between 2 PM and 4 PM?”  He also explains how MCP can be used for complex connections, like linking CloudWatch metrics to customer support claims in real-time.

Community & Events

Jon will be attending OggCamp in Manchester on Saturday 25 and Sunday 26 April this year, and hopes to see Al, Jerry, and Stu there!

Connect with Us

We want to hear from you! How are you using AI in your daily admin tasks?
Contact us via email or Telegram!

Admin Admin Podcast #104 – He’s been talking to himself again

In this episode, Jon Spriggs flies solo to tackle the “scheduling gremlins” affecting the team before diving deep into two powerful tools for the modern SysAdmin: Server Spec for infrastructure unit testing and MCP Servers for integrating LLMs into your monitoring workflow.

Show Notes: https://www.adminadminpodcast.co.uk/ep104sn/

Admin Admin Podcast #103 – Show Notes: That’s how I role

In this episode:

Cloud Outages and Incident Reviews

We mention recent service outages involving AWS DNS and Azure Front Door, discussing how both were triggered by minor misconfigurations, such as empty arrays or DNS records.

We highlight Azure’s practice of sharing detailed post-incident reviews on YouTube to boost transparency, similar to what GitLab once did. The need for improved input validation by cloud providers is emphasized following these outages. Also, a brief explanation of HugOps

Migration and Modernization Projects

Jerry describes his current gig involving the migration of legacy on-premises infrastructure to modern cloud solutions, using AWS Transfer Family for SFTP services and migrating SQL Server databases to Azure SQL Managed Instance. SQL Server Management Studio (SSMS) and AWS Database Migration Service are mentioned as typical tools for these migrations, though both are noted for occasional reliability issues.

Linux Laptop Setup and Configuration Management

The discussion shifts to strategies for configuring Linux systems, especially as Windows 10 becomes unsupported.

Different configuration management tools are discussed:  Al recently restarted with Ansible (after using Puppet), noting how Ansible scripts can provision a system from scratch efficiently using APT and Flatpaks and the local connection in Ansible.

​Playbooks, dotfile management (using solutions like chezmoi), and over-engineered Vim configurations are recurring themes, with mentions of Ansible configs supporting distributions like Debian, RHEL and Arch (but not NixOS yet – someone would have said something ).

Jerry belatedly realises he should sort something out in this respect, though all he really needs to get going is SSH/GPG keys (for pass), ssh-keychain for WSL. Jerry & Stu discuss vim and the vscode vim plugin.

Shells, Package Managers, and Dotfiles

We discuss oh-my-zsh and its productivity-boosting plugins, offering git aliases and improved history searching using fzf. We compare bash, zsh, and fish, with zsh preferred for its better completion and command history features and ability to run Bash one-liners. We also look into the role of package managers (Homebrew (also on Linux, which already has a package manager :), pip, NPM, Cargo, etc.) for managing dev environments

Coding and Tools

We discuss recent experiences (vibe-)coding in Go (Golang) to replace some dodgy powershell scripts, and touch on golang’s learning curve and the fact it’s a compiled language.

We touch on SST (Serverless Stack Toolkit), which is based on TypeScript and offers opinionated AWS resource deployment.

We touch on AI/LLMs again – OpenCode and Claude Code are referenced with their ability to support coding workflows either by making direct changes or providing guidance, we discuss the tradeoffs involved with using them to get stuff done.

Sysadmin and SRE Roles

We discuss the differences and overlaps between the various roles associated with out work: System Administrator (sysadmin), DevOps, Platform Engineering, and Site Reliability Engineering (SRE).

  • Jerry defines sysadmin as a Windows or Linux engineer, perhaps someone from less of a programming background
  • We dive a bit deeper into “SRE” is defined as focusing on reliability to a level that meets business and customer needs, balancing automation and reducing toil (work that could be automated) and the concept of user experience monitoring

SLOs (Service Level Objectives), SLIs (Service Level Indicators), and the importance of observability is highlighted – referencing logs, metrics, traces, and (sometimes) profiling.

Observability, Monitoring, and OpenTelemetry

We discuss logs, metrics, and distributed tracing (especially via OpenTelemetry and hosted services such as Datadog and Honeycomb). Jerry mentions an excerpt from Observability Engineering by the Honeycomb engineers. We also touch on the  practical need for monitoring at both the system level and deeper into data that may be being collected, with analogies like a pain in the foot turning out to be a broken toe upon further investigation.

The pillars of observability (metrics, logs, and traces) come up again and Stu breaks down their roles in incident investigation and maintaining SLOs. We define a real-world example of a 99.5% SLO.

We go on about SRE so much that we run out of time and touch on the naming of these roles over time (plus new roles that are popping up e.g. “finops”), stay tuned for further discussions…

Get in touch with us at  mail@adminadminpodcast.co.uk or via our Telegram channel.

 

Admin Admin Podcast #102 – Getting the band back together

In this episode we return after a couple of years in hiatus to talk about what we’ve been up to since we last recorded, including: LLMs; the differences between platform, DevOps, and sysadmin; and red tape.

Show Notes: https://www.adminadminpodcast.co.uk/ep102sn/

 

Admin Admin Podcast #102 – Show Notes: Getting the band back together

In this episode:

The team shared career updates, including Jon’s new SRE role, Jerry’s transition to freelance work, Stu’s move to a principal software engineer position, and Al’s lead role in a DevOps team.

Key discussions revolved around AI, with Jerry sharing his positive experience using Light LM and AI for design documents, while Stu expressed ethical concerns about AI’s energy consumption. Al raised concerns about AI hindering learning for new developers, and Jon highlighted the issue of “AI slop” affecting projects like curl

John mentioned:
Defensive Security Podcast
– TinyOIDC: https://tinyoidc.authenti-kate.org/ and https://github.com/authenti-kate/tiny-oidc
Open Source Security Podcast; LLM Finding bugs in Curl
– Human Resources book: https://www.amazon.co.uk/dp/B0DZWKGZGN and https://torpublishinggroup.com/human-resources/?isbn=9781250375933&format=ebook

Jerry mentioned:
A YouTube video about AI Sloop

Admin Admin Podcast #098 Show Notes – Contain Your Enthusiasm

Jon couldn’t make it for this episode, he’ll be back next time!

Al mentions our last episode with Ewan, and how the focus on Observability fits with his current focus at work.

Al references the Golden Signals of monitoring, as well as Azure’s App Insights.

Stuart mentions a few books to read including the Google SRE bookGoogle SRE Workbook and Alex Hidalgo’s Implementing Service Level Objectives. One not mentioned in the show but also of interest is Observability Engineering.

Jerry talks about his new job, that uses Azure and .NET. He mentions using Terraform and Azure DevOps. He also does some freelance work, and is trying to build “platforms” rather than just managing servers manually.

Stuart mentions a push in the industry to build easily consumable platforms for developers, allowing them to consume it themselves (Platform Engineering).

Al talks about using multiple regions within Cloud providers. Stuart mentions that sometimes using multiple regions can add redundancy but significantly increase complexity, at which point there is a trade off to consider.

Stuart talks about database technologies that allow multiple “writers” (e.g. Apache’s Cassandra, AWS’s DynamoDB, Azure’s CosmosDB), compared to those with a single writer and multiple readers (e.g. default MySQL and PostgreSQL).

Jerry talks about CPU Credits in Cloud providers, Stuart references AWS’s T-series of instances which make use of CPU Credits.

Al starts a discussion around Containers.

Stuart mentions the primitives that Containers are based around like cgroups. They also use network namespaces (not used in the show).

Al mentions a container image he is looking at currently which includes a huge amount of dependencies (including Xorg and LibreOffice!) that are probably not required.

Al talks about Azure Serverless (“function-as-a-service” like AWS’s Lambda and OpenFAAS), and Jerry mentions that these often are running as containers in the background. He also mentions AWS’s Fargate as a “serverless” container platform.

The conversation then moves onto Kubernetes.

Stuart mentions that when using a Cloud’s managed Kubernetes service, you often still manage the worker nodes, with the Cloud provider managing the control plane. It is possible to use technologies like AWS’s Fargate as Kubernetes nodes.

Al asks about how you would go about viewing splitting up Kubernetes clusters (i.e. one big cluster? multiple app specific clusters? environment-specific clusters?). Jerry and Stuart talk about this, as well as how to use multi-tenancy/access control and more. Stuart mentions concerns in terms of quite large clusters, in terms of rolling upgrades of nodes.

Stuart mentions Openshift, a Kubernetes distribution (similar to how Ubuntu, Debian, and Red Hat are distributions of Linux), and talks more about how it differs from “vanilla” Kubernetes. Stuart also mentions Rancher as another Kubernetes distribution.

Stuart also mentions the Kubernetes reconciliation loop, which is a really powerful concept within Kubernetes.

Stuart briefly mentions Chaos Engineering, inducing “chaos” to prove that your infrastructure and applications can handle failure gracefully.

Stuart talks about the Kubernetes Cluster Autoscaler.

Stuart and Jerry talk about how Kubernetes is not far off being a unified platform to aim for, although not entirely. Differences in how Clouds implement access control/service accounts is a good example of this.

Al mentions using a Container Registry, which Jerry and Stuart go into more detail about. Jerry talks about Container Images and only including what is required in it.

Jerry mentions Alpine Linux as a good base for Container images, to reduce the size of containers and not including unneeded dependencies.

Al mentions slim.ai, and Stuart mentions how it is aiming to be like minify but for Containers.

Jerry talks about Multi-Stage container images, as a way of removing build dependencies from a Production container. Stuart also mentions “Scratch” containers, which are effectively an image with nothing in it.

Stuart mentions running the built container within a Continuous Integration Pipeline with some tests, to make sure that your container doesn’t even get published until it meets the requirements of running the application inside of it.

Al and Stuart talks about running init systems (e.g. systemd) in Containers, and how it usually isn’t the way you run applications within Containers.

Jerry mentions viewing containers as immutable (e.g. don’t install packages that are required in an already running container, add them to the base image before starting it).

Stuart talks about viewing Containers as stateless, avoiding the need to persist data when a new container is deployed.

Admin Admin Podcast #097 Show Notes – Through the Logging Glass

In this episode, Jon’s colleague Ewan joins us, to talk about Observability.

Stu explains that Observability is how you monitor requests across microservices.

Microservices (which we foolishly don’t describe during the recording) is the term given to an application architectural pattern where rather than having all your application logic in a single “monolith” application, instead it is a collection of small applications, executed, as required, when triggered by a request to a single application entry point (like a web page). These small applications are built to scale horizontally (across many machines or environments), rather than vertically (by providing them with more RAM or CPU on a single host), which means that if you have a function that takes a long time to execute, this doesn’t slow down the whole application loading. It also means that you can theoretically develop your application with less risk, as you don’t need to remove your version 1 microservice when you develop your version 2 microservice, so if your version 2 microservice doesn’t operate the way you’re expecting, you can easily roll back to version 1. This, however, introduces more complexity in the code you’ve written, as there’s no single point for logs, and it can be much harder to identify where slowdowns have occurred.

Stu then explains that observability often refers to the “three pillars“, which are: Metrics, Logs and Tracing. He also mentions that there’s a fourth pillar being mentioned now about “Continuous Profiling“. Jerry talks about some of the products he’s used before, including Data Dog and Netdata, and compares them to Nagios.

Ewan talks about his history with Observability, and some of the pitfalls he’s had with them.

Stu talks about being a “SRE” – Site Reliability Engineer, and how that influences his view on Observability. Stu and Ewan talk about KPIs (Key Performance Indicators), SLI (Service Level Indicators) and SLO (Service Level Objectives), and how to determine what to monitor, and where history might make you monitor the wrong things. Jerry asks about Error Budgets. Stu talks about using SLI, SLO and error budgets to determine how quickly you can build new features.

Jerry asks about tooling. Stu and Ewan talk about products they’ve used. Jon asks about injecting tracing IDs. Ewan and Stu talk about how a tracing ID can be generated and how having that tracing ID can help you perform debugging, not just of general errors, but even on specific issues in specific contexts.

Jon asks about identifing outliers with tooling, but the consensus is that this is down to specific tools. Ewan mentions that observability just is tracing events that occur across your systems, and that metrics, logs and tracing can all be considered events.

Jon asks about what is a “Log”, a “Metric” and a “Trace”, Ewan describes these. Stu talks about profiling and how this might also weigh into the conversation, and mentions Parca, a project talking about profiling.

Ewan talks about the impact of Observability on the “industry as a whole” and references “The Phoenix Project“. Jerry talks about understanding systems by using observability.

We talk about being on-call and alert fatigue, and how you can be incentivised to be called out, or to proactively monitor systems. The DevOps movement’s impact on on-call is also discussed.

Ewan talks about structured logging and what it means and how it might be implemented. Stu talks about not logging everything!

We’re a member of the Other Side Podcast Network. The lovely Dave Lee does our Audio Production.

We want to remind our listeners that we have a Telegram channel and email address if you want to contact the hosts. We also have Patreon, if you’re interested in supporting the show. Details can all be found on our Contact Us page.