We introduce our guest – Lucy McGrother.
Lucy is a colleague of Jon’s, who worked in Windows Support, Enterprise Management and now SOAR (Security Orchestration, Automation and Response).
Jon explains what SOAR is, and Lucy improves his answer.
We introduce the question of Monitoring, as raised by our Telegram group.
Lucy explains that you need to start by asking “What do you want to monitor”, and the answer shouldn’t be “everything”. We also talk about how you can respond to monitoring events. Lucy makes a sensible point “When you get an alarm from a monitor, it’s just telling you there’s something wrong to be looking at, and it’s up to you to add the intelligence to it”.
We discuss what enterprise monitoring tools we’ve used, including SCOM (System Center Operations Manager – a Microsoft product, part of SCCM) and CA OIM (previously known as “NSM”, “TNG”, “NISM”). We also mention some open source tools, like Zabbix, Nagios, Monit, Grafana and a free/paid product PRTG.
There’s also a conversation about how you can monitor processes running on a machine to reduce the amount of “noise”. Jon mentions about writing content to a log file, and capturing the output, but that won’t capture all the updates, Lucy mentions you can just monitor whether a log file has been touched in X hours!
Jerry talks about Nagios monitoring plugins, and how they would report issues using error codes.
Al mentions the podcast “Self Hosted Show“.
Jerry talks about the difference between metrics and polling. Lucy mentions that she did a Microsoft Statistics and Analytics course, and that your polling tool should be feeding metrics data for later use.
Jon and Lucy draw some information from their pasts about dealing with incidents and about how it’s difficult to pull logs from boxes, especially when there’s a need to resume service as soon as possible. We also discuss the difficulty of having a constant log transfers to other devices, particularly in carrier grade equipment that might be processing many gigabytes per second, a proxy for a large company that might be producing many 10,000’s of log files per 24 hours, collecting logs from cloud providers that charge for egress traffic, or perhaps if there’s someone malicious inside your network that is trying to hide their actions, they might spam the monitoring solution with valid or invalid log entries to frustrate investigators.
Jerry talks about how application developers he’s worked with frequently embed log collection features into their applications so that you have a known API point you can ask for the status of that application, and use that from your polling system.
Jon brings up a point made in the Telegram group from Stuart, who mentions that his workloads are frequently ephemeral, and that he really needs something that handles service discovery, like Prometheus and Consul.
Jon went on a Wireshark Webinar which he’d strongly endorse people watch (he’s waiting on approval to post the link), and ideally get training from the creator of the course!
Jerry mentions a weekly podcast “The Pod Delusion” which has restarted. Jon mentions “The Coolest Nerds In The Room” podcast. Al talks about the “Lost Connections” audio book and connected podcast – “Uncovering the Real Causes of Depression with Johann Hari“. Lucy mentions the school in Salford who are teaching all their pupils BSL (British Sign Language) to ensure that deaf students at the school are included.
We thank Dave Lee for his continuing work in fixing up our audio. Jerry non-ironically mentions that he hopes our audio will be better this episode. Dave has advised us that he laughed extensively when he heard this.
Dave is also one of our Patreons – if you also want to be a Patreon, please follow this link: https://www.patreon.com/adminadminpodcast.