A new newsletter issue is here!
https://newsletter.catops.dev/p/catops-digest-2024-11-03
#newsletter #digest
A story of debugging OOMs of a Go application in Kubernetes.
Now, I do not agree with the author of this article that the fact that Go is not aware of memory limits is a problem. In my opinion, it works as expected: you don’t want to have environment-dependent runtimes.
However, this article provides some examples of how one can manage Go’s memory utilization and tune garbage collection a little bit. Plus, it has links to articles that describe Go’s garbage collector in more detail, which is also cool.
#go #programming #kubernetes
Who said that there’s no development in the configuration management field :D
Mantis is a new tool to manage you infrastructure (and Kubernetes resources are in the roadmap). It uses Cue lang for configuration.
It’s not production ready! Even according to the author. I just want to share it as an example of:
- Cfg Mgmt development
- The fact that people are still trying to find a middle ground between DSL and Turing-complete languages
#cfg_mgmt
Many people know about resources in Kubernetes, because every second article talks about the importance of setting them correctly. Many people know that resources in Kubernetes are later translated into Linux cgroups, because this is a common interview question.
Yet, how many people know, how exactly are resource requests and limits are translated into cgroups?
#kubernetes #linux
A new issue of the CatOps Digest is here!
https://newsletter.catops.dev/p/catops-digest-2024-10-20
I've missed a week because I was traveling, so this one came a bit later that usual.
#digest #newsletter
Kubernetes on a High Traffic Environment: 3 Key Takeaways is a nice brief article on concrete things one would benefit from in a high load environment. These things are:
- Node cache DNS
- Peak EWMA algorithm for load balancing
- Multiple Ingress Controllers for different income streams (if this is your case).
This article also contains links to other articles, where you can learn more about each thing separately.
#kubernetes
For today’s Donations Monday I would like to remind you of a standing jar of my German tutor.
https://send.monobank.ua/jar/8jQSHW57kP
He’s also a volunteer and he uses this jar to address ongoing requests from the combatants.
These things may not be as favorable to the media compared to some other equipment, but it doesn’t make it not important.
#donations #Ukraine
Kubernetes Pi Cluster release v1.9.
This might be interesting for you, if you run Kubernetes clusters on Raspberry Pies. But what's also interesting about this particular release is that they have stated the reasons for a migration from ArgoCD to FluxCD for cluster bootstrap.
(saving you a click)
Main reasons for this migration: FluxCd native support of Helm.
- ArgoCD does not use Helm to deploy applications, instead helm template command is used to generate the manifest files to be applied to the cluster. The engine used by Argo CD for applying manifests to the cluster, is not always fully compatible with all Helm possible configurations (hooks, lookups, random password) causing out-of-sync situations.
- FluxCD uses helm command to deploy Helm Charts, so Helm charts installed in this way support all the Helm-functions. Also, it eases the debugging process, because helm CLI tool can be used to see installed packages and configuration applied.
- Dependencies Definition support and improve performance in Bootstrap process.
- ArgoCD does not support application dependencies definition, only synchronization waves can be defined, so applications can be allocated to one of the synchronization waves, so some kind of bootstrapping order can be specified. The problem with this approach is that one synchronization wave cannot start till the previous one has ended successfully, making the full process take longer times.
- FluxCD support the definition of dependencies between applications so the cluster can be bootstrapped in order. Each application start its deployment as soon as all its dependencies are already synchronized, improving the time required to make a full cluster deployment.
- Avoid definition of extra-configuration in the manifest files to fix never-ending out-of-sync ArgoCD issues. Due to how Argo CD drift assessment logic certain not mandatory fields or server assigned fields are marked as out-of-synch, and they have to be configured to be ignored during the sync process.
Cluster bootstrap process using Ansible playbook has been updated to use FluxCD instead of ArgoCD
#kubernetes #argo #flux
When using ElasticSearch for logs, you most likely create indices periodically as well as have a job to rotate them logs.
However, ES can also be used as a database, and in that case one should be more careful with the data.
Here is a neat how-to article on changing the field type in the index mapping without downtime.
#elasticsearch
As a follow-up of the yesterday’s post, here’s a bundle of Cybersecurity books by O’Reilly on Humble Bundle.
#books #security
For today’s Donations Monday, I would like to remind you about a fundraiser for the International Legion of Ukraine.
This is a fundraiser for drones and other equipment, and it’s almost closed already. So, with your help we could close it today!
This is a great opportunity to accomplish something on Monday already 😅
Monobank Jar:
https://send.monobank.ua/jar/7282sCqqgy
#donations #Ukraine
Today you may encounter mentions of 9.9 CVE for Linux. Most likely, it's all about this one.
This CVE is related to CUPS - a printing service for Linux. So, if you don't print things, you can just uninstall or disable it on your Linux machine and move with your day.
Anyway, this is an interesting read on its own. It's interesting how they found this vulnerability.
P.S. These are news from chat, btw.
#security
YouTube algorithms got me an interesting video titled: Microservices are Technical Debt. This is an interview with a principle engineer at DoorDash (US food delivery service) and a reference to an old article in DoorDash’s blog: Future-proofing: How DoorDash Transitioned from a Code Monolith to a Microservice Architecture.
Quite ironic that the title of the original article has words “future proofing” inside. Although, the argument about monoliths vs microservices is old and boring, there are a few of interesting insights in the interview. For example, microservices did help DoorDash to move faster when Corona hit. This is something that most likely wouldn’t be able to do without them. So, even with all the clumsiness that they bring today, that decision was worth it at the time.
#architecture
If you're stuck with Python or you want to learn it, check out this collection of training courses and software on Humble Bundle.
Also, it's the last days to grab a bundle of cloud solutions architect trainings.
#humblebundle
Pinterest explains how their engineers deal with web performance. It’s a three parter:
- Part I: about the team structure
- Part II: about systems and practices for performance degradation detection
- There is no Part III, because multipart articles are hard.
I found it particularly interesting, how they organize teams and share responsibility for web performance. It would be very interesting to hear, how did they get there: was it a top-down instruction or a natural evolution. Maybe, that should have been the part III, but who knows.
P.S. Pinterest is the best social media out there!
8 ways to speed up your Ansible playbooks is a neat article with some simple tips and tricks for your Ansible operations.
Sure, configuration management is not such a hot topic as it used to be, but it’s still out there and it’s still relevant.
#ansible #cfgmgmt
If you want to improve your CLI & scripting game, make sure to check out this book bundle by O’Reilly!
#books
A great article about Kubernetes routing.
Yes, things described at the beginning are basic, but then the article explains, how things work under the hood using IPTABLES as an example. So, this article is great both for those who just learn K8s, and those who work with it, but want to dig deeper.
BTW, do you remember all the chains that IPTABLES have? :D
#kubernetes #networking
A friend of mine raises funds for a van for her relative that serves in AFU right now.
The fundraiser is in Privat Bank, which doesn’t accept non-Ukrainian cards for whatever reason. However there’s also PayPal.
Privat for Ukrainian cards: https://next.privat24.ua/send/dntp4
PayPal (worldwide): basta.tragedy@gmail.com
If you’re gonna use PayPal, please, put a comment that this is for a van, so it’s easier for her to distinguish between donations.
#donations #Ukraine
There is a slight disagreement between those who believe that AI is here to save the world from software developers with a job, and those who believe that this is just an advanced autocomplete.
This article provides some arguments to the latter point.
For me, first and foremost, it is interesting insight on the ways how people test new AI models.
P.S. If you are from the optimistic tribe, make sure to check out Den's video (in Ukrainian) about Cursor - an AI-powered editor.
#ai #programming
Today I stumbled upon an interesting project: Withmarble helps you to learn computer science topics using interactive flash cards.
It also looks like it uses some LLM under the hood to generate certain answers, but this is just a guess.
In any case, the project is very raw: it has only a couple of cases, it has bugs on both mobile and desktop, etc. For example, if you opened a flash card, there is no way to close it and go back to the list.
Still, I think it's a nice idea to teach folks computer science. Maybe, some of you could take this idea and execute it better :D
#programming
Another amazing article from the chat - 6 Reasons You Don't Need an SRE Team.
This quote from there should be carved in stone and put upon every technical manager's desk:
reliability is everyone's job.
If you, or engineering leads who work in your org don't think so, then hiring a separate team to care about it isn't going to help.
An interesting article was shared in our chat yesterday.
This is a summary of the analysis of AI's influence on code quality. Some excerpts:
> The data strongly correlates “using Copilot” with “mistake code” being pushed to the repository more frequently.
> The 17% decrease in “move” operations when compared to 2021 hints at
the built-in trait of AI assistants to discourage code reuse. Instead of
refactoring and working to DRY (“Don’t Repeat Yourself”) code, they offer a one-keystroke templation to repeat existing code.
> Especially next to the decrease in “moved code,” the 11% increase in the proportion of duplicated code confirms the drop in overall code quality in 2023 when compared to 2021.
And my favorite one:
> In the absence of a CTO or VP of Engineering who actively schedules time to reduce “tech debt,” “copy/pasted code” often never gets consolidated into the appropriate component libraries.
Although, I saw this even before AI.
#ai #programming
Today I want to share with 2 articles from a guy who knows a thing or two about SLOs - Alex Ewerlöf.
- Heterogeneous SLI vs Homogeneous SLI in which he argues on type of events used for SLI and how to reason about them
- SLO: Elastic vs Datadog vs Grafana in which he compares SLO offerings from 3 major observability providers
#slo#observability
A new issue of the CatOps Digest!
https://newsletter.catops.dev/p/catops-digest-2024-09-29
#newsletter #digest
Recently I was on a meetup, where a guy from Grafana Labs presented Beyla - their new eBPF application instrumentation solution. It's an interesting concept, that allows one to "instrument" an app without actually changing the code, but by intercepting system calls using eBPF.
What can I say: it's a cool concept. Here, Netflix describes how they use eBPF to monitor noisy neighbors. Yet, in case of Netflix it involves a lot of custom code, ofc.
#observability #ebpf
My German teacher is also a volunteer. Together with his buddies, he constantly raises money for ongoing requests from the Ukrainian defenders.
You can support them via this Monobank Jar:
https://send.monobank.ua/jar/8jQSHW57kP
#donations #Ukraine
AI agents invade observability: snake oil or the future of SRE?.
We got from "measure everything" by Twitter to "monitor only what matters" by Honeycomb. Yet, alert fatigue, convoluted dashboards, and garbage metrics are still an issue today.
Could AI solve this? We simply don't know yet. The linked article speculates on this topic: is AI in Observability just another marketing trick or something that could help engineers to solve issues faster or more importantly prevent those issues all together.
#ai #observabilty
A friend of mine is raising funds for drones and other equipment for the International Legion of Ukraine.
https://send.monobank.ua/jar/7282sCqqgy
Currently, 60k out of 75k is there, so let's help him to reach the goal!
#donations #Ukraine