r_devops | Unsorted

Telegram-канал r_devops - Reddit DevOps

86

Reddit DevOps. #devops Thanks @reddit2telegram and @r_channels

Subscribe to a channel

Reddit DevOps

Is there some way to get 10$ AWS credits as a student?

Hey everyone!

I'm a student currently learning AWS and working on DevOps projects like Jenkins pipelines, Elastic Load Balancers, and EKS. I've already used up my AWS Free Tier, and I just need around $10 in credits to test my deployments for an hour or two and take screenshots for my resume/blog.

I’ve tried AWS Educate, but unfortunately it didn’t work out in my case. I also applied twice for the AWS Community Builders program, but got rejected both times.

Is there any other way (like student programs, sponsorships, or community grants) to receive a small amount of credits to continue building and learning?

I'd be really grateful for any suggestions — even a little support would go a long way in helping me continue this journey.

Thanks so much in advance! 🙏

https://redd.it/1ltuqjm
@r_devops

Читать полностью…

Reddit DevOps

Struggling to put two instances in targetid for alb module?

Do i need to create a different alb
targetgroupattachment resource block associating it with the alb module?

https://redd.it/1ltssey
@r_devops

Читать полностью…

Reddit DevOps

Canary Deployment Strategy with Third-Party Webhooks

We're setting up canary deployments in our multi-tenant architecture and looking for advice.

Our current understanding is that we deploy a v2 of our code and route some portion of traffic to it. Since we're multi-tenant, our initial plan was to route entire tenants' traffic to the v2 deployment.

However, we have a challenge: third-party tools send webhooks to our Azure function apps, which then create jobs in Redis that are processed by our workers. Since we can't keep changing the webhook endpoints at the third-party services, this creates a problem for our canary strategy.

Our architecture looks like:

* Third-party services → Webhooks → Azure Function Apps → Redis jobs → Worker processing

How do you handle canary deployments when you have external webhook dependencies? Any strategies for ensuring both v1 and v2 can properly process these incoming webhook events?Canary Deployment Strategy with Third-Party Webhooks

Thanks for any insights or experiences you can share!

https://redd.it/1ltmjre
@r_devops

Читать полностью…

Reddit DevOps

Advice for CI/CD with Relational DBs

Hey there folks!

Most of the the Dbs I've worked with in the past have been either non relational or laughably small PG DBs. I'm starting on a project that's going to be reliant on a much heavier PG db in AWS. I don't think my current approaches are really viable for a big boy relational setup.

So if any of you could shed some light on how you approach handling your DB's I'd very much appreciate it.

Currently I use Prisma, which works but I don't think is optimal. I'd like to move away from ORMs. I've been eying Liquibase.

https://redd.it/1ltcylo
@r_devops

Читать полностью…

Reddit DevOps

What issues do you usually have with splunk or other alerting platforms?

Yo software developer here wanted to know what kind of issues people might have with splunk are there any pain points you are facing? One issue my team is having is not being able to get alerts on time due to our internal splunk team limiting alerts to a 15 minute delay. Doesn't seem like much but our production support team flips out every time it happens

https://redd.it/1lteuuf
@r_devops

Читать полностью…

Reddit DevOps

Resume Review - Recent Grad with an MSCS

As the title goes, I'm a recent Master's graduate with an MS in CS. I haven't had any luck getting interviews with the last one coming 3 months ago, thanks to a recruiter I had established a connection with. I would love some extremely honest, brutal feedback. Also, I have applied to over 500-600 jobs at least since, and have not had any interviews.

Here's my resume - https://at-d.tiiny.site

https://redd.it/1ltcaen
@r_devops

Читать полностью…

Reddit DevOps

Unlock the Truth Behind Kubernetes Production Topologies

When it comes to production-ready Kubernetes, most blogs offer superficial guidance. But this 40+ page guide dives into what actually matters, cloud provider behavior under failure, real-world availability tradeoffs, and the architectural consequences of choosing zonal vs regional vs multi-cluster setups.

Whether you're using EKS, GKE, AKS or Self hosted you’ll walk away with clarity on:

Which control plane models are truly fault-tolerant
Why your node pool topology is silently sabotaging uptime
How pricing tiers map (or don’t) to SLA guarantees
What “high availability” really means across AWS, GCP, and Azure
How to scale safely — without overengineering or overspending

This is not a beginner’s overview. It’s a decision framework for platform engineers, SREs, and cloud architects who want to build
resilient, production-grade infrastructure and stop relying on vendor defaults.

👉 If your team is running Kubernetes in production or planning to, this is essential reading.

# Table of Contents

Introduction: Choosing the Right Topology for Production
Control Plane Architectures
Amazon EKS
Google GKE
Azure AKS
Worker Node Deployment Models
AWS EKS: Node Groups and Multi-AZ Strategy
Google GKE: Zonal, Multi-Zonal and Regional Node Pools
Azure AKS: Node Pool Zoning and Placement Flexibility
Summary: Comparing Node Deployment Models Across Providers
Designing for High Availability Within a Region
AWS EKS
Google GKE
Azure AKS
Summary: Regional HA Comparison
Upgrade and Maintenance Strategy
AWS EKS: Upgrade Mechanics and Control
Google GKE: Automated Channels and Controlled Upgrades
Azure AKS: Scheduled Windows and Tier-Aware Resilience
Summary: Upgrade Strategy Comparison
Multi-Region Topologies (and Limitations)
AWS EKS: Multi-Cluster Resilience via Global Services
Google GKE: Regional Isolation and Federation via Anthos
Azure AKS: Cross-Region Resilience Through Paired Clusters
Summary: Multi-Region Kubernetes Strategy Comparison
Availability, Fault Tolerance, and SLA Considerations
AWS EKS: SLA Commitments and Fault Domain Strategies
Google GKE: Tiered SLAs and Built-In Regional Redundancy
Azure AKS: Availability by Tier and Zone Awareness
Summary: Platform SLAs and Real-World Resilience
Managed vs User-Configured Topology Options
AWS EKS: Operations Freedom with Opt-In Management
Google GKE: Operational Modes from Manual to Fully Managed
Azure AKS: Gradual Abstraction and Tiered Node Management
Summary: Choosing the Right Topology Ownership Model
For Self-Hosted Kubernetes – Provisioning Tools and Topology Models
kubeadm: The Foundation for Custom Clusters
kOps: Opinionated HA Clusters for AWS and Beyond
Kubespray: Flexible, Ansible-Based Multi-Environment Provisioning
Cluster API: Declarative Lifecycle Management Across Environments
Summary: Choosing a Self-Hosted Tool Based on Environment and Control

Free Copy: https://www.patreon.com/posts/chapter-1-guide-131966208

Paid Guide: https://www.patreon.com/posts/unlock-truth-133516014

https://redd.it/1lt61ec
@r_devops

Читать полностью…

Reddit DevOps

Self Hosted Artifactory Alternative for Large Repositories?

Hi,

We recently upgraded our self hosted Artifactory instance and it has become woefully unstable. Support has been a massive miss for us. Of the 12 people assigned to our case over the course of the month only one of them have been helpful. Likewise, during outages Jfrog support was not able to fulfill our live support requests (we pay for the highest tier of support). We got strung along with "a support engineer will be with you in about 30 minutes" until we figured the problem out ourselves. Additionally, once we were in a support call, the support rep would try everything they could to "move the conversation offline" and have us send logs, enable secret logging, increase resources, send more logs, and continue in this cycle. Our instance is so over-provisioned at this point that it is taking up egregious amounts of compute/memory that is not being utilized. This also seemingly has no affect with our stability.

Our Artifact Registry is large around 40tb+ of data. Likewise, due to regulatory constraints some of the data must be kept on-prem. Are there any alternatives that are not Jfrog or Sonatype? We need a registry that is type agnostic (put a .zip file in a maven repo etc) and that can work efficiently while being quite large. It also must support remote registries.

https://redd.it/1lt295z
@r_devops

Читать полностью…

Reddit DevOps

is learning devops a good ideal for data science and llm engineering?

i was first thinking of learning mlops, but if we gonna learn ops, why not learn it all, I think a lot of llm and data science project would need some type of deployment and maintaining it, that's why I am thinking about it

https://redd.it/1lt0jjq
@r_devops

Читать полностью…

Reddit DevOps

What are your go-to tools/methods for reproducible, shareable, disposable dev/ops environments? (Nix, Docker, Devcontainer, etc.)

Hey all,

I’m curious—what tools or approaches do you use to create, share, and easily switch between different development or DevOps environments?
I’m looking for solutions that allow for reusable, disposable, and easily shareable environments (for onboarding, reproducibility, or just avoiding the dreaded “works on my machine” issues).

Some examples I’m considering:
• Nix / Nix Shell / Nix Flakes
• Dockerfiles for fully isolated, portable environments
• Devcontainers (VSCode, Codespaces)
• asdf, pyenv, venv, pipx
• Vagrant, Homebrew Bundle, NixOS
• Custom bootstrap scripts, dotfiles, etc.

What actually works for you?
• For what use cases? (dev, ops, CI/CD, data, etc.)
• Onboarding and ease of use (solo vs team)
• Limitations, gotchas, or workflow-specific experiences?
• Favorite combos, clever tricks, “must-have” automation?

I’d love to hear your real-world experiences, best practices, and recommended tools or setups for reproducible, isolated, and shareable environments.

Thanks in advance for any advice, horror stories, or setup ideas 🚀

https://redd.it/1lswzls
@r_devops

Читать полностью…

Reddit DevOps

here are some handy and free ebooks for troubleshooting Kubernetes and prepping for the CKA exam

If you’re gearing up for the CKA or just want some solid hands-on experience with real cluster issues, I stumbled upon a couple of ebooks that are filled with practical scenarios; think OOMKilled errors, readiness failures, DNS misconfigurations, and more.

If you’ve come across any other resources like this, I’d love to hear about them.

(Links in comments)

https://redd.it/1lsj3mh
@r_devops

Читать полностью…

Reddit DevOps

Is Terraformer used out there?

So I have thought back of a project in my consulting carreer where we had the task make the existing system IaC with Terraform (and more tasks). So we did this:

For each service type, we listed the existing services (via aws cli or sometimes web console), and for each result we created an empty resource, like so:

resource "aws_s3_bucket" "mybucket" { }

Then we did terraform import aws_s3_bucket.mybucket real-bucket-name. Then we looked at the imported configs via terraform show and pasted the corresponding config into the created empty config.

And this for each listing, for each service. This took a long time and we had to still do a "clean up". So I just wondered:
1. How do you guys approach such a task?
2. Do you use tools such as Terraformer that supposedly make this much quicker? I've heard mixed things about them.

https://redd.it/1lsgr36
@r_devops

Читать полностью…

Reddit DevOps

I'm Trying to Learn AWS Cloud but Feel Lost — How Do I Learn It Practically, Not Just Theoretically?

Hi everyone,

I’ve started learning AWS cloud computing recently, and while I’m going through a lot of resources and reading about different services like EC2, S3, IAM, and so on — I still feel like I’m learning it only theoretically. I don’t feel confident or job-ready, and honestly, I’m not sure where to go from here.

I understand the concepts, but when it comes to doing something practical (like provisioning infrastructure, launching services, or setting up a simple project), I freeze. I’ve watched tutorials and gone through courses, but I still feel like I'm just memorizing terms.

I really want to gain hands-on experience, but I’m not sure how to do that the right way:

Should I follow specific labs?
Should I just start a small project and learn as I go?
What’s the best way to move from “understanding” to “doing”?
Are there platforms that give you guided exercises using the AWS Console or CLI?

Any advice, personal experience, or practical tips you have would really help me out. I’m committed to learning, I just don’t want to waste more time feeling lost.

Thanks in advance!

https://redd.it/1lsdd2k
@r_devops

Читать полностью…

Reddit DevOps

Suggestions Required How are you handling alerting for high-volume Lambda APIs without expensive tools like Datadog?

I run 8 AWS Lambda functions that collectively serve around 180 REST API endpoints. These Lambdas also make calls to various third-party services as part of their logic. Logs currently go to AWS CloudWatch, and on an average day, the system handles roughly 15 million API calls from frontends and makes about 10 million outbound calls to third-party services.

I want to set up alerting so that I’m notified when something meaningful goes wrong — for example:

Error rates spike on a specific endpoint
Latency increases beyond normal for certain APIs
A third-party service becomes unavailable
Traffic suddenly spikes or drops abnormally

I’m curious to know what you all are using for alerting in similar setups, or any suggestions/recommendations — especially those running on Lambdas and a tight budget (i.e., avoiding expensive tools like Datadog, New Relic, CW Metrics, etc.).

Here’s what I’m planning to implement:

Lambdas emit structured metric data to SQS
A small EC2 instance acts as a consumer, processes the metrics
That EC2 exposes metrics via `/metrics`, and Prometheus scrapes it
AlertManager will handle the actual alert rules and notifications

Has anyone done something similar? Any tools, patterns, or gotchas you’d recommend for high-throughput Lambda monitoring on a budget?

https://redd.it/1ls42jv
@r_devops

Читать полностью…

Reddit DevOps

Ship tools as standalone static binaries

After Open AI decided to rewrite their CLI tool from TypeScript to Rust, I decided to post about why static binaries are a superior end-user experience.

I presumed it was obvious, but it seems it isn't, so I wrote in detail about why tools should be shipped as static binaries.

https://redd.it/1ls28s6
@r_devops

Читать полностью…

Reddit DevOps

Made a huge mistake that cost my company a LOT – What’s your biggest DevOps fuckup?

Hey all,

Recently, we did a huge load test at my company. We wrote a script to clean up all the resources we tagged at the end of the test. We ran the test on a Thursday and went home, thinking we had nailed it.

Come Sunday, we realized the script failed almost immediately, and none of the resources were deleted. We ended up burning $20,000 in just three days.

Honestly, my first instinct was to see if I can shift the blame somehow or make it ambiguous, but it was quite obviously my fuckup so I had to own up to it. I thought it'd be cleansing to hear about other DevOps' biggest fuckups that cost their companies money? How much did it cost? Did you get away with it?


https://redd.it/1ltuz99
@r_devops

Читать полностью…

Reddit DevOps

Can lambda inside a vpc get internet access without nat gateway?

Guys, I have a doubt in devops.
Can a lambda inside a vpc get internet access without nat gateway
Note:I need to connect my private rds and I can't make it public and I can't use nat instance as well

https://redd.it/1ltpqvu
@r_devops

Читать полностью…

Reddit DevOps

Separate pipeline for application configuration? Or all in IaC?

I'm working in the AWS world, and using CloudFormation + SAM Templates, and have API endpoints, Lambda functions, S3 Buckets and configuration all in the one big template.



Initially was working with a configuration file in DEV and now want to move these parameters over to Param Store in AWS, but the thought of adding these + tagging (required in our company) for about 30 parameters just makes me feel like I'm catastrophically flooding the template with my configuration.



The configuration may change semi regularly, outside of the code or any other infra, and would be pushed through the pipeline to release.



Is anyone out there running a configuration pipeline to release config changes? On one side it feels like overkill, on the other side it makes sense to me.



What's your opinions please brains trust?

https://redd.it/1ltjqmz
@r_devops

Читать полностью…

Reddit DevOps

DevOps Azure Checkbox Custom Field

I feel I am losing my nut...

I want to add Custom Fields to my Bug Tickets & User Story tickets, but I want them to be checkboxes. The only option I have found is this one:
https://stackoverflow.com/questions/74994552/azure-devops-work-item-custom-field-as-checkbox

But it has really odd behaviour that is outside of simply checkboxes.

The reason I do not want toggles is because I do not want an "Off" or "False" state as a visible option, I want users to update the checkbox to be checked if the option is applicable.

Surely there is a way to have a simple checkbox custom field on a work type item?

I am sure this has likely been asked a billion times, but my googling skills are letting me down, as I either get the same responses, or irrelevant responses.

Cheers

https://redd.it/1ltdg2p
@r_devops

Читать полностью…

Reddit DevOps

I got slammed with a $3,200 AWS bill because of a misconfigured Lambda, how are you all catching these before they hit?

I was building a simple ingestion pipeline with Lambda + S3.

Somewhere along the way, I accidentally created an event loop, each Lambda wrote to S3, which triggered the Lambda again. It ran for 3 days.

No alerts. No thresholds. Just a $3,200 surprise when I opened the billing dashboard.

AWS support forgave some of it, but I realized we had **zero guardrails** to catch this kind of thing early.

My question to the community:

* How do *you* monitor for unexpected infra costs?
* Do you treat cost anomalies like real incidents?
* Is this an SRE/DevOps responsibility or something you push to engineers or managers?

https://redd.it/1ltdt4q
@r_devops

Читать полностью…

Reddit DevOps

Do you guys use pure C anywhere?

Wondering if you guys use C anywhere, or just bash,python,go. Or is C only for Systems Performance and Linux books

https://redd.it/1lt9w5g
@r_devops

Читать полностью…

Reddit DevOps

GitOps with ArgoCD Introduction

Hey, I wrote an introduction about GitOps with ArgoCD. Take a look if you are interested in. What is your deployment process? Are you writing CI/CD pipelines with GitHub Actions or something similar?




If you have a medium account:

erwinschleier/gitops-introduction-with-argo-cd-51f81302e013">erwinschleier/gitops-introduction-with-argo-cd-51f81302e013" rel="nofollow">https://medium.com/@erwinschleier/gitops-introduction-with-argo-cd-51f81302e013


Personal blog:

https://erwin-schleier.com/2025/07/04/gitops-introduction-with-argo-cd/

https://redd.it/1lt053g
@r_devops

Читать полностью…

Reddit DevOps

Maybe humans don't need to write documentation for humans anymore?

With tools like Devin wiki starting to generate human-readable documentation from code, shouldn't we shift our focus? Instead of humans writing docs for other humans, we could have AI generate those on-demand when needed.

What humans should focus on is creating documentation for AI - the stuff that can't be extracted from GitHub repos alone. Things like design rationale, decision-making processes, considerations that were explored, task contexts, etc. We should be building environments where humans can effectively pass this kind of contextual knowledge to AI systems.

Thoughts?

https://redd.it/1lt2g73
@r_devops

Читать полностью…

Reddit DevOps

Is Judge0 the right way to run user code for a hobby site?

I’m making a website where i need to let untrusted user code hit public APIs during execution while blocking everything else (internal IPs, metadata endpoints, crypto mining pools, blah blah blah….). Looking for proven patterns / tools.

Best thing I've found online that’s open-source is Judge0, so i was wondering. Have any if you have used it, or anything similar?

I’d really appreciate pointers to blog posts, GitHub examples, or your own configs. Trying to ship publicly soonish without waking up to a surprise AWS bill or a CVE headline, because someone has tried to mine crypto on my servers.

https://redd.it/1lsxdkf
@r_devops

Читать полностью…

Reddit DevOps

Devops consulting

Hey buddies
I have been in the field for roughly 3+ years, and I hold 3 AWS certifications and the CKA, and have a solid experience with most of main devops tools.
I plan to start a consulting business, where I provide devops consulting and maybe some type of retainer support later.
Anyone who have some ideas in mind and can help me kick off this journey?

PS: We are two persons, my friend have a similar experience more or less

https://redd.it/1lsmz0b
@r_devops

Читать полностью…

Reddit DevOps

How often do you actually write scripts?

Context on me - work in tech consulting/professional services. I’m places out to clients by my employer on short-long range contracts/projects.

Primarily as a Senior Platform Engineer and DevOps Engineer.

95% of the time the past 4 years I’ve only wrote Terraform or YAML.

I think I maybe wrote 4 Python Scripts and 3 Bash Scripts.

Every job ad requires Python/Bash and more so Golang nowadays.

I try to do things outside or work for personal projects to keep up to date. But it’s difficult now as a parent. Every time it comes to write a script, I need to refresh myself on Python.

Am I the only one? My peers feel the same and the clients I’m at, some of their staff don’t even know how to code.

https://redd.it/1lsi7zi
@r_devops

Читать полностью…

Reddit DevOps

Istio and a small architecture

I’m trying to build a small microservice to practice with the Istio Bookinfo sample app, and I’d appreciate some advice. My current plan is to have one master node (first VM) and two worker nodes (two additional VMs). The last VM might be used for Jenkins, but I’m not sure if that’s the best approach.

What would be a recommended architecture for this setup? I definitely want to use NGINX for load balancing and as an ingress controller, Prometheus for monitoring, and Jenkins for automation. Should I also include Helm and ArgoCD?

I don’t have much experience with architecture planning, so I’d like to know what other technologies or tools I should consider for a microservices environment besides the ones mentioned above.



https://redd.it/1lse090
@r_devops

Читать полностью…

Reddit DevOps

What are the type of things you do as a DevOps manager?

I'm assuming some of the people that work here are in Management Roles. And I get the general gist of it, but what have you been up to the past year, maybe something concrete, any stumbling blocks. Just looking to hear some stories.

https://redd.it/1ls69qd
@r_devops

Читать полностью…

Reddit DevOps

Looking for a small team to build and learn together this summer

Hey r/devops,

I’m hoping to find a few people interested in teaming up to work on a practical project this summer. Something hands-on around infrastructure, automation, or tooling, where we can learn from each other and get real experience.

I’ve been mostly working with cloud tools and some scripting lately, but want to try collaborating with others instead of working solo. No pressure or fancy plans, just a group of folks who want to build and improve together.

If this sounds like your vibe, please reply or DM. I’d love to hear what you’re working on or want to try.

https://redd.it/1ls2k8h
@r_devops

Читать полностью…

Reddit DevOps

4-month global builder challenge for DevOps engineers — teams, mentorship, grants, and prizes

Hey r/devops,

Wanted to share an opportunity that might resonate with those who enjoy building scalable, reliable infrastructure and automated pipelines.

The **World Computer Hacker League (WCHL)** is a 4-month global builder challenge focused on **open internet infrastructure, AI, and blockchain**. Many teams are working on projects involving deployment automation, infrastructure as code, CI/CD pipelines, monitoring, and decentralized ops tooling.

Here’s what’s on offer:

* 👥 Team-based projects only — no solo entries, but you can find teammates on Discord
* 🧠 Weekly workshops and mentorship from experienced engineers
* 💰 Grants, bounties, and milestone-based rewards
* 🌍 Open to students and independent engineers worldwide
* ⚙️ Tech and stack-agnostic — build with the tools and frameworks that fit your vision

If you’re interested in applying DevOps best practices to decentralized systems, automating cloud deployments, or managing secure infrastructure at scale, this could be a great place to experiment and build.

📌 If you’re in **Canada or the US**, register through **ICP HUB Canada & US** so we can support you directly during the challenge:
[https://wchl25.worldcomputer.com?utm\_source=ca\_ambassadors](https://wchl25.worldcomputer.com?utm_source=ca_ambassadors)

Feel free to reach out if you want to discuss project ideas or find collaborators. Would love to see some strong DevOps projects in the lineup!

https://redd.it/1lryojc
@r_devops

Читать полностью…
Subscribe to a channel