r_devops | Unsorted

Telegram-канал r_devops - Reddit DevOps

86

Reddit DevOps. #devops Thanks @reddit2telegram and @r_channels

Subscribe to a channel

Reddit DevOps

Al won't make coding obsolete. Coding was never the hard part.



Most takes about Al replacing programmers miss where the real cost sits.

Typing code is just transcription. The hard work is upstream: figuring out what's actually needed, resolving ambiguity, handling edge cases, and designing systems that survive real usage. By the time you're coding, most of the thinking should already be done.

Tools like GPT, Claude, Cosine, etc. are great at removing accidental complexity, boilerplate, glue code, ceremony. That's real progress. But it doesn't touch essential complexity.

If your system has hundreds of rules, constraints, and tradeoffs, someone still has to specify them. You can't compress semantics without losing meaning. Any missing detail just comes back later as bugs or "unexpected behavior."

Strip away the tooling differences and coding, no-code, and vibe coding all collapse into the same job, clearly communicating required behavior to an execution engine.

https://redd.it/1q1wpr4
@r_devops

Читать полностью…

Reddit DevOps

Radio station with a host that judges your workflows, explained in detail

This is post I made purely to provide value and explain to everyone in detail how I did it. Hope it clears things up!

What it is

Nikolytics Radio is a late-night jazz station for founders who work too late. 3-hour YouTube videos. AI-generated jazz. A tired DJ named Sonny Nix who checks in between tracks with deadpan observations about your inbox, your pipeline, and why that proposal is still sitting in drafts.

Five volumes in five days. 70+ subscribers. Over 200k views on the first Reddit post.

It's a passion project that doubles as marketing for my automation consultancy.

The concept

The pitch: You're at your desk at 3 AM. Everyone's asleep. You put on Nikolytics Radio. A weathered voice observes your situation with dark humor. He's been where you are. He doesn't fix it. He just... sees it. Then plays a record.

The DJ (Sonny Nix) is a former founder who burned out and now plays jazz for strangers. He has recurring "listeners" who write in: Todd from Accounting whose job got automated, Margaret from Operations who finished her task list and doesn't know what to do with herself.

It's 95% vibe, 5% branding. If you removed every mention of my business, the station would still work. That's the point.

The tech stack

Music generation: Suno

I wrote 49 artist-specific prompts optimized for deep work. Each prompt targets a specific jazz style piano trio, cool trumpet, tenor ballad, etc. Settings: Instrumental only, \~3-4 min tracks, specific mood tags.

Example prompt structure:

jazz, 1950s late-night jazz combo: brushed kit, upright bass walking gently,
warm felted piano carrying the main theme, soft brass pads...
mood tags: soft, warm, slow, lounge, nostalgic

Generate 3-4 per prompt, pick the best, discard anything too busy or with abrupt endings.

Voice generation: ElevenLabs

Custom voice clone for Sonny Nix. I use their V3 model with specific audio tags:

`[mischievously]` \- dry humor, irony
[whispers] \- punchlines, gut punches
`[sighs]` \- weariness
[excited] \- mock ads only (ironic use)
`...` \- pauses

V3 doesn't support some tags like \[warm\] or \[tired\], so the 
words have to carry the emotion. Write tired sentences. Sorrowful observations.

Script writing: txt

I mostly write the scripts, claude double checks for optimizations

Assembly: Logic Pro

120 BPM grid. Drop the tracks, drop the voice clips. Crossfade. Each episode is \~30 drops across 3 hours. Export as MP3.

Video: FFmpeg

Static image + audio. One command:

ffmpeg -loop 1 -i image.png -i audio.mp3 -c:v libx264 -tune stillimage
-c:a aac -b:a 320k -shortest output.mp4

The writing system

Each episode has 30 "drops" - short DJ segments between songs:

Station IDs \- Quick brand hits ("Nikolytics Radio... still here.")
Bumpers \- One-liners ("The coffee's cold. You noticed an hour ago. Still drinking it.")
Pain points \- Observations that hit too close ("Revision eight. The scope tripled. The budget didn't.")
Testimonials \- Fictional listeners writing in
Mock ads \- Parody sponsor segments ("Introducing Scope Creep Insurance...")
Dedications \- "This one goes out to everyone who almost quit today..."
Recurring segments \- Pipeline Weather, Outreach Report, Inbox Conditions

The key insight: Sonny has emotional range. He's not monotone. He moves between tired, mischievous, sorrowful. He worries about Todd. He offers brief sympathy to Sarah. Then plays a record.

What worked

1. The vibe is the moat. Most automation consultants are boring. This is different enough that people share it.
2. Worldbuilding compounds. Todd's promotion arc. Margaret's puzzle. Callbacks like "Here it's always 3 AM." Returning listeners feel like regulars.
3. Reddit got it started. First post on r/productivity got 14k views. Someone called it "Slop Radio FM." Now that's a badge of honor we reference in the show.
4. Daily uploads built

Читать полностью…

Reddit DevOps

Who here works as a Sales Engineer / Solutions Engineer? Looking for real-world advice

I currently work as a contractor and often collaborate with distributed teams. In most projects, especially when there is an on-call rotation or production responsibility, I’ve noticed that almost every major technical or architectural decision has to go through the Sales Engineer / Solutions Engineering team.
As someone coming from a more hands-on engineering background, I’m trying to understand this role better.
I would really appreciate advice on:
>What the day-to-day responsibilities of a Sales Engineer / Solutions Engineer actually look like
>How leads are sourced, and what the role looks like during periods when no deals are being closed
>What skills, background, or experience are critical to transition into this role from an engineering position
>Any harsh or less-talked-about realities of working in Sales / Solutions Engineering
If you’re working in Sales Engineering or Solutions Engineering, I’d love to hear your perspective.
I started looking into this role after coming across the compensation numbers on the careers page of one of my dream companies, and honestly, it made me curious— especially compared to traditional engineering roles.

https://redd.it/1q1rijx
@r_devops

Читать полностью…

Reddit DevOps

Need help deciding on what path to take in 2026

 I'm having trouble figuring out what I should focus on this upcoming year. I have some experience that I will list below from my resume. I really like programming. I like building things I like the job from my internships/apprenticeships. DevOps has been fun but also generally the back end is something that I'm interested in especially with some of my Java experience.

My experience is a bit general which is why I have concerns. And ultimately I'm not sure if I should be focusing on one thing or another. And not having a job is kind of starting to wear me down.

For context I don't have a degree in computer science. I come from a non tech background but I've been working hard at it for the past five years. I have had an internship at a fairly large company in the San Francisco Bay Area from Year Up, that I completed in 2024 for IT as a support specialist. In that job I also worked very closely with the client platform engineering team and did a lot of Devops, though I am pretty rusty because it was 6 months for Year up training and only 6 months for the internship at the larger company and then in 2025 I joined an apprenticeship for that same company for a different team. At the apprenticeship I was on the back end team doing Java and data pipelines. Unfortunately there were some issues with the team and things didn't work out for me and I've been unemployed since  the beginning of November.

My issues are that jumping from IT to devops to Java has left me a bit under-experienced practically. Additionally the apprenticeship this past year was not ideal for learning the skills I needed to be self sufficient as I realistically spent 3 months on the backend team/learning Java for the first time. So I would not be able to pass coding challenges for interviews. Additionally stepping away from IT/Devops has left my IT knowledge a bit lacking too.

I have a couple options for this upcoming year so I will try to lay them out.

I can try and get the Network+ certificate while looking for an IT job right away. To me that feels like the most attainable job to get quickly. Something like help desk or something like support analyst. But I genuinely don’t know how to get a job, it’s been 2 years since I did a job search. I don’t know if I can just start applying on Linkedin, or talking to staffing agencies or what…

Another path is really honing my Java skills, getting good at coding, and hoping my experience at the large Silicon valley company will carry me to a job via applications? I have some friends that work for the mag 7, Meta, Google, Apple, etc that have given me referrals. Though I am struggling to find junior roles or 0-2 years experience roles with them or even anywhere in general.

The next path focusing on Java, honing my skills like I mentioned, and electing to go back to school for the Computer Science degree. I found WGU which is an accredited online school. Due to my history at another college, I have enough transfer credits where I will only need \~52 credits from WGU to get my bachelors. I believe I can likely get this done in about a year.



So yeah, to reiterate I need a job sooner rather than later. But at the same time I’m not sure which area to focus on for studying while I conduct my job search. I want to spend my time wisely. While I’m leaning towards IT and certs just to get some kind of income from tech. I just don't know how relevant a Network+ cert would be in the short term or if the knowledge would actually get me a job…

A part of me wants to just go full in on Java/backend/maybe DevOps, and college. I think having that I'm close to graduating on my resume for Comp Sci would be enough to get some interviews this year? Plus the true college experience (I assume) would push me to be a much better programmer.


My Experience (I can add more detail if it would help):

**Software Engineer**

*San Francisco, CA | January 2025 – November 2025*


**It Support Analyst**

*San Francisco, CA | May 2024 – January

Читать полностью…

Reddit DevOps

Patching: The Boring Security Practice That Could Save You $700 Million

https://lukasniessen.medium.com/patching-the-boring-security-practice-that-could-save-you-700-million-4d8f8b4b56a1?source=user_profile_page---------2-------------e997ef2a34b8----------------------

https://redd.it/1q1r2tx
@r_devops

Читать полностью…

Reddit DevOps

Does anyone here use rapidapi? Having issues making a payment

I'm trying to add my card to purchase a subscription yet my card keeps declining. So then I decide to use klarna as a loan payback option and it gets declined. Then I use affirm for loan payback and the loan was charged but the payment was blocked by rapidapi. The only possible conclusion why this happened is I was making api calls from my laptop while using hotspot so I don't know if rapidapi considered this a proxy and decided to block me from making payments?

https://redd.it/1q00zkz
@r_devops

Читать полностью…

Reddit DevOps

I got tired of the GitHub runner scare, so I moved my CI/CD to a self-hosted Gitea runner.

With the recent uncertainty around GitHub runner pricing and data privacy, I finally moved my personal projects to a self-hosted Gitea instance running on Docker.

The biggest finding: Gitea Actions is compatible with existing GitHub Actions .yaml files. I didn't have to rewrite my pipelines; I just spun up a local runner container, pointed it to my Gitea instance, and the existing scripts worked immediately.

It’s now running on my home server (Portainer) with $0 cost, zero cold-starts, and total data privacy.

Full walkthrough of the docker-compose setup and runner registration:https://youtu.be/-tCRlfaOMjM

Is anyone else running Gitea Actions for actual production workloads yet? Curious how it scales.

https://redd.it/1pzvjv0
@r_devops

Читать полностью…

Reddit DevOps

Docker's hardened images, just Bitnami panic marketing or useful?

Our team's been burned by vendor rug pulls before. Docker drops these hardened images right after Bitnami licensing drama. Feels suspicious.

Limited to Alpine/Debian only, CVE scanning still inconsistent between tools, and suppressed vulns worry me.

Anyone moving prod workloads to these? What's your take?

https://redd.it/1pzrz1p
@r_devops

Читать полностью…

Reddit DevOps

Holiday hack: EKS with your own machines

Hey folks, I’m hacking on a side project over the holidays and would love a sanity check from folks running EKS at scale.

Problem: EKS/EC2 is still a big chunk of my AWS bills even after the “usual” optimizations. I’m exploring a way to reduce EKS costs even further without rewriting everything from scratch without EKS.

Most advice (and what I’ve done before) clusters around:

- Spot + smart autoscaling (Karpenter, consolidation, mixed instance types)
- Rightsizing requests/limits, bin packing, node shapes, and deleting idle workloads
- Graviton/ARM where possible
- Reduce cross-AZ spend (or even go single AZ if you can)
- FinOps visibility (Kubecost, etc.) to find the real culprits (eg, unallocated requests)
- “Kubernetes tax” avoidance: move some workloads to ECS/Fargate when you can

But even after doing all this, EC2 is just… Expensive.

So I'm playing around with a hybrid EKS cluster:

- Keep the managed EKS control plane in AWS
- Run worker nodes on much cheaper compute outside AWS (e.g. bare metal servers on Hetzner)
- Burst to EC2 for spikes using labels/taints + Karpenter on the AWS node pools

AWS now offers “EKS Hybrid Nodes” for this, but the pricing is even more expensive than EC2 itself (why?), so I’m experimenting with a hybrid setup without that managed layer.

Questions for the crowd:

- Would you ever run production workloads on off-AWS worker nodes while keeping EKS control plane in AWS? Why/why not?
- What’s the biggest deal-breaker: networking latency, security boundaries, ops overhead, supportability, something else?

If this resonates, I’m happy to share more details (or a small writeup) once I’ve cleaned it up a bit.

https://redd.it/1pzom7p
@r_devops

Читать полностью…

Reddit DevOps

ai generated k8s configs saved me time then broke prod in the weirdest way

context: migrating from docker swarm to k8s. small team, needed to move fast. i had some k8s experience but never owned a prod cluster

used cursor to generate configs for our 12 services. honestly saved my ass, would have taken days otherwise. got deployments, services, ingress done in maybe an hour. ran in staging for a few days, did some basic load testing on the api endpoints, looked solid

deployed tuesday afternoon during low traffic window. everything fine for about 6 hours. then around 9pm our monitoring started showing weird patterns - some requests fast, some timing out, no clear pattern

spent the next few hours debugging the most confusing issue. turns out multiple things were breaking simultaneously:

our main api was crashlooping but only 3 out of 8 pods. took forever to realize the ai set liveness probe initialDelaySeconds to 5s. works fine in staging where we have tiny test data. prod loads way more reference data on startup, usually takes 8-10 seconds but varies by node. so some pods would start fast enough, others kept getting killed mid-initialization. probably network latency or node performance differences, never figured out exactly why

while fixing that, noticed our batch processor was getting cpu throttled hard. ai had set pretty conservative limits - 500m cpu for most services. batch job spikes to like 2 cores during processing. didnt catch it in staging because we never run the full batch there, just tested the api layer

then our cache service started oom killing. 256Mi limit looked reasonable in the configs but under real load it needs closer to 1Gi. staging cache is basically empty so never saw this coming

the configs themselves were fine, just completely generic. real problem was my staging environment told me nothing useful:

test dataset is 1% of prod size
never run batch jobs in staging
no real traffic patterns
didnt know startup probes were even a thing
zero baseline metrics for what "normal" looks like

basically ai let me move fast but i had no idea what i didnt know. thought i was ready because the yaml looked correct and staging tests passed

took about 2 weeks to get everything stable:

added startup probes (game changer for slow-starting services)
actually load tested batch scenarios
set up prometheus properly, now i have real data
resource limits based on actual usage not guesses
tried a few different tools for generating configs after this mess. cursor is fast but pretty generic. copilot similar. someone mentioned verdent which seems to pick up more context from existing services, but honestly at this point i just validate everything manually regardless of what generates it

costs are down about 25% vs swarm which is nice. still probably over-provisioned in places but at least its stable

lesson learned: ai tools are incredible for velocity but they dont teach you what questions to ask. its like having an intern who codes really fast but never tells you when something might be a bad idea

https://redd.it/1pzn5f9
@r_devops

Читать полностью…

Reddit DevOps

I'm rejecting the next architecture PR that uses a Service Mesh for a team of 4 developers. We are gaslighting ourselves.

I’ve been lurking here for years, and after reading some recent posts, I need to say something that might make me unpopular with the "CV-Driven Development" crowd.

We are engineering our own burnout.

I've sat on hiring panels for the last 6 months, and the state of "Senior" DevOps is terrifying. I’m seeing a generation of engineers who can write complex Helm charts but can’t explain how DNS propagation works or debugging a TCP handshake.

Here is my analysis of why our industry is currently broken:

1. The Abstraction Addiction We are solving problems we don't have. I saw a candidate last week propose a multi-cluster Kubernetes setup with Istio for a simple internal CRUD app. When I asked why not just use a boring EC2 instance or ECS task, they looked at me like I suggested using FTP. We are choosing tools not because they solve a business problem, but because we want to put them on our LinkedIn. We are voluntarily taking on the operational overhead of Netflix without having their scale or their headcount.

2. The Death of Debugging To the user who posted "New DevOps please learn networking": Thank you. We are abstracting away the underlying systems so heavily that we are creating engineers who can "configure" but cannot "fix." When the abstraction leaks (and it always does, usually at 3 AM), these "YAML Engineers" are helpless because they don't understand the Linux primitives underneath.

3. Hiring is a Carnival Game We ask for 8 rounds of interviews to test for trivia on 15 different tools, but we don't test for systems thinking. Real seniority isn't knowing the flags for every CLI tool; it's knowing when not to use a tool. It's about telling management, "No, we don't need to migrate to that shiny new thing."

4. Complexity = Job Security (False) We tell ourselves that building complex systems makes us valuable. It doesn't. It makes us pagers. The best infrared engineers I know build systems so boring that they sleep through the night. If you are currently building a resume-padder architecture: Stop.



If you are a Junior: Stop trying to learn the entire CNCF landscape. Learn Linux. Learn Networking. Learn a scripting language deeply. If you are a Senior: Stop checking boxes. Start deleting code.


The most senior thing you can do is build something so simple it looks like a junior did it, but it never goes down.

/endrant

https://redd.it/1pzkibf
@r_devops

Читать полностью…

Reddit DevOps

qa tests blocking deploys 6 times today, averaging 40min per run

our pipeline is killing productivity. we've got this selenium test suite with about 650 tests that runs on every pr and it's become everyone's least favorite part of the day.

takes 40 minutes on average, sometimes up to an hour. but the real problem is the flakiness. probably 8 to 12 tests fail on every single run, always different ones. devs have learned to just click rerun and grab coffee.

we're trying to ship multiple times per day but qa stage is the bottleneck. and nobody trusts the tests anymore because they've cried wolf so many times. when something actually fails everyone assumes it's just another selector issue.

tried parallelizing more but hit our ci runner limits. tried being smarter about what runs when but then we miss integration issues. feels like we're stuck between slow and unreliable.

anyone actually solved this problem? need tests that are fast, stable, and catch real bugs. starting to think the whole selector based approach is fundamentally flawed for complex modern webapps.

https://redd.it/1pzgupz
@r_devops

Читать полностью…

Reddit DevOps

Terraform's dependency on github.com - what are your thoughts?

Hi all,

Like two weeks ago ( december the 18th ) github.com its reachability was affected by an issue on their side.

See -> https://www.githubstatus.com/incidents/xntfc1fz5rfb

We needed to do maintenance that very day. All of our terraform providers were defined as default. "Go get it from github" plus we didn't had any terraform caching active.

We needed to run some terraform scripts multiple times to be lucky to not get a 500/503 from github downloading the providers. In the end we succeeded but it took a lot more time then anticipated.

We now worked on having all of our terraform providers on local hosted location.
Some tuning with .terraformrc, some extra's in our CI/CD pipeline for running terraform.
All together a nice project to put together, it requires you to think about what are the providers that we are using? And which versions do we exactly need.

But it also creates another technical nook in our infrastructure. F.e. when we want to up one of the provider versions we need to perform additional tasks.

What are your thoughts about this? Some services are treated like they are the light and water of the internet. They are always there ( github / dockerhub / cloudfare ) - until they are not and recently we noticed a lot of the latter behavior.

One thought is this doesn't happens that often, they have the top of the line infra + expertise.
It isn't worth doing this kind of workaround if you are not servicing infra for an hospital or a bank.

The other more personally thought is, I like the disruptive nature of these incidents, it encourages you to think past the assumption of tech building blocks that are to big to fail.
And it ignites the doubt that is not so wise that everybody should stick to the same golden standards from the big 7 in Silicon Valley.

Tell me!?

https://redd.it/1pzfe7e
@r_devops

Читать полностью…

Reddit DevOps

The hardest incidents to explain are the quiet ones

Some of the hardest security incidents I’ve been part of weren’t dramatic. No outages, no obvious alerts, nothing screaming for attention.
Just small things that didn’t line up in hindsight.
How do you all validate concerns when there’s no clear signal yet?

https://redd.it/1pzcww5
@r_devops

Читать полностью…

Reddit DevOps

Supply chain feels “unfinished” once things are live

We do all the right things at build time, but I’ve still seen dependencies behave oddly once they’re under real traffic.
It made me realize how much we assume build-time checks are enough.
How are others thinking about this after deployment?

https://redd.it/1pz9xn9
@r_devops

Читать полностью…

Reddit DevOps

momentum. Five volumes in five days. The algorithm likes consistency.

What I learned about AI voice

ElevenLabs V3 is good but literal. It interprets quotes as character voices (breaks everything). Always paraphrase.
Tags only work if the model supports them. No [warm\], no [tired\]. The text has to do the work.
Regenerate 2-3x per drop, pick the best take. Same script, different reads.
Punchlines land in [whispers]. Setup is [mischievously]. Then stop - no extra lines after the joke lands.

Time investment

Initial setup (prompts, character docs, templates): \~15 hours
Per episode now: \~2 hours
Generate music: 30 min
Generate voice drops: 30 min
Assembly in Logic: 30 min
YouTube upload + description: 30 min

What could be automated further

Voice generation \- Currently pasting drops one by one into ElevenLabs. Could batch via API.
Timestamps \- Calculating from bar positions manually. Already wrote a Python script, could integrate it.
YouTube description \- Template exists, still copy-pasting. Easy n8n automation.
Episode assembly \- The real bottleneck. Logic Pro is manual drag-and-drop. Exploring scripted alternatives.

Writing stays mine.

The dream: one-click episode generation. Not there yet, but the pieces exist.

After getting the desired results and I train the AI enough to understand how everything is supposed to work, it will be automated. I need it to be perfectly in sync with my concept.

Link

NikolyticsRadio">**NikolyticsRadio" rel="nofollow">https://www.youtube.com/@NikolyticsRadio**

Happy to answer questions about the workflow, the writing system, or the Suno/ElevenLabs settings.

TL;DR: Built a fake radio station with AI music (Suno), AI voice (ElevenLabs), and my scripts. The DJ has a character bible. There's lore. It's marketing for my automation business but also just... a thing that exists now. 70 subscribers in 5 days.

https://redd.it/1q1w2l1
@r_devops

Читать полностью…

Reddit DevOps

Unexpected ₹9 lakh Azure bill after startup credits expired, seeking advice on waiver/refund

I had $1000 Azure startup credits and was using OpenAI APIs + Data Lake for personal/learning work.
After credits expired, some services kept running unknowingly and I now have a ~₹9 lakh bill.

I deleted everything immediately and raised a billing support ticket for waiver.
Has anyone successfully gotten such charges waived or reduced?
Any tips or do’s/don’ts would help a lot.

https://redd.it/1q1prko
@r_devops

Читать полностью…

Reddit DevOps

2025*



https://redd.it/1q1qq5o
@r_devops

Читать полностью…

Reddit DevOps

Looking to form a small DevOps group for learning, motivation & side gigs

Hey everyone 👋

I’m a DevOps engineer and I’m trying to be more intentional about growth outside my day job.

Instead of doing it alone, I’m thinking of creating a small, focused group of DevOps folks who want to:

-Upskill together (real-world DevOps skills)
-Share learning resources and experiences
-Keep each other motivated
-Explore legit side gigs / freelancing opportunities when they come up.

This is not a course, not a paid group, and not spam.

Just a few like-minded people who want to grow steadily and support each other.

If this resonates with you, comment or DM me:

Your experience level (beginner / intermediate / experienced)
What you’re currently learning or aiming for

If enough people are interested, we can decide the best platform (Discord / Slack / WhatsApp).

Cheers!

https://redd.it/1q1qjx0
@r_devops

Читать полностью…

Reddit DevOps

How do you enforce data contracts end-to-end across microservices → warehouse?

Hey folks,
We ingest events from microservices into a warehouse. A producer shipped a “small” schema change, and our ingestion kept running but started failing decoding/validation downstream. Nobody noticed for a while → we effectively lost data until someone spotted a gap.

We’re a pretty large org, which makes me feel we’re missing something basic or doing something wrong. This isn’t strictly in my responsibility, but I’m wondering: is this also common on your side? If you’ve solved it, what guardrails actually work to catch this fast?

https://redd.it/1q1bk6l
@r_devops

Читать полностью…

Reddit DevOps

How do u know a CloudFormation CHANGE won’t break something subtle?

You change one resource.
The stack deploys successfully.
Nothing errors.

But something downstream breaks.

How do you catch that before deploy?
Or do you just accept the risk?

Curious how people think about this in practice.


https://redd.it/1pzu7dl
@r_devops

Читать полностью…

Reddit DevOps

How do you integrate identity verification into CI/CD without slowing pipelines?

Hey folks, DevOps teams always need identity verification that plugs straight into pipelines without blocking deployments or creating security gaps since most solutions either slow everything down or leave staging environments exposed and we're looking for clean API handoffs delivering reliable signals at real scale.

Does anyone know of what works seamlessly for CI/CD flows?

https://redd.it/1pzuoy1
@r_devops

Читать полностью…

Reddit DevOps

I made a CLI game to learn Kubernetes by fixing broken clusters (50 levels, runs locally on kind)

Hey ,


I built this thing called K8sQuest because I was tired of paying for cloud sandboxes and wanted to practice debugging broken clusters.


## What it is


It's basically a game that intentionally breaks things in your local kind cluster and makes you fix them. 50 levels total, going from "why is this pod crashing" to "here's 9 broken things in a production scenario, good luck."


Runs entirely on Docker Desktop with kind. No cloud costs.


## How it works


1. Run ./play.sh - game starts, breaks something in k8s
2. Open another terminal and debug with kubectl
3. Fix it however you want
4. Run validate in the game to check
5. Get a debrief explaining what was wrong and why


The game Has hints, progress tracking, and step-by-step guides if you get stuck.


## What you'll debug


- World 1: CrashLoopBackOff, ImagePullBackOff, pending pods, labels, ports
- World 2: Deployments, HPA, liveness/readiness probes, rollbacks
- World 3: Services, DNS, Ingress, NetworkPolicies
- World 4: PVs, PVCs, StatefulSets, ConfigMaps, Secrets
- World 5: RBAC, SecurityContext, node scheduling, resource quotas


Level 50 is intentionally chaotic - multiple failures at once.


## Install


    git clone https://github.com/Aryan4266/k8squest.git
cd k8squest
./install.sh
./play.sh



Needs: Docker Desktop, kubectl, kind, python3


## Why I made this


Reading docs didn't really stick for me. I learn better when things are broken and I have to figure out why. This simulates the actual debugging you do in prod, but locally and with hints.


Also has safety guards so you can't accidentally nuke your whole cluster (learned that the hard way).


Feedback welcome. If it helps you learn, cool. If you find bugs or have ideas for more levels, let me know.


GitHub: https://github.com/Aryan4266/k8squest

https://redd.it/1pzr4jh
@r_devops

Читать полностью…

Reddit DevOps

How would you define proactive AWS Hygiene and Ownership process

We currently lack a standardized way to track ownership, lifespan, and relevance of AWS resources, especially in non-prod accounts. This leads to unused resources, unnecessary cost, and ambiguity during alerts or incidents. We need a proactive process to keep AWS environments clean and accountable.

While I will give some thoughts about this. I want to ask to fellow people, how would you define a process? What steps should be good here? What requirements do you feel we as DevOps need here?

https://redd.it/1pzlj8c
@r_devops

Читать полностью…

Reddit DevOps

Release management nightmare - how do you track what's actually going out?

Just had our third surprise production issue this month bc nobody knew which features were bundled in our release. Engineering says feature X is ready, QA cleared it last week, but somehow it wasn't in the build that went out Friday.

We have relied on Slack threads and manual Git tag checking, they have served us fine for a while but I think we've reached a breaking point. How does this roll up to leadership when they ask what shipped this sprint? Like, what are you using for release management to ensure everything falls into place?

https://redd.it/1pzi7l9
@r_devops

Читать полностью…

Reddit DevOps

Looking for help for my startup

Hey all!

I'm coming here to seek for some guidance or help on how to tackle my next challenge on the startup I am creating.

We currently have various services that some clients are currently using, and our next step is white labeling certain type of website.

Right now, we operate this website which is running over a mono-repo with React and NextJS, and is extremely connected with an admin panel in a different repository.

The website usually requests for data to the admin panel, including for secrets at server-boot (I did this to allow my future self to deploy multiple websites over the same codebase, without having a mess of secrets on GitHub). These secrets are being pulled from the admin panel using a slug I assigned to my website. Ideally, other websites in the future will use this same system.


The problem (or challenge): what's the way to go in order to have multiple deployments happening every time we merge into the main branch? Currently I am using GH actions but to me, it doesn't look sustainable in the future, once we have many white-labeled websites running out there.

It's also important to mention that each website will have it's own external Supabase, an internal (self-hosted) Redis instance, and all of them will use our centralized Soketi (Pusher alternative - self-hosted) service... So, ideally, the solution would include deploying that external Supabase (this is easy, APIs exist for that), a dedicated Redis, and... a server to host the backend, and that dedicated Redis.


I've been a Software Engineer for the last 7-8 years but never really had to actually take care of devops / infra / you-call-it. I'm really open to learn all of this, had multiple conversations with Claude but I always prefer human-to-human information transfers.


Thank you!




https://redd.it/1pzjdwk
@r_devops

Читать полностью…

Reddit DevOps

Kubernetes concepts in 60 seconds

Trying an experiment: explaining Kubernetes concepts in under 60 seconds.

Would love feedback.

Check out the videos on YouTube

soulmaniqbal?si=pZCVwXQizNQXFzv1" rel="nofollow">https://youtube.com/@soulmaniqbal?si=pZCVwXQizNQXFzv1

https://redd.it/1pzfsir
@r_devops

Читать полностью…

Reddit DevOps

zsh-doppler - ZSH plugin to show Doppler project/config in your prompt

I work with a lot of Doppler projects and got tired of running doppler setup / configure to remember which env I was in. So I made a simple plugin that shows [project/config\] in your prompt.

Colors change based on environment - green for dev, yellow for staging, red for prod. Helps avoid that "oh shit" moment when you realize you were in prod.

Works with Oh My Zsh, Powerlevel10k, zinit, etc.

https://github.com/lsdcapital/zsh-doppler

Contributions welcome, happy to help debug, improve it based on feedback



https://redd.it/1pzdt6a
@r_devops

Читать полностью…

Reddit DevOps

CKAD exam pricing confusion: KodeKloud vs Linux Foundation

I recently purchased CKAD via KodeKloud.
For my other four Kubernetes certifications, I bought the exams directly from the Linux Foundation, but this time KodeKloud was offering 55% off for annual subscribers.

https://preview.redd.it/88jby84yo9ag1.png?width=1386&format=png&auto=webp&s=7d94cdcacfd9db0e6f1fced2aca6ddbd500a36b3

The main reason I purchased the annual subscription was to use this discount when needed. After applying it, I paid ₹20.5k INR (including taxes).

Once I redeemed the voucher, it showed:

>

That was fine with me, as I was confident I won’t need a retake.

However, today I accidentally landed on this Linux Foundation page:
https://trainingportal.linuxfoundation.org/learn/course/certified-kubernetes-application-developer-single-attempt-ckad-single/exam/exam

It lists the same CKAD single-attempt exam for $140 (\~₹12–12.5k INR).

https://preview.redd.it/zx7tz4u0p9ag1.png?width=1391&format=png&auto=webp&s=04a80c160758b3dd3eafdcd2ac002de7600b51fe

Same exam.
Same attempt type.
Different platforms. Very different prices.

Am I missing something here or is this just confusing / misleading discount framing?

Posting this to understand better and to help others make an informed choice.

https://redd.it/1pz89eo
@r_devops

Читать полностью…

Reddit DevOps

like:

Reddit (r/devopsr/kubernetesr/aws, r/sre)

CNCF Slack channels

DevOps Discord servers

Local meetups or conferences

Online tech communities that are oriented towards Cloud and Devops (hexplain.space)



# 10. Be Consistent, Not Overwhelmed

DevOps is a long-term journey. Tools will change, fundamentals will not.

If you dedicate a few focused hours each week and build your skills layer by layer, becoming job-ready within several months is realistic. The key is patience, consistency, and learning with purpose.

Join the conversation, stay curious, and keep building.

https://redd.it/1pzalqb
@r_devops

Читать полностью…
Subscribe to a channel