r_devops | Unsorted

Telegram-канал r_devops - Reddit DevOps

86

Reddit DevOps. #devops Thanks @reddit2telegram and @r_channels

Subscribe to a channel

Reddit DevOps

Ask HN / FinOps: How do you actually attribute AI / GPU costs to specific customers or products in multi-tenant SaaS?

Hi there,

I'm digging into billing transparency for AI workloads in multi-tenant systems.

Cloud billing usually shows allocated resources, but mapping real utilization (tokens, GPU time, CPU/RAM usage) to a specific customer or product feature seems surprisingly hard.

Curious how teams handle this in practice:

* How do you attribute infrastructure / AI costs to specific customers?
* Do you track allocation vs real utilization?
* What tools do you use (Kubecost, CloudZero, custom pipelines, etc.)?

Thanks!

https://redd.it/1rqqd78
@r_devops

Читать полностью…

Reddit DevOps

Is it worth taking on a part time Lvl 4 DevOps apprenticeship (UK) as a network design analyst

[](https://www.reddit.com/)Is it worth taking on a part time Lvl 4 DevOps apprenticeship (UK) as a network design analyst.[](https://www.reddit.com/r/devops/?f=flair_name%3A%22Career%20%2F%20learning%22)After 3 years at university I recently landed a graduate role and I’m currently about 6 months into my job as a Network Design Analyst. My role mainly involves supporting commissions and migrations of Fortinet-based networks, working alongside engineers and project teams.

I’m about a month away from sitting my CCNA, and after that my plan was to start working towards Fortinet certifications to deepen my networking knowledge.

My company has offered me the opportunity to do a part-time DevOps Upskiller apprenticeship through Multiverse, which they would fully fund.

My main question is: what are the pros and cons of taking this apprenticeship given the path I’m currently on?

Would it complement a networking career (e.g. automation, infrastructure, cloud), or would it be better to stay focused purely on networking certifications and experience?

I’d be interested to hear from people who have taken a similar path or work in networking / DevOps.

https://redd.it/1rqxc5d
@r_devops

Читать полностью…

Reddit DevOps

VE-2026-28353 the Trivy security incident nobody is talking about, idk why but now I'm rethinking whether the scanner is even the right fix for container image security

Saw this earlier: https://github.com/aquasecurity/trivy/discussions/10265

pull_request_target misconfiguration, PAT stolen Feb 27, 178 releases deleted March 1, malicious VSCode extension pushed, repo renamed. CVE-2026-28353 filed.

That workflow was in the repo since October 2025. Four months before anyone noticed. Release assets from that whole window are permanently deleted. GPG signing key for Debian/Ubuntu/RHEL may be gone too.

Someone checked the cosign signature on v0.69.2 independently and got private-trivy in the identity field instead of the main repo. Quietly fixed in v0.69.3.

Maintainers confirmed: if you pulled via the install script or get.trivy.dev during that window, those assets cannot be checked. Not "we think they're fine." Cannot be checked.

Scanning for CVEs assumes the pipeline that built the image was clean. If it wasn't, the scan result means nothing.

Am I missing something or is this just not a big deal to people? Because it made me completely rethink how much I trust open source container image pipelines.

Looking at SLSA Level 3 for base images now. Hermetic builds, signed provenance. What are people actually using for distroless container images that ships with that level of build integrity baked in? Not scanners. The images themselves.

And before anyone says just switch to Grype or related, please don't. Same problem. You're still scanning images after the fact with no visibility into how they were built or whether the pipeline that produced them was clean. Another scanner doesn't fix a provenance problem.



https://redd.it/1rqmrhi
@r_devops

Читать полностью…

Reddit DevOps

Launch darkly rugpull coming

Hey everyone!

If you're using Launch Darkly on their existing user-based pricing scheme, they're moving to a new usage-based pricing.

Upside? Unlimited users.

Downside? They charge per service connection. What's a service connection? Any independent instance of an app connecting to Launch Darkly. For example, a VM, a Kubernetes pod, or a Heroku worker.

They're charging $12/month per service connection ($10 on an annual commitment).

We were paying $10k/annually for user-based pricing. We would pay $45k on the new per-service connection pricing.

For anyone going through the same thing, there are plenty of open source feature flag tools you can use, like Flagsmith. Just deploy them in your infrastructure and call it a day.

https://redd.it/1rr4fen
@r_devops

Читать полностью…

Reddit DevOps

How to make Documentation Discoverable?

Hey, DevOps Engineer here!

How do you handle the problem of “there is documentation” but no one knows where it is (except like 2 seniors who were there when it was written) - Using Confluence for this example?

The goal is to make the documentation explicitly available where it is most needed, instead of having to ask someone else “Where are the docs on X?” The reason this matters is that if someone is sick or unavailable, we avoid a single point of failure :D

Ideas I’ve come up with:

* Add relevant documents to the Jira ticket (for example, deployment Guide attached to deployment tickets).
* Create “Hook Pages” that are framed around the problem and point to or include the guide for example,
* “How do I do X?” → links to guide on X
* “What is Service?” → links to “Service Architecture Explanation Guide”
* **One guide can have multiple problem/question hooks**

How do you go about making your docmunetation easily findable when you need it?

https://redd.it/1rp2noq
@r_devops

Читать полностью…

Reddit DevOps

finally stopped manually SSH-ing to deploy my code. I built a simple CI/CD pipeline and it saved my sanity.

Last month, I spent 3 hours debugging a broken deployment on a Friday at midnight.

For context, I’m building a full-stack ERP (TypeScript, Node.js, React). Every time I wanted to ship a new feature, my routine was: open terminal -> SSH into my DigitalOcean Droplet -> git pull \-> npm install \-> npm run build \-> restart PM2 -> pray Nginx doesn’t throw a 500 error. It took way too long and was super prone to typos.

I finally decided to automate it. I drew up this architecture [ATTACH YOUR EXCALIDRAW IMAGE HERE\] and wrote a GitHub Actions .yml file.

Now, my workflow is just:

1. git push origin main
2. GH Actions sets up a Node environment, installs dependencies, and runs the TS build (to catch errors early).
3. If it passes, an SSH action connects to my Droplet, pulls the code, and restarts PM2.

Total time: \~30 seconds. Zero manual work. I deployed 3 times today in my pajamas.

I was debating between Jenkins and GitHub Actions, but GH Actions felt like the frictionless choice since my code is already there. For the senior DevOps folks here: at what scale do you usually outgrow GitHub Actions and move to something like Jenkins? Any security flaws in my current setup I should be aware of?

https://redd.it/1rp1n6f
@r_devops

Читать полностью…

Reddit DevOps

Hands-on with OVHcloud Managed Kubernetes

Been testing EU managed k8s providers one by one for eucloudcost.com, OVH was next.

Short version: it just works.

Free control plane, free egress in EU regions. You only pay for nodes. Coming from AWS this feels wrong somehow.

I also managed to set both vRack subnets to no_gateway = true and then spent an hour wondering why Traefik was stuck in Pending. Turns out Octavia needs a gateway on the load balancer subnet. Anyway.

Main issue is no RWX volumes out of the box. File Storage for RWX exists but starts at 150 GiB which is overkill for most things, so out of the Box only RWO exists ...

Also they burned down a datacenter in 2021 so now every resource in the console shows you the AZ deployment mode.

Put together a reference repo with the full OpenTofu setup if you want a starting point: https://github.com/mixxor/opentofu-kubernetes-ovhcloud

Full writeup in comments.

Anyone else running OVHcloud in prod / dev ?
Curious if you hit anything weird I missed...

https://redd.it/1rmp4f9
@r_devops

Читать полностью…

Reddit DevOps

AI’s Impact on DevOps: Opportunities and Challenges

Read this article -- averageguymedianow/ais-impact-on-devops-opportunities-and-challenges-6cdba7a5a45e" rel="nofollow">https://medium.com/@averageguymedianow/ais-impact-on-devops-opportunities-and-challenges-6cdba7a5a45e.

What really caught my eyes is this statement:

"Integrating AI into DevOps workflows introduces significant complexity. Teams must now understand not only traditional infrastructure and application concerns but also machine learning models, training data requirements, model versioning, and AI-specific monitoring needs. This complexity can create new forms of technical debt when AI systems are implemented without proper governance or understanding."

From what I'm seeing, technical debt keeps piling up.


https://redd.it/1rocti9
@r_devops

Читать полностью…

Reddit DevOps

I'm looking to move to a proper devops/platform engineer role

I don't know if its a right place for me to make this post ...
but i have been loking for a job change ...my roles have been mixed like initially i worked as devops engineer for two years then was moved to cloud migration then cloud operations mainly in azure ....i have knowledge in terraform for infrastructure provisioning(mainly virtual machines) jenkins from previous experience python scripting kubernetes (AKS) docker azure devops pipelines its like i know a little bit of everything but not enough so does anyone know how to permanently switch to devops platform engineering?

im stuck i blew of an interview at round 2 because i didn't know system design much so i don't know i would appreciate any sort of help

I don't know where to start wat tools to stick too n learn properly ?

https://redd.it/1roj11d
@r_devops

Читать полностью…

Reddit DevOps

I made an interactive progressive roadmap for new DevOps Engineers

TL;DR

The Roadmap [https://roadmap.esc.sh/](https://roadmap.esc.sh/)
Source : https://github.com/MansoorMajeed/infra-roadmap
Blog Post (the philosophy for learning SRE/DevOps) : [https://blog.esc.sh/sre-devops-roadmap/](https://blog.esc.sh/sre-devops-roadmap/)


I have been an SRE for over a decade, and I’ve mentored a lot of junior engineers. The single biggest hurdle they all face is that the DevOps/SRE field is just incredibly overwhelming to beginners.

Many juniors make the mistake of jumping straight into learning tools (Docker, K8s, Terraform) without actually understanding
what problems those tools were built to solve or how they fit together or the foundation of it all itself. If we look at traditional DevOps roadmaps or the CNCF landscape, it often makes the problem worse. It’s just a massive bingo card of logos that doesn't explain the "why" behind anything.

So, I decided to build a better way to visualize this: an interactive, progressive roadmap.

How it’s different:

Question-Driven: Each different node follows a general thought or question a new engineer may have and lets them choose the next path that they find interesting
Progressive Disclosure: It doesn't show you 200 tools at once. The map expands as you explore, keeping cognitive load low.
Open Source & Static: It’s a fully offline, static site.



Note about how it was made: I am an SRE, not a frontend dev (I still struggle with frontend and I decided that it is not my cup of tea), so I used Claude to help write the React Flow/Next.js engine and some boilerplate text. However, the architecture, the paths, the connections, and the core learning flow are 100% my own design based on my experience. Because of that, it might be biased or missing things, so PRs are more than welcome!

I also wrote a short blog post expanding on why I think we need to teach "concepts over tools" if anyone is interested in the philosophy behind it. https://blog.esc.sh/sre-devops-roadmap/

I hope this helps some of the juniors build a mental model. Would love to hear your feedback!

I am also happy to answer any questions any new folks may have!

https://redd.it/1rojfho
@r_devops

Читать полностью…

Reddit DevOps

hackerbot-claw: An AI-Powered Bot Actively Exploiting GitHub Actions - Microsoft, DataDog, and CNCF Projects Hit So Far

https://www.stepsecurity.io/blog/hackerbot-claw-github-actions-exploitation#attack-6-aquasecuritytrivy---evidence-cleared

Now trivy repo is empty.... https://github.com/aquasecurity/trivy

some advices :

1. Verify the integrity of your Trivy binaries if installed at the end of February
2. Switch to the Docker image (if still available on GHCR/Docker Hub), verify Cosign signatures
3. Keep Checkov or Grype as a fallback
4. Audit your GitHub Actions workflows: no pull_request_target + checkout of the fork, no unescaped ${{ }} in run blocks:

https://redd.it/1ri4nwu
@r_devops

Читать полностью…

Reddit DevOps

Why does docker output everything to standard error?

Everytime I look inside my github wrokflows I see everything outputted to stderr, why does this happen?


Thank you!

https://redd.it/1rhts32
@r_devops

Читать полностью…

Reddit DevOps

CleanCloud v1.6.3 - 20 rules to find what's costing you money in AWS/Azure

A while ago I posted about CleanCloud \- a shift-left cloud waste report tool enforces hygiene as a CI/CD gate, now with cost estimates and --fail-on-cost CLI option

AWS Rules (10):

1. Unattached EBS volumes (HIGH)
2. Old EBS snapshots
3. Infinite retention logs
4. Unattached Elastic IPs (HIGH)
5. Detached ENIs
6. Untagged resources
7. Old AMIs
8. Idle NAT Gateways
9. Idle RDS instances (HIGH)
10. Idle load balancers (HIGH)

Azure Rules (10):

1. Unattached Managed Disks
2. Old Snapshots
3. Unused Public IPs
4. Empty Load Balancers
5. Empty Application Gateways
6. Empty App Service Plans
7. Idle VNet Gateways
8. Stopped (Not Deallocated) VMs — still incurring full compute charges
9. Idle SQL Databases (zero connections 14+ days)
10. Untagged Resources

Every finding includes:
\- Confidence level (HIGH / MEDIUM)
\- Evidence and signals used
\- Resource details and age
\- Cost waste estimates

Enforce in CI/CD:

cleancloud scan --provider aws --all-regions --fail-on-confidence HIGH --fail-on-cost 2000

Exit 0 = pass.

Exit 2 = policy violation.

pipx install cleancloud and run your first scan in 5 minutes.

If you’re one of the 200+ users who have downloaded CleanCloud, we’d love to hear what you found.

Please open an issue here or leave a comment below.

https://redd.it/1rf84m8
@r_devops

Читать полностью…

Reddit DevOps

What is platform engineering exactly?

Every time I tell someone what I like and how I think, they end up in some way or another recommending platform engineering.

For example I’ve always wanted to contribute to open source projects I liked but always thought I wasn’t technically there to help outside infra and cloud, which prompted another “PE is perfect” and every explanation I get is different, and not closely different but can be categorized as a different role

I won’t make the post long by explaining what exactly I like and what I don’t but I want to know what is it to maybe understand why it’s been recommended so much to me. I’d also appreciate some examples of the output of such a role compared to the normal DevOps for example.

https://redd.it/1rhefsl
@r_devops

Читать полностью…

Reddit DevOps

Helm in production: lessons and gotchas

Hi everyone! I've been using Helm in production at scale for the past few years and collected lessons and gotchas that surprised me:

- Helm doesn't manage CRDs.
- --wait doesn't wait for readiness of all resources.
- Dry run is dependent on the state of an existing release.
- Values can be validated with JSON schema.
- OCI registries can be used for charts alongside container images.

I think the tip about values validation is the coolest, because loading the schema into yaml-language-server is a great development experience boost and helps LLMs do better work writing values.

Hope you find this post useful, I think even experienced Helm users can learn something from it.

https://redd.it/1rgdp5x
@r_devops

Читать полностью…

Reddit DevOps

Designing enterprise-level CI/CD access between GitHub <--> AWS

I have an interesting challenge for you today.

Context

I have a GitHub organization with over 80 repositories, and all of these repositories need to access different AWS accounts, more or less 8 to 10 accounts.

Each account has got a different purpose (ie. security, logging, etc).

We have a deployment account that should be the only entry point from where the pipelines should access from.

Constraints

Not all repos should have to have access to all accounts.

Repos should only have access to the account where they should deploy things.

All of the actual provisioning roles (assumed by the pipeline role)( should have least privilege permissions.

The system should scale easily without requiring any manual operations.

How would you guys work around this?


EDIT:

I'm adding additional information to the post not to mislead on what the actual challenge is.

The architecture I already have in mind is:

GitHub Actions -> deployment account OIDC role -> workload account provisioning role

The actual challenge is the control plane behind it:

\- where the repo/env/account mapping lives

\- who creates and owns those roles

\- how onboarding scales for 80+ repos without manual per-account IAM work

\- how to keep workload roles least-privilege without generating an unmaintainable snowflake per repo

I’m leaning toward a central platform repo that owns all IAM/trust relationships from a declarative mapping, and app repos only consume pre-created roles.

So the real question is less “how do I assume a role from GitHub?” and more “how would you design that central access-management layer?”

https://redd.it/1rqwjxt
@r_devops

Читать полностью…

Reddit DevOps

Advice Wanted Transitioning an internal production tool to Open Source (First-timer)

Hey everyone,

I’m looking for some "war stories" or guidance from people who have successfully moved a project from an internal private repo to a public Open Source project.

The Context:

I started this project as "vibe code", heavy AI-assisted prototyping just to see if a specific automation idea for our clusters would work.

Surprisingly, it scaled well. I’ve spent the last 3 months refactoring it into proper production-grade code, and it’s currently handling our internal workloads without issues.

I’ve want to "donate" this to the community, but since this is my first time acting as a maintainer, I want to do it right the first time. I’ve seen projects fail because of poor Day 1 execution, and I’d like to avoid that.

Specific hurdles I’m looking for help with:

1. Sanitization: Besides .gitignore, what are the best tools for scrub-testing a repo for accidental internal URLs or legacy secrets in the git history before the first public push?

2. Documentation for Strangers: My internal docs assume you know our infrastructure. What’s the "Gold Standard" for a README that makes a cluster tool accessible to someone with zero context?

3. Licensing: For infrastructure/orchestration tools, is Apache 2.0 still the "safe" default, or should I be looking at something else to encourage contribution while protecting the project?

4. Community Building: How do you handle that first "Initial Commit" vs. a "Version 0.1.0" release to get people to actually trust the code?

Please don't downvote, I'm genuinely here to learn the "right" way to contribute back to the ecosystem. If you have a blog post, a checklist, or just a "I wish I knew this before I went public" tip, I’d really appreciate it.

TL;DR: My "vibe code" turned into a production tool. Now I want to open-source it properly. How do I not mess this up?



https://redd.it/1rqoqnz
@r_devops

Читать полностью…

Reddit DevOps

Empowering DevOps Teams

I came across an article sharing how to empower DevOps teams. If you are given the following choices and can pick only one to make your life better, which one would you pick?

1. A good team leader who understands what's going on and cares about his/her team. Pay and workloads remain the same.
2. A better paying job with less stress but you are required to relocate
3. A big promotion with far better pay and perks but with more stress and responsibilities.



https://redd.it/1rr74xm
@r_devops

Читать полностью…

Reddit DevOps

Advice For Surviving Current Job Market 6 Months After Layoff 3+ YOE

I've gotten laid off about 6 months ago, back in September. After being made redundant, I took some time off from anything work related, and got back to applying for DevOps/Platform engineering roles. Despite having gotten a dozen or so recruiters contacting me, as well as getting past a few final interviews, I feel as though my confidence is waning at this point.

My emergency funds are fairly solid, and should last a fairly long time (roughly 12 more months). I'm Interested in getting feedback mainly with my CV, as I fear I may be missing something here. I'm applying for mainly mid-level DevOps/Platform engineer roles.

My CV is here

https://redd.it/1rp95f3
@r_devops

Читать полностью…

Reddit DevOps

Python modules for creating and modifying Helm & k8s manifests

I'm now working on a DBaaS service for the developers in my department, and since it's my first time doing a project like this, I'd be happy if anyone could recommend modules they like to use for these types of automations that are used mainly to create or modify existing helm charts and k8s manifests.

https://redd.it/1roxmm9
@r_devops

Читать полностью…

Reddit DevOps

Would you be interested in official r/DevOps Discord server ?

Hi r/devops,

Would you be interested in having a community Discord server related to the subreddit?

This is simply an open discussion to gauge interest.. please comment your opinion.

https://redd.it/1rnnxq8
@r_devops

Читать полностью…

Reddit DevOps

Complete Guide to Building a CLI

In this article, I’ll cover a complete guide on how to build a professional CLI (Command Line Interface) that is easy to use and, most importantly, easy to integrate with other applications. If you’ve never built a CLI before, don’t worry — we’ll start from scratch.

https://vibelog.mateusmoutinho.com.br/en/article?date=2026/03/07&amp;id=cli-guide/

https://redd.it/1ro0rqk
@r_devops

Читать полностью…

Reddit DevOps

Choosing DNS to host

I am designing environment for malware simulation where it uses DNS tunneling to export data bypassing the firewall. For this I need to host an internal authoritative DNS for a dummy domain that would cache requests with encoded information.

Do you have any recommendations which software to use for it? I’m leaning towards bind9 on Debian host, but I’m not sure if it’s not an overkill since it’s an enterprise-grade solution and all I’m doing is a simple demo.

The infra runs on multi node proxmox and I use OPNSense for firewall if it matters.

https://redd.it/1rnghlb
@r_devops

Читать полностью…

Reddit DevOps

DevOps to Build/Release Eng

So I needed to find a full remote role because my current hybrid arrangement isn’t gonna work out moving forward. I ended up receiving an offer for a build and release engineer position.

My background is in traditional DevOps, supporting developers and their CI pipelines which I do enjoy. The toolset is: GitHub actions, AWS, EKS runner infra.

This new position is more like technical program/project management. I’ll be responsible for what releases go out the door, managing the GitHub branching strategy, and also owning the CI/CD pipelines + release automation.

The new role is a +20% TC, full remote position. Has anyone else made this transition? Loved it? Hated it? Interested to hear your experiences.

https://redd.it/1roke6e
@r_devops

Читать полностью…

Reddit DevOps

I parsed cloud Interview questions

Hey Folks,

Last time I published my 100 interview questions. I've added 10 more new question from Glassdoor reviews covering Cloud.

Companies are Amazon, Accenture, Kayak, Adobe, Autodesk, EPAM, Lyft, Twitch, Coinbase.
These are AWS questions, I've added Videos for them as well.

https://github.com/devops-interviews/devops-interview-questions

Nothing on github is paywalled. If you ever feel like thanking me just star the repo. Thanks

https://redd.it/1ro861x
@r_devops

Читать полностью…

Reddit DevOps

Build a website for DevOps Learning

Hey folks
After a long time, I finally rebuilt (vibe-coded ) and revamped one of my old projects DevOps Atlas.
It’s basically a one-stop search engine for DevOps learning resources.
The goal is simple:
Help DevOps engineers discover high-quality learning resources without endless searching.
Any suggestions and feedback are most welcome. Check it out at https://devopsatlas.com/ and let me know what you think!

https://redd.it/1rhwo1p
@r_devops

Читать полностью…

Reddit DevOps

27001 didn’t change our stack but it sure as hell changed our discipline

We missed two deals so it finally made sense to leadership to pursue ISO 27001.


We did end up tightening parts of our stack. A few workflows became more structured, some things moved out of people’s heads and into systems but that wasn’t the real shift even though they definitely had their own positive sides to it.


The uncomfortable part was answering some questions we’d never formally defined. A lot of our processes were muscle memory and ISO forced us to define them, assign ownership and create review cadence.


The discipline we gained changed everything.

https://redd.it/1reqg60
@r_devops

Читать полностью…

Reddit DevOps

Cloud Engineer roadmap check: Networking + Linux completed, next steps?

I’m transitioning to Cloud Engineering from scratch. I’ve completed basic networking (TCP/IP, DNS, subnetting) and Linux fundamentals (CLI, file permissions, processes). I’m currently learning Git and GitHub. My goal is to get a junior cloud role in 6–9 months. What should I focus on next.

https://redd.it/1rezupb
@r_devops

Читать полностью…

Reddit DevOps

ECS CICD Rollback?

Hi Guys! What could be the best way to rollback on ECS CICD , do I describe last active task definition then rerun but it will give diff in GitHub task definition, or just revert back to last successful action I think this would be better or any other solution to it?

any blogs or suggestions would be great

https://redd.it/1rfx80d
@r_devops

Читать полностью…

Reddit DevOps

Lucrative DevOps Fields/Jobs?

Based on your experience, what DevOps positions tend to pay high salaries(250k+)?

I come from a networking background but since then ive made the switch to devops. Back then in the networking space if you wanted to make a lot of money you would get a CCIE certification and try to work at a networking vendor such as Cisco,Arista, and Juniper. There's also the option of working high frequency trading companies where stress levels are high but so is the pay..

Whats the equivalent for DevOps?

Do companies like AWS pay their in-house DevOps engineers a lot? What skills does the industry value to command that type of pay? Are there high paying DevOps vendors out there? I know certifications arent really valued anymore like they used to be.

https://redd.it/1rfvwf4
@r_devops

Читать полностью…
Subscribe to a channel