86
Reddit DevOps. #devops Thanks @reddit2telegram and @r_channels
Spark stage cost breakdown on aws: (Why distributed tracing isn't helping & how to fix it)
Tempo has been a total headache lately. I’ve been staring at Spark traces in there for weeks now, and I’m honestly coming up empty.
What I really want is simple: a clear picture of which Spark stages are actually driving up our costs.
Here’s the thing… poorly optimized Spark jobs can quietly rack up massive bills on AWS. I’ve seen real-world cases where teams cut infrastructure costs by over 100x on critical pipelines just by pinpointing inefficiencies, and others achieve 10x faster runtimes with dramatically lower spend.
We’re aiming to tie stage-level resource usage directly to real AWS dollar figures, so we can rank priorities and tackle the biggest optimizations first. Right now, though, it just feels like we’re gathering traces with no real insight.
I still can’t answer basic questions like:
Which stages are consuming the most CPU, memory, or disk I/O?
How do we accurately map that to actual spend on AWS?
Here’s what I’ve tried :
Running the OTel Java agent and exporting to Tempo -> massive trace volume, but the spans don’t align meaningfully with Spark stages or resource usage. Feels like we’re tracing the wrong things entirely.
Spark UI -> perfect for one-off debugging, but not practical for ongoing cost analysis across production jobs.
At this point, I’m seriously questioning whether distributed tracing is even the right approach for cost attribution.
Would we get further with metrics and Mimir instead? Or is there a smarter way to structure Spark traces in Tempo that actually enables proper cost breakdown?
I’ve read all the docs, watched the talks, and even asked GPT, Claude, and Mistral for ideas… I’m still stuck.
Any advice or experience here would be hugely appreciated,
https://redd.it/1qbnszj
@r_devops
Deterministic analysis of Java + Spring Boot + Kafka production logs
I’m working on a **Java tool that analyzes real production logs** from **Spring Boot + Apache Kafka** services.
This is **not an auto-fixing tool** and not a tutorial.
The goal is **fast incident classification + safe recommendations**, the way an experienced on-call / production engineer would reason.
**Example: Kafka consumer JSON deserialization failure**
**Input (real Kafka production log):**
`Caused by: org.apache.kafka.common.errors.SerializationException:`
`Error deserializing JSON message`
`Caused by: com.fasterxml.jackson.databind.exc.InvalidDefinitionException:`
`Cannot construct instance of \`com.mycompany.orders.event.OrderEvent\``
`(no Creators, like default constructor, exist)`
`at [Source: (byte[])"{"orderId":123,"status":"CREATED"}"; line: 1, column: 2]`
**Output (tool result)**
`Category: DESERIALIZATION`
`Severity: MEDIUM`
`Confidence: HIGH`
`Root cause:`
`Jackson cannot construct target event class due to missing creator`
`or default constructor.`
`Recommendation:`
`Add a default constructor or annotate a constructor`
**Example fix:**
public class OrderEvent {
private Long orderId;
private String status;
public OrderEvent() {}
public OrderEvent(Long orderId, String status) {
this.orderId = orderId;
this.status = status;
}
}
# Design goals
* Known **Kafka / Spring / JVM failures** detected via **deterministic rules**
* Kafka rebalance loops
* schema incompatibility
* topic not found
* JSON deserialization errors
* timeouts
* missing Spring beans
* **LLM assistance is strictly constrained**
* forbidden for infrastructure issues
* forbidden for concurrency / threading
* forbidden for binary compatibility (e.g. `NoSuchMethodError`)
* Some failures must **always** result in:
* **No safe automatic fix, human investigation required.**
This project is **not about auto-remediation**
and explicitly avoids “AI guessing fixes”.
It’s about **reducing cognitive load during incidents** by:
* classifying failures fast
* explaining *why* they happened
* only suggesting fixes when they are provably safe
**GitHub (WIP):**
[https://github.com/mathias82/log-doctor](https://github.com/mathias82/log-doctor)
# Looking for feedback from DevOps / SRE folks on:
* Java + Spring boot + Kafka related failure coverage
* missing rule categories you see often on-call
* where LLMs should be **completely disallowed**
Production war stories very welcome 🙂
https://redd.it/1qbllc2
@r_devops
What DevOps and cloud practices are still worth adding to a live production app ?
Hello everyone, I'm totally new to devops
I have a question about applying Devops and cloud practices to an application that is already in production and actively used by users.
Let’s assume the application is already finished, stable, and running in production, I understand that not all Devops or cloud practices are equally easy, safe, or worth implementing late, especially things like deep re-architecture, Kubernetes, or full containerization.
my question is: What Devops and cloud concepts, practices, and tools are still considered late-friendly, low risk, and truly worth implementing on a live production application? ( This is for learning and hands-on practice, not a formal or professional engagement )
Also if someone has advice in learning devops that would be appreciated to help :))
https://redd.it/1qb9mcf
@r_devops
Our CI strategy is basically "rerun until green" and I hate it
The current state of our pipeline is gambling.
Tests pass locally. Push to main. Pipeline fails. Rerun. Fails again. Rerun. Oh look it passed. Ship it.
We've reached the point where nobody even checks what failed anymore. Just click retry and move on. If it passes the third time clearly there's no real bug right.
I know this is insane. Everyone knows this is insane. But fixing flaky tests takes time and there's always something more urgent.
Tried adding more wait times. Tried running in Docker locally to match the CI environment. Nothing really helped. The tests are technically correct, they're just unreliable in ways I can't pin down.
One of the frontend devs keeps pushing to switch tools entirely. Been looking at options like Testim, Momentic, maybe even just rewriting everything in Playwright. At this point I'd try anything if it means people stop treating retry as a debugging strategy.
Anyone actually solved this or is flaky CI just something we all live with?
https://redd.it/1qas4ft
@r_devops
researching the best subscription management software 2026, outgrowing our billing spreadsheets.
our saas company is moving from a handful of enterprise clients to a true product led growth model with hundreds of self serve subscribers. our manual billing and account management processes are breaking. were planning our 2026 tech stack and know we need a dedicated subscription management platform to handle billing, dunning, prorations, and plan changes.
when i search for the best subscription management software, the big names (chargebee, recurly, zuora, stripe billing) all seem strong, but its hard to understand the nuances for a b2b saas company at our stage. we need solid revenue recognition, tax handling, and flexible pricing models (seats, usage, flat fee).
if any finance, operations, or product folks at a scaling saas company have recently gone through this evaluation, id appreciate your perspective. we need a platform that can scale with us for the next 5 years. any real world insights are invaluable.
https://redd.it/1q4thrg
@r_devops
looking for good agile tools - how do you keep github issues and planning in sync?
we rely heavily on github, but things get messy when issues turn into real work items. how are teams syncing commits, PRs and sprint work without constant manual updates? i am looking for good agile tools that dont slow devs down
https://redd.it/1q4i05z
@r_devops
A small browser-only page I built for quick config diffs
Been working on a side project over the holidays. Built a small browser only page that lets me paste two configs and diff them locally. It flags changed values and a few things that tend to usually bite.
No accounts, no uploads, no backend. It just runs in the browser.
Hope it helps!
https://configsift.com
https://redd.it/1q4ctjt
@r_devops
I got tired of "shallow" GCP labs, so I built a soulful, production-ready scenario. Looking for technical feedback.
TL;DR: I created a GCP tutorial scenario as a pilot for a bigger series. It’s designed to read like an engaging article rather than dry documentation. I’m looking for feedback on the architecture and flow.
Hello,
After spending quite a bit of time on GCP designed labs (on CloudSkillsBoost) and courses I came to a conclusion that these either go in depth on very shallow scenarios or they skim over a lot of important stuff in more complex topics. The end status, I feel, is that you end up with this scattered knowledge about the platform that you then might struggle to put together into a secure, prod ready setup.
I decided to build a set of tutorials that don't just give you commands to copy, but explain the why. I’ve poured my personality into this - I wanted to make it an engaging "story" that you actually enjoy reading, rather than just checking boxes and copy pasting the commands.
Here is the TLDR about the scenario from the repository:
## TL;DR - what you'll learn and what we'll use
### GCP Services Used:
- Cloud Build (with Buildpacks)
- Cloud Run (backend)
- Cloud Functions (async processing)
- Pub/Sub
- Cloud SQL (Postgres)
### What you will learn
- How to deploy serverless applications to Cloud Run & Cloud Functions
- How to connect GCP-managed services to resources inside your own VPC (spoiler: it’s not as magical as marketing suggests)
- How to build a secure, end-to-end serverless microservice architecture
- How to apply Principle of Least Privilege (PoLP) to serverless components
- How to avoid Dockerfiles using Buildpacks, reducing ops overhead
- And finally how to tie this all together
I come to you, fellow engineers, to ask for feedback on the the technical accuracy, the flow, and the "engagement" factor. Does this feel like something a mid/senior dev would actually find valuable? My friends haven't been much help in the review department, so I'm reaching out to the community for some honest peer review.
Here's the link to the scenario:
https://github.com/brzezinskilukasz/gcp-tutorials/tree/main/scenarios/1
https://redd.it/1q42g5w
@r_devops
Do you also struggle with non-prod environments being left running “just in case”?
Hi everyone,
I’m curious if this is a common issue or just something I’ve seen in a few teams.
In many companies I’ve observed, non-production environments (dev / test / staging) are often left running 24/7, even though they’re only actively used during working hours.
When I ask why they’re not shut down after hours, the most common answer is:
“Just in case we need it.”
Not because they’re actually needed at night, but because people are worried that:
- someone might suddenly need access
- shutting it down could cause problems
- no one wants to be responsible if something breaks
Does this sound familiar to you?
If yes:
- how do you currently deal with this?
- is it mostly a cost issue, a risk issue, or an ownership issue in your team?
just trying to understand how widespread this problem really is.
https://redd.it/1q413fo
@r_devops
Eager to learn ,would love some structure
For the experienced DevOps engineers, if you were to go back to the beginning, what would you do to make sure you have the right skills for DevOps in today’s market?
I want to learn DevOps this year. I tried at the end of last year and I’d feel so discouraged looking at all the tools I am required to learn. I have seen some people say that “DevOps is a senior position job.”
I have an AWS CCP certificate and I have soo much time on my hands.
What advice would you guys give me?
https://redd.it/1q40oud
@r_devops
Those using GitLab + MS Teams - how do you handle MR notifications?
The native GitLab integration for Teams is pretty basic and Microsoft is retiring Office 365 connectors soon.
I've seen tools like PullNotifier for GitHub + Slack, but nothing similar for GitLab + Teams.
Anyone found a good solution for:
\- Getting notified when assigned to review
\- Avoiding channel spam from every commit/comment
\- Tracking which MRs are still waiting for review?
What's your workflow?
https://redd.it/1q3wxtu
@r_devops
Company I work for realized AI can’t replace DevOps and now Hiring again
Hi folks, I work as a freelance DevOps engineer, and in 2020–2022 I used to get 2-3 recruiter calls a day.. those were crazy times. It started to slowly fade off, and by mid-2023, although I still managed to get offers, it was noticeably harder.
Currently, the company I’m working at has a large proportion of developers compared to the DevOps team (I’d say \~15% DevOps, 85% devs). Our management tried multiple shiny tools to improve our processes, but we ended up using AI only for PR reviews and even that is mostly for pre-screening. We still have to manually review things since AI makes mistakes and hallucinates.
For past few years usual response around here was "Hey, these guys don’t know how to use AI and .. it’s a skill issue." but imo These folks haven’t dealt with complex infrastructure beyond boilerplate to think AI can automate DevOps.
During the past three years, I've heard all sorts of things: "Everything will be automated," "It’s just the first year of AI wait and see in a couple of years there won’t be dev jobs," "Devin will eliminate engineers.. (LOL to this one)", and so on. All this hype and bubble kept growing, yet where I worked there were no meaningful headcount reductions beyond cutting back on intern and junior roles doing mostly grunt work and boilerplate and even that ended up hurting us.
Anyway, all of this could have remained speculation, if not for the fact that DevOps positions previously considered redundant due to "more efficient processes" are now being filled again, and the 5-6 DevOps engineers on our team are so overworked that we urgently need to hire more people.
In short (TL;DR), I haven’t seen any meaningful AI automation beyond what we already had, nor did it add much real value to our team. At best, it made us slightly more efficient, but at the cost of reduced maintainability and more complexity in the codebase. If you enjoy working in DevOps, there are still plenty of opportunities out there and likely more going forward.
https://redd.it/1q3ugf8
@r_devops
Sci-Fi Author needs your help - "End of Integers"
Hey folks! I'm a career IT Ops Engineer, and Author, with just enough programmatic knowledge to be dangerous. I'm writing a Sci-Fi novel, and need your advice.
It's the year 2711, and I have an android-like bot that works in a research lab. She has a malfunction when her human boss ask her a question that she isn't supposed to answer.
That causes an error that makes her verbalize the terms and conditions of the leasing contract that she's governed by. Not in an informational way, but one that shows she's had a failure and not acting right.
When she's done, there's a one-second pause, followed by the statement End of Integers, which she says like it's a punctuation mark.
EDIT - I want the answer to sound programmatic, but also vague and not possible.
My Dev wife thinks it's a brilliant idea, since there is no such thing as an "end of integers."
My thought is there's a safeguard to keep her from telling anyone what she knows, but the code for the safeguard has a flaw that makes her say End of Integers.
1. Keep this, or use another type of error?
2. If another, which one would make more sense, for what I need to accomplish?
Thank you, and may your Secrets Management never fail, and blow up your Sprint schedule :)
https://redd.it/1q3rjh6
@r_devops
CI/IaC is basically a control plane now… what guardrail helped the most?
It feels like everything is a control plane now. GitHub Actions, IaC pipelines, internal platforms, agents, all of it.
And the failure mode I keep seeing is “one small change lands everywhere” because the blast radius is huge and rollout/rollback isn’t really a thing.
Curious... What’s one guardrail you added that actually helped?
Canaries, progressive delivery, env isolation, policy checks, drift detection, JIT admin, whatever… doesn’t have to be fancy.
https://redd.it/1q3oifo
@r_devops
One Windows package manager to rule them all?
Just came across a nice articsl about an unfair that brings all the various package managers together.
I personally mainly use chocolatey as it what integrated into the tool company use, however this one "UniGetUI" brings them all together into a gui.
I haven't tried it myself yet but the artical seems to good not to share.
https://www.makeuseof.com/replace-microsoft-store-with-unigetui-package-manager/
https://redd.it/1q3kln2
@r_devops
Azure VM auto-start app
Azure has auto‑shutdown for VMs, but no built‑in “auto‑start at 7am” feature. So I built an app for that - VMStarter.
It’s a small Go worker that:
• discovers all VMs across any Azure subscriptions it has access to
• sends a start request to each one — **no need to specify VM names**
• runs cleanly as a scheduled Azure Container Apps Job (cron)
Instructions how-to deploy: https://github.com/groovy-sky/vm-starter#deployment-script
Docker image: https://hub.docker.com/repository/docker/gr00vysky/vm-starter
Any feedback/PRs welcome.
https://redd.it/1qbmr2s
@r_devops
Observabilty For AI Models and GPU Infrencing
Hello Folks,
I need some help regarding observability for AI workloads. For those of you working on AI workloads or have worked on something like that, handling your own ML models, and running your own AI workloads in your own infrastructure, how are you doing the observability for it? I'm specifically interested in the inferencing part, GPU load, VRAM usage, processing, and throughput etc etc. How are you achieving this?
What tools or stacks are you using? I'm currently working in an AI startup where we process a very high number of images daily. We have observability for CPU and memory, and APM for code, but nothing for the GPU and inferencing part.
What kind of tools can I use here to build a full GPU observability solution, or should I go with a SaaS product?
Please suggest.
Thanks
https://redd.it/1qb51ph
@r_devops
One end-to-end DevOps project to learn almost all tools together?
Hey everyone,
I’m a DevOps beginner. I’ve covered the theory, but now I want hands-on experience.
Instead of learning tools separately, I’m looking for ONE consolidated, end-to-end DevOps project where I can see how tools work together, like:
Git → CI/CD (Jenkins/GitLab) → Docker → Kubernetes → Terraform → Monitoring (Prometheus/Grafana) on AWS.
A YouTube series, GitHub repo, or blog + repo is totally fine.
Goal is to understand the real DevOps flow, not just run isolated commands.
If you know any solid project or learning resource like this, please share 🙏
Thanks!
https://redd.it/1qarrve
@r_devops
Struggles at a new org
I'm a DevOps tech lead at an AWS shop for the past 5~ months with some senior engineers, a few juniors and oh boy - the tech debt and org culture has me seriously reconsidering employment. I'm running into problems like:
- Company has a DevOps team that is treated exclusively like an Ops team. DevOps culture was never adopted and isn't practiced
- Lack of development ownership on product issues. Engineering management fails to hold their teams accountable and isn't responsive to issues in their domain
- Engineering team is comprised of a 50/50 split of contractors and full time engineers, with contractors taking a "that's not my job" approach to problems that bubble up outside of their usual work
- Some of the most spaghetti terraform I've ever had the displeasure of reading - in 0.11.15 no less
- No CI/CD - terraform applies are done locally and software deployments are done by SSH'ing into a Jenkins host to run some wild chain of zsh scripts
- Chef 0.14.5 is being used to provisioned new EC2 instances
- Static SSH keys installed on hosts (no SSM)
- IAM users with a partial, but incomplete AWS SSO roll out
- A contracting DevOps company was hired to start an EKS migration, but they're at the point of throwing in the towel because of the complexity
- To top it all off, a manager with no technical experience and no spine. I'm not sure how he's still here given his passive nature and lack of ability to lead a team towards change
It would be easier if I was only solving technical issues, but this is both technical and cultural. This feels like a huge step back in my career of having to go back to managing EC2 instances like pets instead of like cattle. As a lead, I'm trying my best to get my manager to understand what a DevOps team is and how it should operate, but I am having a hard time reaching him. He's one that literally manages his team communication through AI as English isn't his first language; it's quite frustrating to say the least.
When I have time, I've been trying to get them off of terraform 0.11.15 and fixing their drift so that there's a standard way for everyone to run things on their local machines, as well as a folder structure that makes sense - with CI following once things are more consistent. Outside of that, I'm been "voluntold" to be on a few tiger teams to help a few product features get off the ground as I have the keys to the kingdom and can keep developers unblocked.
There's no platform and no structure.
With this situation, do others have experiences on how I could go about tackling the challenges at this org? I'm quite stressed at the moment. Thanks!
https://redd.it/1qb6w2v
@r_devops
Wrote a deep dive on sandboxing for AI agents: containers vs gVisor vs microVMs vs Wasm, and when each makes sense
https://www.luiscardoso.dev/blog/sandboxes-for-ai
Wrote this after spending too long untangling the "just use Docker" vs "you need VMs" debate for AI agent sandboxing. I think the problem is that the word "sandbox" gets applied to four different isolation boundaries with very different security properties.
So, I decided to write this blog post to help people out there.
Interested in what isolation strategies folks here are running in production, especially for multi-tenant or RL workloads.
https://redd.it/1q4pvy6
@r_devops
ai made shipping faster but understanding slower
lately i’ve been thinking about how different building feels now compared to a few years ago. getting something off the ground is insanely fast. scaffolds, endpoints, ui, all done in a weekend. but when something breaks, i’m spending way more time reading than actually writing code.
i’ve ended up using different tools depending on what i’m working on. GitHub Copilot for in-editor autocomplete and quick suggestions, Replit Agent when i want help across bigger chunks of work, Claude Code when i need to talk through a codebase at a higher level. and on larger or messier repos, i’ve found cosine surprisingly useful to trace how logic flows across files when my mental map falls apart. it’s not doing magic, it just helps me see what already exists without burning energy.
it feels like the bottleneck shifted from “can i build this?” to “do i actually understand what’s already here?” curious how others are dealing with this. do you stick to one ai tool, or do you end up with a stack where each thing does one job well?
https://redd.it/1q4ermd
@r_devops
We built a GitHub Action that could have prevented the CrowdStrike outage. It's free.
On July 19, 2024, CrowdStrike pushed a config update that crashed 8.5 million Windows machines. The root cause: 21 fields validated against a 20-field schema. The unvalidated field caused a null pointer exception.
We ran that deployment profile through ARBITER:
Bad deployment:
0.335 null pointer exception ✓ ← RANKED FIRST
0.235 memory access violation ✓
0.149 safe execution ✓
0.120 system crash ✓
Good deployment:
0.257 safe execution ✓ ← RANKED FIRST
-0.068 null pointer exception ✗ ← REJECTED
-0.094 memory access violation ✗ ← REJECTED
-0.176 system crash ✗ ← REJECTED
ARBITER is a semantic coherence gate. It checks if your deployment profile coheres with "safe execution" or "failure modes" before you push.
Add it to your pipeline:
uses: arbiter-engine/arbiter-action@v1
Marketplace: https://github.com/marketplace/actions/arbiter-deployment-coherence-check
It's free. MIT licensed. 26MB deterministic engine.
Your move.
https://redd.it/1q4bzef
@r_devops
Asked to spread into ML-Ops, but it's new territory. Being required to find related certs but unsure where to start.
I'm a DevOps engineer for a fortune 500 tech company. On my team, I'm the sole person in my role. Been here for 6 years. In fact, for my entire org, I'm only 1 of a handful of us. Our CICD pipeline is very solid and simple to maintain. Most of my work centers around DevSecOps instead of just DevOps. I KNOW that my company is paying me less than what I'm worth, but when the market is "iffy", I don't want to rock the boat. I do well in my role, but even 6 years later I still feel like there's a bit of imposter syndrome going on, despite consistently good recognition and reviews.
So I helped out on an AI-centric hackathon with work and provided all kinds of tech-related assistance to the different teams, such as provisioning new cloud products, creating DNS records for them, debugging various issues, things like that.
Afterwards, I'm now being told that for FY26, I have a personal goal of related certification to attain, but it's on me to find the relevant certs with which to get. I know what AI is. I can bust out a set of prompts that are rather decent. That's about the extent of it.
So as a DevOps Engineer, who acts as a consultant for his team on the more technical side of things, I feel it's my responsibility to not only be able to deploy various models, but also interact with various closed models, as well. And this includes Generative AI for text-based resources and image-based resources as the company I work for is one of the largest graphics-related companies in the world, apparently that's important.
So where do I start? I feel I need to know what's involved at a low level, hence the thought about deploying models. Beyond that, it's pretty new territory to me.
https://redd.it/1q44glk
@r_devops
Experienced sysadmin cannot pass a coding interview. RIP
I'm an experienced sysadmin (15 years) looking for a job, and it looks like most companies are asking for coding skills now. The Leetcode challenges I've attempted do not mirror my experiences with Python at work, and I am banging my head against the "easy" ones.
I am 60% through "Python Data Structures & Algorithms + LEETCODE Exercises" on Udemy, and I still do not recognize the patterns that are presented in Leetcode problems.
Am I digging in the wrong direction here? How should I be studying? Should I switch careers at the age of 40 and become a toilet farmer?
https://redd.it/1q42yvj
@r_devops
What OS do you daily drive, and why?
I'm curious about people working in the field and why you use one OS over another?
Are there tools you've found that only avaliable on your distro of choice, is it because of stability, is it because of less bloat? Maybe it was the only option or you just like it?
https://redd.it/1q3zk3k
@r_devops
Another Helm Chart for Garage (MinIO Alternative for Homelabs & Small Deployments)
After MinIO abandoned the open-source project, I needed a new S3-compatible object store for my homelab. I tried the usual suspects (SeaweedFS, Ceph, etc.), but Garage stood out for its simplicity and focus on small, geo-distributed clusters.
I have published a Helm chart that goes way beyond the official one, making Garage a drop-in replacement for MinIO with a much smoother experience for Kubernetes users.
Repo: https://github.com/datahub-local/garage-helm1
What makes this Helm chart better than the official one?
1. Automated cluster configuration: No more manual CLI or YAML hacks. Just set your layout, buckets, and keys in values.yaml or secrets and a job will set up them for you.
2. Built-in WebUI: Deploy the Garage WebUI with a single flag for easy management.
3. Gateway API support: Native support for Kubernetes Gateway API (plus Ingress), so you’re ready for modern K8s networking.
4. Grafana dashboard & ServiceMonitor: Get instant metrics and dashboards out of the box.
5. Extra resources: Inject any custom K8s manifest (Secrets, ConfigMaps, etc.) directly via values.yaml.
Big thanks to \#wittdennis — this chart is based on his original Helm chart for Garage!
If you’re looking for a MinIO alternative that’s actually open source and easy to run at home, give Garage (and this chart) a try. Feedback and PRs welcome!
https://redd.it/1q3utve
@r_devops
UAT for 40 +
We are rolling out a chatbot for our organization. Leadership wants all of corp tech to be able to soft test the feature and provide feedback. Jira ID, Acceptance Criteria, Pass/ fail, stengths, weaknesses.
Normally i would have test steps but its really launch the bot and ask it questions related to description/acceptance criteria.
My queation. How do you distribute and track something like this? I normally do feature releases which is done via email. This seems like it might be better on a Microsoft form with a power automate to a sharepoint list for metrics. Its 40 + scenarios though as well, add that to the problem on how to distribute and track question.
https://redd.it/1q3tzeb
@r_devops
Is Kubernetes here to stay for a long time?
Is it worh investing time in learning K8s or it will be hidden under PaS? Is it a must have skill for every DevOps in the future or it is expected to be buried under other technologies?
https://redd.it/1q3qgdx
@r_devops
Many companies are moving towards Dev-owned DevOps.
I’m seeing a trend where companies want developers to handle DevOps work directly.
For someone working as a DevOps engineer, what’s the best way to adapt?
What new skills are worth learning, and what roles make sense in the future?
Curious to hear how others are handling this shift
https://redd.it/1q3h19o
@r_devops
When is old?
At what age should someone hang their hat on trying to get in the door? What door should the older try for?
https://redd.it/1q3jlw7
@r_devops