Reddit DevOps. #devops Thanks @reddit2telegram and @r_channels
How do you decide which microservices need a message broker llike Kafka/RabbitMQ
Say you have many microservices, how do you personally decide that "hey microservice A and B needs a message broker, while C and D does not - even though C talks to D".
https://redd.it/1lwyq16
@r_devops
What does this mean in terms of DevSecOps
A job description mentions " Implement secure infrastructure with IaC tools ". What does this ACTUALLY mean and how can I understand it better. Is it just writing terraform in a CI/CD Pipeline to use secure scanning tools such as trivy, SCA, SAST, etc?
Apologies if this is an ignorant question.
https://redd.it/1lwsetw
@r_devops
Looking for advice: how do you typically gather input when writing performance reviews for your team/direct reports? Do you rely on tools, notes, past projects, or something else?
Looking for advice here — especially the process of gathering input across tools and channels. Curious how you do it and what works well (or doesn’t). How much time do you spend on it?
Happy to share back what I learn.
https://redd.it/1lwlwzx
@r_devops
Hemmelig TUI
Hi,
I have, for a couple of years, been thinking of implementing the Diffie-Hellman key exchange for Hemmelig.app. This made me create a TUI that solves this for me.
The background for Hemmelig was to securely share PII, GDPR, and other sensitive data like passwords and API keys.
Built with Curve25519, AES-256-GCM, and TOFU fingerprinting to keep your comms secure. Bypasses firewalls with NAT traversal.
https://github.com/bjarneo/hemmelig
Let me know what you think. If usable, I'll move it to the Hemmelig organization.
https://redd.it/1lwmay6
@r_devops
Starting curv
How can I start learning in devops I mean the resources and all and if there are enough jobs for freshers in this ??? Please help
https://redd.it/1lwjf4b
@r_devops
Best practice for handling user claims from ALB/Cognito in Fargate-deployed apps?
Hi all,
I'm working on a platform where multiple apps are deployed on AWS Fargate behind an Application Load Balancer (ALB). The ALB handles authentication using Cognito and forwards OIDC headers (such as x-amzn-oidc-data) to the app, which contain user and group information.
Access to each app is determined by the user's group membership.
I'm unsure of the best practice for handling these claims once they reach the app. I see two main options:
Option 1: Use a reverse proxy in front of each app to validate the claims and either allow or block access based on group membership. I’m not keen on this approach at the moment, as it adds complexity and requires managing additional infrastructure.
Option 2: Have each app validate the JWT and enforce access control based on the user's groups. This keeps things self-contained but raises questions for me around where and how best to handle this logic inside the app (e.g. middleware? decorators? external auth module?).
I’d really appreciate any advice on which approach is more common or secure, and how others have integrated this pattern into their apps.
Thanks in advance!
https://redd.it/1lwfrkn
@r_devops
Project ideas that recruiters like.
I am still a fresher and targeting devops field . I am making projects but they are simple af.
I want to know from a recruiter pov what they want to see in the projects.What kind of projects they wanna see (I also heard that homelab project is plus). Please help me and give me ideas I am tired of doing chatgpt for it
https://redd.it/1lw9kus
@r_devops
I am 24 years old, lost and need advice
Hi everyone,
I'm 24 and just finished my Bachelor's in Digital Business Administration. Back when I chose this major, I didn’t really know what I wanted—I just picked something broad that seemed useful. During the degree, I learned a bit of everything: some basic Python, logistics, data management, and finance stuff (like bookkeeping).
But now that I’ve graduated, I realize I don’t have any strong, job-ready skills in any specific area. To make things worse, my family business went under last year, so I don’t have any backup plan—and I’m honestly scared I might end up homeless if I don’t figure things out soon.
For the past 2 years, I’ve been working as an intern in an IT department doing 1st-level tech support—mostly device setup, software installations, and basic troubleshooting. That’s the only real experience I have on my CV. One of my colleagues suggested I look into DevOps, and after some research, I feel like it’s something I’d enjoy doing.
But I have no idea how to get started. My degree isn’t very technical, and I feel behind compared to people with computer science backgrounds.
Can anyone give me advice on:
What courses (free or paid) I should start with?
Which tools or concepts are most important to learn first (e.g., Linux, Docker, cloud, etc.)?
How to build experience or a portfolio that helps me stand out?
What entry-level jobs should I be aiming for to break into DevOps?
Thanks so much in advance—any guidance would mean a lot!
https://redd.it/1lw90o4
@r_devops
I Found a Roadmap for DevOps—Can You Confirm if it's Right?
Hello People,
I have been glancing over DevOps for a bit now, and I just found a roadmap for it. Would you guys be kind and let me know if it's a well-written roadmap worth following?
The roadmap: https://roadmap.sh/devops
Thank you in advance.
https://redd.it/1lw4h1p
@r_devops
I’m stumped- how do Mac application developers test and deploy their code?
I’ve mainly worked with devs who write code for websites and that’s a pretty easy thing for me to suggest how they make their pipelines. However I’m going to be working with this developer who wants to deploy code to a separate mac using gitlab CI and my brain is just not processing it. Like, won’t they be writing their code ideally on a Mac itself? How does one even deploy code other than a tar/pkg file with an install to another mac? How does local testing not fit the use case? Feeling super new to this and I definitely don’t want to guide them in the wrong direction but the best idea I came up with was just 1) local testing or 2) a MacOS-like docker image that it appears is not really a thing that apply supports for obvious reasons.
https://redd.it/1lw0pay
@r_devops
AWS Freelanced Project Pricing Help
I recently got my first gig to set up some cloud infra on aws. The problem is I don't know how much is usually charged for the field of project based work. The infra I setup took about two days - I came up with the cloud architecture for the webapp and setup the Cloudfront Hosting, S3 buckets for storage, and wrote some lambda function for basic pin-based security - this is all just proof of concept.
The final project will have:
\-proper password access (Doesnt have to be super secure, its just so a large group of select people can view some images)
\-a database will be added for scalability
\-and the cloud front behaviors will need to be changed.
(Its pretty much an image gallery website with flare)
How should I price this?
https://redd.it/1lvsmnw
@r_devops
Real Consulting Example: Refactoring FinTech Project to use Terraform and ArgoCD
https://lukasniessen.medium.com/real-consulting-example-refactoring-fintech-project-to-use-terraform-and-argocd-1180594b071a
https://redd.it/1lvribg
@r_devops
Why is drift detection/correction so important?
Coming from a programming background, I'm struggling to understand why Terraform, Pulumi and friends are explicitly designed to detect and correct so-called cloud drift.
Please help me understand, why cloud drift such a big deal for companies these days?
Back in the day (still today) database migrations were the hottest thing since sliced bread, and they assumed that all schema changes would happen through the tool (no manual changes through the GUI). Why is the expectation any different for cloud infrastructure deployment?
Thank you for your time.
https://redd.it/1lvn6pj
@r_devops
Terraform at Scale: Smart Practices That Save You Headaches Later
DynamoDevOps/terraform-at-scale-smart-practices-that-save-you-headaches-later-part-1-7054a11e99db">DynamoDevOps/terraform-at-scale-smart-practices-that-save-you-headaches-later-part-1-7054a11e99db" rel="nofollow">https://medium.com/@DynamoDevOps/terraform-at-scale-smart-practices-that-save-you-headaches-later-part-1-7054a11e99db
https://redd.it/1lvkwa0
@r_devops
Has anyone taken this AI-readiness infra quiz?
Found this 10-question quiz that gives you a report on how AI-ready your infrastructure is.
Questionaire link : https://lnk.ink/bKmPl
It touches on things like developer self-service and platform engineering — felt like it's leaning a bit in that direction. Curious if anyone else took it and what you thought of your results. Are these kinds of frameworks useful or just more trend-chasing?
https://redd.it/1lvhaea
@r_devops
I automated the compliance work I do for infrastructure teams. Then turned it into a startup.
I was the DevOps engineer who inevitably got assigned compliance tasks. You know the drill - sales promises SOC2 to close a deal, then suddenly it's "can you handle the technical implementation?" and you're reading control frameworks at midnight trying to understand what "logical access controls" actually means in practice.
Over several years, I probably spent 400+ hours manually documenting infrastructure configurations, taking screenshots of AWS console settings, and writing policies that felt disconnected from actual operational work. The entire process felt antithetical to everything we try to achieve in DevOps - it was manual, error-prone, and didn't scale.
The breaking point came when I had to implement both SOC2 and ISO 27001 simultaneously. That's roughly 160 controls across both frameworks with significant overlap, but still requiring individual verification and documentation. Three months of engineering time that could have been spent on infrastructure improvements or reliability work.
Instead of continuing to suffer through manual compliance, I started building automation scripts - first for evidence collection, then for configuration validation, then for continuous monitoring. Eventually I realized I was building a comprehensive platform just to avoid doing compliance work manually.
The core insight was that most compliance requirements are really just infrastructure configuration checks that can be queried programmatically. Instead of manually screenshotting AWS settings, you can query the API. Instead of manually tracking policy reviews, you can automate the workflow.
What's interesting is that automating compliance actually improved our infrastructure practices. To automate compliance checking, you need to deeply understand your infrastructure configuration, which forces better documentation and more consistent implementation patterns. The infrastructure-as-code practices that make compliance easier also make systems more reliable and maintainable.
The time savings were substantial. Manual compliance work for a typical startup takes 40-60 hours of engineering time per framework. With proper automation, I managed to drop to 10-15 hours - mostly spent on initial setup and reviewing automated findings rather than manual evidence collection.
I had a customer recently whose engineer said "this is the first time compliance didn't make me want to find a different job." Honestly, that felt so real to me. Compliance work used to be the worst part of being a DevOps engineer.
The broader principle here in my opinion - is that compliance requirements are increasingly becoming code problems rather than process problems. Most of what auditors want to verify can be checked automatically if you structure your infrastructure and tooling appropriately.
For those still stuck doing manual compliance work, I'd encourage thinking about it as an automation challenge rather than an administrative burden. The skills you develop automating compliance will probably make you better at infrastructure work anyways.
https://redd.it/1lwww4k
@r_devops
🚨 Hiring for a Web3 NFT Marketplace – Remote (Europe timezones preferred)
Helping a team launch a decentralized NFT marketplace with features like wallet integration, staking, AI-driven personalization, and multi-chain support (Ethereum, Phantom).
We’re looking for experienced developers + leads across the stack for a quick MVP build.
📌 Open Roles:
– Technical Manager / PM (Web3/Blockchain experience)
– Senior Blockchain Lead (Solidity + Rust)
– Smart Contract Developer (NFT minting, royalties, staking)
– Blockchain Security Engineer (auditing, fraud detection)
– Senior Frontend Lead (React.js, TypeScript, Web3.js)
– Frontend Developer (Figma to code, scalable UI)
– Senior Backend Lead (Node.js, GraphQL, REST)
– Backend Developer (API integrations, microservices)
– AI/ML Engineer (recommendations, fraud detection, personalization)
– DevOps Engineer (CI/CD, Docker, cloud deploys)
– QA Engineer (manual + automated testing)
💼 All roles are remote, project-based or contract, and require strong ownership and fast turnaround.
DM me if you’re interested or know someone perfect for one of these roles — I’ll connect you directly with the founder.
https://redd.it/1lwqtlm
@r_devops
Monitoring and Observability Intern
Hey everyone,
I’ve been lurking here for a while and honestly this community helped me land a monitoring and observability internship. I’m a college student and I’ve been working with the monitoring team, and I’ve learned a lot, but also feeling a little stuck right now. For context I’m based in the US
Here’s what I’ve done so far during the internship:
• Set up Grafana dashboards with memory, CPU, and custom Prometheus metrics
• Used PromQL with variables, filters, thresholds, and made panels that actually make sense
• Wrote alert rules in Prometheus with labels, severity levels, and messages
• Used Blackbox Exporter to monitor HTTP endpoints and vanity URLs for status codes, SSL certs, redirect chains, latency, etc
• Learned how Prometheus file-based service discovery works and tied it into redirect configs so things stay in sync
• Helped automate some of this using YAML playbooks and made sure alerts weren’t manually duplicated
• Got exposure to Docker (Blackbox Exporter and NGINX are running in containers), xMatters for alerting, and GitHub for versioning monitoring configs
It’s been really cool work, but I’ve also heard some people say observability and monitoring tends to be more senior work because it touches a lot of systems. So I’m wondering where to go from here and if this can allow me to apply for junior roles.
My questions:
Are tools like Blackbox exporter and whitebox exporter used everywhere or just specific teams?
Any advice, next steps, or real-world experiences would mean a lot. Appreciate any thoughts.
Thanks
https://redd.it/1lwoew7
@r_devops
How do you all manage records in your DNS providers for Kubernetes deployments?
I've been using external-dns for years. But recently I've been encountering a bug where it will sometimes delete all records it's managing for a cluster's Ingresses and then recreate them on the next pass. Causing 2-3 minutes of service disruption. I think I'm personally ready for a change on how I manage records in my DNS provider, so I'm curious what tools people are using, if any, or if you're just managing your records manually (sounds horrible, but I'd rather that than look like an idiot for causing an incident.)
I'll also mention I'm in the process of switching from Ingresses to Gateway API's HTTPRoutes. So if it's a tool that supports both, and doesn't accidentally delete all my records out from under me, bonus points.
https://redd.it/1lwl6wg
@r_devops
How do you all deal with pipeline schedules in Gitlab?
Pipeline schedules are very convenient and I use them for a few things, but it runs under the user that created it. Meaning that if that user leaves the company those pipeline schedules all break. Last I knew you couldn't run them under a bot user. Short of making a pipeline schedule service account user, is there a good way to handle this?
https://redd.it/1lwi50p
@r_devops
Why do I see AWS mentioned more than others when it comes to DevOps?
Every where I look, when DevOps is mentioned it seems to be tied to AWS over Azure or hybrid infrastructures. It can be used in all the above mentioned. What is it about AWS that makes it the most mentioned infrastructure when people bring up DevOps? My company is pushing for DevOps methodology and we use Azure/ Windows and we technically do not sell a product. We are more or less a huge global consulting enterprise.
https://redd.it/1lwc2f5
@r_devops
Which job is the best opportunity straight out of university
I have 3 job offers on the table and I am a bit torn right now. Pay is comparable for all of them. I hope this sub is the right one, as all of them are more platform than devops, but I guess there is a lot of overlap.
Job 1: Platform Engineer that develops toolings / SDKs for devs to provision their own infra. They also manage all cloud infra (that devs can just spin up themself if needed). Logging and monitoring is apparently included in these reusable modules so this is not a part of this job. Also everything seems to be built using managed services or at least hyperscalers versions of services (e.g AKS instead of native Kubernetes). Definetly cool challenges (e.g building one click deployments etc.) Don't know if I vibe with the team though and no one was able to really tell me what my tasks would and could be.
Job 2: Platform engineer at a technical consulting company. They build multi cloud Kubernetes platforms for customers, everything using open source tools and also ensured me work is only technical 0% powerpoint. Monitoring and Alerting solutions are also included. Compared to Job 1 it is more focused on Terraform, Yaml and Helm and no software is written.
Job 3: Building an IDP. This company has roughly 2000 devs and they want an IDP for all of them with Backstage. The project starts from scratch, which is a huge appeal. But I am not sure if that would move me away to far from infrastructure and related tooling?
Long term I want to move in a direction like Job 1, but the fact that no one was really able to communicate what I would do (e.g we build go sdks) and whether it is a lot of maintenance or development of new things concerns me a lot. Or do you think with Job 2 I can still move into a more writing "infrastructure software" and tooling direction later?
https://redd.it/1lw9050
@r_devops
Kubernetes production ready?
I am backend dev turned Devops with 10+ sites overlooking. I am trying to up my game and experience to Kubernetes and its hand on experience . I have deployed and created my own cluster configuration and deployed it but have not done that for long stretch of time (I.e: have not done Kubernetes in production) as I donot have such resources and such website that is used by many users. I did many interviews and every time my shortcomings is I hadn’t done any production level Kubernetes.
It’s the same game I donot have experience because I donot have job, I donot have a job because I donot have experience. I have done whatever a learner can do on his own with limited experience I also have configured kubeadm to use with on Prem cloud infra.
What should I do?
https://redd.it/1lw70gm
@r_devops
ELK Alternative: With Distributed tracing using OpenSearch, OpenTelemetry & Jaeger
I have been a huge fan of OpenTelemetry. Love how easy it is to use and configure. I wrote this article about a ELK alternative stack we build using OpenSearch and OpenTelemetry at the core. I operate similar stacks with Jaeger added to it for tracing.
I would like to say that Opensearch isn't as inefficient as Elastic likes to claim. We ingest close to a billion daily spans and logs with a small overall cost.
PS: I am not affiliated with AWS in anyway. I just think OpenSearch is awesome for this use case. But AWS's Opensearch offering is egregiously priced, don't use that.
https://osuite.io/articles/alternative-to-elk-with-tracing
Let me know if I you have any feedback to improve the article.
https://redd.it/1lw3ovq
@r_devops
Announcing Factor House Local v2.0: A Unified & Persistent Data Platform!
We're excited to launch a major update to our local development suite. While retaining our powerful Apache Kafka and Apache Pinot environments for real-time processing and analytics, this release introduces our biggest enhancement yet: a new Unified Analytics Platform.
Key Highlights:
🚀 Unified Analytics Platform: We've merged our Flink (streaming) and Spark (batch) environments. Develop end-to-end pipelines on a single Apache Iceberg lakehouse, simplifying management and eliminating data silos.
🧠 Centralized Catalog with Hive Metastore: The new system of record for the platform. It saves not just your tables, but your analytical logic—permanent SQL views and custom functions (UDFs)—making them instantly reusable across all Flink and Spark jobs.
💾 Enhanced Flink Reliability: Flink checkpoints and savepoints are now persisted directly to MinIO (S3-compatible storage), ensuring robust state management and reliable recovery for your streaming applications.
🌊 CDC-Ready Database: The included PostgreSQL instance is pre-configured for Change Data Capture (CDC), allowing you to easily prototype real-time data synchronization from an operational database to your lakehouse.
This update provides a more powerful, streamlined, and stateful local development experience across the entire data lifecycle.
Ready to dive in?
⭐️ Explore the project on GitHub: https://github.com/factorhouse/factorhouse-local
🧪 Try our new hands-on labs: https://github.com/factorhouse/examples/tree/main/fh-local-labs
https://redd.it/1lvubnm
@r_devops
Need advice: Centralized logging in GCP with low cost?
Hi everyone,
I’m working on a task to centralize logging for our infrastructure. We’re using GCP, and we already have Cloud Logging enabled. Currently, logs are stored in GCP Logging with a storage cost of around $0.50/GB.
I had an idea to reduce long-term costs:
• Create a sink to export logs to Google Cloud Storage (GCS)
• Enable Autoclass on the bucket to optimize storage cost over time
• Then, periodically import logs to BigQuery for querying/visualization in Grafana
I’m still a junior and trying to find the best solution that balances functionality and cost in the long term.
Is this a good idea? Or are there better practices you would recommend?
https://redd.it/1lvsura
@r_devops
Tiny statically-linked nginx Docker image (~432KB, multi-arch, FROM scratch)
Hey all,
I wanted to share a project I’ve been working on: [nginx-micro](https://github.com/johnnyjoy/nginx-micro). It’s an ultra-minimal, statically-linked nginx build, packaged in a Docker image FROM scratch. On amd64, it’s just **\~432KB**—compared to nearly 70MB for the official image. Multi-arch builds (arm64, arm/v7, 386, ppc64le, s390x, riscv64) are supported.
**Key points:**
* Built for container-native environments (Kubernetes, Compose, CI/CD, etc.)
* No shell, package manager, or writable FS—just the nginx binary and config
* *Only* HTTP and FastCGI (for PHP-FPM) are included—no SSL, gzip, or proxy modules
* Runs as root (for port 80), but worker processes drop to `nginx` user
* Default config and usage examples provided; custom configs are supported via mount
* Container-native logging (stdout/stderr)
**Intended use:**
For internal use behind a real SSL reverse proxy (Caddy, Traefik, HAProxy, or another nginx). Not intended for public-facing or SSL-terminating deployments.
**Use-cases:**
* Static file/asset serving in microservices
* FastCGI for PHP (WordPress, Drupal, etc.)
* Health checks and smoke tests
* CI/CD or demo environments where you want minimal surface area
**Security notes:**
* No shell/interpreter = much lower risk of “container escape”
* Runs as root by default for port 80, but easily switched to unprivileged user and/or high ports
I’d love feedback from the nginx/devops crowd:
* Any features you wish were included?
* Use-cases where a tiny nginx would be *too* limited?
* Is there interest in an image like this for other internal protocols?
Full README and build details here: [https://github.com/johnnyjoy/nginx-micro](https://github.com/johnnyjoy/nginx-micro)
Happy to answer questions, take suggestions, or discuss internals!
https://redd.it/1lvptij
@r_devops
What are your tips for long running migrations and how to handle zero downtime deployments with migrations that transform data in the database or data warehouse?
Suppose you're running CD to deploy with zero-downtime, and you're deploying a Laravel app proxied with NGINX
Usually this can be done by writing new files to a new directory under ./releases, like ./releases/1001and then symlinking the new directory so that NGINX feeds requests to its PHP code
This works well, but if you need to transform millions of rows, with some complex long running queries, what approach would you use, to keep the app online, yet avoid any conflicts?
Do large scale apps have some toggle for a read only mode? if so, is each account locked, transformed, then unlocked? any best practices or stories from real world experience is appreciated.
Thanks
https://redd.it/1lvix7m
@r_devops
Any tools to automatically diagram cloud infra?
Are there any tools that will automatically scan AWS, GCP, Azure and diagram what is deployed?
So far, I have found CloudCraft from Datadog, but this only supports AWS and its automatically diagraming is still in beta (AFAIK).
I am considering building something custom for this - but judging from the lack of tools that support multi-cloud, or only support manual diagraming, I wonder if I am missing some technical limitation that prevent such tools form being possible.
https://redd.it/1lvjpwo
@r_devops
What does the cloud infrastructure costs at every stage of startup look like?
So, I am writing a blog about what happens to the infrastructure costs as startups scale up. This is not the exact topic, as I'm still researching and exploring. But I needed help from you to understand what, as a startup, the infrastructure costs look like at every stage. At early, growth, and mature stages. It would be great if I could get a detailed explanation of everything that happened.
Also, if you know of any research that took place on this topic, pls share that with me.
And if someone is willing to do so, help me structure this blog properly. Suggest other sections that should definitely be there.
https://redd.it/1lvf23u
@r_devops