Reddit DevOps. #devops Thanks @reddit2telegram and @r_channels
How do you structure incident response in your team? Looking for real-world models
I recently wrote a blog post based on conversations with engineering leaders from Elastic, Amazon, Snyk, and others on how teams structure incident response as they scale.
We often hear about centralized vs. distributed models (ie., a dedicated incident command team vs. letting service teams handle their own outages). But in practice, most orgs blend the two, adopting hybrid models that vary based on:
* Severity of the incident
* Who owns coordination vs. fixing
* How mature or experienced teams are
* Who handles communication (devs vs. support/comms)
I'd love to hear from you:
**How is incident response handled on your team?**
* Do you have rotating incident commanders or just whoever’s on call?
* How do you avoid knowledge silos when distributed teams run their own incidents?
* Have you built internal tooling to handle escalation or severity transitions?
Would love to hear how other teams think about this.
\---
ps: here's the full post if you're curious about hybrid models: [https://rootly.com/blog/owning-reliability-at-scale-inside-the-hybrid-incident-models](https://rootly.com/blog/owning-reliability-at-scale-inside-the-hybrid-incident-models)
https://redd.it/1m2yqbu
@r_devops
I self-created Linkedin Job, Applied with 18 different resumes to see which resume format passes ATS, here it is.
During past few weeks I was experimenting with Linkedin, I created few of accounts with different setup to see what makes candidate to have higher chances to get a job or be rejected by Linkedin filters.
Out of 56 candidates only 18 appeared in my Inbox, for others I had to manually select "Not a Fit" section (spam folder) to see those candidates as they are hidden. They get a rejection letter 3 days after application. LinkedIn does this 3 day thing not to frustrate people, shitty thing if you ask me cuz you are hopeful for that time while in fact you are already rejected.
Before I go on, let me give a full disclosure, I'm sharing LaTeX formatted resume for TL;DR (latex is open source format for creating documents) also I'm adding UI Interface I did for those who just wanna use UI to drag and drop PDF, before you accuse me of something you should be aware that this app is free (with limitations) and doesn't require signup it basically takes your current resume and converts that to the very same LaTeX resume so you don't have to do it manually. You can use either, both will be equally fine, UI works only for pdf (no Word files) also it fails sometimes (1-2% of times), I have no plans of improving it, but you can.
Ok lets continue with Linkedin filters:
The very first and most Brutal filter is if your Country is not in same country where job was advertised.
If job is advertised as Hybrid or On-Site, and your location is way too far even in same country you have 50-50 chance of ending up in spam (auto-reject)
Another one is your Phone number's country code, don't use foreign numbers
Another big one is Resume format. Some PDF resume formats especially fancy ones are not parsed well by Linkedin and if they can't parse it they will rank you significantly lower. Keep it very simple in terms of styling.
Don't spam bunch of keywords e.g. comma separated/bullet list of technologies at the bottom of the page, this kind of tricks doesn't work anymore and will do more harm triggering spam filter, keywords should be naturally integrated in descriptions of what you did at your past jobs. If you need to highlight them for recruiters you can use bold text.
Here is the link to the site: interview10x.com
https://redd.it/1m310bz
@r_devops
I self-created Linkedin Job, Applied with 18 different resumes to see which resume format passes ATS, here it is.
Hi Folks,
During past few weeks I was experimenting with Linkedin, I created few of accounts with different setup to see what makes candidate to have higher chances to get a job or be rejected by Linkedin filters.
Out of 56 candidates only 18 appeared in my Inbox, for others I had to manually select "Not a Fit" section (spam folder) to see those candidates as they are hidden. They get a rejection letter 3 days after application. LinkedIn does this 3 day thing not to frustrate people, shitty thing if you ask me cuz you are hopeful for that time while in fact you are already rejected.
Before I go on, let me give a full disclosure, I'm sharing LaTeX formatted resume for TL;DR (latex is open source format for creating documents) also I'm adding UI Interface I did for those who just wanna use UI to drag and drop PDF, before you accuse me of something you should be aware that this app is free (with limitations) and doesn't require signup it basically takes your current resume and converts that to the very same LaTeX resume so you don't have to do it manually. You can use either, both will be equally fine, UI works only for pdf (no Word files) also it fails sometimes (1-2% of times), I have no plans of improving it, but you can.
Ok lets continue with Linkedin filters:
The very first and most Brutal filter is if your Country is not in same country where job was advertised.
If job is advertised as Hybrid or On-Site, and your location is way too far even in same country you have 50-50 chance of ending up in spam (auto-reject)
Another one is your Phone number's country code, don't use foreign numbers
Another big one is Resume format. Some PDF resume formats especially fancy ones are not parsed well by Linkedin and if they can't parse it they will rank you significantly lower. Keep it very simple in terms of styling.
Don't spam bunch of keywords e.g. comma separated/bullet list of technologies at the bottom of the page, this kind of tricks doesn't work anymore and will do more harm triggering spam filter, keywords should be naturally integrated in descriptions of what you did at your past jobs. If you need to highlight them for recruiters you can use bold text.
https://redd.it/1m2yzxj
@r_devops
finished my first full CI/CD pipeline project (GitHub/ ArgoCD/K8s) would love feedback
Hey folks,
I recently wrapped up my first end-to-end DevOps lab project and I’d love some feedback on it, both technically and from a "would this help me get hired" perspective.
The project is a basic phonebook app (frontend + backend + PostgreSQL), deployed with:
* GitHub repo for source and manifests
* Argo CD for GitOps-style deployment
* Kubernetes cluster (self-hosted on my lab setup)
* Separate dev/prod environments
* CI pipeline auto-builds container images on push
* CD auto-syncs to the cluster via ArgoCD
* Secrets are managed cleanly, and services are split logically
My background is in Network Security & Infrastructure but I’m aiming to get freelance or full-time work in DevSecOps / Platform / SRE roles, and trying to build projects that reflect what I'd do in a real job (infra as code, clean environments, etc.)
What I’d really appreciate:
* Feedback on how solid this project is as a portfolio piece
* Would you hire someone with this on their GitHub?
* What’s missing? Observability? Helm charts? RBAC? More services?
* What would you build next after this to stand out?
[Here is the repo](https://github.com/Alexbeav/devops-phonebook-demo)
Appreciate any guidance or roast!
https://redd.it/1m2w4be
@r_devops
What distro do you use?
Hello fellas.
Im generally interested what distro did you find most suitable for your work?
Me personally, I use PopOS because of the window manager. Once you learn all the shortcuts you dont even need to touch the mouse. I know I can install the window manager on other distros, but here it works out of the box.
I tried nixOS recently, but to be honest I didnt liked it.
https://redd.it/1m2ush2
@r_devops
Free Advanced DevOps Video Series – For Developers Transitioning to DevOps
Hey folks,
If you're a developer, sysadmin, or cloud enthusiast looking to shift into DevOps, here’s something useful.
I’ve compiled a few free, advanced-level DevOps playlists that are now available on YouTube. These cover real-world tools and go beyond beginner tutorials — useful for anyone wanting to build depth or prep for a role in CI/CD, automation, or cloud infrastructure.
# 🎓 What’s Covered?
Ansible Full Course (15+ Hours / 9 Sessions) Covers YAML, playbooks, roles, inventory, real-world automation. ▶️ [https://www.youtube.com/watch?v=MJ9yJoPmypc&list=PLO9ci1OliMiPHEbSjkSphCHAdqco5rt8K](https://www.youtube.com/watch?v=MJ9yJoPmypc&list=PLO9ci1OliMiPHEbSjkSphCHAdqco5rt8K)
Jenkins CI/CD Complete Course (17+ Hours / 10 Videos) Deep dive into Jenkins pipelines, jobs, integration with Git, Docker, etc. ▶️ https://www.youtube.com/watch?v=ZQBb751E2sg&list=PLO9ci1OliMiNorLuMdVfthBO4hII9uTjz
Python for DevOps Engineers (6+ Hours) Scripting, subprocesses, automation, API calls, and real use cases. ▶️ [https://www.youtube.com/watch?v=ZQBb751E2sg&list=PLO9ci1OliMiNorLuMdVfthBO4hII9uTjz](https://www.youtube.com/watch?v=ZQBb751E2sg&list=PLO9ci1OliMiNorLuMdVfthBO4hII9uTjz)
Shell Scripting for DevOps (6+ Hours) Covers bash, conditions, loops, automation tasks — perfect for Linux-based workflows. ▶️ https://www.youtube.com/watch?v=COR8xdfX3tU&list=PLO9ci1OliMiNUicQCjqA77bEDHDGF\_usI
🛠 All videos are detailed, hands-on, and go beyond theory. You’ll find production-style implementations and use cases — not just hello-world scripts.
🔖 Bookmark this if you're planning to move from development to DevOps in the coming months. No fluff. Just structured content for real growth.
More topics like Docker, Kubernetes, Terraform, and AWS pipelines will be added soon.
Hope it helps someone on their DevOps journey. 🙌
https://redd.it/1m2rn3q
@r_devops
Licensing requirements for enterprise deployment
Hello everyone!
BACKGROUND: My organization is a government owned power utility enterprise with a sizeable amount Electrical Engineers (Around 5000). We have a small IT team comprising about 50 engineers. Most of our IT work/application development (Finance/ERP) have been so far managed by contractors.
But of late in house application development has been gaining traction. I have been recently transferred to the IT department to develop an application for the Electrical Power System domain.
My company has strict budget requirements of developing applications with open source technologies only with no cost involvement for software license.
I need to deploy a self hosted centralized version control system with CI CD solution along with a self hosted container registry. I have chosen GitLab Community Edition and Docker Community Edition (Not Docker Desktop, just the Engine and CLI), Docker compose and Harbor as the required technologies.
My Question:
I know all these technologies are open source with MIT and Apache 2.0 licenses. But is there any hidden cost that I may have overlooked particularly for enterprise deployment with such a large scale?
https://redd.it/1m2qaap
@r_devops
Looking for DevOps Intern/Volunteer gigs for real world experience
I'm looking to break into DevOps and am actively seeking part-time roles, internships, or volunteer opportunities to gain practical, hands-on experience.
I have built numerous CI/CD pipelines on Jenkins and GitHub Actions for my side projects, provisioned EKS clusters using Terraform, deployed applications with ArgoCD, and monitored systems with Grafana and Prometheus. I have experience with Docker and Kubernetes and hold the AWS Solutions Architect Associate certification. I recently graduated with my Bachelor of Science in Software Engineering. I also have two years of frontend web development experience as part-time work for startups while I was attending school.
If you have work that needs help with, I would love to join and learn
https://redd.it/1m2l6u5
@r_devops
Browserstation open source alternative of browserbase
We just released BrowserStation, an open source alternative to Browserbase that lets you deploy and manage headless Chrome browsers on your own infra.
It’s built with Kubernetes and Ray, using a sidecar pattern for isolated browser instances and exposes a secure WebSocket proxy for full CDP control.
It integrates with agent frameworks like LangChain and Browser-Use, supports metrics and API key auth, and runs on any cloud or local cluster. Feedback and contributors welcome: https://github.com/operolabs/browserstation
and more info here.
https://redd.it/1m2h3in
@r_devops
AWSCDK appreciation post
Exactly seven years ago today (July 17, 2018), the AWS CDK was publicly announced. I honestly still think it’s one of the most elegant pieces of infrastructure tooling out there. The high-level interface, the design decisions, the focus on developer experience, to me, not many tools today top it (except the CloudFormation part of it).
Over the past year, I’ve been working on bringing that same interface to Terraform. Mainly just to make the same experience available in environments where the original AWS CDK might not have been option just because Terraform has been the standard there.
My hope is for those people who have avoided the AWSCDK because of CFN to give this a try and see if they like it?
Here is the whole cdkworkshop completely ported to terraform: https://aws-workshop.terraconstructs.dev/15-prerequisites.html - let me know what you think?
https://redd.it/1m2cuoe
@r_devops
Theory"
7. Bell, J.S. "On the Einstein Podolsky Rosen paradox"
8. Penrose, Roger. "The Road to Reality"
9. Baez, John C., and Mike Stay. "Physics, Topology, Logic and Computation"
10. Abramsky, Samson, and Bob Coecke. "A categorical semantics of quantum protocols"
https://redd.it/1m29bja
@r_devops
responsive(x))
Partition Tolerance: □◇(∃P ⊆ Nodes. partitioned(P))
```
Where □ denotes "always" and ◇ denotes "eventually".
The CAP theorem can then be proven using the semantic incompatibility of these temporal formulas under the assumption of finite message propagation delays.
### 3.2 Branching Time and Concurrent Histories
In Computation Tree Logic (CTL), the impossibility becomes even more apparent. The branching structure of possible futures creates a tree of computation paths, and the CAP properties constrain which paths are realizable:
```
AG(Consistent ∧ Available ∧ PartitionTolerant) ≡ ⊥
```
This formula is unsatisfiable in any model where network partitions are possible, revealing the fundamental incompatibility at the level of temporal logic semantics.
## 4. Quantum Mechanical Analogies and Bell's Theorem
### 4.1 CAP as Distributed Bell Inequality
The CAP theorem exhibits striking parallels to Bell's theorem in quantum mechanics. Both theorems demonstrate the impossibility of maintaining local realism (consistency) while preserving certain global properties (availability and partition tolerance).
Consider the CAP inequality:
```
⟨C⟩ + ⟨A⟩ + ⟨P⟩ ≤ 2
```
This mirrors the CHSH Bell inequality:
```
|⟨AB⟩ + ⟨AB'⟩ + ⟨A'B⟩ - ⟨A'B'⟩| ≤ 2
```
Both inequalities arise from the fundamental impossibility of reconciling local hidden variables (local state) with global correlations (distributed consensus).
### 4.2 Entanglement and Distributed State Correlation
The desire for strong consistency in distributed systems is analogous to quantum entanglement. Just as entangled particles maintain correlated states regardless of spatial separation, strongly consistent distributed systems maintain correlated state regardless of network topology.
However, the no-communication theorem in quantum mechanics shows that entanglement cannot be used for faster-than-light communication, paralleling how strong consistency cannot be maintained during network partitions without violating availability.
## 5. Differential Geometry and Consensus Manifolds
### 5.1 Configuration Space as Riemannian Manifold
The space of all possible distributed system configurations can be modeled as a Riemannian manifold M with metric tensor g that encodes the "distance" between states in terms of consensus difficulty.
The CAP properties define submanifolds:
- C ⊂ M (consistent configurations)
- A ⊂ M (available configurations)
- P ⊂ M (partition-tolerant configurations)
The CAP theorem states that C ∩ A ∩ P = ∅, meaning these submanifolds do not intersect.
### 5.2 Geodesics and Optimal Consensus Paths
The shortest path between distributed states (geodesics) in this manifold represents optimal consensus protocols. The curvature of the manifold, determined by the Riemann tensor, encodes the fundamental difficulty of achieving consensus.
The CAP theorem emerges from the fact that the manifold has regions of infinite curvature where geodesics cannot exist, corresponding to the impossibility of simultaneous CAP properties.
## 6. Homotopy Theory and Distributed Algorithms
### 6.1 The Fundamental Group of Distributed Computation
The space of distributed computations forms a topological space whose fundamental group π₁(X) captures the essential structure of distributed algorithms. Different consensus protocols correspond to different homotopy classes of paths in this space.
The CAP theorem can be understood as a statement about the non-contractibility of certain loops in this space, meaning that some distributed computations cannot be continuously deformed into trivial (non-distributed) computations.
### 6.2 Obstruction Theory and Impossibility Results
Using obstruction theory, we can show that the CAP constraints create topological obstructions to the existence of certain distributed algorithms. The obstruction classes lie in higher cohomology groups H^n(X; π_{n-1}(Y)), where X is the space of network configurations and Y is the space of desired behaviors.
## 7. Game-Theoretic and Mechanism Design Perspectives
### 7.1 CAP as Multi-Agent
Upcoming changes to the Bitnami catalog
Broadcom introduces Bitnami Secure Images for production-ready containerized applications:
[https://news.broadcom.com/app-dev/broadcom-introduces-bitnami-secure-images-for-production-ready-containerized-applications](https://news.broadcom.com/app-dev/broadcom-introduces-bitnami-secure-images-for-production-ready-containerized-applications)
https://github.com/bitnami/charts/issues/35164
https://redd.it/1m28mag
@r_devops
AI has had no noticeable difference in monitoring / troubleshooting
I obviously use chatgpt to ask for how to debug or some ideas why specific issue might be happening. I also use cursor to create runbooks / alerts / dashboards but that's about it. I have tried a bunch of tools that try to talk to k8s cluster etc but haven't been able to see a noticeable difference generally in debugging. Most of my life is in terminal/logs or dashboards..
One place I have seen though, is in Supabase. They have a cool AI assistant that can query the db / check schema / errors within it's data and do the analysis.
What's the best use-case that you've seen so far that you're repeatedly using? Curious to hear if any of you have been able to validate the AI productivity gain as a DevOps/SRE!
https://redd.it/1m24tjs
@r_devops
Infrastructure automation and self service portal for helpdesk
I hope my question is in the right subreddit. If not, I will remove my post.
I would like to get the community's opinion on a work-related topic. I am a network engineer and I want to implement network and system automation in my company. I have already developed several Ansible roles to automate the management of Cisco switches. I will also develop playbooks/roles for automating the deployment and configuration of Windows and Linux servers.
This automation has two objectives: to harmonize the configuration of new machines being deployed, and to allow the helpdesk to carry out some basic actions without having to connect directly to the machines. For these actions, I have set up Semaphore UI (https://semaphoreui.com/) and created several tasks. Everything works as expected. However, I find that using Semaphore is not suited for the helpdesk. Reading and understanding Ansible logs requires knowledge of Ansible, which the helpdesk does not have.
So my question is: Should the helpdesk be trained to understand and read Ansible logs, or would it be better to develop an independent web application, with a simpler GUI tailored for the helpdesk, containing easy-to-fill fields and a results table, for example? This web application would use the Semaphore UI API to launch jobs and display the results.
https://redd.it/1m22uwq
@r_devops
Devops, CI/CD, Docker, etc. course
Hello,
I'm looking for a course that covers all DevOps concepts — both from a project-level perspective and, of course, the technical side like Docker, CI/CD, etc.
I found this course, which doesn’t seem bad:
https://www.coursera.org/professional-certificates/devops-and-software-engineering#courses
Plus, I could list an “IBM Certification” on LinkedIn.
What do you think?
Do you have any other course suggestions?
I’m also willing to pay, as long as it’s something well-structured and high quality.
Keep in mind that I work full time, so I don’t have time for 400,000-hour courses that explain things I’ll never use.
Thanks!
https://redd.it/1m333nw
@r_devops
Managing authorization for every identity with full visibility, consistent policy enforcement, and alignment with a Zero Trust strategy - solution my team and I have been working on for the past 4 years. What do you think about it?
Hey everyone! I thought it would make sense to share about a solution my team and I have been working on for the past 4 years, in this community. Would love to get your thoughts on it.
I think it’s especially relevant, since OWASP’s Top 10 top issue has been related to access control for several years now.
The back story is that permission management across applications is difficult, especially as the code base grows. You have 100+ users, multiple services, and several environments. And hardcoded access control rules tangled with business logic make every new role and permission change a hassle to write, test, and maintain.
So, in order for the access rules to stay consistent across the entire code base & avoid security vulnerabilities - we built Cerbos. It’s an authorization layer that can evolve as your product grows. It enables our users to define context-aware access control in simple, intuitive, and testable policies.
The part I'm most excited to share with you, is that over the last year we’ve spoken with hundreds of customers, which has helped shape four new use cases of Cerbos Hub :)
Fine-grained, tenant specific authorization. If you’re thinking “We need to let our customers define their own roles and rules without hardcoding every customization” - that can now be done with Cerbos Hub.
Dynamic policy management at scale. Users can automate the full lifecycle of their authz policies (Policy Stores enable programmatic creation, updates, and deployment of policies via API, triggered by any event or system in their stack)
Scalable NHI permission management. We’ve all heard about the incidents related to overprivileged NHIs…Cerbos’s NHI support gives teams centralized, policy-based authorization for every non-human identity.
Secure authorization for MCP servers. MCP-related breaches are popping up as well - Asana, Atlassian, and most recently - Supabase. Clearly, misconfigured agents can easily access more than they should. Cerbos Hub can control which agents can access which MCP tools, using policies evaluated per agent, per tool, and per session, outside your server logic.
Here are more details, if you’re interested: https://www.cerbos.dev/blog/updated-cerbos-hub-complete-authorization-solution-for-your-identity-fabric
And if you'd prefer to watch a video on how it works, rather than read: https://youtu.be/JNiNV15WIr4
What do you think of the solution? ( Constructive criticism more than welcome as well :) )
Do you think it could be useful to you?
https://redd.it/1m2yzmm
@r_devops
What Are the DevOps Tools You Rely on Most This Year?
Hey Redditors, I’ve been reflecting on the ever-growing toolbox we use in DevOps. Are there any tools you swear by in 2025, ones that consistently help you out, no matter how tough the situation? Whether it’s for troubleshooting, automation, monitoring, or deployment.
For me, one tool that has consistently proven its value is Tailwind CSS. While it’s often mentioned for UI work, I’ve found its utility-first approach to bring design consistency and speed, helping me ship front-ends more efficiently, especially when paired with rapid automation and deployment cycles.
https://redd.it/1m2yhkg
@r_devops
Cloudflare's Transparency Deserves More Credit
The recent Cloudflare outage got me looking and thinking more about how this seems to be becoming more normal. You can find metrics online showing that data centers are more reliable than ever, but sources like thousandeye show regular major incidents. That led me to write this blog.
Curious what other's think. Is this just a biased perspective because I'm spending more time looking at these things, or is infrastructure consolidation creating problems (at least in the short term)? & is there anyone else matching Cloudflare's public post-mortem's?
https://redd.it/1m2vtfv
@r_devops
Looking for some input/ guidance on CI/CD pipelines
To preface, I am a in a junior role with the ability to potentially influence change.
Right now our team uses AWS Codebuild/codepipeline which push the images for containerized deployments. As it stands there is only one pipeline which everything flows through, lets call it DEV and then we promote the containers manually to different environments, test and prod.
Having a conversation with the devs, in a way this set up seems ok for the time being because if they had test, they were pretty frank that they would bypass any deployments to DEV and go straight to TEST env. Which I want to avoid because then those environments would not be in sync.
What would be considered best practice in this case, I would like to see what I could maybe do better without having drift in our environments.
edit: or some book recs on reading about this
https://redd.it/1m2rw12
@r_devops
How do you manage downstream deployments?
I have several go packages and applications I’m working with. For example one contains business logic and data store operations, others are standalone apps, lambda functions, etc.
Deployments for core packages consist of having to manually update each project that needs to support the new version of the package. I.e. the feature may be complete in the business logic, but apps that depend on that code must get recompiled with the new version. For the actual deployment of apps, I use Bitbucket pipelines to perform tasks like uploading a new image to ECS or updating a lambda function.
I have a feeling we’re outgrowing this because it’s getting tough to remember what to update downstream. In the perfect world everything would be running the current version of the base package, however that isn’t always necessary. And I’m working on getting a dependency graph/chart setup, but if there’s a smarter way to handle something like this, I’d love to hear what you all do in these situations.
https://redd.it/1m2pgo1
@r_devops
Next phase SRE interview, what to expect?
Hey folks,
I recently had a technical interview for an SRE role (focused mainly on networking), and just got invited for the next phase a 30-minute virtual interview with the Director of SRE!
The email didn’t include any specific details about what we’ll be discussing
Any idea what to expect from a director level interview? Is it more behavioral, system design, culture fit, or high-level technical discussion?
Would love to hear your experiences or tips on how i should prepare!
https://redd.it/1m2lzxc
@r_devops
Alternatives for a code quality checks and security checks
Hi all, I'm new here and I'm also a junior DevSecOps. I was wondering what you could recommend for a code quality and security check. I'm working for a small company at the moment and they can't afford much, so I was looking for a free but effective alternative. So I'm looking for a free but effective solution. It would also be a good addition to my dissertation to have found a free or cheaper but effective solution.
https://redd.it/1m2hbss
@r_devops
Seeking Arch Advice: Tines story, ECS-hosted webapp session token mismatches
First off is this is not an appropriate place to ask then my apologies.
I'm not a devops guy, nor a dev at all so I'm outside my comfort zone but really have nowhere else to ask.
I have a Tines story that fronts a webapp that I've deployed to ECS. Works fine on one Task but when it scales up it breaks, because it's not designed to be scaled.
When an HTTP Request is called in Tines, the Credential is authorized and an access_token and a session_token are successfully created. However the HTTP Request itself (after the credential) ends up being load balanced to the second Task.. and that fails on 'invalid session token'.
I have not been able to figure this out on the Tines side, so I am experimenting on the AWS side.. Tried stickiness using both LB tokens and Application tokens. Neither works.
I'm asking for ideas on how to resolve this problem, outside of recompiling the Goland webapp,, that's a non-starter.
Can a redis container be added into the same service in ECS? Maybe an elasticache, API Gateway stack?
https://redd.it/1m2fsru
@r_devops
Octopus Deploy reviews for a large enterprise? Worth it long-term?
Curious if folks in big orgs are still happy with Octopus after 2+ years. Does it hold up with hundreds of apps and multi-region infra? Or does it hit a wall eventually?
https://redd.it/1m2d0bm
@r_devops
Mechanism Design Problem
The CAP theorem can be reinterpreted through the lens of mechanism design theory. Each node in the distributed system is a rational agent with preferences over consistency, availability, and partition tolerance.
The impossibility result emerges from the fact that no mechanism exists that can implement all three properties simultaneously while maintaining incentive compatibility and individual rationality.
### 7.2 Algorithmic Game Theory and Nash Equilibria
In the game-theoretic formulation, the CAP theorem shows that no Nash equilibrium exists where all players (nodes) can simultaneously achieve their desired outcomes for all three properties. The theorem thus represents a fundamental limitation of distributed coordination mechanisms.
## 8. Logical Foundations and Proof Theory
### 8.1 Intuitionistic Logic and Constructive Proofs
The CAP theorem's proof has deep connections to intuitionistic logic and constructive mathematics. The impossibility is not merely classical logical negation but represents a constructive impossibility - there exists no algorithm that can construct a system satisfying all three properties.
In the language of type theory:
```
¬∃(s: System). Consistent(s) ∧ Available(s) ∧ PartitionTolerant(s)
```
This is provable constructively, meaning we can exhibit a specific contradiction for any proposed system.
### 8.2 Modal Logic and Possible Worlds Semantics
Using modal logic, the CAP theorem can be expressed as:
```
□(C ∧ A ∧ P) ≡ ⊥
```
In possible worlds semantics, this means there exists no possible world (network configuration) where all three properties hold simultaneously.
## 9. Information Geometry and Statistical Manifolds
### 9.1 Fisher Information and Consensus Efficiency
The efficiency of distributed consensus protocols can be analyzed using information geometry. The Fisher information matrix encodes the sensitivity of the consensus process to parameter changes, and the CAP constraints impose bounds on the achievable Fisher information.
The Cramér-Rao bound provides a lower bound on the variance of distributed estimators, showing that the CAP trade-offs are reflected in the fundamental limits of statistical inference in distributed systems.
### 9.2 Exponential Families and Maximum Entropy
The space of distributed system configurations can be parameterized as an exponential family with sufficient statistics corresponding to the CAP properties. The maximum entropy principle then provides a natural way to understand the trade-offs between these properties.
## 10. Conclusion: Toward a Unified Field Theory of Distributed Systems
The CAP theorem represents more than a practical constraint on distributed systems; it reveals deep mathematical structures that connect distributed computing to fundamental areas of mathematics and physics. The impossibility is not merely technical but reflects profound limitations rooted in the nature of information, space, and time.
By examining the theorem through multiple mathematical lenses - topology, category theory, quantum mechanics, differential geometry, and logic - we gain insight into the fundamental nature of distributed computation and its relationship to the physical universe. The CAP theorem thus serves as a bridge between computer science and the deeper mathematical structures that govern reality itself.
The implications extend beyond distributed systems to questions of consciousness, knowledge, and the nature of information itself. In a universe where information cannot travel faster than light, the CAP theorem may represent a fundamental constraint on the structure of reality itself.
## References
1. Brewer, Eric. "Towards robust distributed systems"
2. Gilbert, Seth, and Nancy Lynch. "Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services"
3. Hatcher, Allen. "Algebraic Topology"
4. Nielsen, Michael A., and Isaac L. Chuang. "Quantum Computation and Quantum Information"
5. Mac Lane, Saunders. "Categories for the Working Mathematician"
6. Awodey, Steve. "Category
CAP Theorem: A more nuanced look from an experienced developer and mathematician
The CAP theorem, originally formulated by Eric Brewer and subsequently proven by Gilbert and Lynch, represents a fundamental impossibility result in distributed systems theory. This treatise examines the theorem through the lens of algebraic topology, quantum information theory, and category-theoretic foundations, revealing deep connections to the fundamental limits of information propagation in spacetime and the topological constraints of distributed consensus protocols.
## 1. Theoretical Foundations and Metamathematical Structure
### 1.1 The Brewer Conjecture as Topological Invariant
The CAP theorem can be reinterpreted as a statement about the topological invariants of distributed system configurations. Let us define a distributed system as a simplicial complex K where:
- Vertices represent computational nodes
- Edges represent communication channels
- Higher-dimensional simplices represent coordinated operations
The CAP properties can then be formalized as cohomological invariants:
```
Let H^n(K; R) be the n-th cohomology group of K with coefficients in ring R
Let C: H^0(K; Z₂) → {0,1} be the consistency functional
Let A: H^1(K; Z₂) → {0,1} be the availability functional
Let P: H^2(K; Z₂) → {0,1} be the partition tolerance functional
Then the CAP theorem states: C(K) + A(K) + P(K) ≤ 2 for all connected K
```
### 1.2 Information-Theoretic Formalization via Channel Capacity
From an information-theoretic perspective, the CAP theorem emerges from the fundamental limits of information transmission in noisy channels. Consider a distributed system as a quantum channel Φ: B(H₁) → B(H₂) where B(H) represents the bounded operators on Hilbert space H.
The consistency requirement demands that all observers share identical quantum states |ψ⟩, implying perfect quantum error correction. The availability requirement demands that the channel capacity C(Φ) remains positive under all conditions. The partition tolerance requirement allows for arbitrary noise in the quantum channel.
By the quantum no-cloning theorem and the Holevo bound, we can show that:
```
C(Φ) ≤ max_{ρ} S(Φ(ρ)) - S(Φ(ρ)|ρ)
```
Where S denotes the von Neumann entropy. The impossibility of satisfying all three CAP properties simultaneously follows from the incompatibility of quantum error correction with maintaining channel capacity under arbitrary noise.
## 2. Categorical Semantics of Distributed Consensus
### 2.1 The Category of Distributed States
Define the category **DistSys** where:
- Objects are distributed system states
- Morphisms are state transitions preserving invariants
- Composition represents sequential operations
The CAP properties can be understood as functors from **DistSys** to the category of Boolean algebras:
```
C: DistSys → Bool (Consistency functor)
A: DistSys → Bool (Availability functor)
P: DistSys → Bool (Partition tolerance functor)
```
The CAP theorem then states that there exists no natural transformation that makes the diagram commute for all three functors simultaneously.
### 2.2 Sheaf-Theoretic Interpretation of Global State
The notion of global consistency in distributed systems can be formalized using sheaf theory. Let X be the topological space of system nodes with the network topology, and let F be the sheaf of local states.
Global consistency requires that the global sections Γ(X, F) form a coherent view. However, when network partitions occur, the topology of X becomes disconnected, and the sheaf condition fails:
```
For open cover {Uᵢ} of X:
F(U) ≠ lim←ᵢ F(Uᵢ ∩ Uⱼ)
```
This sheaf-theoretic perspective reveals that consistency is fundamentally about the ability to lift local observations to global knowledge, which becomes impossible under partition conditions.
## 3. Temporal Logic and Distributed Consensus
### 3.1 Linear Temporal Logic Formalization
The CAP properties can be expressed in Linear Temporal Logic (LTL) as:
```
Consistency: □(∀x,y ∈ Nodes. read(x) = read(y))
Availability: □◇(∀x ∈ Nodes.
Automating scaffold changes across multiple python repos
I'm a software engineer that is responsible for maintaining 40-50 repositories on github for my data science team. We are a startup, and there are still a lot of things that we want to change over time. A lot of our repositories are built with python code. Most of the repos are quite similar, it's usually just the underlying python code that changes. Our team works on individual laptops.
I have a scaffold repository that works with cookiecutter, and I want to update over time. The scaffold includes pre-commit/linting configs, dockerfile, python versions, ci/cd definitions. I want to push changes in the scaffold repo across all repositories in github to try to keep things as consistent across repos as possible. I've asked my team to pull stuff as things change, but most repos do not get updated. At the same time, I need people to be able to make modifications across any repository for something that is custom.
I've looked at using git submodules/subtrees for each individual file of the scaffold, and used git-xargs to open prs across multiple repos. There were enough differences between 5 repos that auto-merging the PRs wasn't working, and used https://github.com/dlvhdr/gh-dash to check what PRs were still open due to conflicts (This could be done with a script instead).
Has anyone managed a similar setup? Any alternatives you found for this?
https://redd.it/1m24wny
@r_devops
Trying to break into CS - worth doing a conversion master’s? + CV feedback please!
Hey all, I’m hoping to get some advice. I’ve been working in Tech for about 6 years, mostly on the business/marketing side. More recently, I took on a junior data analyst role, but it’s still quite marketing/business focused rather than purely technical.
This September, I’m planning to start a part-time conversion master’s in Computer Science (my company is sponsoring me) to properly pivot into the CS field.
I’m wondering:
1. Is it actually worth doing a conversion master’s in CS, given my background?
2. See my CV https://imgur.com/a/K3YTaKc, does it look okay for someone trying to break into CS? Anything you’d suggest changing or adding?
Any feedback or thoughts would be massively appreciated! Thanks 😊
https://redd.it/1m24gt0
@r_devops
I built a tiny Windows service wrapper for production use - looking for feedback
Hi all,
Over the past couple of months, I've been having to wrap apps, scripts & utilities as WIndows Services for a few projects at work. Tools like WInSW & NSSM do exist, but I seem to keep running into bugs or missing features - especially around log rotation, management & restarting behaviour.
This led me to build WInLet \-a tiny, production-focused WIndows service wrapper we now use internally at work. It's really built to be simple to use and to offer proper support for log management, env vars, restart policies & so on.
Key features:
Run any script or executable as a Windows Service
A plethora of log management configurations - rotation, compression, etc
Configurable auto-restart on failure
Tiny footprint
Easy-to-read TOML configuration
Example config:
Example config (with full logging and health check):
[service]
name = "my-web-api"
display_name = "My Web API"
description = "Production web API with monitoring"
[process]
executable = "node"
arguments = "server.js"
working_directory = "C:\\Apps\\MyWebAPI"
shutdown_timeout_seconds = 45
[process.environment]
NODE_ENV = "production"
PORT = "3000"
DATABASE_URL = "postgresql://db-server/myapi"
[logging]
level = "Information"
log_path = "C:\\Logs\\MyWebAPI"
mode = "RollBySizeTime"
size_threshold_kb = 25600
time_pattern = "yyyyMMdd"
auto_roll_at_time = "02:00:00"
keep_files = 14
zip_older_than_days = 3
separate_error_log = true
[restart]
policy = "OnFailure"
delay_seconds = 10
max_attempts = 5
window_seconds = 600
[service_account]
username = "DOMAIN\\WebAPIService"
allow_service_logon = true
prompt = "Console"
Install/start it like this:
WinLet.exe install --config my-web-api.toml
WinLet.exe start --name my-web-api
Here's what's coming next - especially as our internal requirements evolve at work:
Prometheus metrics & Windows performance counters
PowerShell module
Hot-reload of config changes
Service dependency graph and bulk operations
Web dashboard for management
I'd love to hear form anyone managing/using Windows services - suggestions, feedback & other use cases you may have are all welcome. Posting in here as well in the hope someone else finds it useful.
Github: ptfpinho23/WinLet: A modern Windows service runner that doesn’t suck.
https://redd.it/1m21ug6
@r_devops