Reddit DevOps. #devops Thanks @reddit2telegram and @r_channels
How should I split infrastructure in state files?
We are planning an automation to deploy and configure a product through IaC and some scripting. I'm thinking how to break up Infra into different state files, simplest option being Infrastructure and Kubernetes cluster level declaration, and more complex being Base Infra, Networking, App Infra, Database Infra and app configuration. I don't want to overengineer this thing, but I also want to decouple it's components to avoid a gigantic state file working as a bottleneck and a single point of failure.
Are there any patterns for this? How do I know when enough separation of concerns is enough? If I break it up, should I connect everything in a single pipeline with different stages? Or different pipelines triggered independently?
https://redd.it/1cniy84
@r_devops
Resources other than Oracle for hosting self hosted runners
What resources would you suggest for hosting self hosted runners for CI/CD workflows other than the free instance of Oracle.
I do want to use an instance but I don't have the budget to spring up lots of VMs (I definitely will in the future as I scale up). So in the meantime, if you have any resources in mind, do let me know!
The best I found was from Hetzner.
https://redd.it/1cnh0rz
@r_devops
Junior SRE/Devops losing his mind over database replication.
Hello, I'm a junior devops from Argentina. I've been working as a SRE/Devops for like a year as my first IT job, which has been a challenge. I work for a state company, so it's a shitshow as you can imagine. I have to create a database replication using docker and MySQL. The idea is having two DB, each running in differents servers, for load balancing a wordpress page. The master as a write/read and the slave as a read only. But for the love of god, I can't do it. The containers dont communicate with each other, the master works fine, but the slave is useless. Any ideas of what can I do? Thanks in advance and sorry for bad english, is not my first language.
https://redd.it/1cne215
@r_devops
PC setup questions
Hi All!
Some background:
I'm a newbie with respect to devops but am a .net software engineer with lots of coding experience - so apologies if I get terms wrong or am even asking in the wrong place!
I've recently been given a task to create a "setup" for an entire new PC that controls a scientific instrument. The Windows 11 PC setup will require installing everything from several 3rd Party MSIs, Nvidia CUDA, numerous pre-compiled python applications, dozens of configuration files, creating a user, and setting permissions and firewall rules. The current culture at the company I was asked to help does everything command line and would like to use mostly scripting and a package manager. They have a JFrog artifactory that they will (probably - not sure yet)post their production packages to in some form. I'm a tiny bit familiar with winget, but it looks like JFrog doesn't support winget. So I'm kind of at a loss on how to proceed with JFrog and a package manager. The final solution will need to support new installs and upgrades of any given item in the artifactory.
So here are my (naive) questions to try to get me pointed in the right direction:
1) First, is this the right place to ask these questions?
2) Would using a generic JFrog repository for everything be best?
3) Are artifact versions controlled by how you define the folder/directory structure and their names?
4) Since JFrog doesn't support winget, what package manager can do the downloads and installs (on Windows)?
I've watched a bunch of JFrog videos, but they don't seem to have any content for this scenario (that I could find). Am I even in the ballpark?
Thanks for any advice! 🙂
https://redd.it/1cnajyr
@r_devops
Need Help Understanding "Failed to Query DNS" Error in Docker Setup on RHEL-8 Cloud Server
Hey everyone,
I'm encountering an issue with my Docker setup on a RHEL-8 cloud server, and I could really use some insights and advice. Every time I try to set up my application via Docker, I keep getting the error message "Failed to query DNS <ip-addr>." Interestingly, this error seems to disappear temporarily when I restart Docker, but it resurfaces after about half an hour.
I'm a bit puzzled about what might be causing this and how to resolve it permanently. Has anyone else experienced a similar issue before? If so, how did you go about troubleshooting it?
I'd appreciate any suggestions or recommendations on how to diagnose and fix this DNS-related error. Also, if you have any insights into Docker setups on RHEL-8 servers, I'd love to hear your thoughts.
Looking forward to your responses! Thanks in advance for your help.
https://redd.it/1cn7ske
@r_devops
TeamCity
Hi,
is anyone using teamcity and can give some advice regarding CICD pipelines. Currently we are using agents built with packer on aws. The install script of tc agent AMI has a lot of steps to install all the dependencies such as kubectl, docker, dotnet etc and it needs to be updated now and then. Recently I needed to update nodejs version on tc agent and consequently agent broke in another build job where it needs to deploy aws ELB controller because new version of kubectl(installation isn't locked to version). Then I had somethin brake with pulling the image with docker commands in another job. So eventually I switched to that old image except for nodejs build... So just wanted to know if anybody uses some better approach. Most of our build and deploy jobs are bash scripts..
https://redd.it/1cn4plo
@r_devops
Out of touch management?
Possibly out of touch manager refuses to let us use Ansible, Terraform or any other CI/CD tool and seems to think we can do everything with Power Automate. Are they right or out of touch with PA capabilities?
https://redd.it/1cn2oim
@r_devops
Self thaught DevOps Ressources
Hey Guys, im an IT-Technician specialized in WiFi and Troubleshooting. I have some Network and Linux experience too and i like programm for myself, actually mainly automation stuff with Python. I've seen some really really good paid DevOps Jobs near me and i would like to learn DevOps relevant Topics that i can try to get an Job there.
But my understanding of Devops is quite low. Is it possible to self thaught DevOps without a Software Team behind who really needs an DevOps Engineer? Can someone link me some ressources on the relevant things like ci/cd etc?
https://redd.it/1cn12s0
@r_devops
What if elastic IP is associated with a stopped ec2 instance
What happens to elastic IP when an instance is stopped? In this post, I have explained AWS elastic IP and how it charges while instances are attached or shut down.
https://vishalvyas.com/understanding-elastic-ip-and-its-association/
https://redd.it/1cmwzke
@r_devops
How to automate propagating updates to all internal terraform modules?
Company has a shit ton of internal terraform root modules adhering to our security standards. Problem is every time we make changes we have to manually go through every repo using that module and update manually (or not at all). I’m sure people have figured out a way to automate this, I’ve found dependabot and renovatebot, but wondering what people use and how they use it to find a good solution
https://redd.it/1cmlshh
@r_devops
Control-M Thoughts? Worth it in 2024?
I looked at the prospect of jobs as code with control-m back in 2021, and was overwhelmed by the Automation API and determined that for us, octopus run books can cover our use cases for now, since most could be done with a combination of SQL and PowerShell.
Granted control m is kinda silo’d at our organization so we don’t have great automation support.
Octopus isn’t really compared to competitors of control-m, so I feel like I’m missing something. Can anyone straighten me out?
Thanks
https://redd.it/1cmkkax
@r_devops
Kubernetes + Terraform Youtube Channel Recommendation
Hey All,
Do anyone have a Youtube channel you'd recommend to learn Kubernetes + Terraform? Specifically building out either bare metal K8's or EKS and learning how to build modules within Terraform?
https://redd.it/1cmbndt
@r_devops
Kubiya AI
Hello guys,
Anyone here used Kubiya AI??
Please share your experience with this tool.
Thanks!
https://redd.it/1cmc3j3
@r_devops
Our Best Practices for Source Code Management
I wrote up some of our best practices for managing source code at Doppler that have significantly benefited our team. The post explores various strategies we use. What methodologies and technologies do you find indispensable in managing your source code? How does your team address common challenges?
https://www.doppler.com/blog/our-source-control-best-practices
https://redd.it/1cmacrt
@r_devops
NEW UPDATE: OneUptime - Open Source Datadog Alternative.
ABOUT ONEUPTIME: OneUptime (https://github.com/oneuptime/oneuptime) is the open-source alternative to DataDog + StausPage.io + UptimeRobot + Loggly + PagerDuty. It's 100% free and you can self-host it on your VM / server.
OneUptime has Uptime Monitoring, Logs Management, Status Pages, Tracing, On Call Software, Incident Management and more all under one platform.
Updates:
- Several new monitor options launched - You can now monitor your SSL Certificates and Servers (Processes running, Mem, CPU, Dick, etc)
- Evaluate monitor metrics over time. You can set up alerts for things like - "Create an incident when my website response time is >5 seconds for 5 minutes". This wasn't possible before.
- Added Logs ingestion with fluentd and OpenTelemetry. Traces and Metrics ingestion with OpenTelemetry.
Roadmap to end of Q2:
- New Monitors: We will be working on new monitors options, specifically "Log Monitor", "Traces Monitor", "Metrics Monitor" where you can set up alerts for things like - if there are logs of error logs, create an incident and alert the team.
- Datadog like Dashboards coming soon.
Roadmap to end of Q3:
- We're working on a reliability co-pilot. All you need to do is run a GitHub actions job / CI job where it scans your codebase, queries OneUptime API to get all the error's your software has seen in production. We then try to fix those errors and create PR's automatically. Making your software reliable and better every since day. None of your code will be sent to us. It'll stay on GitHub action runner. We will do this via a local LLM on the runner. Needless to say this will be beta and will getb better over time.
REQUEST FOR FEEDBACK & FEATURES: This community has been kind to us. Thank you so much for all the feedback you've given us. This has helped make the softrware better. We're looking for more feedback as always. If you do have something in mind, please feel free to comment, talk to us, contribute. All of this goes a long way to make this software better for all of us to use.
OPEN SOURCE COMMITMENT: OneUptime is open source and free under Apache 2 license and always will be.
https://redd.it/1cm8gml
@r_devops
How to improve communication skills?
Quick background: I have 15+ of experience but most of those years have been working on my own. I didn't have to answer to anyone, I just did things I felt make sense. So even if I have a lot experience technically I don't have much experience with dealing with people.
For the last 3 years I have been in a huge company as a consultant and have been fairly successful in a platform team. Recently I got asked a question from a dev-team if we had the capability to do "x" in our solution. And I answered no we don't, and that would be too much work to do right now. I honestly expected a follow up question from them but they just answered "Ok, I can understand that". A few weeks later it has come back to our team via back channels (various managers) and they have taken that reply out of context to make me look really bad. What I really meant is that we can't prioritize that since you can do this with y and z instead and having to implement that is not something we want to prioritize since their pattern is questionable at best.
Is anyone of you aware of any literature on how to deal with this in a proactive manner? I'm learning as I go to be more political but I wish upskill without having to learn the hard way. I'm getting really stressed out by this political stuff. I've noticed other respected people in the organization are experts at putting up smoke-screens without ever having to actually deliver much. It's so far away from what I'm used to.
https://redd.it/1cnggy8
@r_devops
2 projects/products from a single source repo, thoughts!
So I am running into a particular scenario where I need some suggestions on how to put a strategy on managing multiple repositories. Imagine a SaaS product (Alpha) developed and managed by a vendor X, which is purchased by a client Y. Y now needs to have a different version of product Alpha with their customizations and product roadmap, naming SaaS as Gamma. Y has received the source code from X and hosted it in their git infra, managed by Y's internal IT.
Now comes the tricky part, at least for me.
Y is in a contract with X for getting support on bug fixes on existing features and also inheriting the product features which X develops for 6 months. Meanwhile, Y has their developers onboarded and is developing other features on Y's roadmap, which are quite different from X's, but Y needs to receive the periodic updates from X as well. The developers working on this project are different
The question is, what is the recommended way to structure these repositories to avoid conflicts (I agree there would be code overlap which needs to be considered case-by-case) and have it in a neat way?
https://redd.it/1cnd80f
@r_devops
How to now get swallowed up by the job
Anyone have any recommendations on how to make DevOps/development work less consuming? I currently work as a staff level engineer and do some dev work for some interesting OSS projects on my spare time.
Recently after taking a 2w vacation without much work going on (bless Asia’s cable cuts for creating a 300ms+ latency back to Europe) I realize how much of my life has really been consumed by just dev work instead of doing other things that I want like hiking etc. Sometimes it’s just a bug or something I’m too focused on and end up spending a lot of time on, or some new feature that I’m building…
Anyone else have this issue or have any advice about it?
https://redd.it/1cncx7v
@r_devops
Is it possible to deploy Argo workflows a job scheduler to AWS and have the control plane in GCP?
Been reading around this all day and I have been getting opinions that it’s not possible and yet it is. I am not sure currently what to believe. I was wondering if anyone has done this before because there doesn’t seem to be much out there around this topic.
https://redd.it/1cn9oxx
@r_devops
Monitoring Tool with data/credentials segregation
I just want to implement a monitoring tool that'd store data in a segregated way that every team have its data stored on a different place and use a different credential to access it.
Like, North Teal will only have access to the North environment, data and logs.
While the South Team is on the same board, will only have access to the South environment, data and everything.
I'm thinking about these tools like:
OpenTelemetry, DebugBear, Signoz, Thanos/Prometheus.
Am I missing something?
I'm all open to recommendations.
https://redd.it/1cn4mz0
@r_devops
would anyone like to bodydouble with me during EST?
Body doubling or parallel working is a strategy used to initiate and complete tasks, such as household chores or writing and other computer tasks.
https://redd.it/1cn4fql
@r_devops
CICD in bitbucket
Hi all, want to ask isit possible to build an arm64 image from bitbucket without using the self-hosted runner? After browsing around the internet i dun really find a solution for it.
https://redd.it/1cn24mf
@r_devops
Ownership Boundaries between FullStack and SRE
Hi Yall,
I'm curious on your opinions on the right relationship between fullstack developers and sre when it comes to ownership of load balancing and security services (authorization, EDR, etc.)?
In this case our fullstack team owns development of a customer facing product which includes our SAAS API and various front ends. The API is composed of micro-services. It does biz logic things like user management, fetching results, ingesting media inputs and passing them into our core engine which is owned by MLEs...
We are a small startup, I am the first and only devops engineer at the moment. I initially wrote authorization and owned load balancing into the system but out of feeling that I was spread too thin I passed this ownership off to the fullstack team. My sense over the past few months is that the fullstack team shouldn't ideally own load balancing and authorization, they tend to do things really quickly and dirtily and I suppose I'm not fully comfortable with their approach. My sense is that this should be owned by security minded SREs who are already responsible for production.
Can you guys assist in providing insight into my potential blindspots? What are your thoughts?
Thanks!
https://redd.it/1cmy6sd
@r_devops
Log archive search
Hey, we are using datadog today but the log rehydration for anything beyond 3 days is a pain due to the cost implications.
We are planning on going with a hybrid approach for historical logs using something like ChaosSearch. Is anyone doing something similar and if so what are you using?
Thanks.
https://redd.it/1cmrjjx
@r_devops
Any senior devs here that wont mind taking a look at my resume?
i am in urgent need of a senior dev/intermediate dev that can just skim over my devops resume and give some feedback. your help is much appreciated ! - junior dev
https://redd.it/1cmmnfn
@r_devops
Huge Build stage duration in Gitlab CI - Frontend Projet
* The build stage is taking so long (with a timeout set to 1 hour, often reached…)
* package.json contain a lot of caret and tilde ranges which will lead to inconsistent build stage duration since the command ‘npm install’ will always try to fetch the latest version of a package, and if not found it will download it…
* the use of an npm registry did not fix the issue since the process will be highly dependent on the net bandwith the Gitlab runner .
What I have been thinking of so far:
* Use ‘npm ci’ instead of ‘npm install’ basing on a package-lock.json. This method required the manual update of the npm lock file, or maybe with the help of third-party tools/extensions such as Renovate through the scheduling of an update process
* Configure cache on a separate disk? (Not sure about this approach and how it could help me in this case)
I posted here in order to get more insights about the matter.
https://redd.it/1cmg7jw
@r_devops
Is it just a job for you or your passion?
Is it just a job for you? Means, you do what you have to do at work hours and after leaving the office/closing your work laptop at home, you do not touch anything related with job? In that case how/when do you learn new stuff? How do you stay motivated?
Or is it your passion, you subscribe devops channels, you try to be "up to date" with any new things, maybe you have your "testing" setup/gear where you learn and try those things. In such case do you have time for other things, hobbies, friends, family? How do you relax (give your brain a rest)?
Its not any "official" survey for some statistic or academic purposes. Its my pure curiosity. I want to learn what's your approach to this job and maybe learn few thing for myself to stay motivated or get fresh look on things.
View Poll
https://redd.it/1cmb68w
@r_devops
Sharing a method of deploying LangServe applications to AWS
In this article, I will share a method for deploying LangServe applications to AWS without the need for Infrastructure as Code (IaC). All that's required are your AWS access credentials; there's no necessity to master AWS operations or to log into the AWS console.
https://pluto-lang.vercel.app/cookbook/deploy-langserve-to-aws
https://redd.it/1cm8bfx
@r_devops
Stackoverflow and OpenAI https://x.com/StackOverflow/status/1787467736097939562
What do you think guys? Should we delete our profiles or not?
# https://x.com/StackOverflow/status/1787467736097939562
https://redd.it/1cm92qh
@r_devops
How do I log http cookies in HAProxy? Preferably the whole cookie header.
I did ask Gemini and used what it gave me, but it didn't work. This is what it gave me: option httplog
log-format custom "%clf %Hr %{+b}C\ " # Captures the entire "Cookie" header with a trailing space
And this is the error I got: [ALERT] (18679) : config : parsing [/tmp/haproxy.cfg:98] : log-format expects only one argument, don't forget to escape spaces!
[ALERT] (18679) : config : Error(s) found in configuration file : /tmp/haproxy.cfg
[ALERT] (18679) : config : Fatal errors found in configuration.
So I removed the trailing space after C\\, and also removed the "\\" since otherwise it was escaping the last closing quotes too. Anyway, I'm still getting the same error. Any ideas? I've been stuck with this for the whole day. Thanks.
https://redd.it/1cm7h4n
@r_devops