2830
English speaking PostgreSQL public chat. This group is for discussions on PostgreSQL-related topics and strives to provide best-effort support as well.
I will follow your advice, thank you both very much.
Okay, I will take a look at them.
But I also want to ask you about ScyllaDB — if you know it… is it strong in this area?
You’re right — handling a single node, especially in SQL, is much better than dealing with multiple nodes, because the latter involves certain trade-offs that we need to be very careful about.
Читать полностью…
Generally — not.
Well, at least it's simpler to handle on one node than on several ones.
I provide analytical services for some large companies, and we’re still in the early stages. The current data is just for testing, but once we start running analytics for real companies, the data volume will be huge.
In fact, I’m using Elasticsearch, but I want to test whether PostgreSQL would be faster. The data for just one month in Elasticsearch reached 60 TB.
10TB of storage is 200€ (HDD) or 900€ (3x 4T SSD)
10TB of memory is ~37k€ for just the sticks, and good luck finding a system that can fit allo of that in a single node
Crap, I didn't read all the conversation. As usual I do.
Byt trust me, that was just a guess!
Just don't.
(Every first newby wants to do that. Always unsucsessfuly.)
Postgres is a relational database — which means it's all about consistency, locking and predictable synchronisation.
This means that speed penalty in attempt to do that via networks — range from very large to enormous.
Highly skilled architect may get something useful out of it — when costs of hardware upgrade gets more than "very large". When there are nonlinear hardware costs increase vs speed — and that is very powerful server systems, hundreds of thousands of dollars.
You don't have neither the server that can't be upgraded nor knowledge of how to attempt to distribute something — so just don't.
100GB is not that big at all (can fully fit in memory).
10TB is not an issue either, fits on a single HDD or a couple of fast nvme drives and assuming you have [covering] indexes and you optimize your queries you will probably not feel the difference
I am building an ERP (with a full accounting module), and I remember in the beginning, I was slapping SERIALIZABLE everywhere because I was scared of dirty/phantom reads, especially for things like stock operations 😀
Читать полностью…
Hello,
I have a PostgreSQL database, and I want to make it a distributed database. What is the best way to achieve this?
I will follow your advice. Thank you both very much!
Читать полностью…
Btw, often analytical data could be cut to unconnected pieces. Multi tenants, archives, etc...
Than it doesn't need anything distributed.
Also, maybe the postgres isn't the first choice for OLAP. Rather powerful in algorithms, but not that concentrated on repeatable aggregates grazing.
Maybe you'd try to look at kafka or clickhouse, to start with?
I know it’s difficult to handle this amount of data on a single node, that’s why I asked for a reliable and powerful way to distribute it.
Читать полностью…
Btw, just don't. Not for relational databases. It's 2025, even for a 50TB database — the cost of SSD and x8 penalty (x2 space reserve, x2 RAID, x2 standby/spare) is a few man-months. It just doesn't payback itself.
Читать полностью…
At pgday naples i had the chance to attend this very interesting talk about huge databases https://pgday.at/wp-content/uploads/2025/09/manydata.pdf
Читать полностью…
Fact is, attempts to use split hardware and add more parallelisation penalty is much more expensive than buying it in one piece.
Читать полностью…
Let me guess — it barely fits on a powerful smartwatch...
Читать полностью…
Fact is, hardware has became so cheap it's better to just add more than make your software complex
Читать полностью…
Currently, the database is about 100 GB, but it’s expected to grow to 10 TB.
Читать полностью…
I know there are some complexities and trade-offs, but I want to distribute the data because it’s large. When it’s not distributed, queries become slow — I have complex queries.
Читать полностью…
Switched to SELECT … FOR UPDATE , the default isolation level, and “everything happens in a transaction, and we commit just once or rollback everything” and I haven’t yet looked back
Читать полностью…
The "best way" is to not make it distributed. There are different extensions such as citus to shard your database, but you pay with having eventual consistency and/or your distributed system being as slow as the slowest node
Читать полностью…
In fact, a lot of financial projects don't need even a single SELECT FOR UPDATE even at RC.
Typical CRUD and account transfers just don't need it.
(Massive updates or batch workers sometimes do need or get to use it, though).