The Wonders of Postgres Logical Decoding Messages
Comments
simonw
xyzzy_plugh
That doesn't seem particularly novel. A replication stream by definition announces changes to data, simply listening to it gives the listener the full story.
Through a certain lens, a read replica is just a cache.
oulu2006
It's quite a common pattern, we did the same thing years ago using RMQ to achieve eventually consistent cache invalidation.
aynyc
Does it support some form of checkpointing without Flink? I would like to run debezium without Flink.
jakehilborn
Yes. Postgres stores the checkpoint on the source database replication slot. You can configure Debezium Engine to send checkpoints to the file /dev/null to disable the default external system checkpoint behavior.
aynyc
Any documentation on this? We ran an assortment of outbox pattern for data distribution, I don’t want to manage that in data producers anymore.
gunnarmorling
Debezium relies on Kafka Connect's `OffsetBackingStore` contract [1] for persisting and restoring offset positions (that's subject to abstraction from Connect's SPIs at some point). While acknowledged LSNs are committed to the source DB in case of Postgres indeed, the offset store is still needed in most cases, e.g. for persisting other information, e.g. the fact whether or not an initial snapshot has happened yet.
> I don’t want to manage that in data producers anymore.
Could you elaborate a bit? The outbox pattern typically _is_ implemented in source applications, so I'm curious what alternative you have in mind? Stream processing e.g. with Flink actually can be an alternative, creating external data representations after the fact, though challenges arise e.g. in terms of transactional consistency.
[1] https://debezium.io/documentation/reference/stable/developme...
aynyc
Our current many outbox patterns are implemented by producers using two phase commit. Rather than writing code for that, I want move the outbox to another component, but we don't want to use stream, we want to use a queue. Essentially, listen to WAL, any relevant data gets written to a queue for down stream consumers. Kafka/Kinesis is not in our stack right now, so I want to use standalone Debezium server on a AWS Fargate/ECS service and write to FIFO-SNS.
gunnarmorling
Author here: yes, you can use Debezium without Flink, via Kafka Connect. Alternatively, if you're not using Kafka but Kinesis, Apache Pulsar, etc., via Debezium Server (Discussing the different options on this slide: https://speakerdeck.com/gunnarmorling/change-data-streaming-...).
cryptonector
NOTIFY also can be used this way.
semiquaver
Correct me if I’m wrong but even though NOTIFY is transactional, it isn’t persisted to WAL or anywhere else —- so any clients that were not LISTENing at the time NOTIFY was called cannot receive it retroactively. WAL is, well, a log which has nicer properties for many applications that allow clients to “fall behind” or recover from transient network disconnects without losing data.
klysm
Notify is really hard to use correctly, and I think logical replication works better in most cases
paulddraper
That's so cool.
I remember hearing years ago that Facebook used a trick like this for cache invalidation: they would write custom "invalidate cache X" messages (maybe even as SQL comments?) into their MySQL replication streams that fanned out to multiple data centers, such that a data center would know to clear a cached record out of its local memcached cluster when the relevant MySQL rows were updated.