The Wonders of Postgres Logical Decoding Messages

88 points
1/20/1970
a year ago
by rbanffy

Comments


simonw

I remember hearing years ago that Facebook used a trick like this for cache invalidation: they would write custom "invalidate cache X" messages (maybe even as SQL comments?) into their MySQL replication streams that fanned out to multiple data centers, such that a data center would know to clear a cached record out of its local memcached cluster when the relevant MySQL rows were updated.

a year ago

xyzzy_plugh

That doesn't seem particularly novel. A replication stream by definition announces changes to data, simply listening to it gives the listener the full story.

Through a certain lens, a read replica is just a cache.

a year ago

oulu2006

It's quite a common pattern, we did the same thing years ago using RMQ to achieve eventually consistent cache invalidation.

a year ago

aynyc

Does it support some form of checkpointing without Flink? I would like to run debezium without Flink.

a year ago

jakehilborn

Yes. Postgres stores the checkpoint on the source database replication slot. You can configure Debezium Engine to send checkpoints to the file /dev/null to disable the default external system checkpoint behavior.

a year ago

aynyc

Any documentation on this? We ran an assortment of outbox pattern for data distribution, I don’t want to manage that in data producers anymore.

a year ago

gunnarmorling

Debezium relies on Kafka Connect's `OffsetBackingStore` contract [1] for persisting and restoring offset positions (that's subject to abstraction from Connect's SPIs at some point). While acknowledged LSNs are committed to the source DB in case of Postgres indeed, the offset store is still needed in most cases, e.g. for persisting other information, e.g. the fact whether or not an initial snapshot has happened yet.

> I don’t want to manage that in data producers anymore.

Could you elaborate a bit? The outbox pattern typically _is_ implemented in source applications, so I'm curious what alternative you have in mind? Stream processing e.g. with Flink actually can be an alternative, creating external data representations after the fact, though challenges arise e.g. in terms of transactional consistency.

[1] https://debezium.io/documentation/reference/stable/developme...

a year ago

aynyc

Our current many outbox patterns are implemented by producers using two phase commit. Rather than writing code for that, I want move the outbox to another component, but we don't want to use stream, we want to use a queue. Essentially, listen to WAL, any relevant data gets written to a queue for down stream consumers. Kafka/Kinesis is not in our stack right now, so I want to use standalone Debezium server on a AWS Fargate/ECS service and write to FIFO-SNS.

a year ago

gunnarmorling

Author here: yes, you can use Debezium without Flink, via Kafka Connect. Alternatively, if you're not using Kafka but Kinesis, Apache Pulsar, etc., via Debezium Server (Discussing the different options on this slide: https://speakerdeck.com/gunnarmorling/change-data-streaming-...).

a year ago

cryptonector

NOTIFY also can be used this way.

a year ago

semiquaver

Correct me if I’m wrong but even though NOTIFY is transactional, it isn’t persisted to WAL or anywhere else —- so any clients that were not LISTENing at the time NOTIFY was called cannot receive it retroactively. WAL is, well, a log which has nicer properties for many applications that allow clients to “fall behind” or recover from transient network disconnects without losing data.

a year ago

klysm

Notify is really hard to use correctly, and I think logical replication works better in most cases

a year ago

paulddraper

That's so cool.

a year ago