Replicate cloud AWS RDS MySQL to on-premise PostgreSQL in Docker – future is today! Debezium and Kafka on AWS EKS

Introduction

Nowadays applications have bigger requirements then older ones – many SaaS systems need to operate globally on all continents or in hybrid solutions, sharing some data between private and public clouds. That requirement is really complicated as many systems and users require real-time responses – there are expectations that modern IT systems are fast, efficient and provide zero waiting time. Looking on pure speed of light (as in ideal environment that’s speed of electrons running through cables) when some information need to pass 10 000 km it will take around 30ms and that’s limit coming from physics (best regards Einstein!). We all know that it will take much longer because of electronic parts delays, routers, protocols overhead, traffic congestion, etc. How to deal with that?

As currently IT systems are more about reads than writes (probably not in IoT but that’s different story) there is concept to replicate data from “source of truth” and store it near to client to provide following advantages:

  • data is near client so he can make fast reads
  • when “source of truth” database is down, reads from replica are still working
  • when connection with “source of truth” is broken, reads from replica are still working
  • load on “source of truth” can be reduced by passing all reads to replica

In this article I show you how to replicate changes happening in database located in AWS cloud to database located on-prem. We can achieve that using Debezium tool which use CDC concept (change data capture) to stream changes onto Kafka, acting as data producer. On the other side we can attach Kafka Connect tool which can consume that stream and write that to target database using JDBC. To make it more interesting and easier I deployed Kafka and Debezium on Amazon EKS – new Kubernetes service in AWS. See architecture diagram:

article-rds-debezium-postgres-kafka-kubernetes (2)

As always full source code can be found on my Github:

https://github.com/jakubbujny/article-replicate-cloud-AWS-RDS-MySQL-to-on-premise-PostgreSQL-in-Docker-future-is-today

Disclaimer – security

Configuration showed in that article is definitely far away from proper security standards required by production systems looking on e.g. lack of SSL/encryption, free public access to services, not limited network or weak passwords so please keep in mind that it’s more about POC less about implementation which you can copy/paste to your real/production system.

Configure AWS EKS using Terraform

There is perfect tutorial describing EKS deployment using Terraform: https://www.terraform.io/docs/providers/aws/guides/eks-getting-started.html As EKS availability is limited for now I chose Ireland region and modified a little configuration from tutorial.

EKS master configuration is very easy and require a few blocks:


resource "aws_iam_role" "eks-cluster" {
  name = "eks-cluster"

  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "AmazonEKSClusterPolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = "${aws_iam_role.eks-cluster.name}"
}

resource "aws_iam_role_policy_attachment" "AmazonEKSServicePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
  role       = "${aws_iam_role.eks-cluster.name}"
}

resource "aws_eks_cluster" "eks" {
  name            = "eks"
  role_arn        = "${aws_iam_role.eks-cluster.arn}"

  vpc_config {
    security_group_ids = ["${aws_security_group.eks-cluster.id}"]
    subnet_ids         = ["${aws_subnet.eks_a.id}","${aws_subnet.eks_b.id}"]
  }

  depends_on = [
    "aws_iam_role_policy_attachment.AmazonEKSClusterPolicy",
    "aws_iam_role_policy_attachment.AmazonEKSServicePolicy",
  ]
}

Important notes to lines:

21,26 – AWS delivered proper policies for us, we need only to attach them

30 – Create EKS cluster (masters) and place in defined network. That operation may take 10-20 minutes so be calm.

And that's almost all what is required to create EKS masters – the last thing which we need is kubectl config which is required to administrate Kubernetes. To obtain config we will use following Terraform output:

locals {
  kubeconfig = <<KUBECONFIG
apiVersion: v1
clusters:
- cluster:
    server: ${aws_eks_cluster.eks.endpoint}
    certificate-authority-data: ${aws_eks_cluster.eks.certificate_authority.0.data}
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: aws
  name: aws
current-context: aws
kind: Config
preferences: {}
users:
- name: aws
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      command: aws-iam-authenticator
      args:
        - "token"
        - "-i"
        - "eks"
KUBECONFIG
}

output "kubeconfig" {
  value = "${local.kubeconfig}"
}

There is one important thing there – aws-iam-authenticator command will be executed on your PC. It means that you need to go on https://github.com/kubernetes-sigs/aws-iam-authenticator/releases download cli and place it somewhere in your PATH e.g. in /usr/bin with name aws-iam-authenticator. After that execute following command to get your kubectl config:

 terraform output kubeconfig > kubeconfig.yaml

Adding nodes to EKS cluster is a little more complex.

Again attach policies delivered by AWS to role.

resource "aws_iam_role" "eks-node" {
  name = "eks-node"

  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "eks-node-AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = "${aws_iam_role.eks-node.name}"
}

resource "aws_iam_role_policy_attachment" "eks-node-AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = "${aws_iam_role.eks-node.name}"
}

resource "aws_iam_role_policy_attachment" "eks-node-AmazonEC2ContainerRegistryReadOnly" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = "${aws_iam_role.eks-node.name}"
}
resource "aws_iam_instance_profile" "eks-node" {
name = "eks-node"
role = "${aws_iam_role.eks-node.name}"
}

Next create Auto Scaling Group to get up Kubernetes node:

data "aws_ami" "eks-worker" {
  filter {
    name   = "name"
    values = ["amazon-eks-node-v*"]
  }

  most_recent = true
  owners      = ["602401143452"] # Amazon Account ID
}

locals {
  eks-node-userdata = <<USERDATA
#!/bin/bash
set -o xtrace
/etc/eks/bootstrap.sh --apiserver-endpoint '${aws_eks_cluster.eks.endpoint}' --b64-cluster-ca '${aws_eks_cluster.eks.certificate_authority.0.data}' 'eks'
USERDATA
}

resource "aws_launch_configuration" "node" {
  associate_public_ip_address = true
  iam_instance_profile        = "${aws_iam_instance_profile.eks-node.name}"
  image_id                    = "${data.aws_ami.eks-worker.id}"
  instance_type               = "t2.medium"
  name_prefix                 = "terraform-eks"
  security_groups             = ["${aws_security_group.eks-node.id}"]
  user_data_base64            = "${base64encode(local.eks-node-userdata)}"

  root_block_device {
    delete_on_termination = true
    volume_size = 20
    volume_type = "gp2"
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "node" {
  desired_capacity     = 1
  launch_configuration = "${aws_launch_configuration.node.id}"
  max_size             = 1
  min_size             = 1
  name                 = "eks"
  vpc_zone_identifier  = ["${aws_subnet.eks_a.id}"]

  tag {
    key                 = "Name"
    value               = "eks"
    propagate_at_launch = true
  }

  tag {
    key                 = "kubernetes.io/cluster/eks"
    value               = "owned"
    propagate_at_launch = true
  }
}

Important notes to lines:

1 – fetch AMI created by AWS which can be used as EKS node. AWS installed there proper scripts/configuration.

11 – command in user data for instance required by EKS to configure node and things like kubelet, systemd service etc.

23 – t2.medium should be enough to run Kafka (Zookeeper + Broker) and Debezium

30 – we need at least 20G disk as AMI has 20G size

39 – create ASG which will spawn one node for us

So as last step we need to apply following ConfigMap to allow nodes to join cluster, generated by Terraform:

locals {
  config-map-aws-auth = &lt;<span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span><span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>&lt;CONFIGMAPAWSAUTH

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: ${aws_iam_role.eks-node.arn}
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes
CONFIGMAPAWSAUTH
}

output &quot;config-map-aws-auth&quot; {
  value = &quot;${local.config-map-aws-auth}&quot;
}

And create service account with admin rights to use in Kubernetes dashboard:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: eks-admin
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: eks-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: eks-admin
  namespace: kube-system

To get admin token which can be used in Kubernetes Dashboard login we can use following command:

kubectl -n kube-system describe secret $(./kubectl.sh -n kube-system get secret | grep eks-admin | awk '{print $1}') | grep token

To get full source code of those configs please take a look on: https://github.com/jakubbujny/article-replicate-cloud-AWS-RDS-MySQL-to-on-premise-PostgreSQL-in-Docker-future-is-today/tree/master/aws

Configure AWS RDS using Terraform

We need to configure Amazon RDS as our “source of truth” – database which we will replicate to on-prem deployment. Debezium will read binlog from MySQL to stream changes on Kafka.

resource "aws_db_instance" "rds" {
  apply_immediately = true
  publicly_accessible = true
  skip_final_snapshot = true
  identifier = "mymysql"
  availability_zone = "${local.region}a"
  allocated_storage    = 5
  storage_type         = "gp2"
  engine               = "mysql"
  engine_version       = "5.7"
  instance_class       = "db.t2.micro"
  name                 = "mydb"
  username             = "mymysql"
  password             = "mysqlmysql"
  db_subnet_group_name = "${aws_db_subnet_group.rds.name}"
  vpc_security_group_ids = ["${aws_security_group.rds.id}"]
  backup_retention_period = 1
  parameter_group_name = "${aws_db_parameter_group.rds.name}"
}

resource "aws_db_parameter_group" "rds" {
  family = "mysql5.7"
  name = "rds"

  parameter {
    name = "binlog_format"
    value = "ROW"
  }
}

Important notes to lines:

3 – make RDS public to easily access DB from our PC – we need to insert some data

15 – we need to create DB subnet group which contains at least 2 subnets even when we create single-AZ database – that’s requirement from AWS coming from possibility of promoting single-AZ deployment to multi-AZ deployment

17 – that’s important – automatic backups enable possibility of streaming binlog which is required by Debezium

21 – we need to attach custom parameter group to stream binlog in ROW format required by Debezium

Deploy Kafka and Zookeeper as StatefulSet on Kubernetes

Kafka requires Zookeeper to work what is well known fact. We can deploy both things as 2 different StatefulSets with AWS EBS attached for data. As we want to keep that POC simple we will just use one Kubernetes node and expose Kafka on node-port to allow on-prem client to connect. For Kafka deployment we will use popular docker images from Wurstmeister.

To allow dynamic provisioning of EBS volumes by Kubernetes we need to define following StorageClass

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: ebs
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  zones: eu-west-1a
  fsType: ext4

In Zookeeper Dockerfile we can find following 2 volumes:

VOLUME ["/opt/zookeeper-${ZOOKEEPER_VERSION}/conf", "/opt/zookeeper-${ZOOKEEPER_VERSION}/data"]

So we need to attach EBS volumes to those 2 paths to keep our state safe.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: zookeeper
  labels:
    app: zookeeper
spec:
  selector:
    matchLabels:
      app: zookeeper
  serviceName: "zookeeper"
  replicas: 1
  template:
    metadata:
      labels:
        app: zookeeper
    spec:
      terminationGracePeriodSeconds: 30
      containers:
      - name: zookeeper
        image: wurstmeister/zookeeper:3.4.6
        ports:
        - containerPort: 2181
          name: zookeeper
        volumeMounts:
        - name: zookeeper-data-pv-claim
          mountPath: /opt/zookeeper-3.4.6/data
        - name: zookeeper-conf-pv-claim
          mountPath: /opt/zookeeper-3.4.6/conf
  volumeClaimTemplates:
    - metadata:
        name: zookeeper-data-pv-claim
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: ebs
        resources:
          requests:
            storage: 1Gi
    - metadata:
        name: zookeeper-conf-pv-claim
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: ebs
        resources:
          requests:
            storage: 1Gi

So by defining volumeClaims we say to Kubernetes that he should create dynamically  EBS volumes using AWS API, make ext4 file system and mount them in proper paths. All more what we need is service which will allow to access Zookeeper inside cluster.

apiVersion: v1
kind: Service
metadata:
  name: zookeeper
  labels:
    app: zookeeper
spec:
  ports:
  - port: 2181
    name: zookeeper
  clusterIP: None
  selector:
    app: zookeeper

clusterIP: None means that service is Headless service so Kubernetes will expose POD’s IP directly under service’s DNS record.

Kafka deployment is a little tricky because of “advertised listener” – it means that we need to put node’s public IP which will be presented to clients. We cannot easily use e.g. Kubernetes Downward API as status.hostIP will be private node’s IP, not accessible from on-prem infrastructure. So for keeping POC simple we will just fetch nodes’s public IP using query by tag name:eks via AWS CLI.

aws --region eu-west-1 ec2 describe-instances --filters "Name=tag:Name,Values=eks" --filters "Name=instance-state-name,Values=running" --query "Reservations[0].Instances[0].PublicIpAddress"

then inject it to proper environment variable during deployment by sed NODE_PUBLIC_IP in Kafka’s yaml file using following script:

kafka_file=$(cat kafka.yaml | sed -- expression="s/NODE_PUBLIC_IP/$public_node_ip/g")
echo "$kafka_file" | kubectl apply -f -

That simple trick read Kafka deployment file from disk, make sed in the fly and then apply it to Kubernetes cluster by sending changed file on kubectl STDIN.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kafka
spec:
  selector:
    matchLabels:
      app: kafka
  serviceName: "kafka"
  replicas: 1
  template:
    metadata:
      labels:
        app: kafka
    spec:
      terminationGracePeriodSeconds: 30
      containers:
      - name: kafka
        image: wurstmeister/kafka:2.11-2.0.0
        env:
        - name: KAFKA_ZOOKEEPER_CONNECT
          value: zookeeper:2181
        - name: KAFKA_ADVERTISED_LISTENERS
          value: PLAINTEXT://NODE_PUBLIC_IP:30002
        - name: KAFKA_LISTENERS
          value: PLAINTEXT://:9092
        ports:
        - containerPort: 9092
          name: kafka
        volumeMounts:
        - name: kafka-data-pv-claim
          mountPath: /kafka
  volumeClaimTemplates:
    - metadata:
        name: kafka-data-pv-claim
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: ebs
        resources:
          requests:
            storage: 1Gi

So again we attach EBS disk for data on Kafka and we connect Kafka to Zookeeper using service (zookeeper:2181). What we need next is Kafka service so Debezium can access it locally for bootstrap and we need to expose Kafka on NodePort.

apiVersion: v1
kind: Service
metadata:
  name: kafka
  labels:
    app: kafka
spec:
  ports:
  - port: 9092
    name: kafka
  clusterIP: None
  selector:
    app: kafka
---

apiVersion: v1
kind: Service
metadata:
  name: kafka-node
  labels:
    app: kafka-node
spec:
  ports:
  - port: 9092
    targetPort: 9092
    name: kafka
    nodePort: 30002
  type: NodePort
  selector:
    app: kafka

Please note that we expose Kafka on non-standard port 30002 due to default NodePort range 30000-32767

After those operations there should be working deployment of Kafka on Kubernetes wich can be tested by connecting from local PC on node_public_ip:30002

Deploy and configure Debezium on Kubernetes

Debezium is type of Kafka Connector what means that it stores whole state in Kafka so we can deploy it as Kubernetes Deployment because from Kubernetes point of view it’s stateless application.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: debezium-connect-source
spec:
  selector:
    matchLabels:
      app: debezium-connect-source
  replicas: 1
  template:
    metadata:
      labels:
        app: debezium-connect-source
    spec:
      terminationGracePeriodSeconds: 30
      containers:
      - name: debezium-connect-source
        image: debezium/connect:0.8
        env:
        - name: BOOTSTRAP_SERVERS
          value: kafka:9092
        - name: GROUP_ID
          value: debezium-connect-source
        - name: CONFIG_STORAGE_TOPIC
          value: debezium-connect-source_config
        - name: OFFSET_STORAGE_TOPIC
          value: debezium-connect-source_offset
        ports:
        - containerPort: 8083
          name: dm-c-source

There are 3 important env variables which are perfectly described on Debezium’s dockerhub site:

GROUP_ID
This environment variable is required when running the Kafka Connect service. Set this to an ID that uniquely identifies the Kafka Connect cluster the service and its workers belong to.

CONFIG_STORAGE_TOPIC
This environment variable is required when running the Kafka Connect service. Set this to the name of the Kafka topic where the Kafka Connect services in the group store connector configurations. The topic must have a single partition and be highly replicated (e.g., 3x or more).

OFFSET_STORAGE_TOPIC
This environment variable is required when running the Kafka Connect service. Set this to the name of the Kafka topic where the Kafka Connect services in the group store connector offsets. The topic must have a large number of partitions (e.g., 25 or 50), be highly replicated (e.g., 3x or more) and should be configured for compaction.

Last step on Kubernetes is to expose Debezium on NodePort so we can easily configure it from PC.

apiVersion: v1
kind: Service
metadata:
  name: debezium-connect-source
  labels:
    app: debezium-connect-source
spec:
  ports:
  - port: 8083
    targetPort: 8083
    name: debezium-connect-source
    nodePort: 30001
  type: NodePort
  selector:
    app: debezium-connect-source

Now is time to configure Debezium – it can be done using web API exposed by Debezium where we need to define connector with proper options. So first configuration json:

{
  "name": "my-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "tasks.max": "1",
    "database.hostname": "mymysql.clmru7r0arad.eu-west-1.rds.amazonaws.com",
    "database.port": "3306",
    "database.user": "mymysql",
    "database.password": "mysqlmysql",
    "database.server.id": "184054",
    "database.server.name": "mydebezium",
    "database.whitelist": "mydb",
    "database.history.kafka.bootstrap.servers": "kafka:9092",
    "database.history.kafka.topic": "schema-changes.mydb",
    "transforms": "route",
    "transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
    "transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
    "transforms.route.replacement": "$3"
  }
}

Important notes to lines:

6 – there should be RDS host created by Terraform

12 – here we define which database we want to observe

15:18 – here you can find deeper explanation about those lines: https://debezium.io/blog/2017/09/25/streaming-to-another-database/

Nice explanation about parameters: https://debezium.io/docs/connectors/mysql/#example-configuration

Save that json to source.json file and then you can execute following command:

curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json"  -d @source.json http://$public_node_ip:30001/connectors/

Where public_node_ip is Eks node. That curl will configure connector so stream of changes in database should start.

Load data

We need to put some data in AWS MySQL which can be streamed now to Kafka. To do that we can run following sql script:


 CREATE TABLE IF NOT EXISTS my_debezium_test (
  ID int NOT NULL,
  test varchar(255),
  PRIMARY KEY (ID)
);

INSERT INTO my_debezium_test VALUES(1, 'hello!');
INSERT INTO my_debezium_test VALUES(2, 'world!');
INSERT INTO my_debezium_test VALUES(3, 'it rox!!');

UPDATE my_debezium_test SET test = "my friend!" WHERE ID = 2;

DELETE FROM my_debezium_test WHERE ID = 1;

So we create my_debezium_test table with ID field as PK and then make some inserts, update and delete to observe what will happen. Debezium should capture those changes and stream them to Kafka in real time.

Tip: you can use docker to execute mysql client to run that SQL:

sql=$(cat rds.sql)
docker run -it  --rm mysql bash -c "echo '$sql' | mysql -h${rds_endpoint_without_port} -umymysql -pmysqlmysql mydb"

Start on-prem deployment

To make quick on-prem deployment we can use docker-compose where we need to start Kafka connector which will connect to Kafka in AWS and consume changes stream, pushing them to PostgreSQL.

version: '3'

volumes:
  postgres-data:

services:
  debezium:
    image: debezium/connect-jdbc:0.8
    build:
      context: .
    ports:
      - 8083:8083
    environment:
      - BOOTSTRAP_SERVERS=${PUBLIC_NODE_IP}:30002
      - GROUP_ID=debezium-connect-sink
      - CONFIG_STORAGE_TOPIC=debezium-connect-sink_config
      - OFFSET_STORAGE_TOPIC=debezium-connect-sink_offset

  postgres:
    image: postgres
    volumes:
      - postgres-data:/var/lib/postgresql/data

Important notes to lines:

4 and 22 – we define docker named volume to store PostgreSQL data in it

13 – again we need to put connection to Kafka using EKS node IP and also place configuration required by Kafka Connect interface

10 – that line says that we need to build custom connect, let’s take a look on that

FROM debezium/connect:0.8
ENV KAFKA_CONNECT_JDBC_DIR=$KAFKA_CONNECT_PLUGINS_DIR/kafka-connect-jdbc

# Deploy PostgreSQL JDBC Driver
RUN cd /kafka/libs && curl -sO https://jdbc.postgresql.org/download/postgresql-42.1.4.jar

# Deploy Kafka Connect JDBC
RUN mkdir $KAFKA_CONNECT_JDBC_DIR && cd $KAFKA_CONNECT_JDBC_DIR &&\
	curl -sO http://packages.confluent.io/maven/io/confluent/kafka-connect-jdbc/3.3.0/kafka-connect-jdbc-3.3.0.jar

So we need to extend debezium/connect image to add JDBC driver specific for PostgreSQL so our connector can consume changes from Kafka and push them to PostgreSQL.

So next step is to start those containers and configure JDBC sink using following JSON config:

 {
  "name": "jdbc-sink",
  "config": {
    "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
    "tasks.max": "1",
    "topics": "my_debezium_test",
    "connection.url": "jdbc:postgresql://postgres:5432/postgres?user=postgres&password=postgres",
    "transforms": "unwrap",
    "transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
    "auto.create": "true",
    "insert.mode": "upsert",
    "pk.fields": "ID",
    "pk.mode": "record_value"
  }
}

Important notes to lines:

6 – it’s name of table which will be consumed, created in AWS MySQL in previous step

8,9 – explanation here https://debezium.io/blog/2017/09/25/streaming-to-another-database/

12 – Primary Key field name, created in AWS MySQL in previous step

After that configuration captured changes should be pushed to on-prem PostgreSQL database.

Results

So our PostgreSQL should contain now automatically created my_debezium_test table but what about data? There is small surprise here. INSERT and UPDATE are perfectly replicated but there is problem with DELETE operation as it’s currently dropped by Jdbc sink connector. There is opened pull request for that on Github https://github.com/confluentinc/kafka-connect-jdbc/pull/282

I asked about this on my Twitter and I received response from Gunnar Morling that workaround for that problem is coming in Debezium 0.9.x version:

Edit:

Debezium guys decided to deliver that feature faster than expected! Nice!

 

So as conclusion I can say that solution is not perfect and in case of pure database replication there are some missing points but in my opinion it’s very big step forward as replicating database from MySQL to PostgreSQL in producer/consumer pattern via queue is something very innovative and amazing – that define new way to create reliable and HA global applications distributed across whole world.

2 thoughts on “Replicate cloud AWS RDS MySQL to on-premise PostgreSQL in Docker – future is today! Debezium and Kafka on AWS EKS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s