How to set up a highly available PostgreSQL cluster using Patroni and HAProxy?

Subscribe to my newsletter and never miss my upcoming articles

This post is originally published on my blog askvikram.com

Introduction

PostgreSQL is an opensource relational database that can run on major operating systems. It is highly robust and versatile, but doesn’t have features for the high availability. There are other open source frameworks available which can be used to manage high availability of the PostgreSQL Database.

In this tutorial, you’ll set up the PostgreSQL with high availability using Patroni, ETCD and HAProxy.

Patroni is a template for you to create your own customized, high-availability solution developed using Python. It is also capable of handling Database replication, backup and restoration configurations.

ETCD is a fault-tolerant, distributed key-value store that is used to store the state of the PostgreSQL cluster. It gracefully handles leader elections during network partitions and can tolerate machine failure, even in the leader node.

Whenever there is a change in the state of any PostgreSQL node in the cluster, Patroni updates the state change in the ETCD key-value store. With this information, ETCD elects the master node and keeps the cluster UP and running.

With this high availability solution which has more than one database instances in a single cluster, the master node of the cluster keeps changing in case of any failure. It becomes difficult to maintain the database end points in this situation. To solve this, HAProxy can be used. It keeps track of changes in the Master/Slave nodes and connects to the appropriate master node when the clients request connection.

When you’re finished, you will have a robust and highly available PostgreSQL cluster ready for production use.

Prerequisites

Before you begin this guide, you’ll need the following:

  • Create 5 EC2 instances in AWS by following the guide How to launch EC2 instance in EC2.
  • In all five nodes, Update the packages list which are upgrade-able using sudo apt update
  • In all five nodes, Upgrade the packages to the latest versions using sudo apt upgrade
  • Install PostgreSQL on three of your servers by following the Install PostgreSQL on Ubuntu 20.04. Just follow Step 1 of PostgreSQL installation tutorial. Other steps should be ignored. These three servers are referred as node-1, node-2, node-3 in this tutorial.
  • Ensure Python 3 is available on your three servers where PostgreSQL is installed by following the guide Install python.
  • Reserve two servers for installing etcd and HAProxy. We will refer to these servers as node-4 and node-5 respectively.

Step 1 — Configuring Your Node Firewalls

You need to configure security groups of your servers to restrict the Incoming and the Outgoing traffic to the droplets.

You can learn How to create a security groups, edit the security groups in this working with security groups guide.

Now, let’s see how to configure the security groups for your nodes.

Configuring Ports for Postgresql

Create a rule to allow inbound traffic from the nodes(node-1, node-2, node-3,node-5) via port 8008.

  • Patroni’s REST API uses port 8008 to provide the health checks and cluster messaging between the participating nodes.

Create a rule to allow inbound traffic from the nodes(node-1, node-2, node-3, node-5 ) via port 5432.

  • Postgresql listens to port 5432 for the database transactions and and Data Replication.

Apply the security group to the PostgreSQL nodes(node-1, node-2, node-3).

  • This allows PostgreSQL nodes to communicate between each other for creating replicas.

Apply the security group to the HAProxy node(node-5).

  • This allows HAProxy node to communicate with the active Master postgresql node in the cluster.

Configuring Ports for ETCD

Create a rule to allow inbound traffic from the nodes (node-1, node-2, node-3) via port 2379.

  • ETCD uses the port 2379 for the client requests. Clients communicate with ETCD using this port to read and write information from ETCD. For e.g. Database connection details, current leader node information, node status etc.

Apply the security group to the ETCD node(node-4).

  • This allows ETCD node to communicate with the Postgres client node in the cluster.

Configuring Ports for HAProxy

Create a rule to allow inbound traffic from All IPV4 and IPV6 address via port 7000.

  • HAproxy uses the port 7000 to serve the client requests for the database transactions. If you already know your client IP Address, then you can use only that IP address in the source. This will ensure only the traffic from your client is served by HAProxy and other requests are dropped.

Apply the security group to the HAProxy node(node-5).

  • This allows HAproxy node to serve the requests for the database transaction.

The firewall configuration is complete for the Highly available postresql cluster. You now more on and start setting up the cluster.

Step 2 — Stopping Postgres Service

When PostgreSQL in installed, it automatically starts as a system service. You should stop this PostgreSQL service so that Patroni can take care of running the PostgreSQL Service.

The systemctl command is used to manage the systemd services. Use stop with systemctl to stop the system service.

Execute the following command to stop the PostgreSQL service.

sudo systemctl stop postgresql

Note: You should execute this command in all three droplets(node-1, node-2, node-3)where PostgreSQL is installed.

Now the PostgreSQL service is stopped in all nodes(node-1, node-2, node-3).

You can install Patroni, so that it takes charge of running PostgreSQL Service as required.

Step 3 — Installing Patroni

In this step, you’ll install Patroni with PIP3 the python package manager.

During Installation, Patroni uses utilities that come installed with postgres, located in the /usr/lib/postgresql/12/bin directory by default.

Hence you need to create symbolic links in the PATH to ensure that Patroni can find the utilities during the installation. ln command with -s option is used to create symbolic links.

Execute the below command to create the symbolic link.

sudo ln -s /usr/lib/postgresql/12/bin/* /usr/sbin/

Make sure you replace postgresql version according to the version installed.

The pip3 install command is used to install additional Python packages.

Execute the following command to install Patroni along with its dependencies psycopg2-binary and python-etcd.

sudo pip3 install patroni psycopg2-binary python-etcd

Note: You should execute this command in all three droplets(node-1, node-2, node-3) where PostgreSQL is installed, so that the PostgreSQL configuration can be handled using Patroni.

Now, you can install the etcd to handle the distributed cluster.

Step 4 — Installing Etcd

In this step, you will install ETCD using the apt-get install command.

The apt install command is used to install packages with all the necessary dependencies.

Execute the following command to install etcd in the node-4 reserved for etcd.

sudo apt install etcd

etcd is installed successfully. Now, you can install HAProxy that provides high availability.

Step 5 — Installing HAProxy

HAProxy is free, open source software that provides a high availability load balancer and proxy server for TCP and HTTP-based applications that spreads requests across multiple servers.

The apt install command is used to install packages with all the necessary dependencies.

Execute the following command to install HAProxy in the node-5 reserved for HAProxy:

sudo apt install haproxy

HAProxy is installed successfully.

Now all the installations are complete in the relevant servers and you’ll start the configuration of etcd, Patroni and HAProxy.

Step 6 — Configuring Etcd

etcd is a fault-tolerant, distributed key-value store that is used to store the state of the postgres cluster. It gracefully handles leader elections in the cluster and can tolerate machine failure, even in the leader node.

You’ve installed etcd in the step3.

Now, you will configure etcd to handle the leader elections in the highly available cluster and store the state of the postgres cluster.

Using Patroni, all of the postgres nodes makes use of etcd to keep the postgres cluster up and running.

During the installation of the etcd, a default etcd configuration file is created in the location /etc/default/etcd.

nano tool is used edit the file. Use sudo nano to open the file in the edit mode. If you do not use sudo, nano will open the file in the read only mode.

Execute the below command to open and update the configuration file.

sudo nano /etc/default/etcd

nano opens the file.

Now the etcd default configuration file is opened where all the parameters are commented. Look for the each of the below parameters, uncomment it and update the settings with the relevant etcd droplet IP address as give below.

/etcd/default/etcd

ETCD_LISTEN_PEER_URLS="http://node-4-server-ip:2380"
ETCD_LISTEN_CLIENT_URLS="http://localhost:2379,http://node-4-server-ip:2379"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://node-4-server-ip:2380"
ETCD_INITIAL_CLUSTER="default=http://node-4-server-ip:2380,"
ETCD_ADVERTISE_CLIENT_URLS="http://node-4-server-ip:2379"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"  

Save the changes to the file and exit the nano editor.

You can learn more about etcd configuration parameters in the official etcd page. Here’s what each line in this file is for:

  • ETCD LISTEN PEER URLS - This flag informs the etcd to accept incoming requests from its peers on the specified scheme://IP:port combinations.
  • ETCD LISTEN CLIENT URLS - This flag informs the etcd to accept incoming requests from the clients on the specified scheme://IP:port combinations.
  • ETCD INITIAL ADVERTISE PEER URLS - This flag specifies the list of this member’s peer URLs to advertise to the rest of the cluster. These addresses are used for communicating etcd data around the cluster.
    At least one must be routable to all cluster members. These URLs can contain domain names as well.
  • ETCD INITIAL CLUSTER - This is the initial cluster configuration for bootstrapping. The key is the value of the --name flag for each node provided.
  • ETCD ADVERTISE CLIENT URLS - This flag specifies the list of this member’s client URLs to advertise to the rest of the cluster. These URLs can contain domain names as well.
  • ETCD INITIAL CLUSTER TOKEN - This flag specifies the Initial cluster token for the etcd cluster during bootstrap. Default token name is “etcd-cluster”.
  • ETCD INITIAL CLUSTER STATE - This flag is used to denote the Initial cluster state (“new” or “existing”). Set to new for all members present during initial static or DNS bootstrapping.

Now, you need to restart etcd for the configuration changes to be effective.

The systemctl command is used to manage the systemd services. Use restart with systemctl to restart the system service.

Execute the following command to restart the etcd service.

sudo systemctl restart etcd

Now etcd is running with the updated configurations.

Use enable with systemctl to enable automatic start of etcd service after every system reboot.

Execute the following command to enable automatic start of etcd service after system reboot.

sudo systemctl enable etcd

You can use status with the systemctl to check the status of the system service.

Execute the following command to check the status of the etcd service.

sudo systemctl status etcd

You will see the below messages if the etcd configuration is successful.

etcd.service - etcd - highly-available key value store
     Loaded: loaded (/lib/systemd/system/etcd.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2020-08-27 14:03:57 UTC; 11h ago
       Docs: https://github.com/coreos/etcd
             man:etcd
   Main PID: 14786 (etcd)
      Tasks: 9 (limit: 1137)
     Memory: 61.2M
     CGroup: /system.slice/etcd.service
             └─14786 /usr/bin/etcd
Aug 27 14:03:57 do-04 etcd[14786]: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 3
Aug 27 14:03:57 do-04 etcd[14786]: 8e9e05c52164694d became leader at term 3
Aug 27 14:03:57 do-04 etcd[14786]: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 3
Aug 27 14:03:57 do-04 etcd[14786]: published {Name:do-04 ClientURLs:[http://157.245.111.222:2379]} to cluster cdf818194e3a8c32
Aug 27 14:03:57 do-04 systemd[1]: Started etcd - highly-available key value store.
Aug 27 14:03:57 do-04 etcd[14786]: ready to serve client requests
Aug 27 16:59:14 do-04 etcd[14786]: sync duration of 1.165829808s, expected less than 1s

You’ve configured ETCD to maintain the state of the PostgreSQL nodes. Next, you’ll configure Patroni which can update the state of the PostgreSQL nodes to the ETCD key-value store.

Step 7 — Configuring Patroni

Patroni is a Python package used to handle PostgreSQL configuration in the high availability clusters. You’ve already installed Patroni in the Step 3 in node-1, node-2, node-3.

Now, you will configure Patroni using a YAML file in the /etc/patroni/ directory to handle the PostgreSQL service in node-1. A default YAML file is available in the offical Patroni GitHub URL.

Create a directory patroni inside the /etc/ folder using the mkdir command as below.

sudo mkdir /etc/patroni

You should navigate to the /etc/patroni/ directory to copy the YAML file to that location. cd command can be used to navigate to the specified directory.

Execute the following command to navigate to the /etc/patroni/ directory.

cd /etc/patroni/

Now, your current working directory is /etc/patroni/.

Next, you need to copy the raw default YAML file from GitHub to the /etc/patroni/ directory.

curl tool is used to copy data from a server to another server.

Execute the below command to copy the YAML file from GitHub to your server.

sudo curl -o config.yml raw.githubusercontent.com/zalando/patroni/m..

The -o option in the curl command copies the file with the filename specified in the command. Here it creates a file named config.yml.

Now, you need to update the config.yml file with the right configuration.

Execute the below command to open and update the configuration file.

sudo nano config.yml

Update the highlighted parameters in the config.yml.

/etc/patroni/config.yml

scope: postgres
namespace: /db/
name: node1

restapi: listen: node-1-server-ip:8008 connect_address: node-1-server-ip:8008

Certfile: /etc/ssl/certs/ssl-Cert-snakeoil.pem

Keyfile: /etc/ssl/private/ssl-Cert-snakeoil.key

Authentication:

Username: Username

Password: Password

Ctl:

Insecure: False # Allow Connections to SSL Sites Without Certs

Certfile: /etc/ssl/certs/ssl-Cert-snakeoil.pem

Cacert: /etc/ssl/certs/ssl-Cacert-snakeoil.pem

etcd:

#Provide host to do the initial discovery of the cluster topology: host: node-4-server-ip:2379

#Or use "hosts" to provide multiple endpoints

#Could be a comma separated string:

#hosts: host1:port1,host2:port2

#or an actual yaml list:

#hosts:

#- host1:port1

#- host2:port2

#Once discovery is complete Patroni will use the list of advertised clientURLs

#It is possible to change this behavior through by setting:

#use_proxies: true

#Raft:

data_dir: .

self_addr: 127.0.0.1:2222

partner_addrs:

- 127.0.0.1:2223

- 127.0.0.1:2224

bootstrap:

this section will be written into Etcd:///config after initializing new cluster

and all other cluster members will use it as a global configuration

dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576

master_start_timeout: 300

synchronous_mode: False

#standby_cluster:
  #host: 127.0.0.1
  #port: 1111
  #primary_slot_name: patroni
postgresql:
  use_pg_rewind: true

use_slots: True

  parameters:

wal_level: hot_standby

hot_standby: "on"

wal_keep_segments: 8

max_wal_senders: 10

max_replication_slots: 10

wal_log_hints: "on"

archive_mode: "on"

archive_timeout: 1800s

archive_command: Mkdir -P ../wal_archive && Test ! -F ../wal_archive/%F && Cp %P ../wal_archive/%F

recovery_conf:

restore_command: Cp ../wal_archive/%F %P

some desired options for 'initdb'

initdb: # Note: It needs to be a list (some options need values, others are switches)

  • encoding: UTF8
  • data-checksums

    pg_hba: # Add following lines to pg_hba.conf after running 'initdb'

    For kerberos gss based connectivity (discard @.*$)

    #- host replication replicator 127.0.0.1/32 gss include_realm=0 #- host all all 0.0.0.0/0 gss include_realm=0

  • host replication replicator 127.0.0.1/32 md5
  • host replication replicator node-1-server-ip/0 md5
  • host replication replicator node-2-server-ip/0 md5
  • host replication replicator node-3-server-ip/0 md5
  • host all all 0.0.0.0/0 md5

    - Hostssl All All 0.0.0.0/0 Md5

    Additional script to be launched after initial cluster creation (will be passed the connection URL as parameter)

    post_init: /usr/local/bin/setup_cluster.sh

    Some additional users users which needs to be created after initializing new cluster

    users: admin: password: admin options:

    - createrole
    - createdb
    

postgresql: listen: node-1-server-ip:5432 connect_address: node-1-server-ip:5432 data_dir: /data/patroni

bin_dir:

config_dir:

pgpass: /tmp/pgpass0 authentication: replication: username: replicator password: rep-pass superuser: username: postgres password: zalando rewind: # Has no effect on postgres 10 and lower username: rewind_user password: rewind_password

Server side kerberos spn

Krbsrvname: Postgres

parameters:

# Fully qualified kerberos ticket file for the running user
# same as KRB5CCNAME used by the GSS

krb_server_keyfile: /var/spool/keytabs/postgres

unix_socket_directories: '.'

#Watchdog:

Mode: Automatic # Allowed Values: Off, Automatic, Required

Device: /dev/watchdog

safety_margin: 5

tags: nofailover: false noloadbalance: false clonefrom: false nosync: false

You can learn more about Patroni configuration parameters in the official patroni docs page. Here’s what each line in this file is for in the different sections of the file.

Global section:

  • scope - Name for the highly available postgres cluster
  • namespace -path within the configuration store where Patroni will keep information about the cluster. Default value: “/service”.
  • name - the name of the host. Must be unique for the cluster. for e.g. (node-1 in the first server, node-2 in the second server and so on).

RestAPI section:

  • listen - IP address (or hostname) and port that Patroni will listen to for the REST API - to provide also the same health checks and cluster messaging between the participating nodes.
  • connect_address - IP address (or hostname) and port, to access the Patroni’s REST API.

etcd section:

  • host - information for the etcd endpoint for the clusters.

Bootstrap configuration section:

  • pg_hba - list of lines that you should add to pg_hba.conf. Client authentication is controlled by this pg_hba.conf file. You can read more about this file in the official postgresql page.
    You will add the lines of the three nodes which can be used for authentication during the replication between the servers.

postgresql section:

  • listen - IP address + port that Postgres listens to. It must be accessible from other nodes in the cluster.
  • connect_address - IP address + port through which Postgres is accessible from other nodes and applications.
  • data_dir - The location of the Postgres data directory, either existing or to be initialized by Patroni.
  • authentication - Authentication section has the username and passwords specified for the superuser, replication user.
    SuperUser is a user which has no access restrictions in the Postgresql database. Superuser is used by Patroni to connect to the postgresql database.
    Replication user is a user which has permissions to perform the replication between the master and slave nodes and used by replicas to access master via streaming replication.
    You can specify the passwords for the superuser and replication user in the password properties of the users respectively. Use the strong passwords for the super user and the replication user as highlighted in the sample configuration file available.

Now, save the changes to the file and exit the nano editor.

Next, you need to configure the data directory.

In the Patroni configuration file, data directory is specified as /data/patroni. The superuser(postgres) mentioned in the configuration file, should be able to write into the data directory.

Create a directory patroni inside the /data/ folder using the mkdir command as below.

sudo mkdir -p /data/patroni

Now, make the superuser postgres the owner of /data/patroni.

sudo chown postgres:postgres /data/patroni

In order to prevent anybody else from changing this directory, change the permissions on this directory to make it accessible only to the postgres user(owner of the directory).

sudo chmod 700 /data/patroni

Next, you need to create a systemd script that will allow us to start, stop and monitor Patroni. You can learn more about the systemd unit and unit file in this tutorial.

You need to create a file /etc/systemd/system/patroni.service using the nano command.

sudo nano /etc/systemd/system/patroni.service

Update the file with the below contents.

/etc/systemd/system/patroni.service

[Unit]
Description=High availability PostgreSQL Cluster
After=syslog.target network.target

[Service] Type=simple User=postgres Group=postgres ExecStart=/usr/local/bin/patroni /etc/patroni/config.yml KillMode=process TimeoutSec=30 Restart=no

[Install] WantedBy=multi-user.target

Save the changes to the file and exit the nano editor.

Now, you need to start Patroni for handling the postgres database.

The systemctl command is used to manage the systemd services.

Use start with systemctl to start the system service.

Execute the following command to start Patroni.

sudo systemctl start patroni

Now Patroni is running and its handling the postgresql database.

Use enable with systemctl to enable automatic start of Patroni service after every system reboot.

Execute the following command to enable automatic start of Patroni service after system reboot.

sudo systemctl enable patroni

You can use status with the systemctl to check the status of the system service.

Execute the following command to check the status of the Patroni service.

sudo systemctl status patroni

You will see the below messages if the Patroni configuration is successful.

● patroni.service - High availability PostgreSQL Cluster
     Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
     Active: active (running) since Fri 2020-08-28 00:41:53 UTC; 1 day 3h ago
   Main PID: 3045 (patroni)
      Tasks: 14 (limit: 1137)
     Memory: 76.4M
     CGroup: /system.slice/patroni.service
             ├─3045 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni/config.yml
             ├─3071 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --listen_addresses=128.199.30.42 --port=5432 --cluster_name=postgres --wal_level=replica --hot_standby=on --max_connections=100 --max_wal_senders=>
             ├─3076 postgres: postgres: checkpointer
             ├─3077 postgres: postgres: background writer
             ├─3078 postgres: postgres: stats collector
             ├─3083 postgres: postgres: postgres postgres 128.199.30.42(51028) idle
             ├─3183 postgres: postgres: walwriter
             ├─3184 postgres: postgres: autovacuum launcher
             ├─3185 postgres: postgres: logical replication launcher
             └─3191 postgres: postgres: walsender replicator 128.199.30.45(58144) streaming 0/5000550

Aug 29 04:19:41 do-01 patroni[3045]: 2020-08-29 04:19:41,453 INFO: Lock owner: node1; I am node1 Aug 29 04:19:41 do-01 patroni[3045]: 2020-08-29 04:19:41,458 INFO: no action. i am the leader with the lock Aug 29 04:19:51 do-01 patroni[3045]: 2020-08-29 04:19:51,453 INFO: Lock owner: node1; I am node1 Aug 29 04:19:51 do-01 patroni[3045]: 2020-08-29 04:19:51,458 INFO: no action. i am the leader with the lock Aug 29 04:20:01 do-01 patroni[3045]: 2020-08-29 04:20:01,453 INFO: Lock owner: node1; I am node1 Aug 29 04:20:01 do-01 patroni[3045]: 2020-08-29 04:20:01,459 INFO: no action. i am the leader with the lock Aug 29 04:20:11 do-01 patroni[3045]: 2020-08-29 04:20:11,453 INFO: Lock owner: node1; I am node1 Aug 29 04:20:11 do-01 patroni[3045]: 2020-08-29 04:20:11,458 INFO: no action. i am the leader with the lock Aug 29 04:20:21 do-01 patroni[3045]: 2020-08-29 04:20:21,453 INFO: Lock owner: node1; I am node1 Aug 29 04:20:21 do-01 patroni[3045]: 2020-08-29 04:20:21,460 INFO: no action. i am the leader with the lock

Configuring Patroni in the first Droplet is complete.

You need to follow the same steps in other two droplets(node-2 and node-3) where PostgreSQL is installed.

The specific sections to be updated are

  • Node name
  • IP Address of the respective node in the restapi section and the PostgreSQL section.

Once you complete the Patroni configuration in the node-2 and node-3, these nodes will join in the cluster and follow the leader node-1.

You can use status with the systemctl to check the status of the Patroni in node-2.

Execute the following command to check the status of the Patroni service.

sudo systemctl status patroni

You will see the below messages if the node-2 is joined in the cluster successfully.

● patroni.service - High availability PostgreSQL Cluster
     Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
     Active: active (running) since Thu 2020-08-27 14:05:32 UTC; 1 day 14h ago
   Main PID: 983 (patroni)
      Tasks: 12 (limit: 1137)
     Memory: 101.8M
     CGroup: /system.slice/patroni.service
             ├─ 983 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni/config.yml
             ├─3617 postgres -D /data/patroni --config-file=/data/patroni/postgresql.conf --listen_addresses=128.199.30.45 --port=5432 --cluster_name=postgres --wal_level=replica --hot_standby=on --max_connections=100 --max_wal_senders=>
             ├─3619 postgres: postgres: startup   recovering 000000030000000000000005
             ├─3620 postgres: postgres: checkpointer
             ├─3621 postgres: postgres: background writer
             ├─3622 postgres: postgres: stats collector
             ├─3623 postgres: postgres: walreceiver   streaming 0/5000550
             └─3627 postgres: postgres: postgres postgres 128.199.30.45(57556) idle

Aug 29 04:23:11 do-02 patroni[983]: 2020-08-29 04:23:11,437 INFO: no action. i am a secondary and i am following a leader Aug 29 04:23:21 do-02 patroni[983]: 2020-08-29 04:23:21,432 INFO: Lock owner: node1; I am node2 Aug 29 04:23:21 do-02 patroni[983]: 2020-08-29 04:23:21,433 INFO: does not have lock Aug 29 04:23:21 do-02 patroni[983]: 2020-08-29 04:23:21,437 INFO: no action. i am a secondary and i am following a leader Aug 29 04:23:31 do-02 patroni[983]: 2020-08-29 04:23:31,433 INFO: Lock owner: node1; I am node2 Aug 29 04:23:31 do-02 patroni[983]: 2020-08-29 04:23:31,433 INFO: does not have lock Aug 29 04:23:31 do-02 patroni[983]: 2020-08-29 04:23:31,439 INFO: no action. i am a secondary and i am following a leader Aug 29 04:23:41 do-02 patroni[983]: 2020-08-29 04:23:41,434 INFO: Lock owner: node1; I am node2 Aug 29 04:23:41 do-02 patroni[983]: 2020-08-29 04:23:41,434 INFO: does not have lock Aug 29 04:23:41 do-02 patroni[983]: 2020-08-29 04:23:41,439 INFO: no action. i am a secondary and i am following a leader

Postgresql cluster is up and running.

Now, you’ll configure HAProxy which can be used to connect to Master Postgresql node.

Step 8 — Configuring HAProxy

HAProxy is free, open source software that provides a high availability load balancer and proxy server for TCP and HTTP-based applications that spreads requests across multiple servers…

You can use HAProxy to connect to the master node in the configured Postgresql cluster. All Postgres clients (your applications, psql, etc.) will connect to HAProxy which will make sure you connect to the master in the configured postgres cluster. You’ve installed it in node-5 in the step4 of the turorial.

During the installation of the HAProxy, a default haproxy.cfg file is created in the location /etc/haproxy/haproxy.cfg.

Execute the below command to open and update the configuration file.

sudo nano /etc/haproxy/haproxy.cfg

Update the file with the following content.

/etc/haproxy/haproxy.cfg

  global
    maxconn 100

defaults log global mode tcp retries 2 timeout client 30m timeout connect 4s timeout server 30m timeout check 5s

listen stats mode http bind *:7000 stats enable stats uri /

listen postgres bind *:5000 option httpchk http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server postgresql_node-1-server-ip5432 node-1-server-ip:5432 maxconn 100 check port 8008 server postgresqlnode-2-server-ip5432 node-2-server-ip:5432 maxconn 100 check port 8008 server postgresqlnode-3-server-ip_5432 node-3-server-ip:5432 maxconn 100 check port 8008

In the listen postgres section, you’ll update the details of the Postgres servers IPs which is used by HAProxy to connect to Postgres master server. You can learn more about HAProxy configuration parameters in the official HAProxy docs page.

Save the changes to the file and exit the nano editor.

Now, you need to restart HAProxy for handling the high availability with the updated settings.

The systemctl command is used to manage the systemd services. Use restart with systemctl to start the system service.

Execute the following command to restart the HAProxy.

sudo systemctl restart haproxy

Use enable with systemctl to enable automatic start of HAProxy service after every system reboot.

Execute the following command to enable automatic start of HAProxy service after system reboot.

sudo systemctl enable haproxy

Now HAProxy is running and its handling the postgresql database instance with High availability.

Now, you can test the highly available postgresql cluster.

Step 9 — Testing the Setup

You can test the highly available cluster using the HAProxy dashboard.

In your preferred web browser, enter the node-5-IP-address:7000 to open the HAProxy dashboard which looks like the below image.

HAproxy dashboard page

In the postgres section, the green highlighted line denotes the postgres node which is currently acting as the master. Now, shutdown the node-1 or stop the Patroni service in the node-1, you’ll be able to see another node from the cluster becomes the master. Refresh the HAProxy dashboard page and you’ll see the node-2 which has become the master as shown below.

HAproxy dashboard page2

When you restart the node-1 or start the Patroni service, then node-1 will join the cluster, sync and follow the leader node-2.

You now have the highly available postgres up and running.

Your applications and postgres client can use the public-IP-address of the node-5 where HAProxy is installed with port 5000 to connect to this highly available cluster.

Conclusion

In this article, you have set up a robust and highly available PostgreSQL cluster.

Now you can improve it further by

  • adding larger etcd cluster to improve availability
  • adding HAProxy server and configure IP failover to create a highly available HAProxy cluster

How to Install Postgresql Ubuntu?

You Can Install Postgresql in Ubuntu by Using the Command: Sudo Apt Install Postgresql Postgresql-contrib

What Is High Availability in PostgreSQL?

High Availabilty in Postgresql Is a Setup Where the Postgresql Is Setup in More Than One Machines and Each Machine Acts as a Node in the Cluster. if the Leader Fails, Other Slave Nodes Available Are Automatically Selected as Leader. Hence, You Can Build Systems Without Any Database Interruption.

Does PostgreSQL Support Clustering?

No, PostgreSQL Doesn't Support Clustering Natively. However, It Can Be Clustered Using Etcd and Patroni.

Error Patroni Failed to Get List of Machines from

This Error Occurs When the Patroni Is Not Able to Communicate With Your ETCD Machine. Ensure the Ports Are Correctly configured(Explained in the Step1 of This Tutorial.)

system ID Mismatch, Node Belongs to a Different Cluster - After Reboot

Ensure name, namespace and scope is correctly configured in the patroni configuration file(Explained in step-7 of this tutorial).

Follow me on twitter if you like my writing.

No Comments Yet