January 16, 2016

AWS – Amazon Relational Database Services (RDS)

Amazon RDS is a service that makes it easy to set up, operate, and scale a relational database. Amazon RDS supports the following database engines:


You can find all my latest posts on medium.
  • Amazon Aurora – this is a fork of mysql. this means that if you are familiar with mysql, then this works exactly the same way. It is 5 times faster then mysql. This isn’t free, but it is about a 10% of the price of other commercial alternatives such as oracle db.
  • Oracle
  • Microsoft SQL Server
  • PostgreSQL
  • MySQL
  • MariaDB.


Fully Managed service

With  Amazon RDS, you get a “Fully Managed” service. A fully managed service is a service where:

  • Your DB software of choice is preinstalled and configured for you.
  • You are not allowed to access the operating system (e.g. via ssh) that the database is running on. Instead AWS will maintain the OS for you, e.g. installing OS security patches.
  • RDS will automatically take regular db backups for you – These are point in time snapshots. However you can also take manual backups via your db’s cli. These automated backups are also incremental backups
  • RDS will automatically upgrade your db software for you, you can choose to enable either automatic minor, major, or all upgrades
  • The underlying instance that is provideing the service will automatically scale up/down, so that you don’t have to worry about it.
  • It has built-in fault tolerance
  • It has built-in high-availability
  • High scalability – e.g. can add extra cpu cores, ram, or diskspace while db is running.
  • Automatic recovery if there is a failover
  • You can enable  “multi-availability-zone deployments”  with a single click
  • You can make use of read replica. This makes a copy of the main db, that is kept in sync with the main db (but not in real-time), and it’s main purpose is to respond to read related queries. This eases the burden on the main db. Read replica is available for:
    • mysql
    • postgres
    • aurora



Multi-Availability-Zone deployments

Multi-AZ is a Disaster recovery feature. I.e. if a DB fails, then there is another DB in standby mode that can aws will automatically fail over to. Although there are side benefits of Multi-Az too, e.g. perform db upgrades with zero downtime.

In a RDS Multi-AZ deployment setup  you have 2 databases, the first db is a the active db that acts as your application’s main db. The second db is in a different availability zone, and acts as a passive db, aka standby db. The passive db is kept in sync with the active db in real-time, which is made possible because each db (sql) transtaction is applied to both active and passive db simultanuously. This setup provides the following benefits

  • You can change your disk types with no downtime
  • you can change your rds’s instance type with no downtime
  • This feature is really important for building highly available, fault tolerant systems – bacause the passive db acts as a failover db, which becomes active if the main db fails for any reason.
  • We can initiate a manual failover if needed for any reason
  • backups are taken agains the passive db, so that active db’s performance is not affected and there is no downtime.
  • Do OS updates, or db software updates with no downtime – this happens in the following setups
    • 1. failover to passive db
    • 2. Take primary db offline, perform os/db-software updates to primary instance
    • 3. replicate passive db’s data back to primary db
    • 4. switch back to primary db
    • 5. Take passive db offline, and  perform os/db-software updates to primary instance
    • 6. Get passive db to sync up with primary db again

Here are a few thinkgs to remember about this setup:

  • The primary db and it’s passive db(s) are in the same region (to minimise latency), but in different availability zones (to maximize fault-tolerance/high-availability).
  • You don’t have to worry about managing your passive dbs, aws will automatically manage all this for you behind the scenes.
  • If there is a main db failure, then aws straight away updates  the db’s “cname” to point to the passive db’s endpoint. Hence this appears seamless and downtime for the end user.


RDS Underlying instance

When creating an rds:

  • you have a large choice of intance types to choose from
  • choose a disk size ranging from 5GB to 3TB
  • Can choose disk type, (burstable SSD) or Provisioned IOPS (EBS)


RDS backups

There are 2 types of RDS backups:

  • Automated backups
  • Database Snapshots

Whether you are using RDS multi-AZ deployment or not, db’s backup are still made.

In both cases, When you restore (aka create) a db from an existing backup, then the new db ends up with a new endpoint.


Automated Backups

These are performed daily by AWS automatically. You can specify a retention period for this time of backup, which can range from 1 day to 35 days. If you set the retention period to 35 days, then it will let you ‘rewind’ your db to any point in time within the last 35 days. The default retention period is 7 days.   These backups are full daily backups, and transaction logs are also captured throughout the day.

Some key facts:

  • All the backups gets deleted when you delete the database itself
  • All backups are stored in S3 for free
  • When doing a point-in-time restore, you don’t select snapshot to restore with, instead you choose the db itselt, and select point-in-time restore under actions. RDS will then choose the appropriate snapshot along with transaction logs to restore from.
  • For backups to occur, it’s recommended that our db uses a transactional db engine, e.g. for mysql, use InnoDB. This type of db engine, essentially logs every sql crud queries into log files. This means that are you have restored a snapshot, you can then rewind to a particular transaction (and consequently point-in-time) within the snapshot itself.
  • Backup are taken during a particular window – e.g. 3am in the morning. You can choose when this window is.
  • Backups are performed without stopping the DB, however sql queries are paused to allow the backup to take place. This can cause a slightly longer latency while the backup is occuring.


Database Snapshots

These are backups created manually. They exist even after the db is deleted. Hence you have to manually delete them if you no longer need them.





Read Replicas

Unlike Multi-AZ, read replicas are used for scaling.

Read replicas are secondary db’s that supports the main db (in the sense that they can be used to respond to read only sql queries). each sql query gets applied to the main db first, and the resulting changes are then  applied to the  replica db. This means  it is asynchronous, rather than synchronous (i.e. simultaneous). Read replica’s main purpose is to reduce the load on the primary db instance

Here are the key points about read replicas:

  • Only mysql, postgres, aurora, and mariadb offer read replica setup
  • read replicas can be created from other read replicas, although this will cause even more latency. Hence, it’s best to create read replicas direct from a db.
  • each main db can have multiple read replicas, which can be in different AZs. This makes read-replicas ideally suited for applications that does db “reading” most  of the time, and comparatively little writing, e.g. wikipedia website. I.e. we can use read replica to effectively scale out read activities. Each DB can have a maximum of 5 read replicas associated with it.
  • We can monitor replication lag between main db and read replicas, using cloudwatch
  • Read replica is only compatible with dbs that uses a transactional db engine. E.g. you have to use InnoDB iwth msyql
  • You can use read replicas to feed data to data warehouses, rather than main db feeding this data. Hence minimises the main db’s burden.
  • We can promote a read-replica to become the primary db
  • You must have automated backups enabled to use read replicas.
  • Additinonal benefits that applies to  Mysql only:
    • You can create read replicas in different regions, which is useful for high availability, and reduced lags. You can also do this with mariadb too.
    • You can create a read replica for a db that is actually outside of aws, e.g. an on premise db server.

When to use read replicas:

  • Use when application has a lot of read requests, compared to write requests
  • Provide various systems, e.g. datawarehouse, it’s own dedicated read replica to pull data from.
  • importing/exporting data between database outside aws, and a read replica, you can then promote this read replica to become primary db
  • If you wan to do db reindexing. Do this on a read replica, then promote it to become main db, and demote main db to read replica


Monitoring RDS instances

There are 2 ways to monitor your RDS setup:

  • SNS notifications
  • Cloudwatch

SNS Notifications

You can easily configure RDS to link up with the SNS service, so that you get notifications when any of the following occurs:

  • snapshots
  • parameter group changes
  • option changes
  • Security Group changes


Hardware and performance is monitored via  cloudwatch.  For a normal EC2 instances, we need to install a few scripts into our instances to get access to a fuller range of cloudwatch monitors. However with RDS, all these scripts are already set up for you.  This means we use cloudwatch to monitor/track:

  • CPU usage
  • ram usage
  • freeable memory (possible with preinstalled scripts)
  • swap usage
  • disk usage
  • read/write IOPS
  • db connections
  • read replicate latency logs
  • read/write throughput



Subnet Groups

As part of  creating an RDS instance, you have to specify which VPC it needs to be created in. Now if you enable multi-AZ deployment (or create read replicas in other AZs), then before creating an RDS instance, you need to first ensure your VPC has subnets attached to it which in turn belongs to different AZs.

However you might have several subnets across different AZs, and you don’t want RDS to use all those subnets. In this situation, you need to apply a restriction, and this is done by creating “subnet group”:

Now you can create a rds db, and specify which vpc, followed by which group of subnets (i.e. subnet group) you want RDS to have access to.











The drawbacks of RDS compared to EC2 based databases

  • you have less control+freedom, e.g. can’t ssh into instance.
  • you can’t use some of your db’s features, e.g. setting up mysql clusters


The RDS creation process


Here I selected postgresql, then

Notice we can enable  Multi-AZ Deployment feature by simply enabling the correct checkbox.


Notice you can choose yes/no for multi-AZ.

You also have a long list of isntance types

storage type is either general ssd or provisioned IOPS, or magnetic:

If we end choosing the wrong storage, then it can be easily switched to another (if we have Multi-AZ enabled) with only a max of a couple of minutes downtime.

The next screen is:

Once you click on the launch instance button. you get:

Notice, that a ip address isn’t displayed. Instead we have an endpoint. This is the url we use to connect to the db from a remote ec2 instance. It is unique to our db, and is referred to as a c-name. You can give this name a nicer alias, via route 53.

Now we are ready to connect to this db.



Connecting to RDS db from an EC2 instance

This is quite straight forward, there are 3 things you need to ensure:

  • the security group that is attached to the rds instance is listening on the db’s port open, this port can accept source connections from the ec2 instance’s private ip address, security group, or subnet range. You just have to specify one of these 3 as the source.
  • The EC2 instance is in the same vpc as the rds


Then you install the relevant db client software, and test your connection, here is an example of what to do inside an ec2 instance:


# The following link was found on:
$ wget
$ yum localinstall pgdg-centos95-9.5-2.noarch.rpm
# the following is the client software only
$ yum install postgresql95-9.5.1   # Note: the server softare is something like: postgresql95-server 
$  psql --version
psql (PostgreSQL) 9.5.1
$ psql -h -U postgrescb1 -d cb1mydb
Password for user postgrescb1:
psql (9.5.1, server 9.4.5)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

cb1mydb=> \l
                                      List of databases
   Name    |    Owner    | Encoding |   Collate   |    Ctype    |      Access privileges
 cb1mydb   | postgrescb1 | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 postgres  | postgrescb1 | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 rdsadmin  | rdsadmin    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | rdsadmin=CTc/rdsadmin
 template0 | rdsadmin    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/rdsadmin                +
           |             |          |             |             | rdsadmin=CTc/rdsadmin
 template1 | postgrescb1 | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgrescb1             +
           |             |          |             |             | postgrescb1=CTc/postgrescb1
(5 rows)

cb1mydb=> \du
                                  List of roles
   Role name   |                   Attributes                   |    Member of
 postgrescb1   | Create role, Create DB                        +| {rds_superuser}
               | Password valid until infinity                  |
 rds_superuser | Cannot login                                   | {}
 rdsadmin      | Superuser, Create role, Create DB, Replication+| {}
               | Password valid until infinity                  |
 rdsrepladmin  | No inheritance, Cannot login, Replication      | {}
cb1mydb=> \q