How to Set Up a PostgreSQL Database on Amazon RDS

PostgreSQL is an open source object-relational database system that uses the SQL language for interactions and maintenance. It has been proven to be a highly scalable database solution because it allows you to manage terabytes of data and can handle many concurrent users. PostgreSQL is also ACID-compliant to ensure validity in the event of errors.

You should opt for a PostgreSQL database when you need to create relational databases, where data is organized into one or more tables of columns and rows using a unique key identifying each row. The rows in one table are usually linked to rows in other tables with that unique key.

While there are many places to host your PostgreSQL database, this article focuses specifically on hosting your database on Amazon Relational Database Service (Amazon RDS). Amazon RDS allows for setup of relational databases in the AWS cloud and it’s very easy to use.

If you’re just getting started with all the AWS services, hopefully you’ll find this helpful! For the majority of this article, I will go through the PostgreSQL setup on AWS and then I go over how to do some other useful database operations. If you already have your database set up, feel free to skip around! ⏭⏭

What is covered in this article:

  • How to create a PostgreSQL database in Amazon RDS
  • How to connect to a PostgreSQL database externally
  • How to create a table in a database using SQL
  • How to insert data into a table in a database using SQL
  • How to delete a table in a database using SQL

Creating a PostgreSQL database in Amazon RDS

To create your Postgres database in AWS, first sign into your account and search RDS in the AWS Management Console. Click Databases on the left sidebar and you will be taken to a page that looks like this.

RDS Databases View

Click the orange Create database button on the top right and now you can get started on selecting options for your new database.

Choose a database creation method

Select Standard Create so you can customize the database configuration.

Select engine options

We’re creating a Postgres database so select PostgreSQL and leave the default version 11.5-R1.

Select a template for your use case

I went with the free tier since we’re testing!

Customize Settings

You can either change the DB instance identifier and Master username or leave them as defaults. I changed mine to test-db and and left the default username postgres. You can also either have Amazon RDS generate a password for you, or you can specify your own. In this case, I specified my own.

DB instance size

The DB instance class determines the computation and memory capacity of an Amazon RDS DB instance. The DB instance class you need depends on your processing power and memory requirements (Amazon documentation)

This is where you want to select an instance that meets your memory requirements. For the free tier, we only have Burstable classes as an option and I left this part on the default db.t2.micro.

Storage size

This is where you’ll specify the storage requirements for your database. You also have the option to enable storage autoscaling so your database is dynamic and scales automatically 🙂

Again, I just left the defaults, for storage typeallocated storagemaximum storage threshold, and left enable storage autoscaling checked ✅

Availability & Durability

This option is to create a standby database in case of an outage so that your database is always up and functioning. The option is unavailable on the free tier.

Connectivity

This is where you’ll specify the following connectivity configurations:

  • VPC — the VPC that defines the virtual networking environment for this DB instance
  • Subnet group — the subnet group that defines which subnets and IP ranges the DB instance can use
  • Public accessibility — whether the database is publicly accessible
  • VPC security group — the security groups that will allow incoming traffic outside your VPC
  • Availability zone — improves high availability by isolating failures from other Availability Zones
  • Database port — the TCP/IP port that the DB instance will use for application connections

I left all defaults except for the Publicly accessible section, which I changed to Yes so that my database will be accessible outside of the VPC, like from a Jupyter notebook.

Database authentication

In this section you can choose how you’d like to authenticate your database credentials from the following options:

  • Password authentication — Manage your database user credentials through your DB engine’s native password authentication features.
  • Password and IAM database authentication — Manage your database user credentials through your DB engine’s native password authentication features and IAM users and roles.
  • Password and Kerberos authentication  Manage your database user credentials through your DB engine’s native password authentication features and an AWS Managed Microsoft AD created with AWS Directory Service. This way, you can centrally store and manage user credentials, instead of individually for each DB instance.

Let’s go with Password authentication.

Create Database

Finally, click create database on the bottom to start creating your database and you should see the following messages on your page. It will take a couple minutes!

Connecting to a PostgreSQL database in AWS RDS

Download the Jupyter notebook to follow allong

We’ll be using the psycopg2 library to connect to the postgres database locally in a Jupyter notebook (you can also connect from your IDE or terminal). Go ahead and install psycopg2 using !pip install psycopg2-binary and then you can run the following code to connect.

NOTE: For POSTGRES_DBNAME, it never actually works when I put the real database name, in this case I named it test-db, and I have to put postgresinstead. If it ends up not working with your given database name, try the default postgres and it will probably work.

import psycopg2 as ps
# define credentials 
credentials = {'POSTGRES_ADDRESS' : '', # change to your endpoint
               'POSTGRES_PORT' : '', # change to your port
               'POSTGRES_USERNAME' : '', # change to your username
               'POSTGRES_PASSWORD' : '', # change to your password
               'POSTGRES_DBNAME' : ''} # change to your db name
# create connection and cursor    
conn = ps.connect(host=credentials['POSTGRES_ADDRESS'],
                  database=credentials['POSTGRES_DBNAME'],
                  user=credentials['POSTGRES_USERNAME'],
                  password=credentials['POSTGRES_PASSWORD'],
                  port=credentials['POSTGRES_PORT'])cur = conn.cursor()

If you don’t get an error, you’ve successfully connected to your postgres database! 🎊

Creating a table in a Postgres database

Now that you’ve successfully created a postgres database, let’s get a table in there.

First, verify what is in your database — there should be nothing. You can check using the following code which returns an empty list.

query = """SELECT * FROM pg_catalog.pg_tables
            WHERE schemaname != 'pg_catalog'
            AND schemaname != 'information_schema';"""
cur.execute(query)
cur.fetchall()

Note: if you skipped parts of this article, make sure you’ve connected to the database first (see above) 😀

You can create your first table by running the following code:

cur.execute("""CREATE TABLE table_1
                (column_1 integer, 
                column_2 float,
                column_3 varchar(50),
                column_4 boolean);""")
# Commit table creation
conn.commit()

This creates a table named table_1 with 4columns. I’ve included some of the common SQL datatypes so you can change those up to the datatypes for your table.

If we run the code above that checks for tables in our database, we get the following response where we can see that our table_1 is now in the database:

[(‘public’, ‘table_1’, ‘postgres’, None, False, False, False, False)]

Bonus: if you want to systematically create a bunch of tables given a list of table names, you can use the following code:

table_list = [] # a list of string table names

for table_name in table_list:
    cur.execute("""CREATE TABLE {table_name}
                (column_1 float,
                column_2 float,
                column_3 float,
                column_4 float);""".format(table_name=table_name))

# Commit table creation
conn.commit()

Insert Data Into a Table in a PostgreSQL Database

Now that you have a table in your postgres database, you can get started on inputting your data.

To insert one line of data, run the following code:

data = [1, 2.2, 'three', True]
insert_query = """INSERT INTO table_8
                   (column_1, column_2, column_3, column_4)
                   VALUES (%s, %s, %s, %s);"""
# execute insert
cur.execute(insert_query, data)
            
# commit data insert
conn.commit()

I put my data into a list, making sure that the datatypes match whatever was selected for the columns in my table. Make sure to customize the table you’re creating to match the datatypes that you’ll be inserting. Then I wrote my query to insert that list into the table, executed it, and commited that to the table. If you don’t commit the execution, your data will not save to the table.

Verify that the data was saved by fetching the data from the table:

The select statement returns the data that we inserted so we know it worked!

To insert many lines of data into the table, run the following code:

data = [[5, 5.5, 'five', True], [5, 5.5, 'five', True], [5, 5.5, 'five', True]]

insert_query = """INSERT INTO table_1
                   (column_1, column_2, column_3, column_4)
                   VALUES (%s, %s, %s, %s);"""

# execute multiple inserts
cur.executemany(insert_query, data)
            
# commit data insert
conn.commit()

The difference here is that the variable data is a list of lists containing the data to be inserted. We also use cur.executemany instead of cur.execute to iterate through our data list and insert all of the inputs.

Again you can verify that three new rows of data were inserted using the simple select statement:

Now you can see that the select statement returns the 3 new rows that were added using the previous insert statement.

Delete/Drop a Table in SQL

You’ll eventually want to delete this testing table and start your real work — or you probably messed something up like I always do 🙃

Delete or drop a table in a postgres database using the SQL command DROP TABLE:

cur.execute("""DROP TABLE table_1""")
conn.commit()

Make sure you actually commit otherwise your table won’t get deleted! Verify the tables that your database contains using the code we previously used to check.


That’s it! Now you can set up your own PostgreSQL database on AWS, create tables, add data, and drop any tables you don’t need. If you follow the tutorials and catch any errors, please let me know so I can fix it!

You can also manage your PostgreSQL using a better user interface. I recommend pgAdmin, which can be downloaded for free here.

This article will eventually become part of a series of articles from lessons I learned with my team while working on a project called Cryptolytic (article will be written about this soon and linked here). It’s the guide we wish we had when working on the project.

The notebook containing all the code used in this article can be found here which is inside the repo of our Cryptolytic project — so if you’re curious, check it out!

Find me on twitter @elizabethets or connect with me on LinkedIn!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s