Importing data from a CVS file to a Amazon RDS PostgreSQL 12.2 using a SSH tunnel created in Python

In this post I will show you a simple python program that import a CVS file from a remote PC into a table of Amazon RDS PostgreSQL using a a SSH tunnel created by the same Python program.

I going to use an Amazon RDS PostgreSQL without a Public IP, for this reason I will use an Amazon EC2 instance as bastion host (Jump server).

Both Amazon RDS PostgreSQL and EC2 Instance use the same VPC (172.31.0.0/16) and the EC2 Instance can connect to Amazon RDS PostgreSQL using the Port 5432 and the python program will connect remotely to EC2 instance using a SSH tunnel.

Current Environment

Instance ID/ Db Instance ID Local IP Public IP Secure Group Enabled Port
EC2 Instance i-032f4a47077b4a2ad 172.31.82.38 3.85.36.103 SECURITY_TO_EC2_POSTGRESQL 22
Amazon Postg2ress RDS postmig 172.31.17.227 Not enabled ACCESS_TO_RDS_POSTGRESQL 5432
Local Workstation 127.0.0.1 x1.x2.x3.x4 5439

The local PC run a python program (version 3.75) that executes three things:

  • Create a SSH Tunnel from the laptop with final destination the Amazon RDS PostgreSQL
  • Create a connection to Amazon RDS PostgreSQL using the SSH tunnel.
  • Import a CSV file into Amazon RDS PostgreSQL.

1. I going to validate the rules used in Security Groups for the EC2 instance and Amazon RDS using AWS CLI 2.0.10.

a. Validate what is the active Security Group used by the EC2 Instance (i-032f4a47077b4a2ad)

aws ec2 describe-instances --instance-ids i-032f4a47077b4a2ad --query "Reservations[*].Instances[*].[InstanceId,SecurityGroups[*].GroupName |[*]]"
[
    [
        [
            "i-032f4a47077b4a2ad",
            [
                "SECURITY_TO_EC2_POSTGRESQL"
            ]
        ]
    ]
]

b. Validate the Security Group “SECURITY_TO_EC2_POSTGRESQL” grant access only to the PC (Public IP: x1.x2.x3.x4) using the port 22 (SSH)

aws ec2 describe-security-groups --group-names SECURITY_TO_EC2_POSTGRESQL --query "SecurityGroups[].[GroupName,IpPermissions[].[FromPort,IpRanges[]]]"
[
    [
        "SECURITY_TO_EC2_POSTGRESQL",
        [
            [
                22,
                [
                    {
                        "CidrIp": "x1.x2.x3.x4/32"
                    }
                ]
            ]
        ]
    ]
]

c. Validate what is the active Security Group used by the Amazon RDS DB (i-032f4a47077b4a2ad)

aws rds describe-db-instances --db-instance-identifier postmig  --query "DBInstances[*].VpcSecurityGroups"
[
    [
        {
            "VpcSecurityGroupId": "sg-0d30f4aeed419af97",
            "Status": "active"
        }
    ]
]

d. Validate the Security Group (Group ID sg-0d30f4aeed419af97) used by Amazon RDS DB only grant access trough the port 5432 to the EC2 Instance (located in the same VPC)

aws ec2 describe-security-groups --group-ids sg-0d30f4aeed419af97 --query "SecurityGroups[].[GroupName,GroupId,IpPermissions[].[FromPort,IpRanges[]]]"
 [
     [
         "ACCESS_TO_RDS_POSTGRESQL",
         "sg-0d30f4aeed419af97",
         [
             [
                 5432,
                 [
                     {
                         "CidrIp": "172.31.82.38/32"
                     }
                 ]
             ]
         ]
     ]
 ]

2. Check if the packages psycopg2 and sshtunnel were installed in python, if they were not , install them using pip (Python Package Installer)

C:\Users\user>pip freeze|findstr sshtunnel

C:\Users\user>pip freeze|findstr psycopg2

C:\Users\user>pip install psycopg2
Collecting psycopg2
  Using cached https://files.pythonhosted.org/packages/83/8d/bbb2ca983f3939066e8d104fe7a7b0fce1fd3d0f706ddb6d8a86bb33f5d
a/psycopg2-2.8.5-cp37-cp37m-win_amd64.whl
Installing collected packages: psycopg2
Successfully installed psycopg2-2.8.5
WARNING: You are using pip version 19.2.3, however version 20.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

C:\Users\user>pip install sshtunnel
Collecting sshtunnel
  Using cached https://files.pythonhosted.org/packages/c5/5c/4b320d7ec4b0d5d4d6df1fdf66a5799625b3623d0ce4efe81719c6f8dfb
3/sshtunnel-0.1.5.tar.gz
Requirement already satisfied: paramiko>=1.15.2 in c:\users\user\appdata\local\programs\python\python37\lib\site-package
s (from sshtunnel) (2.7.1)
Requirement already satisfied: pynacl>=1.0.1 in c:\users\user\appdata\local\programs\python\python37\lib\site-packages (
from paramiko>=1.15.2->sshtunnel) (1.3.0)
Requirement already satisfied: bcrypt>=3.1.3 in c:\users\user\appdata\local\programs\python\python37\lib\site-packages (
from paramiko>=1.15.2->sshtunnel) (3.1.7)
Requirement already satisfied: cryptography>=2.5 in c:\users\user\appdata\local\programs\python\python37\lib\site-packag
es (from paramiko>=1.15.2->sshtunnel) (2.9.2)
Requirement already satisfied: six in c:\users\user\appdata\local\programs\python\python37\lib\site-packages (from pynac
l>=1.0.1->paramiko>=1.15.2->sshtunnel) (1.14.0)
Requirement already satisfied: cffi>=1.4.1 in c:\users\user\appdata\local\programs\python\python37\lib\site-packages (fr
om pynacl>=1.0.1->paramiko>=1.15.2->sshtunnel) (1.14.0)
Requirement already satisfied: pycparser in c:\users\user\appdata\local\programs\python\python37\lib\site-packages (from
 cffi>=1.4.1->pynacl>=1.0.1->paramiko>=1.15.2->sshtunnel) (2.20)
Installing collected packages: sshtunnel
  Running setup.py install for sshtunnel ... done
Successfully installed sshtunnel-0.1.5
WARNING: You are using pip version 19.2.3, however version 20.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

3. Create a file dept.csv in the folder c:\python that will be imported in Amazon RSD PostgreSQL

DEPT_ID,DEPT_NAME
1,HR
2,IT
3,MARKETING


4. Explanation of the Python program

a. Import the package sshtunnel to create the sshtunnel and the package psycopg2 to connect to Postgress and execute different SQL statements.

from sshtunnel import SSHTunnelForwarder
import psycopg2


b. Initialize the parameters to create a ssh tunnel

remote_user = ‘dbadmin’ #==> user of EC2 instance
remote_host = ‘3.85.36.103’ #==> Public IP of EC2 instance
remote_port = 22 #==> Port used to connect to SSH
private_host = ‘postmig.c3qqnqe4hzdh.us-east-1.rds.amazonaws.com’ #==> is the Endpoint of Amazon RDS (Private IP)
private_port = 5432 #==> Enabled Port to connect to PostgreSQL
local_host = ‘127.0.0.1’ # ==> Local port of the PC run the python program
local_port = 5439 #==> The local port of PC


c. Create a SSH tunnel between that allow the connection among Local PC, the EC2 instance and Amazon RDS then start the SSH tunnel.

server = SSHTunnelForwarder(
(remote_host, remote_port),
ssh_username=remote_user,
ssh_password=’********’,
remote_bind_address=(private_host, private_port),
local_bind_address=(local_host, local_port),
) #Configure Tunnel

server.start() #Open Tunnel


d. Create a connection to Amazon RDS PostgreSQL using the ssh tunnel and create a cursor connection to execute SQL statements

conn = psycopg2.connect(host=”127.0.0.1″, port = 5439, database=”db_post_mig”, user=”postgres”, password=”oracle”) #Open Connection to PostgreSQL DB
cur = conn.cursor() #Open Cursor to execute commands on Database


e. Create a table dept in the schema progreso if it does not exist

cur.execute(“””create table IF NOT EXISTS progreso.dept (DEPT_ID bigint, DEPT_NAME varchar(128));”””) #create a table if does not exist


f. Import the data from the file dep.csv to the table progreso.dept

with open(‘C:\python\dep.csv’) as csv_file:
next(csv_file) # Skip the header
cur.copy_from(csv_file, ‘progreso.dept’, sep=’,’)
conn.commit()


g. Close the cursor, connection and the SSH tunnel.

cur.close() #Close Cursor
conn.close() #Close Connection to DB
server.stop() #Close Tunnel


5. List of the python program imp_csv_to_RDS_tunnel.py

C:\python>type imp_csv_to_RDS_tunnel.py

from sshtunnel import SSHTunnelForwarder
import psycopg2

remote_user  = 'dbadmin'
remote_host  = '3.85.36.103'
remote_port  = 22
private_host = 'postmig.c3qqnqe4hzdh.us-east-1.rds.amazonaws.com'
private_port = 5432
local_host   = '127.0.0.1'
local_port   = 5439

server = SSHTunnelForwarder(
   (remote_host, remote_port),
   ssh_username=remote_user,
   ssh_password='********',
   remote_bind_address=(private_host, private_port), 
   local_bind_address=(local_host, local_port),
   )  #Configure Tunnel

server.start() #Open Tunnel

#Open Connection to PostgreSQL DB
conn = psycopg2.connect(host="127.0.0.1", port = 5439, database="db_post_mig", user="postgres", password="oracle")  

cur = conn.cursor()  #Open Cursor to execute commands on Database

#create a table if does not exist
cur.execute("""create table IF NOT EXISTS progreso.dept (DEPT_ID bigint, DEPT_NAME varchar(128));""")
      
#Dpen and upload the file.csv in the table progreso.dept
with open('C:\python\dep.csv') as csv_file:
    next(csv_file) # Skip the header
    cur.copy_from(csv_file, 'progreso.dept', sep=',')
    conn.commit()

cur.close()    #Close Cursor
conn.close()   #Close Connection to DB
server.stop()  #Close Tunnel
    

6. Verify the program phtyon program imp_csv_to_RDS_tunnel.py

a. Check how many row has the table progreso.dept before run the python program (connected from the EC2 instance)

[dbadmin@dbserver ~]$ psql -h 172.31.17.227 -U postgres -p 5432
Password for user postgres:
psql (12.2)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

postgres=> \c db_post_mig
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
You are now connected to database "db_post_mig" as user "postgres".

db_post_mig=> select * from progreso.dept;

 dept_id | dept_name
---------+-----------
(0 rows)

b. Run the the program phtyon program imp_csv_to_RDS_tunnel.py

C:\python>python imp_csv_to_RDS_tunnel.py

c. Check if were inserted the 3 rows in the table.

db_post_mig=> select * from progreso.dept;

 dept_id | dept_name
---------+-----------
       1 | HR
       2 | IT
       3 | MARKETING
(3 rows)