Sparkify Redshift ETL (Infra. as Code)
Summary
Create ETL pipeline using AWS redshift following ‘Infrastructure as Code’ format.
Code As Infrastructure
- Prcoess of managing and provisioning computer data centers through machine-readable definition files
 - Ease of repeatability by other users
 - Might be hard to edit if co-working project
 
Details
- Most part of the necessary constants must be saved in the cfg for reference
 - Python boto3 library is used to establish connection between different services in AWS
 - Create a IAM role that takes control of database
 - Create redshift cluster through the new user
 - Attain the endpoint of the cluster, and run the necessary python file for loading using the connection
 
- Most SQL data loading is similar to the past queries
 
Tools
- Python
 - SQL
 
Link Github Repo