Sparkify Redshift ETL (Infra. as Code)
Summary
Create ETL pipeline using AWS redshift following ‘Infrastructure as Code’ format.
Code As Infrastructure
- Prcoess of managing and provisioning computer data centers through machine-readable definition files
- Ease of repeatability by other users
- Might be hard to edit if co-working project
Details
- Most part of the necessary constants must be saved in the cfg for reference
- Python boto3 library is used to establish connection between different services in AWS
- Create a IAM role that takes control of database
- Create redshift cluster through the new user
- Attain the endpoint of the cluster, and run the necessary python file for loading using the connection
- Most SQL data loading is similar to the past queries
Tools
- Python
- SQL
Link Github Repo