Twilio API: Scrap SMS Data and insert into Redshift
Organizations use Twilio's platform to interact with their clients using a variety of APIs. It keeps track of all of its clients' online activity across various channels, including chats, Facebook Messenger, audio, emails, and more. Twilio keeps all the client data in one location, however, migrating the data to data centers like Amazon Redshift can be used for future investigations. Redshift is a perfect central store for Twilio's data because it holds data from several sources. Using industry-standard APIs or third-party ETL (Extract, Load, and Transform) tools, you may link your Twilio to Amazon Redshift.
In this blog, we will go through a code snippet that showcases how you can extract SMS data from the Twilio API, store it in S3, and then insert the data into Redshift.
Scraping SMS data using Twilio API and inserting it in Redshift
In airflow, we couldn’t transfer data from Twilio to Redshift. We can use S3 as a mediator and get the data from the Twilio API as a CSV file. And then upload the CSV file into S3.To scrap SMS data using Twilio API we need to
- Establish the connection between Twilio API and airflow.
- Get the data from Twilio API as a JSON file.
- Convert the JSON file into a CSV file.
- Upload the CSV file into AWS S3.
- Then insert the data into Redshift from AWS S3 using copy command.
Challenges
- Implementing the connection for Twilio in Airflow is difficult.
- Lack of necessary JDBC driver for the Twilio API.
- We can’t able to choose the data by max_createddate.
- The converted CSV file had issues inserting into Redshift
Designing a workflow
- Design a translation from Twilio API to Redshift. In airflow, we cannot establish the connection it can be done only by the python requests module.
- To start with, we import the necessary libraries such as requests, pandas, and logging. We then set the parameters such as the number of results per page (1000), the number of pages to be extracted (5), and the start date of the data we want to retrieve (2010-04-01)
- Next, we set the base URL for Twilio API and build the initial URL with the account SID and the parameters set earlier. Then use the requests library to make a GET request to the API and store the response in a variable.
- The response we receive from the API is in JSON format and includes various details about the SMS messages such as the SID, date created, date updated, account SID, and others. We extract the relevant data and store it in a list.
- If the response from the API includes a next-page URI, we continue to make API calls to retrieve the next set of data. This continues until all the pages have been extracted or there are no more pages to be extracted.
- Once all the data has been retrieved, we create a Pandas DataFrame with the extracted data and reindex the columns to match the desired order. We then drop the unwanted columns using the del statement.
- Finally, we write the data to a CSV file and use the s3_load_obj function to upload it to an S3 bucket. The uploaded file is then logged using the logging library.
Conclusion:
While we hope this blog gives a good overview of Twilio and an example of how to extract data from an API and store it in S3. By following the steps outlined in this blog, you can retrieve data from any API and store it for further analysis and processing.