The customer was challenged with data replication using their complex API. Almost all the objects in the database were supposed to be extracted and loaded into the data warehouse using complex API methods. Also, the customer wanted to have a dynamic method to get any new data created at the source. The API has lower call limits and made the bulk data extraction difficult. Also, its multi-step process in data extraction eats up the API call limit.
SOLUTION
All the required APIs are tested and classified based on the way they are extracted. For objects under each of the classifications, separate code/mapping is created to get the output. The code/mapping is dynamic and holds the parameters in a stage like Snowflake. The parameters are iterated through each API call value one by one. A python code is created which creates the table in snowflake based on the incoming data. Also, the python code extracts data, compiles, and stores it in the S3 bucket in JSON form. A stored procedure is used to copy the files from S3 to snowflake by executing the “copy into” command. It is also used to flatten the JSON data and load it into the respective table and any scheduler can be used to execute the python codes sequentially.
SUCCESS CRITERIA & BUSINESS VALUE
Our Infofiscus Marketo Data Replication solution helped the customer to reduce the complexities and have only a few categories of APIs. In this case, it was a total of 7 categories. These categories are scheduled to run at different sets of time intervals to prevent hitting the API limitations. The solution mentioned is dynamic and any new data is extracted dynamically if added to the source. The solution is automated end to end and makes the data flow simpler.