Composite Rest Webservice for IDQ and MDM.
March 11, 2019Salesforce Data Replication using IDMC
March 11, 2019Problem Description
Informatica PowerCenter: Nowadays ETL developers facing the issue of data redundancy while dealing with large data sets. As part of the storage concern, we need to reduce the duplicate data from daily incoming source data and load the latest data to the target table.
Challenges to Developer/user
Implement incremental loading in Informatica PowerCenter:-
- The source data come along with timestamp filed by using that timestamp field we need to extract the data which is newly coming newly and load it to the target.
- Make use of parameters or variables in the Informatica power.
- Understanding the value precedence of parameters and variables.
- Make use of :setmaxvariable() function in expression transformation.
Technical Solution
Steps:
1. log in to Informatica PowerCenter designer with valid credentials. open the folder where you want to implement your incremental loading.
2. Go to tools> source analyzer then imports your source definition.
(NOTE: source should have a timestamp filed.)
3. Go to tools> target designer then imports your target definition.
4. Go to tools> mapping designer, then create a mapping with meaningful naming convention (ex: m_MyFirstMapping). Drag your source and target to the designer workspace.
5. Go to the mappings tab and select parameters and variables property then create a variable with DateTime datatype.
6. Create an expression transformation and drag all ports from source to expression.
7. create a variable port(v_max_date) and assign the following expression for that port
:Setmaxvariable($$Daily_Date,Created_date)
the above expression will assign the maximum values(recent date) to the v_max_date variable from the last modified date filed or every successful run of the session.
In the above I have provided the initial value for the first time run it will use this value if it’s not provided it will use a datatype default value like ‘01/01/1753’.
8. Go to source qualifier >edit >properties, then make use of SQL override and generate default SQL query and append where clause to that default SQL query like,
Where Created_date>$$Daily_Date.
NOTE: The order of execution of parameters/variables values in Informatica PowerCenter:
- Parameter file.
- The value assigned in the pre-session variable assignment.
- Value saved in the repository.
- Initial value defined for the variable at declaration level.
- The default value is defined by Informatica for the datatype of the defined variable.
The Informatica checks the value for the variable/parameter in the above order and it stops assigning a value to the parameter/variable if its finds a value.
10. Go to workflow designer and create a workflow and session for that mapping.
11. Now run the workflow and monitor the status using workflow monitor.
12. Now create a new record in the source and run the session again now this time it will load only newly created records.
Explanation:
For the first time, the session uses initial values for the variable $$Daily_Date is ‘01/01/1980’
In my SQL override, the query looks like
Informatica PowerCenter So the query pulls all the records from the source table and loads them to target. During the runtime in expression, the variable port will assign a maximum value from the “created_date” port to the “$$Daily_Date” variable using the “Setmaxvariable()” function. Next time the session runs it will use that value and that value will be used in SQL override query and that query will pull the data from the source table based on the condition given.
We can see the latest value assigned to viable by using the “view persistent value” property.
Right-click on session>view persistent value.
Advance use case:
In Informatica cloud also we can do the same for the Salesforce object. while replicating data from salesforce to local flat file directory or from local flat-file to salesforce we can use incremental loading.
Under the data filter tab, we can define a condition like
LastmodifiedDate >$LastRunDate
Salesforce account object has by default have a LastModifiedDate filed and also for every successful task the Informatica server will default parameter values like
- $LastRunDate.
- $LastRunTime.
- $ErrorFileName.
- $SuccesFileName.
We can see the parameter file in the following directory:
<InfaAgentInstalledDir>\apps\Data_Integration_server\data\parameters.
Advantages:
1. Decreases the time consumption and the load on the database server and network.
2. we can take a daily, monthly, quarterly backup.
Conclusion:
Initially, We are reloading the entire transactions table daily even though I already had the data till yesterday with me. This is not only a tool for significant time but also increased the load on the database server and the network. This is where incremental loading made a huge difference by loading newly coming data from the source table.