Saturday, 13 August 2016

How to find duplicate records based on keys using Transformer


You can capture the duplicate records based on keys using Transformer stage variables.

1. Sort and partition the input data of the transformer on the key(s) which defines the duplicate.

2. Define two stage variables, let's say StgVarPrevKeyID (data type same as keyid field) and StgVarCnt as Integer with default value 0

where keyid is your input column which defines the duplicate.

Expression for StgVarCnt(1st stg var - maintain order):

If DSLink.keyid = StgVarPrevKeyID Then StgVarCnt + 1 Else 1

Expression for StgVarPrevKeyID (2nd stg var):

DSLink.keyid

3. Now in constrain, if you filter rows where StgVarCntr = 1 will give you the unique records and if you filter StgVarCntr > 1 will give you duplicate records.