Tuesday, 26 December 2017

Datastage Job Extraction - Performance Tuning

Extraction from Database - 

Filtration
1) Try to have the constraints in the 'Selection' criteria of the jobs itself. This will eliminate the unnecessary records even getting in before joins are made.
2) Using a constraint to filter a record set is much slower than performing a SELECT … WHERE….er joins in SQL itself if possible.

Sorting - Try not to use a sort stage when you can use an ORDER BY clause in the database. 
Join - Always prefer joins in SQL itself if possible.

Parallel Read and Partitioning
For Netezza -
Set Enable partitioned reads to Yes.
WHERE mod(datasliceid,[[node-count]])=[[node-number]]
For Oracle -
Partitioned reads method property is set to Modulus.
WHERE MOD(TABLE.COL_NM, 4) = 0
** COL_NM must be a number datatype
For ODBC Connector -
Set Enable partitioning to Yes.
Select partitioning method to divide your data into subsets.
Set Column name - key field column in which the data is partitioned.

Extraction from Flat Files- 

Filtration 
1) Remove the data not used from the source as early as possible in the job.
2) Use Filter option to specify command to filter records from a text file.
3) Also read only columns which are required in the job instead of reading all.

Parallel Read 
Sequential Files can be read in parallel by using -
1) Read From Multiple Nodes
2) Number Of Readers Per Node