The ETL Help Guide

Wednesday, 7 February 2018

Export /Import Datastage jobs - istool.sh

istoolDir="/opt/IBM/InformationServer/Clients/istools/cli/"

#export isx

${istoolDir}/istool.sh export -v -ds '"'$server'/'$project'/*/'$f'.*" ' -domain "$domain" -username "$username" -password "$pwd" -archive $destD/$f".isx" >> logfile

#import isx

${istoolDir}/istool.sh import -dom <domain:port> -u <username> -p <password> -ar <full_path/file.isx> -replace -ds "<dsserver>/<dsproject>" >> logfile

# log validation - wc -l logfile = (job counts * 4) + 1

.profile file with shortcuts

dshome=`cat /.dshome`
. ${dshome}/dsenv
export APT_CONFIG_FILE=${dshome}/../Configurations/default.apt

dsstr()
{
orchadmin dump -name -delim "~" -n 1 $1 2>/dev/null | head -1 | tr '~' '\n' | cat -n
}

dshead()
{
orchadmin dump -name -delim "~" -n 1 $1 2>/dev/null | head -1 | tr '~' '\n' | sed 's/:.*$//g' | tr '\n' '|' | sed 's/.$//';
echo;
orchadmin dump -delim "|" -n 5 $1 2>/dev/null
}

dsdump()
{
orchadmin dump -delim "|" $1 2>/dev/null
}

set -o vi

TMOUT=9999

Frequently Asked Scenarios - Datastage

1) How to calculate Sum(sal), Avg(sal), Min(sal), Max(sal) with out using Aggregator stage.

2) How to find out First sal, Last sal in each dept with out using aggregator stage

3) How to Find Out Duplicate Values Using Transformer?

4) Database comparison – compare two databases using generic datastage job

5) Remove the header and footer while reading this file using sequential file stage of Datastage?
sed '1d;$d' file_name>new_file_name (type this in job before job subroutine then use new file in seq stage)
sed '1,$d' file_name>new_file_name ( or use filter option in sequential file stage )

6) How to generate sequence without using surrogate key generator
@PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM-1))

ETL - Designing

ETL design is one of key processes of a DWH project. Improperly designed ETL systems lead to considerable complexity and significant operational problems in later phase of the project which costs extra time and resources due to issues and required rework.

To design a good ETL system, ETL designers should follow basic steps in respective phases of the project as below -

Requirement Gathering

Data Analysis and Profiling(5D)

Data Integrity

Data Volume

Data Distribution

UNIX Fundamentals

What is Unix

Multitasking
Multiuser
Computer operating system
Modular design
Unified file system
C programming language as a systems programming

History and Evolution of Unix

Frequently Asked Questions - Datastage

Questions on Configuration File and Partitioning

1) What is Configuration file, its contents, usage, and node pools?

2) What is difference between resource desk and resource scratch desk space in configuration file.

3) DataStage Collector methods, which collector method to use if records should be collected in a sequential way by 3 target files?

4) What all stages require Hash partitioning and why ?

5) What is the difference between Hash and Modulus partitioning ?

6) What will happen if the data is not sorted on keys before join stage ?

7) In a parallel job with 4 nodes configuration file, containing 1 sequential file stage, 2 transformers and 1 odbc stage. how many processes will be generated ?

Questions on Datastage Components

Datastage Job Extraction - Performance Tuning

Extraction from Database -

Filtration
1) Try to have the constraints in the 'Selection' criteria of the jobs itself. This will eliminate the unnecessary records even getting in before joins are made.
2) Using a constraint to filter a record set is much slower than performing a SELECT … WHERE….er joins in SQL itself if possible.

Sorting - Try not to use a sort stage when you can use an ORDER BY clause in the database.

Join - Always prefer joins in SQL itself if possible.

The ETL Help Guide

Wednesday, 7 February 2018

Export /Import Datastage jobs - istool.sh

.profile file with shortcuts

Frequently Asked Scenarios - Datastage

Wednesday, 17 January 2018

ETL - Designing

Thursday, 11 January 2018

UNIX Fundamentals

Saturday, 6 January 2018

Frequently Asked Questions - Datastage

Tuesday, 26 December 2017

Datastage Job Extraction - Performance Tuning

About Me

Total Pageviews

Wednesday, 7 February 2018

Export /Import Datastage jobs - istool.sh

.profile file with shortcuts

Frequently Asked Scenarios - Datastage

Wednesday, 17 January 2018

ETL - Designing

Thursday, 11 January 2018

UNIX Fundamentals

Saturday, 6 January 2018

Frequently Asked Questions - Datastage

Tuesday, 26 December 2017

Datastage Job Extraction - Performance Tuning

About Me

Total Pageviews

Subscribe To