Sunday, 24 December 2017

Datastage for Beginners

This page lists Datastage topics which should be covered for grasping basic understanding of ETL Datastage (Parallel Jobs). All Topics are listed in a logical order with required questions to be asked and practicals to be performed.

1)  What is Datastage
  • Pipelining Concept
  • Partitioning Concept
  • Partitioning in Datastage
  • Collecting in Datastage
Questions –
  • Why is ETL needed?
  • What are the benefits of partitioning? 
  • Why Collecting is required?
  • Why different type of partitioning and collecting methods, which is the best, fastest, slowest and why? 
2)  Configuration File in Datastage
  • What is a node
  • What happens in background when a Datastage Job runs
3) Datasets
  • Descriptor file
  • Data(Binary file)
  • Datastage management utility
  • orchadmin utility
4) Sequential File
  • Read a Delimited File
  • Read a Fixed Width File
  • Read by File Pattern
  • Read by Filter option
  • Read by Schema file
  • Read from multiple nodes
  • Number of readers per node
5) Funnel Stage
  • Continuous Funnel
  • Sort Funnel
  • Sequence Funnel
6) Copy
7) Filter
8) Transformer
9) Modify
10) Sort
11) Remove Duplicate
12) Aggregator
13) Change Capture
14) Join
15) Lookup
16) Merge
17) Pivot Enterprise
18) Sequence Job
19) Director
20) Administrator

Reference - ibm-datastage-reference-links