- Connected and Active
Transformation
- Used to join source data from two
related heterogeneous sources residing in Different locations or file
systems. Or, we can join data from the same source.
- If we need to join 3 tables, then
we need 2 Joiner Transformations.
- The Joiner transformation joins
two sources with at least one matching port. The Joiner transformation
uses a condition that matches one or more pairs of Ports between the two
sources.
Example:
To join EMP and DEPT tables.
- EMP and DEPT will be source table.
- Create a target table
JOINER_EXAMPLE in target designer. Table should Contain all ports of EMP
table plus DNAME and LOC as shown below.
- Create the shortcuts in your folder.
Creating
Mapping:
1. Open
folder where we want to create the mapping.
2. Click
Tools -> Mapping Designer.
3. Click
Mapping-> Create-> Give mapping name. Ex: m_joiner_example
4. Drag EMP,
DEPT, and Target. Create Joiner Transformation. Link as shown below.
5. Specify
the join condition in Condition tab. See steps on next page.
6. Set
Master in Ports tab. See steps on next page.
7.
Mapping -> Validate
8.
Repository -> Save.
- Create Session and Workflow as
described earlier. Run the Work flow and see the data in target table.
- Make sure to give connection
information for all tables.
JOIN
CONDITION:
The join
condition contains ports from both input sources that must match for the Power
Center Server to join two rows.
Example: DEPTNO=DEPTNO1
in above.
1. Edit
Joiner Transformation -> Condition Tab
2. Add
condition
- We can add as many conditions as
needed.
- Only = operator is allowed.
If we
join Char and Varchar data types, the Power Center Server counts any spaces
that pad Char values as part of the string. So if you try to join the
following:
Char (40)
= “abcd” and Varchar (40) = “abcd”
Then the
Char value is “abcd” padded with 36 blank spaces, and the Power Center Server
does not join the two fields because the Char field contains trailing spaces.
Note: The
Joiner transformation does not match null values.
MASTER
and DETAIL TABLES
In
Joiner, one table is called as MASTER and other as DETAIL.
- MASTER table is always cached. We
can make any table as MASTER.
- Edit Joiner Transformation ->
Ports Tab -> Select M for Master table.
Table
with less number of rows should be made MASTER to improve Performance.
Reason:
- When the Power Center Server
processes a Joiner transformation, it reads rows from both sources
concurrently and builds the index and data cache based on the master rows.
So table with fewer rows will be read fast and cache can be made as table
with more rows is still being read.
- The fewer unique rows in the
master, the fewer iterations of the join comparison occur, which speeds
the join process.
JOINER
TRANSFORMATION PROPERTIES TAB
- Case-Sensitive String
Comparison: If
selected, the Power Center Server uses case-sensitive string comparisons
when performing joins on string columns.
- Cache Directory: Specifies
the directory used to cache master or detail rows and the index to these
rows.
- Join Type: Specifies
the type of join: Normal, Master Outer, Detail Outer, or Full Outer.
Tracing
Level
Joiner
Data Cache Size
Joiner
Index Cache Size
Sorted
Input
JOIN
TYPES
In SQL, a
join is a relational operator that combines data from multiple tables into a
single result set. The Joiner transformation acts in much the same manner,
except that tables can originate from different databases or flat files.
Types of
Joins:
- Normal
- Master Outer
- Detail Outer
- Full Outer
Note: A
normal or master outer join performs faster than a full outer or detail outer
join.
Example: In
EMP, we have employees with DEPTNO 10, 20, 30 and 50. In DEPT, we have DEPTNO
10, 20, 30 and 40. DEPT will be MASTER table as it has less rows.
Normal
Join:
With a
normal join, the Power Center Server discards all rows of data from the master
and detail source that do not match, based on the condition.
- All employees of 10, 20 and 30
will be there as only they are matching.
Master
Outer Join:
This join
keeps all rows of data from the detail source and the matching rows from the
master source. It discards the unmatched rows from the master source.
- All data of employees of 10, 20
and 30 will be there.
- There will be employees of DEPTNO
50 and corresponding DNAME and LOC Columns will be NULL.
Detail
Outer Join:
This join
keeps all rows of data from the master source and the matching rows from the
detail source. It discards the unmatched rows from the detail source.
- All employees of 10, 20 and 30
will be there.
- There will be one record for
DEPTNO 40 and corresponding data of EMP columns will be NULL.
Full
Outer Join:
A full
outer join keeps all rows of data from both the master and detail sources.
- All data of employees of 10, 20
and 30 will be there.
- There will be employees of DEPTNO
50 and corresponding DNAME and LOC Columns will be NULL.
- There will be one record for
DEPTNO 40 and corresponding data of EMP Columns will be NULL.
USING
SORTED INPUT
- Use to improve session
performance.
- to use sorted input, we must pass
data to the Joiner transformation sorted by the ports that are used in
Join Condition.
- We check the Sorted Input Option
in Properties Tab of the transformation.
- If the option is checked but we
are not passing sorted data to the Transformation, then the session fails.
- We can use SORTER to sort data or
Source Qualifier in case of Relational tables.
JOINER
CACHES
Joiner
always caches the MASTER table. We cannot disable caching. It builds Index
cache and Data Cache based on MASTER table.
1) Joiner
Index Cache:
- All Columns of MASTER table used
in Join condition are in JOINER INDEX CACHE.
·
Example: DEPTNO in our mapping.
2) Joiner
Data Cache:
- Master column not in join
condition and used for output to other transformation or target table are
in Data Cache.
·
Example: DNAME and LOC in our mapping example.
Performance
Tuning:
- Perform joins in a database when
possible.
- Join sorted data when possible.
- For a sorted Joiner
transformation, designate as the master source the source with fewer
duplicate key values.
- Joiner can't be used in following
conditions:
1. Either
input pipeline contains an Update Strategy transformation.
2. We
connect a Sequence Generator transformation directly before the Joiner
transformation.
No comments:
Post a Comment