Secondary
indexes in teradata is
used to provide an
alternate path to the data and should be used on queries that run many times.
These are two types:-
·
Unique secondary
index (USI)
·
Non Unique secondary index (NUSI)
Unique Secondary Index
Syntax of the Unique Secondary Index:-
SI can be created even after table is
populated with the data , unlike Primary Index which is created only at the
time of creation of table .we can create and drop secondary index at any time.
CREATE UNIQUE
INDEX(COLUMN/COLUMNS)ON<dbname>.<table name>
Whenever we create a SI on the
table,Teradata will create a subtable on all AMPS . This subtable contains three columns given
below
·
1.Secondary index value
·
2.Secondary index row ID
(This is the hash value of SI value)
·
3.Base table row
ID (This is
the actual base row id)
USI sub table creation:-
·
When we defined a Unique SI on the table, then the teradata
will immediately create a USI subtable in each AMP for that particular table.
·
Creation of Subtable requires PERM space.so always be wise to choose your
SI,Normally the best SI is that columns which is mostly used in the WHERE
clause.
Suppose
we have employee table(base table)having attributes EMP.DEPT FNAME,LNAME and
Soc_security.We defined USI on the column soc_security.The SI subtable created
on each AMP which holds information about the SI column and corresponding base
row id(base table row_id)which is the row id of the actual employee table. The
steps involved to load this subtable is as follows:-
·
Teradata will first create the subtable on all amp.
·
After that it hashes the value of this USI column (soc_security) and
based on that hashed value it checks the hash map for the amp number
which will hold this USI value in its subtable.
PLEASE LOOK INTO THE IMAGE
BELOW:-
·
After getting the respective amp number, the SI value along with
the two or more attributes (secondary index row id and base table row id) will
be stored in the sub table of that amp.
As it is clear
now that defining SI will require the creation of subtable,so we should
be aware that requires space cost factor on our Teradata system.
Teradata retrieval of USI query:-
Suppose on the above example we make a
query.
Select*from employee_table shere
soc_security=”123-99-8888”;
When a TD optimizer find a USI in
where clause it knows that it’s a 2 amp operation and also only one row will be
returned. so the step its performs for retrieval is as follows:-
·
It will hash the value of SI(‘123-99-8888’)by hashing algorithm
and found the hash value for it.
·
Now it checks this hash value in the hash map and gets the amp
number from it.we know that this amp stores this SI value.
·
Now it will go to the employee subtable of the amp and retrieve
the base row id which is stored for that hash value.
·
This base row id will be sent back to optimizer by BYNET
·
Now optimizer sent back this row id again and fetch the resultant
row from the base table for which soc_security=’123-99-8888’.
As
we have seen that Teradata system requires 2amp to each the answer row that’s
why we called USI operation as the 2 amp
operations.Even if SI row resides in the same amp in which base row
reside,still after getting base row id from the subtable it will sent back to
optimizer so that it start search again based on that base row id.so it’s
always called as the 2 amp operation.
Non Unique Secondary Index (NUSI) :
Non Unique Secondary Index (NUSI) :
NUSI SUBTABLE EXAMPLE:-
When we defined a NUSI on the table
then Teradata will build the subtable on each amp in the same fashion as that
in USI.The only difference in the subtable creation is that,instead of building
subtable on each amp it will be build the subtable in it to points it own base
rows,In other word each NUSI subtable will reflect and points to the those base
rows only which it owns.
Suppose we have an employee table(base table)on which we defined NUSI on the
column frame.
·
Teradata will first create the subtable on all amp.
·
Each AMP will hold the secondary index values for their rows
in the base table only.in our examples,each amp holds the FNAME column for all
employee rows in the base table on their amp(amp local).
·
Each AMP local fname will have the base table row id so the amp
can retrieve it quickly if needed.if an amp contains duplicate first names,only
one subtable row for that name is built with multiple base row _id’s.the above
example for f_name =’john’the subtable holds multiple base row id for this
value.
Teradata retrieval of NUSI query:-
Suppose on the above example we make
query-
Select*from employee table where
Fname=’john’ ;
When an NUSI
(Fname ) is used in the where clause of an sql statement,the PE optimizer
recognizes the NUSI.it will perform an all amp operation to look in to
the subtable for the requested value.so the steps its performs for retrieval is
as follows
·
It will hash the value of NUSI(‘john’),by hashing algorithm and
found the hash value for it.
·
Now it will instruct all amp to look for this hash value in its
employee subtable. Unlike NUSI there is no looking in to hash map because each
subtable in the amp contains rows from its own base rows only.so this look up
on hash value will performed on all AMP subtable.
· By amp which does not have this hash value will
not participate any more in the operation.
·
when the hash value found the corresponding base row id will be
fetched from the subtable and send to optimizer for actual retrieval of rows.
SECONDARY INDEX SUMMARY:-
·
We can have up to 32 SI for
a table
·
USI subtables are hash distributed
·
NUSI subtables are AMP local
·
USI queries are two amp operations
·
NUSI queries are all amp operations, but not full table scans.
·
Always collect statistics on all NUSI indexes.
0 Comments