Purpose of PPI :
- PPI is
used to improve performance for large tables when you submit queries that
specify a range constraint.
- PPI
allows you to reduce the number of rows to be processed by using partition
elimination.
- PPI will
increase performance for incremental data loads, deletes, and
data access when working with large tables with range constraints
Specifying Pratition
primary index :
· Partitions are usually
defined based on Range or Case as follows.
· Two functions, RANGE_N
and CASE_N, can be used to simplify the specification of a partitioning
expression.
Example of Primary Index :
We have on order
table having two columns order_date and order _number,in which PI is defined on
order_date.the primary index was hashed and rows were distributed to the power
amp based on row hash value then sorted by the row_id.the distribution of rows
will take place as explained in the below
Now when we execute query
Select*from order_table
where order_date between 1-1-2003 and 1-31-2003
This query will result in
a all table scan despite of order_date being PI.
Now when we execute query
:
After Creating Partition
Primary Index :
Select*from order table
where order date between 1-1-2003 and 1-31-2003:
This query will not result
in a full table scan because all the January orders are kept together in their
position.
Syntax of Partition Primary Index:
CREATE TABLE ORDER_Table
(
ORD_number integer NOT NULL,
customer_number integer NOT NULL,
order_date date ,
order_total integer
)
PRIMARY INDEX (customer_number)
PARTITION BY case_n (
order_total < 10000 ,
order_total < 20000 ,
order_total < 30000,
NO CASE OR UNKNOWN )
;
2. Partition by Range -
example using date range
CREATE TABLE ORDER_Table
(
ORD_number integer NOT NULL,
customer_number integer NOT NULL,
order_date date ,
order_total integer
)
PRIMARY INDEX (customer_number)
PARTITION BY range_n (
Order_date BETWEEN date '2010-01-01' AND date '2010-12-01'
EACH interval '1' month,
NO RANGE
OR UNKNOWN);
P.S: If we use NO RANGE
or NO CASE - then all values not in this range will be in a singlepartition. If
we specify UNKNOWN, then all null values will be placed in this partition
Disadvantages:
- The PI access disadvantage occurs only when the partitioning column is not part of the PI. In this situation, a query specifying a PI value, but no value for the partitioning column, must look in each partition for that value, instead of positioning directly to the first row for the PI value.
- Another disadvantage is that when another table (without PPI) is joined with PPI table on PI=PI condition. If one of the tables is partitioned, the rows won't be ordered the same, and the task, in effect, becomes a set of sub-joins, one for each partition of the PPI table. This type of join is sliding window join
Limitations:-
- Primary index of PPI table has to be
- Non unique PI, if PPI column is not part of Index, since enforcing a Unique PI would require checking for a duplicate key value in each partition, which would be very expensive.
- Primary Index of PPI table can be Unique, if PPI is part of UPI. This will result in checking for unique constraint in same partition.
- PPI cannot be defined on Global temporary tables and Volatile tables and also on compressed join indices
- PPI Table rows occupy two extra bytes compared to NPPI table row, as these extra bytes store the partition number for each row .
- PPI table rows are four bytes wider if value compression is specified for the table.
Tips of partitioning:
1.
|
Use
the DATE data type,
if possible, for a date-based partitioning expression. A date-based
partitioning expression is often a good choice for at least one level of
partitioning. This will allow the Teradata Database to better recognize
partition elimination opportunities.
|
2.
|
Keep
the partitioning expression simple.
A RANGE_N partitioning expression usually works the best for partition
elimination. With multilevel partitioning, while one level usually does
RANGE_N date-based partitioning, other levels may use CASE_N partitioning.
|
3.
|
Add
query conditions on the partitioning columns, where possible, to improve
partition elimination opportunities.
|
4.
|
If
queries join on the PI but the PI doesn't include the partitioning
column, consider propagating the partitioning column value to
the other table and modifying the query to also join on the partitioning
column and the column propagated to the other table.
|
5.
|
Make
sure the selected partitioning clusters the data so that a combined
partition contains either a large number of rows (resulting in multiple data
blocks per AMP) or contains no rows. This is to ensure that when a combined
partition is read, a majority of rows read from the data blocks qualify. (The
first and last data block may contain rows from other non-qualifying
partitions.) Note that an empty partition does not take any space. As a rule
of thumb, include at least 10 data blocks per combined partition per AMP.
Since, on average, half of the first data block and half of the last data
block will be rows for other partitions, then 90% of the rows read will be
from the qualifying partition. Simple queries can be run on the data to
determine how well the data clusters for a candidate set of partitioning
expressions.
|
6.
|
If
you follow the previous tip, the order of partitioning expression shouldn't
matter too much. If all else is constant, place the partitioning
expressions in ascending order based on the number of partitions
they each define. However, you may want to put your date-based partitioning
expression first.
|
7.
|
Use
tools such
as Teradata Database Query Log and Index Wizard to better understand your
workload, identify partitioning opportunities and verify the success of your
partitioning.
|
8.
|
Collect
statistics on
the system-derived column PARTITION and the usual recommended indexes and
columns. Collecting on the partitioning columns themselves is usually also a
good idea, but statistics on PARTITION may be enough for good plans. Check
EXPLAINs and measure performance to make sure.
|
PPI
improves performance as follows:
- Automatic optimization occurs for queries that specify a restrictive condition on the partitioning column.
- Uses partition elimination to improve the efficiency of range searches when, for example, the searches are range partitioned.
- Only the rows of the qualified partitions in a query need to be accessed avoid full table scans.
- Provides an access path to the rows in the base table while still providing efficient join Strategies
- If the same partition is consistently targeted, the part of the table updated may be able to fit largely in cache, significantly boosting performance
- PPI based tables have advantage during Purging stages. Since purging based on partition is very fast and deletion happens quickly.