Partition Index

Purpose of PPI :

PPI is used to improve performance for large tables when you submit queries that specify a range constraint.

PPI allows you to reduce the number of rows to be processed by using partition elimination.

PPI will increase performance for incremental data loads, deletes, and data access when working with large tables with range constraints

Specifying Pratition primary index :

· Partitions are usually defined based on Range or Case as follows.

· Two functions, RANGE_N and CASE_N, can be used to simplify the specification of a partitioning expression.

Example of Primary Index :

We have on order table having two columns order_date and order _number,in which PI is defined on order_date.the primary index was hashed and rows were distributed to the power amp based on row hash value then sorted by the row_id.the distribution of rows will take place as explained in the below

Now when we execute query

Select*from order_table where order_date between 1-1-2003 and 1-31-2003

This query will result in a all table scan despite of order_date being PI.

The primary index was hashed and rows were distributed to the proper amp based on row hash value then sorted by order_date and not by row id. This distribution of rows will be take place as explained in the below image.

Now when we execute query :

After Creating Partition Primary Index :

Select*from order table where order date between 1-1-2003 and 1-31-2003:

This query will not result in a full table scan because all the January orders are kept together in their position.

Syntax of Partition Primary Index:

1. Partition by CASE

CREATE TABLE ORDER_Table

(

ORD_number integer NOT NULL,

customer_number integer NOT NULL,

order_date date ,

order_total integer

)

PRIMARY INDEX (customer_number)

PARTITION BY case_n (

order_total < 10000 ,

order_total < 20000 ,

order_total < 30000,

NO CASE OR UNKNOWN ) ;

2. Partition by Range - example using date range

CREATE TABLE ORDER_Table

(

ORD_number integer NOT NULL,

customer_number integer NOT NULL,

order_date date ,

order_total integer

)

PRIMARY INDEX (customer_number)

PARTITION BY range_n (

Order_date BETWEEN date '2010-01-01' AND date '2010-12-01'

EACH interval '1' month,

NO RANGE

OR UNKNOWN);

P.S: If we use NO RANGE or NO CASE - then all values not in this range will be in a singlepartition.If we specify UNKNOWN, then all null values will be placed in this partition

Disadvantages:

The PI access disadvantage occurs only when the partitioning column is not part of the PI. In this situation, a query specifying a PI value, but no value for the partitioning column, must look in each partition for that value, instead of positioning directly to the first row for the PI value.
Another disadvantage is that when another table (without PPI) is joined with PPI table on PI=PI condition. If one of the tables is partitioned, the rows won't be ordered the same, and the task, in effect, becomes a set of sub-joins, one for each partition of the PPI table. This type of join is sliding window join

Limitations:-

Primary index of PPI table has to be
Non unique PI, if PPI column is not part of Index, since enforcing a Unique PI would require checking for a duplicate key value in each partition, which would be very expensive.
Primary Index of PPI table can be Unique, if PPI is part of UPI. This will result in checking for unique constraint in same partition.
PPI cannot be defined on Global temporary tables and Volatile tables and also on compressed join indices
PPI Table rows occupy two extra bytes compared to NPPI table row, as these extra bytes store the partition number for each row .
PPI table rows are four bytes wider if value compression is specified for the table.

Tips of partitioning:

1.	Use the DATE data type, if possible, for a date-based partitioning expression. A date-based partitioning expression is often a good choice for at least one level of partitioning. This will allow the Teradata Database to better recognize partition elimination opportunities.
2.	Keep the partitioning expression simple. A RANGE_N partitioning expression usually works the best for partition elimination. With multilevel partitioning, while one level usually does RANGE_N date-based partitioning, other levels may use CASE_N partitioning.
3.	Add query conditions on the partitioning columns, where possible, to improve partition elimination opportunities.
4.	If queries join on the PI but the PI doesn't include the partitioning column, consider propagating the partitioning column value to the other table and modifying the query to also join on the partitioning column and the column propagated to the other table.
5.	Make sure the selected partitioning clusters the data so that a combined partition contains either a large number of rows (resulting in multiple data blocks per AMP) or contains no rows. This is to ensure that when a combined partition is read, a majority of rows read from the data blocks qualify. (The first and last data block may contain rows from other non-qualifying partitions.) Note that an empty partition does not take any space. As a rule of thumb, include at least 10 data blocks per combined partition per AMP. Since, on average, half of the first data block and half of the last data block will be rows for other partitions, then 90% of the rows read will be from the qualifying partition. Simple queries can be run on the data to determine how well the data clusters for a candidate set of partitioning expressions.
6.	If you follow the previous tip, the order of partitioning expression shouldn't matter too much. If all else is constant, place the partitioning expressions in ascending order based on the number of partitions they each define. However, you may want to put your date-based partitioning expression first.
7.	Use tools such as Teradata Database Query Log and Index Wizard to better understand your workload, identify partitioning opportunities and verify the success of your partitioning.
8.	Collect statistics on the system-derived column PARTITION and the usual recommended indexes and columns. Collecting on the partitioning columns themselves is usually also a good idea, but statistics on PARTITION may be enough for good plans. Check EXPLAINs and measure performance to make sure.

PPI improves performance as follows:

Automatic optimization occurs for queries that specify a restrictive condition on the partitioning column.
Uses partition elimination to improve the efficiency of range searches when, for example, the searches are range partitioned.
Only the rows of the qualified partitions in a query need to be accessed avoid full table scans.
Provides an access path to the rows in the base table while still providing efficient join Strategies
If the same partition is consistently targeted, the part of the table updated may be able to fit largely in cache, significantly boosting performance
PPI based tables have advantage during Purging stages. Since purging based on partition is very fast and deletion happens quickly.

Hadoop Concepts

Hadoop Tools

Interview

GCP Tools And POCs

Partition Index

contact us

Contact Form

Dataengines

Teradata Basics

Utilities

Dataengines

Footer Menu Widget

Ad Code

Hadoop Concepts

Hadoop Tools

Interview

GCP Tools And POCs

Partition Index

contact us

Contact Form

Dataengines

Teradata Basics

Utilities

Dataengines

Footer Menu Widget