Quick Tips for Performance Tuning and best practices in Bigquery
Simple in Nature:
(Reduction in data being processed)
·
Projections
: Select only required columns.Use Except Keyword in Select *
·
Selections:
Select Only required Rows with Where Clause . Use Limit clause for data analysis
·
Filter
using Pseudo Partition Column (_PARTITIONDATE)
Medium
in nature :
·
Partitioned
Tables
·
Reducing
data before join
·
Where
clause : Operations on BOOL
, INT
, FLOAT
, and DATE
columns are typically faster than operations on STRING
or BYTE
Complex in Nature :
·
Reuse
the repeatedly used transformations
·
Avoid
multiple usage of CTE ( Common table expression )
·
Avoid
Repeated Joins and Subqueries
Heavy
In Nature :
·
Split
complex queries into smaller ones
·
Materializing
Large Datasets and use in necessary places
·
Optimize
your join patterns (place the
table with the largest number of rows first, followed by the table with the
fewest rows, and then place the remaining tables by decreasing size.)
Minimize Data Skew in below two instances by using
Concepts of bigquery
·
Skew at Partition level: Partition skew refers to when data is
not evenly distributed across partitions, resulting in some partitions being
larger than others. This imbalance causes more data to be processed in certain
slots, leading to inefficiencies. Unequally sized partitions exacerbate this
issue, creating a disparity in data distribution.
·
Skew at Join level: Data skew can happen when you
use JOIN clauses. This means that when Big Query organizes data for joining, it
may put too much data with the same joining key into one group. This can
overwhelm the system's capacity.
0 Comments