HBASE
Hbase is a No SQL distributed databasewhich is built on top of Hadoop file system designed to achieve random, real-time read/write access to Big data. It is opensource and is developed after Google’s big-data table and is written in Java. It is a column-oriented database.
What is the Need of Hbase?
In Hadoop, data can be accessed only in sequentially manner which means read/write starts from the beginning of the file and proceeds step-by-step till the end. To query small data also it needs the entire dataset to be searched. Hadoop cannot change the partial data in the file without completely re-writing it. Because of this, there was a need to develop a solution which can provide random read/write access to huge volumes of Data.
Features of Hbase:
· Column-oriented No SQL Database
· Provides fault tolerance
· Supports semi-structured as well as structured data
· It Uses Hash tables to give random access and stores the data in Indexed form in HDFS for fast look ups.
Architecture of Hbase:
Hbase has 3 main components:
· H-Master
· Region Servers
· Zookeeper
What is the Need of Hbase?
In Hadoop, data can be accessed only in sequentially manner which means read/write starts from the beginning of the file and proceeds step-by-step till the end. To query small data also it needs the entire dataset to be searched. Hadoop cannot change the partial data in the file without completely re-writing it. Because of this, there was a need to develop a solution which can provide random read/write access to huge volumes of Data.
Features of Hbase:
· Column-oriented No SQL Database
· Provides fault tolerance
· Supports semi-structured as well as structured data
· It Uses Hash tables to give random access and stores the data in Indexed form in HDFS for fast look ups.
Architecture of Hbase:
Hbase has 3 main components:
· H-Master
· Region Servers
· Zookeeper
1) H-Master:
- § It is the Master Server in Hbase.
- § It Assigns regions to the region servers and also monitors all region servers.
- § Performs load balancing. It distributes the load equally between Region servers.
- § H-Master handles all the operations related to metadata change like DDLs (create, delete, update of table)
2) Region Servers:
- § These are worker nodes in Hbase
- § Contains regions which are the horizontal partitions of the tables based on the Row key. Regions are the basic building blocks of Hbase cluster
- § Communicates with clients and handles read/write/update/delete operations of all the regions present in it.
- § Region server process will be run on every data node of Hadoop Cluster.
Region server has the following components:
1. Write Ahead Log (WAL): It is a log file that stores the new data which is not yet written to permanent storage and is useful while recovering due to node failures.
2. Block Cache: In memory,It caches the frequency used data .
3. MemStore: It is a Write Cache which stores the data which is not yet written to disk. Each column family in the region server will have its dedicated MemStore.
4. HFile: It stores the actual data/rows in store in a sorted manner of KeyValues.
3) Zookeeper
- § Maintains Server configuration information.
- § Keeps track of server failures.
- § Monitors all master servers and keeps only one H-Master server active at any time.
0 Comments