Impala Introduction & Essentials - 2 Day Bootcamp
The Impala-an Open Source SQL Engine for Hadoop is an ideal course package for individuals who want to understand the basic concepts of Massively Parallel Processing or MPP SQL query engine that runs on Apache Hadoop. On completing this course, learners will be able to interpret the role of Impala in the Big Data Ecosystem.
The course focuses on the basics of Impala. It further provides an overview of the superior performance of Impala, against other popular SQL-on-Hadoop systems.
Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Impala ecosystem, learning topics such as:
-
Describe Impala and its role in Hadoop Eco-system
-
Explain how to query data using impala SQL
-
Discuss partitioning of Impala tables and explain its benefits
-
List the factors affecting the performance of Impala
-
Describe the complete flow of a SQL query execution in Impala
Outline:
1. An Introduction to Impala
An overview to the Impala
What is Impala?
The benefits of Impala
Exploratory Business Intelligence
The Impala Installation
Starting and Stopping Impala
Data Storage
Managing Metadata
Controlling Access to Data
Impala Shell Commands and Interface
2. Querying with Hive and Impala
Querying with Hive and Impala
SQL Language Statements
DDL Statements
CREATE the DATABASE
CREATE the TABLE
Internal and External Tables
Loading Data in Impala Table
The ALTER TABLE
The DROP TABLE
What is DROP DATABASE?
Describing the Statement
Explaining the Statement
SHOW the TABLE Statement
INSERT Statement SELECT Statement
Data Type
The Operators
About the Functions
The CREATE VIEW in Impala
Hive and Impala Query Syntax Impala
3. Data Storage and File Format
About the Data Storage and File Format
The Partitioning Tables
SQL Statements for Partitioned Tables
File Format and Performance Considerations
Choosing the File Type and Compression Technique
4. Working with the Impala
Working with the Impala
Know Impala Architecture
What is Impala Daemon?
About the Impala Statestore
Impala Catalog Service
Query Execution Flow in Impala
User - Defined Functions
Hive UDFs with Impala
Improving Impala Performance