Working with Apache Hive
Hive is the de-facto standard for data warehousing Hadoop. This course starts with a Hive setup and operations and continues into advanced Hive uses. It also discusses performance and execution engines while ending with a practical workshop.
WHAT YOU'LL LEARN
Join an engaging hands-on learning environment, where you’ll learn:
- Hive basics and features
- How to process, transform, and manage data
- Processing and performance management
- How to setup a date warehouse with Hive
- Data query and analysis
WHO SHOULD ATTEND?
Data Scientists, Software Engineers, Developers, and Administrators
PREREQUISITES
Before attending this course, you should:
- Be familiar with SQL
- Be able to navigate the Linux command line
- Have basic knowledge of command line Linux editors (VI/nano)
COURSE OUTLINE
Hive Basics
- Defining Hive Tables
- SQL Queries over Structured Data
- Filtering / Search
- Aggregations / Ordering
- Partitions
- Joins
- Text Analytics (Semi-Structured Data)
Hive Advanced
- Transformation, Aggregation
- Working with Dates, Timestamps, and Arrays
- Converting Strings to Date, Time, and Numbers
- Create new Attributes, Mathematical Calculations, Windowing Functions
- Use Character and String Functions
- Binning and Smoothing
- Processing JSON Data
- Execution Engines (Tez, MR, and Spark)
Impala (for Cloudera track)
- Architecture
- Impala joins and other SQL specifics
Bonus Project
- Students will work in teams to do this end-to-end workshop
- Setup a data warehouse with Hive
- Query and analyze data with Hive and Spark
Sorry! It looks like we haven’t updated our dates for the class you selected yet. There’s a quick way to find out. Contact us at 502.265.3057 or email info@training4it.com
Request a Date