Working with Apache Hive

Hive is the de-facto standard for data warehousing Hadoop. This course starts with a Hive setup and operations and continues into advanced Hive uses. It also discusses performance and execution engines while ending with a practical workshop.

Retail Price: $1,795.00

Next Date: Request Date

Course Days: 2


Request a Date

Request Custom Course


WHAT YOU'LL LEARN

Join an engaging hands-on learning environment, where you’ll learn:

  • Hive basics and features
  • How to process, transform, and manage data
  • Processing and performance management
  • How to setup a date warehouse with Hive
  • Data query and analysis

WHO SHOULD ATTEND?

Data Scientists, Software Engineers, Developers, and Administrators

PREREQUISITES

Before attending this course, you should:

  • Be familiar with SQL
  • Be able to navigate the Linux command line
  • Have basic knowledge of command line Linux editors (VI/nano)

COURSE OUTLINE

Hive Basics

  • Defining Hive Tables
  • SQL Queries over Structured Data
  • Filtering / Search
  • Aggregations / Ordering
  • Partitions
  • Joins
  • Text Analytics (Semi-Structured Data)

Hive Advanced

  • Transformation, Aggregation
  • Working with Dates, Timestamps, and Arrays
  • Converting Strings to Date, Time, and Numbers
  • Create new Attributes, Mathematical Calculations, Windowing Functions
  • Use Character and String Functions
  • Binning and Smoothing
  • Processing JSON Data
  • Execution Engines (Tez, MR, and Spark)

Impala (for Cloudera track)

  • Architecture
  • Impala joins and other SQL specifics

Bonus Project

  • Students will work in teams to do this end-to-end workshop
  • Setup a data warehouse with Hive
  • Query and analyze data with Hive and Spark


Sorry! It looks like we haven’t updated our dates for the class you selected yet. There’s a quick way to find out. Contact us at 502.265.3057 or email info@training4it.com


Request a Date