Introduction to Hadoop Administration

Name: training4it.com
Address: 9913 Shelbyville Rd #200, Louisville, KY, 40223
Telephone: 502.265.3057

Apache Hadoop is an open source framework for creating reliable and distributable compute clusters. Hadoop provides an excellent platform (with other related frameworks) to process large unstructured or semi-structured data sets from multiple sources to dissect, classify, learn from and make suggestions for business analytics, decision support, and other advanced forms of machine intelligence.

Retail Price: $2,395.00

Next Date: Request Date

Course Days: 3

Request a Date

Request Custom Course

Course Objectives

Working within in an engaging, hands-on learning environment, guided by our expert team, attendees will learn to:

Understand the benefits of distributed computing
Understand the Hadoop architecture (including HDFS and MapReduce)
Define administrator participation in Big Data projects
Plan, implement, and maintain Hadoop clusters
Deploy and maintain additional Big Data tools (Pig, Hive, Flume, etc.)
Plan, deploy and maintain HBase on a Hadoop cluster
Monitor and maintain hundreds of servers
Pinpoint performance bottlenecks and fix them

Course Prerequisites

This is an introductory-level course designed to teach experienced systems administrators how to install, maintain, monitor, troubleshoot, optimize, and secure Hadoop. Previous Hadoop experience is not required.

Course Outline

Introduction

Hadoop history and concepts
Ecosystem
Distributions
High level architecture
Hadoop myths
Hadoop challenges (hardware / software)

Planning and installation

Selecting software and Hadoop distributions
Sizing the cluster and planning for growth
Selecting hardware and network
Rack topology
Installation
Multi-tenancy
Directory structure and logs
Benchmarking

HDFS operations

Concepts (horizontal scaling, replication, data locality, rack awareness)
Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
Health monitoring
Command-line and browser-based administration
Adding storage and replacing defective drives

MapReduce operations

Parallel computing before MapReduce: compare HPC versus Hadoop administration
MapReduce cluster loads
Nodes and Daemons (JobTracker, TaskTracker)
MapReduce UI walk through
MapReduce configuration
Job config
Job schedulers
Administrator view of MapReduce best practices
Optimizing MapReduce
Fool proofing MR: what to tell your programmers
YARN: architecture and use

Advanced topics

Hardware monitoring
System software monitoring
Hadoop cluster monitoring
Adding and removing servers and upgrading Hadoop
Backup, recovery, and business continuity planning
Cluster configuration tweaks
Hardware maintenance schedule
Oozie scheduling for administrators
Securing your cluster with Kerberos
The future of Hadoop

Sorry! It looks like we haven’t updated our dates for the class you selected yet. There’s a quick way to find out. Contact us at 502.265.3057 or email info@training4it.com

Request a Date