Big Data Analytics with Hadoop

Apache Hadoop is the most popular platform for big data processing. Hadoop combined with a host of other big data tools to build powerful analytics solutions. Big Data Analytics with Hadoop 3 provides participants with insight into these tools and software benefits and uses through the exploration of practical examples and case studies. The course reviews Hadoop 3’s latest features, taking a detailed look at HDFS, MapReduce, and YARN, and how they enable faster, more efficient big data processing. The course then explores how to integrate Hadoop with the open source tools - such as Python and R – used to analyze and visualize data and perform statistical computing on big data. Participants will also learn how to use Hadoop 3 with Apache Spark and Apache Flink for real-time data analytics and stream processing. Attendees will exit this course with the skills required to use Hadoop to build analytics solutions on the cloud, working in an end-to-end pipeline to perform big data analysis using practical use cases. Students will be well-versed with the analytical capabilities of the Hadoop ecosystem, and able to build powerful solutions to perform big data analytics and get insight effortlessly. Working in a hands-on learning environment, students will: Learn Hadoop 3 to build effective big data analytics solutions on-premise and on cloud Integrate Hadoop with other big data tools such as R, Python, Apache Spark, and Apache Flink Exploit big data using Hadoop 3 with real-world examples

Retail Price: $2,347.80

Next Date: Request Date

Course Days: 3


Request a Date

Request Custom Course


Course Overview

Attendees will exit this course with the skills required to use Hadoop to build analytics solutions on the cloud, working in an end-to-end pipeline to perform big data analysis using practical use cases. Students will be well-versed with the analytical capabilities of the Hadoop ecosystem, and able to build powerful solutions to perform big data analytics and get insight effortlessly.

Working in a hands-on learning environment, students will:

  • Learn Hadoop 3 to build effective big data analytics solutions on-premise and on cloud
  • Integrate Hadoop with other big data tools such as R, Python, Apache Spark, and Apache Flink
  • Exploit big data using Hadoop 3 with real-world examples

 

Course Objectives

This course is approximately 50% hands-on, combining expert lecture, real-world demonstrations and group discussions with machine-based practical labs and exercises.  Our engaging instructors and mentors are highly experienced practitioners who bring years of current "on-the-job" experience into every classroom

Working in a hands-on learning environment, led by our expert instructor, students will learn to:

  • Define Big Data and Identify Big Data Use Cases
  • Review Big Data Management Architecture and Engineered Systems
  • Describe Integrated Big Data Solution and its components.
  • Examine MapReduce programs and balance MapReduce jobs
  • Use NoSQL Database
  • Use XQuery for Hadoop
  • Install, use, and administer the Big Data Appliance
  • Provide data security and enable resource management
  • Examine MapReduce programs and balance MapReduce jobs
  • Use the BigDataLite Virtual Machine
  • Use the Hadoop Distributed File System (HDFS)to store, distribute, and replicate data across the nodes in the Hadoop cluster.
  • Acquire big data using the HDFS Command Line Interface, Flume, and NoSQL Database.
  • Use MapReduce and YARN for distributed processing of the data stored in the Hadoop cluster.
  • Process big data using MapReduce, YARN, Hive, Pig, XQuery for Hadoop, Solr, and Spark.
  • Integrate big data and warehouse data using Sqoop, Big Data Connectors, Copy to BDA, Big Data SQL, Data Integrator, and GoldenGate.
  • Analyze big data using Big Data SQL, Advanced Analytics technologies, and Big Data Discovery.
  • Use and manage Big Data Appliance.
  • Secure your data.
  • Understand Big Data Cloud Service: Key Features & Benefits

 

Course Prerequisites

This course is geared for attendees who wish to use Integrated Big Data Solution to acquire, process, integrate and analyze big data. Attendees should possess the following incoming skills:

  • Basic to Intermediate IT Skills, and Big Data knowledge
  • Good foundational mathematics or logic skills
  • Basic Linux skills, including familiarity with command-line options such as ls, cd, cp, and su

Outline

  1. Introduction to Hadoop
  • Hadoop Distributed File System
  • MapReduce framework
  • YARN
  • Other changes
  • Installing Hadoop 3
  1. Overview of Big Data Analytics
  • Introduction to data analytics
  • Introduction to big data
  • Distributed computing using Apache Hadoop
  • The MapReduce framework
  • Hive
  • Apache Spark
  • Visualization using Tableau
  1. Big Data Processing with MapReduce
  • The MapReduce framework
  • MapReduce job types
  • MapReduce patterns
  1. Scientific Computing and Big Data Analysis with Python and Hadoop
  • Installation
  • Data analysis
  1. Statistical Big Data Computing with R and Hadoop
  • Introduction
  • Methods of integrating R and Hadoop
  • Data analytics
  1. Batch Analytics with Apache Spark
  • SparkSQL and DataFrames
  • DataFrame APIs and the SQL API
  • Schema – structure of data
  • Loading datasets
  • Saving datasets
  • Aggregations
  • Joins
  1. Real-Time Analytics with Apache Spark
  • Streaming
  • Spark Streaming
  • fileStream
  • Transformations
  • Checkpointing
  • Driver failure recovery
  • Interoperability with streaming platforms (Apache Kafka)
  • Handling event time and late date
  • Fault-tolerance semantics
  1. Batch Analytics with Apache Flink
  • Introduction to Apache Flink
  • Installing Flink
  • Using the Flink cluster UI
  • Batch analytics
  1. Stream Processing with Apache Flink
  • Introduction to streaming execution model
  • Data processing using the DataStream API
  1. Visualizing Big Data
  • Tableau
  • Chart types
  • Using Python to visualize data
  • Using R to visualize data
  • Big data visualization tools
  1. Introduction to Cloud Computing
  • Concepts and terminology
  • Goals and benefits
  • Risks and challenges
  • Roles and boundaries
  • Cloud characteristics
  • Cloud delivery models
  • Cloud deployment models
  1. Using Amazon Web Services
  • Amazon Elastic Compute Cloud
  • Launching multiple instances of an AMI
  • What is AWS Lambda?
  • Introduction to Amazon S3
  • Amazon DynamoDB
  • Amazon Kinesis Data Streams
  • AWS Glue
  • Amazon AMR


Sorry! It looks like we haven’t updated our dates for the class you selected yet. There’s a quick way to find out. Contact us at 502.265.3057 or email info@training4it.com


Request a Date