Apache Spark Big Data Boot Camp

Name: training4it.com
Address: 9913 Shelbyville Rd #200, Louisville, KY, 40223
Telephone: 502.265.3057

The speed, expanded versatility, and new access to powerful APIs and libraries make Apache Spark the undisputed new toolset for powering big data solutions with distributed cluster computing. Also, for the first time ever Spark gives applications the ability to support serious data science capabilities with R-type dataframes and big data streaming to overcome time constraints.

Retail Price: $1,595.00

Next Date: Request Date

Course Days: 3

Request a Date

Request Custom Course

About this Course

Audience Profile

Primary audience for this course are:

Developers and Team Leads
Software Engineers
Business Analysts
System Analysts
Data Analysts and Scientists
Data Scientists
Operations and DevOps Engineers
JAVA Developers
Big Data Engineers

At Course Completion

We will explore Apache Spark, how it came into existence, how it compares with Apache Hadoop – currently the de facto big data standard – and the new use cases that can be realized with Apache Spark as well as how your current use cases can be made more performant and powerful.
We will also look at Apache Spark’s Streaming Architecture which can help realize most of the real time-constrained needs of your business. We will also explore Apache Spark’s SQL Architecture which provides very fast migration from traditional slower analytical tools like Hive to SparkSQL.
We will spend some time on Apache Spark ML/ML Lib which provide a total integrated Architecture with both real-time and batch analytics.
Finally, we will also look at Apache Spark GraphX which deals with Graph Algorithms.

Prequisites

Labs can be accessed by everyone using the cloud environment set up by the instructor. Participation is not mandatory; if they prefer, attendees can simply observe the instructor perform the lab example. Scala/Python are a nice to have skill to better understand what is being done in the Labs.

Course Outline

Introduction to Big Data & Apache Spark

Introduce Data Analysis
Introduce Big Data
Big Data Definition
Introduce the techniques and challenges in Big Data
Introduce the techniques and challenges in Distributed Computing
Show how the functional programming approach is particularly useful in tackling these challenges
Short overview of previous solutions: Google’s MapReduce and Apache Hadoop
Introduce Apache Spark

Hands-on practice: We will get exposure to admin and setup

Deploying & Understanding Apache Spark Architecture

Spark Architecture in a Cluster
Spark Ecosystem and Cluster Management
Deploying Spark on a Cluster
Deploying Spark on a Standalone Cluster
Deploying Spark on a Mesos Cluster
Deploying Spark on YARN cluster
Cloud-based Deployment

Hands-on practice: Learn to deploy and begin using Spark

Spark Core, RDDs and Spark Shell

Dig deeper into Apache Spark
Introduce Resilient Distributed Datasets (RDDs)
Apache Spark installation (basic, local)
Introduce the Spark Shell
Actions and Transformations (Laziness)
Caching
Loading and Saving data files from the file system

Hands-on practice: Get hands-on with Spark Core and RDDs

4.Deep Dive into RDD

Tailored RDD
Pair RDD
NewHadoop RDD
Aggregations
Partitioning
Broadcast Variables
Accumulators

Hands-on practice: You’ll learn expanded RDD capabilities

5.Spark SQL and DataFrames

SparkSQL & DataFrames
DataFrame & SQL API
DataFrame Schema
Datasets and Encoders
Loading and Saving data
Aggregations
Joins

Hands-on practice: You’ll learn to use one of Spark’s most powerful features: DataFrames using R-style modeling supported by supercomputing clusters

6.Spark Streaming

Brief introduction to streaming
Spark Streaming
Discretized Streams
Structured Streaming
Stateful / Stateless Transformations
Checkpointing
Interoperability with Streaming Platforms (Apache Kafka)

Hands-on practice: Another of Spark 2.1’s most exciting features is the ability to provide big data streaming to allow beating the timeframe constraints of previous big data solutions

Spark MLlib and ML

Introduction to Machine Learning
Spark Machine Learning APIs
Feature Extractor and Transformation
Classification using Logistic Regression
Best Practice in ML for the Practitioners

Hands-on practice: Use Spark to perform production-friendly calls for powerful machine learning service and predictive analytics

Graphx

Brief Introduction to Graph Theory
GraphX
Vertex and Edge RDDs
Graph operators
Pregel API
PageRank / Travelling Salesman Problem *

Hands-on practice: Get hands-on practice using Graphx

Testing and Debugging Spark

Testing in a Distributed Environment
Testing Spark Application
Debugging Spark Application

Hands-on practice: You’ll get lab practice supporting Spark solutions with best practices for testing, debugging, and normal-day production issues for Spark solutions

Sorry! It looks like we haven’t updated our dates for the class you selected yet. There’s a quick way to find out. Contact us at 502.265.3057 or email info@training4it.com

Request a Date

Apache Spark Big Data Boot Camp

Retail Price: $1,595.00

Next Date: Request Date

Course Days: 3

About this Course

Audience Profile

At Course Completion

Prequisites

Course Outline

Introduction to Big Data & Apache Spark

Deploying & Understanding Apache Spark Architecture

Spark Core, RDDs and Spark Shell

4.Deep Dive into RDD

5.Spark SQL and DataFrames

6.Spark Streaming

Spark MLlib and ML

Graphx

Testing and Debugging Spark

Sorry! It looks like we haven’t updated our dates for the class you selected yet. There’s a quick way to find out. Contact us at 502.265.3057 or email info@training4it.com