Apache Flink is an open source platform for distributed stream and batch data processing
Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink builds batch processing on top of the streaming engine, overlaying native iteration support, managed memory, and program optimization. This 5 day course covers every technical aspect a developer, architect and DevOps individual will need to know to install, administrate, develop toward, manage and monitor every capability of this 4th generation, distributed data-flow Apache Software Project (ASF).
By 4th generation it is meant that Flink is not just the culmination of the ideas and functions for data flow that developers have had to assemble from predecessor Apache projects such as Spark and Kafka, it is a much more powerful and performant complement or successor to both of those projects. Flink, in fact, changes the very meaning of ’data flow’ and ’infinite’ versus
’finite’ data sources.
In addition, Flink makes batch and micro-batch processing simple subclasses of true streaming. For those software engineers who use imperative or functional languages, Flink supports Python, Java and Scala APIs. For those developers who work with a tabular data set visualization and SQL, Flink provides a 100% SQL interface. As with the Flink programming APIs, Flink SQL can be used for batch, micro-batch and pure streaming processing. Flink allows the use of the same programming paradigm for data flow and data analysis with finite data sets, infinite data sets, heterogeneous data sets, batch, micro-batch and streaming data.
This course will present all essential concepts, libraries and techniques, in a complete hands-on environment, for understanding, creating and supporting Flink and Flink-ecosystem-based applications.
50% Lecture, 50% Lab
- Day 1: Introduction to Flink concepts, ecosystem, use cases
- Day 2: Application development with Flink
- Day 3: Extending Flink into the Flink ecosystem
- Day 4: DevOps, installation options, deployment and monitoring
- Day 5: Performance enhancement practices with Flink and Flink ecosystem
We believe the audience for this class will be bifurcated into two types of software engineers. First, those Java or Scala software engineers, with minimal knowledge of Spark and Kafka, who must quickly generate rigorous, extensible, enterprise-level applications reliant upon a distributed data flow topology.
Second, those software engineers who have worked with Java, the Spark API and the Kafka API who desire to understand how the Flink functionality and performance complements or supersedes the functionality offered by Spark and Kafka. Companies like Alibaba, Capital One, Ericsson, Netflix and Uber consider Spark and Kafka to be 3rd generation and Flink 4th generation in their capabilities.