Cloudera Training Partner Logo

Preparing with Cloudera Data Engineering

powered by Apache Spark, Hive, and Airflow

Cloudera Training Partner Logo

This hands-on course teaches the key concepts and skills developers need to develop high-performance, parallel applications on the Cloudera Data Platform (CDP) with Apache Spark.

Practical exercises allow you to practice writing Spark applications that integrate with CDP core components. You will learn how to use Spark SQL to query structured data, how to use Hive functions to ingest and denormalize data, and how to work with "big data" stored in a distributed file system.</p

After this course, you will be able to face real-world challenges. You will be able to create applications to make faster and better decisions. You will also be able to perform interactive analysis applied to a variety of use cases, architectures and industries.

Course Contents

  • HDFS Introduction
  • YARN Introduction
  • Working with RDDs
  • Working with DataFrames
  • Introduction to Apache Hive
  • Working with Apache Hive
  • Hive and Spark Integration
  • Distributed Processing Challenges
  • Spark Distributed Processing
  • Spark Distributed Persistence
  • Data Engineering Service
  • Workload XM
  • Appendix: Working with Datasets in Scala

E-Book Symbol You will receive the original course documentation by Cloudera in English language as an E-Book (pdf).

Request in-house training now

Target Group

This course is intended for developers and data engineers.

Knowledge Prerequisites

You are expected to have basic knowledge of Linux and basic knowledge of the programming languages Python or Scala. Basic knowledge of SQL is helpful. Previous knowledge of Spark and Hadoop is not required.</p

We also recommend our training courses in Programming languages and software development and Linux.

Course Objective

  • Distribute, store and process data in a CDP cluster
  • Write, configure and deploy Apache Spark applications
  • Use Spark interpreters and Spark applications to explore, process and analyze distributed data
  • Query data with Spark SQL, DataFrames and Hive tables
  • Deploying a Spark application on the Data Engineering Service

Classroom training

Do you prefer the classic training method? A course in one of our Training Centers, with a competent trainer and the direct exchange between all course participants? Then you should book one of our classroom training dates!

Online training

You wish to attend a course in online mode? We offer you online course dates for this course topic. To attend these seminars, you need to have a PC with Internet access (minimum data rate 1Mbps), a headset when working via VoIP and optionally a camera. For further information and technical recommendations, please refer to.

Tailor-made courses

You need a special course for your team? In addition to our standard offer, we will also support you in creating your customized courses, which precisely meet your individual demands. We will be glad to consult you and create an individual offer for you.
Request in-house training now
PDF SymbolYou can find the complete description of this course with dates and prices ready for download at as PDF.

This hands-on course teaches the key concepts and skills developers need to develop high-performance, parallel applications on the Cloudera Data Platform (CDP) with Apache Spark.

Practical exercises allow you to practice writing Spark applications that integrate with CDP core components. You will learn how to use Spark SQL to query structured data, how to use Hive functions to ingest and denormalize data, and how to work with "big data" stored in a distributed file system.</p

After this course, you will be able to face real-world challenges. You will be able to create applications to make faster and better decisions. You will also be able to perform interactive analysis applied to a variety of use cases, architectures and industries.

Course Contents

  • HDFS Introduction
  • YARN Introduction
  • Working with RDDs
  • Working with DataFrames
  • Introduction to Apache Hive
  • Working with Apache Hive
  • Hive and Spark Integration
  • Distributed Processing Challenges
  • Spark Distributed Processing
  • Spark Distributed Persistence
  • Data Engineering Service
  • Workload XM
  • Appendix: Working with Datasets in Scala

E-Book Symbol You will receive the original course documentation by Cloudera in English language as an E-Book (pdf).

Request in-house training now

Target Group

This course is intended for developers and data engineers.

Knowledge Prerequisites

You are expected to have basic knowledge of Linux and basic knowledge of the programming languages Python or Scala. Basic knowledge of SQL is helpful. Previous knowledge of Spark and Hadoop is not required.</p

We also recommend our training courses in Programming languages and software development and Linux.

Course Objective

  • Distribute, store and process data in a CDP cluster
  • Write, configure and deploy Apache Spark applications
  • Use Spark interpreters and Spark applications to explore, process and analyze distributed data
  • Query data with Spark SQL, DataFrames and Hive tables
  • Deploying a Spark application on the Data Engineering Service

Classroom training

Do you prefer the classic training method? A course in one of our Training Centers, with a competent trainer and the direct exchange between all course participants? Then you should book one of our classroom training dates!

Online training

You wish to attend a course in online mode? We offer you online course dates for this course topic. To attend these seminars, you need to have a PC with Internet access (minimum data rate 1Mbps), a headset when working via VoIP and optionally a camera. For further information and technical recommendations, please refer to.

Tailor-made courses

You need a special course for your team? In addition to our standard offer, we will also support you in creating your customized courses, which precisely meet your individual demands. We will be glad to consult you and create an individual offer for you.
Request in-house training now

PDF SymbolYou can find the complete description of this course with dates and prices ready for download at as PDF.