ExperTeach Networking Logo

Big Data

Insight into Hadoop and Other Frameworks

ExperTeach Networking Logo

For a long time since, the topic of Big Data has come of age. A long time ago, experience and information became capital assets of many companies, and in the process, the analysis and structuring of huge data volumes evolved into a company-critical factor. Anyone who understands trends and contexts faster than his or her competitors gains the competitive edge. For this reason, Big Data solutions have emerged everywhere in large numbers. The course at hand shows what's behind this hype, which technologies are applied, and how they work.

Course Contents

  • What's behind Big Data?
  • Application Scenarios for Big Data
  • Storage of Large Data Volumes in Distributed File Systems
  • MapReduce Procedure and CAP Theorem
  • NoSQL Databases
  • Software Solutions for Big Data Hadoop, Spark, and Flink
  • Data Analytics IT Architectures for Big Data
  • Hands-On Big Data and Data Analytics Preview

Print E-Book PDF Symbol You will receive the comprehensive documentation package of the ExperTeach Networking series – printed documentation, e-book, and personalized PDF! As online participant, you will receive the e-book and the personalized PDF.

Target Group

The course at hand is tailor-made for all those who want to design, evaluate and implement Big Data solutions.

Knowledge Prerequisites

Specific technical know-how is not required. Anyone who is interested in Big Data solutions in terms of application scenarios and technical implementation will benefit from this course.

1 What is Big Data?
1.1 The big mountain of data
1.2 Application areas of BIG DATA
1.3 The definition of Big Data: 3-5 "V "s
1.3.1 Volume
1.3.2 Velocity
1.3.3 Variety
1.3.4 The fourth V - Veracity
1.3.5 This is Big Data
1.4 The origins of Big Data
2 Big Data Basics
2.1 The BIG DATA value chain
2.2 Sources for BigData data analysis
2.3 The architecture
2.4 SQL: fixed, predefined table schemas
2.5 Normalization of tables
2.6 NoSQL
2.6.1 Key-Value Stores
2.6.2 In-Memory Key-Value Stores
2.6.3 Document Stores
2.6.4 Graph Databases
2.6.5 Column Stores
2.7 CAP Theorem
2.7.1 Combination CA of the CAP Theorem
2.7.2 Combination CP of the CAP Theorem
2.7.3 Combination AP of the CAP Theorem
3 Hadoop and Spark
3.1 Hadoop
3.2 MapReduce
3.2.1 Main Concepts - MapReduce
3.2.2 MapReduce - Data Flow
3.2.3 Example: Counting words
3.2.4 Map Reduce - Hints
3.3 HDFS
3.3.1 HDFS - Main components
3.3.2 HDFS - Architecture
3.4 YARN
3.5 Apache Spark
3.5.1 Resilient Distributed Dataset
3.5.2 Spark SQL
3.5.3 Spark Streaming
3.5.4 MLlib
3.5.5 Machine Learning
3.5.6 GraphX
4 Big Data Technologies
4.1 The Hadoop Ecosystem
4.2 Pig
4.3 Hive
4.4 Mahout
4.5 HBase
4.6 Sqoop
4.7 Flume
4.8 Chukwa
4.9 Nimble
4.10 Oozie
4.11 Zookeeper
4.12 Ambari
4.13 R Connector
4.14 Cassandra
4.15 SAP HANA
5 Application examples for Big Data
5.1 Limitations of classic analytical applications
5.2 Application scenarios for Big Data
5.2.1 Clickstream analysis
5.2.2 Sentiment analysis from social media
5.2.3 Analysis of log data
5.2.4 Analysis of sensor data
5.2.5 Analysis of texts
5.2.6 Analysis of video and voice data
5.2.7 Business Intelligence (BI) and Big Data
5.2.8 Hybrid solution with a data warehouse
5.3 Conclusion
6 Data Governance + Risks
6.1 The 3 pillars of data governance
6.2 What can I do to protect my data?
6.3 Risks
6.4 Data compliance risk
6.4.1 Nationally and in Europe
6.4.2 International
6.4.3 Social risk
6.5 Data risks
6.5.1 Security of data
6.5.2 Quality of data
6.6 Definition and leakage risk
6.7 Risk avoidance
6.7.1 Data factor
6.7.2 Data management factor
6.7.3 Factor Organization
6.7.4 Process factor
6.7.5 Factor customer as affected party
6.8 Challenges
7 Challenges in the operation of Big Data solutions
7.1 Where to start?
7.2 Operating Hadoop across the enterprise
7.2.1 Physical infrastructure
7.2.2 Data storage
7.2.3 Data access
7.2.4 Data integration
7.2.5 IT security
7.2.6 Other operational criteria
7.2.7 Economic criteria
7.3 Real-time analyses for streaming data
8 Outlook
8.1 Current status
8.2 Technical developments
8.3 Market developments
8.4 Business developments
8.5 Discussion of results
9 Hadoop Installation & Configuration & Go!
9.1 Installation Scheme for APACHE HADOOP 3.1.3
9.2 Hadoop 3.1.3 on Github
9.3 The Experteach Lab Environment
9.4 Customizing the configuration files
9.5 Overview of the file structures in the lab
9.6 First start of the HDFS
9.7 Syntax and flow of counting tasks
9.8 Output during the MAPREDUCE process
9.9 Hadoop Cockpit
9.10 Wordcount query via PIG:
9.11 RATING - Filtering Records (25M)

Classroom training

Do you prefer the classic training method? A course in one of our Training Centers, with a competent trainer and the direct exchange between all course participants? Then you should book one of our classroom training dates!

Hybrid training

Hybrid training means that online participants can additionally attend a classroom course. The dynamics of a real seminar are maintained, and the online participants are able to benefit from that. Online participants of a hybrid course use a collaboration platform, such as WebEx Training Center or Saba Meeting. To do this, a PC with browser and Internet access is required, as well as a headset and ideally a Web cam. In the seminar room, we use specially developed and customized audio- and video-technologies. This makes sure that the communication between all persons involved works in a convenient and fault-free way.

Online training

You wish to attend a course in online mode? We offer you online course dates for this course topic. To attend these seminars, you need to have a PC with Internet access (minimum data rate 1Mbps), a headset when working via VoIP and optionally a camera. For further information and technical recommendations, please refer to.

Tailor-made courses

You need a special course for your team? In addition to our standard offer, we will also support you in creating your customized courses, which precisely meet your individual demands. We will be glad to consult you and create an individual offer for you.
Request for customized courses
PDF SymbolYou can find the complete description of this course with dates and prices ready for download at as PDF.

For a long time since, the topic of Big Data has come of age. A long time ago, experience and information became capital assets of many companies, and in the process, the analysis and structuring of huge data volumes evolved into a company-critical factor. Anyone who understands trends and contexts faster than his or her competitors gains the competitive edge. For this reason, Big Data solutions have emerged everywhere in large numbers. The course at hand shows what's behind this hype, which technologies are applied, and how they work.

Course Contents

  • What's behind Big Data?
  • Application Scenarios for Big Data
  • Storage of Large Data Volumes in Distributed File Systems
  • MapReduce Procedure and CAP Theorem
  • NoSQL Databases
  • Software Solutions for Big Data Hadoop, Spark, and Flink
  • Data Analytics IT Architectures for Big Data
  • Hands-On Big Data and Data Analytics Preview

Print E-Book PDF Symbol You will receive the comprehensive documentation package of the ExperTeach Networking series – printed documentation, e-book, and personalized PDF! As online participant, you will receive the e-book and the personalized PDF.

Target Group

The course at hand is tailor-made for all those who want to design, evaluate and implement Big Data solutions.

Knowledge Prerequisites

Specific technical know-how is not required. Anyone who is interested in Big Data solutions in terms of application scenarios and technical implementation will benefit from this course.

1 What is Big Data?
1.1 The big mountain of data
1.2 Application areas of BIG DATA
1.3 The definition of Big Data: 3-5 "V "s
1.3.1 Volume
1.3.2 Velocity
1.3.3 Variety
1.3.4 The fourth V - Veracity
1.3.5 This is Big Data
1.4 The origins of Big Data
2 Big Data Basics
2.1 The BIG DATA value chain
2.2 Sources for BigData data analysis
2.3 The architecture
2.4 SQL: fixed, predefined table schemas
2.5 Normalization of tables
2.6 NoSQL
2.6.1 Key-Value Stores
2.6.2 In-Memory Key-Value Stores
2.6.3 Document Stores
2.6.4 Graph Databases
2.6.5 Column Stores
2.7 CAP Theorem
2.7.1 Combination CA of the CAP Theorem
2.7.2 Combination CP of the CAP Theorem
2.7.3 Combination AP of the CAP Theorem
3 Hadoop and Spark
3.1 Hadoop
3.2 MapReduce
3.2.1 Main Concepts - MapReduce
3.2.2 MapReduce - Data Flow
3.2.3 Example: Counting words
3.2.4 Map Reduce - Hints
3.3 HDFS
3.3.1 HDFS - Main components
3.3.2 HDFS - Architecture
3.4 YARN
3.5 Apache Spark
3.5.1 Resilient Distributed Dataset
3.5.2 Spark SQL
3.5.3 Spark Streaming
3.5.4 MLlib
3.5.5 Machine Learning
3.5.6 GraphX
4 Big Data Technologies
4.1 The Hadoop Ecosystem
4.2 Pig
4.3 Hive
4.4 Mahout
4.5 HBase
4.6 Sqoop
4.7 Flume
4.8 Chukwa
4.9 Nimble
4.10 Oozie
4.11 Zookeeper
4.12 Ambari
4.13 R Connector
4.14 Cassandra
4.15 SAP HANA
5 Application examples for Big Data
5.1 Limitations of classic analytical applications
5.2 Application scenarios for Big Data
5.2.1 Clickstream analysis
5.2.2 Sentiment analysis from social media
5.2.3 Analysis of log data
5.2.4 Analysis of sensor data
5.2.5 Analysis of texts
5.2.6 Analysis of video and voice data
5.2.7 Business Intelligence (BI) and Big Data
5.2.8 Hybrid solution with a data warehouse
5.3 Conclusion
6 Data Governance + Risks
6.1 The 3 pillars of data governance
6.2 What can I do to protect my data?
6.3 Risks
6.4 Data compliance risk
6.4.1 Nationally and in Europe
6.4.2 International
6.4.3 Social risk
6.5 Data risks
6.5.1 Security of data
6.5.2 Quality of data
6.6 Definition and leakage risk
6.7 Risk avoidance
6.7.1 Data factor
6.7.2 Data management factor
6.7.3 Factor Organization
6.7.4 Process factor
6.7.5 Factor customer as affected party
6.8 Challenges
7 Challenges in the operation of Big Data solutions
7.1 Where to start?
7.2 Operating Hadoop across the enterprise
7.2.1 Physical infrastructure
7.2.2 Data storage
7.2.3 Data access
7.2.4 Data integration
7.2.5 IT security
7.2.6 Other operational criteria
7.2.7 Economic criteria
7.3 Real-time analyses for streaming data
8 Outlook
8.1 Current status
8.2 Technical developments
8.3 Market developments
8.4 Business developments
8.5 Discussion of results
9 Hadoop Installation & Configuration & Go!
9.1 Installation Scheme for APACHE HADOOP 3.1.3
9.2 Hadoop 3.1.3 on Github
9.3 The Experteach Lab Environment
9.4 Customizing the configuration files
9.5 Overview of the file structures in the lab
9.6 First start of the HDFS
9.7 Syntax and flow of counting tasks
9.8 Output during the MAPREDUCE process
9.9 Hadoop Cockpit
9.10 Wordcount query via PIG:
9.11 RATING - Filtering Records (25M)

Classroom training

Do you prefer the classic training method? A course in one of our Training Centers, with a competent trainer and the direct exchange between all course participants? Then you should book one of our classroom training dates!

Hybrid training

Hybrid training means that online participants can additionally attend a classroom course. The dynamics of a real seminar are maintained, and the online participants are able to benefit from that. Online participants of a hybrid course use a collaboration platform, such as WebEx Training Center or Saba Meeting. To do this, a PC with browser and Internet access is required, as well as a headset and ideally a Web cam. In the seminar room, we use specially developed and customized audio- and video-technologies. This makes sure that the communication between all persons involved works in a convenient and fault-free way.

Online training

You wish to attend a course in online mode? We offer you online course dates for this course topic. To attend these seminars, you need to have a PC with Internet access (minimum data rate 1Mbps), a headset when working via VoIP and optionally a camera. For further information and technical recommendations, please refer to.

Tailor-made courses

You need a special course for your team? In addition to our standard offer, we will also support you in creating your customized courses, which precisely meet your individual demands. We will be glad to consult you and create an individual offer for you.
Request for customized courses

PDF SymbolYou can find the complete description of this course with dates and prices ready for download at as PDF.