Hadoop
- Introduction to Big Data
- Apache Hadoop Architecture
- HDFS
- YARN
- MapReduce
- Configuration of Apache Hadoop Clusters
- Loading Data in HDFS
- Hadoop web user interface
Spark
- Big Data Definition
- Introduction Apache Spark
- Install Setting
- Spark cluster mode overview
- Spark local host view
Spark RDD
- Spark RDD Basics
- RDD operations
- Spark lazy transformation
- Spark fault-tolerance
Spark DataFrame
- Spark DataFrame Basic
- Spark DataFrame Operation
- DataFrame Group By
- DataFrame Aggregate Functions
- Missing Data
- Dates and Timestamps
Machine learning in Spark
- Introduction of Machine Learning
- Regression Algorithm
- Classification Algorithm
- Clustering Algorithm
- Natural Language Processing