Course Information
Bigdata & Hadoop Course Duration: 30 Hours
Bigdata & Hadoop Training Method: Classroom Training
Bigdata & Hadoop Study Material: Soft Copy
Course Content
Module 1 –Introduction to Data Ware Housing & Business Intelligence
- What is Data Warehouse ?
- Data Warehouse Architecture
- Data Warehouse Vs Data Mart
- OLTP Vs OLAP
- Data Modeling
- Relational
- Dimensional
- Star Schema / Snowflake Schema
- Normalization
- Data Normalization
- Data De-Normalization
- Dimension Table
- Categories – Normal & Confirmed Dimension
- Slowly Changing Dimension - Type 1, Type 2 & Type 3
- Level & Hierarchy
- Fact Table
- Categories - Summary / Aggregations table
- Type
- Additive
- Semi-Additive
- Non-Additive
- Real Time Data ware housing - Change Data capture
- What is Business Intelligence?
Module 2 – Introduction to Big Data & Hadoop
- What is Big Data ?
- Limitations and Solutions of existing Data Analytics Architecture
- Hadoop&Hadoop Features
- Hadoop Ecosystem
- Hadoop 2.x core components
- Hadoop Storage: HDFS, Hadoop Processing: MapReduce Framework
- Anatomy of File Write and Read Awareness
Module 3 – Hadoop Architecture,Installation, Setup& Configuration
- Hadoop 2.x Cluster Architecture - Federation and High Availability
- Hadoop Cluster Modes
- Common Hadoop Shell Commands
- Hadoop 2.x Configuration Files
- Hadoop Job Processes
- MapReduce Job Execution
Module 4 –Hadoop MapReduce & YARN Architecture &Framework
- MapReduce Framework
- Traditional way Vs MapReduce way
- Hadoop 2.x MapReduce Architecture&Components
- YARN Architecture, Components &Workflow
- Anatomy of MapReduce Program
- MapReduce Program
Module 5 –Sqoop & Flume
- What is Sqoop ?
- Sqoop Installations and Basics
- Importing Data from RDBMS / MySQL to HDFS
- Exporting Data from HDFS to RDBMS / MySQL
- Parallelism
- Importing data from RDBMS / MySQL to Hive
- What is Flume ?
- Flume Model and Goals
- Features of Flume
Module 6 – Pig
- What is Pig ?
- MapReduce Vs Pig
- Pig Use Cases
- Programming Structure in Pig
- Pig Running Modes
- Pig Components
- Pig Execution
- Pig Data Types
- Relational &Group Operators, File Loaders, Union &Joins, Diagnostic Operators& UDF
Module 7 –Hive
- What is Hive ?
- Hive Vs Pig
- Hive Architecture and Components its Limitations
- Metastore in Hive
- Comparison with Traditional Database
- Hive Data Types, Data Models,Partitions and Buckets
- Hive Tables (Managed Tables and External Tables)
- Importing, Querying Data & Managing Outputs
- Hive Script & UDF
Module 8 –Hbase
- HBase Data Model
- HBase Shell
- HBase Client API
- Data Loading Techniques
- ZooKeeper Data Model
- Zookeeper Service
- Zookeeper
- Data Handling
- HBase Filters
Module 9 –Spark
- What is Spark?
- What is Spark Architecture & Components
- Spark Algorithms-Iterative Algorithms, Graph Analysis, Machine Learning
- Spark Core
- Spark Libraries
- Spark Demo
Module 10 –Big Data & Hadoop 10 Project – Sales Analytics
- Towards the end of the course, you will work on a LIVE project where you will be using Sqoop, Flume, PIG, HIVE, Hbase, MapReduce& Spark to perform Big Data Analytics
- You will use the industry-specific Big Data case studies that are included in our Big Data and Hadoop
- You will gain in-depth experience in working with Hadoop & Big Data
- Understand your sales pipeline and uncover what can lead to successful sales opportunities and better anticipate performance gap
- Review product-related information like Cost, Revenue, Price, etc. across Years and Ordering Method. This dataset could also be used in the Explore feature to better understand the hidden trends & patterns