Hadoop Training

This Hadoop training will start with basic Hadoop introduction like The Hadoop Distributed File System, How MapReduce Works etc. The it will teach you the concept like Joining Data Sets in MapReduce Jobs, Programming Practices & Performance Tuning. You will also learn Hadoop with Analytics using R. Further we will be Delving Deeper Into The Hadoop API, we will learn the concept Hive and Pig in Hadoop. We will also learn Advanced MapReduce Programming.

Few of the clients we have served across industries are:

DHL | PWC | ATOS | TCS | KPMG | Momentive | Tech Mahindra | Kellogg's | Bestseller | ESSAR | Ashok Leyland | NTT Data | HP | SABIC | Lamprell | TSPL | Neovia | NISUM and many more.

MaxMunus has successfully conducted 1000+ corporate training in India, Qatar, Saudi Arabia, Oman, Bangladesh, Bahrain, UAE, Egypt, Jordan, Kuwait, Srilanka, Turkey, Thailand, HongKong, Germany, France, Australia and USA.

Corporate Clients


Course Information

Hadoop Course Duration: 30 Hours

Hadoop Training Timings: Week days 1-2 Hours per day (or) Weekends: 2-3 Hours per day

Hadoop Training Method: Online/Classroom Training

Hadoop Study Material: Soft Copy

Course Content

The Motivation For Hadoop

  • Problems with traditional large-scale systems
  • Requirements for a new approach

Hadoop: Basic Concepts

  • What is Hadoop?
  • The Hadoop Distributed File System
  • How MapReduce Works
  • Anatomy of a Hadoop Cluster

Joining Data Sets in MapReduce Jobs

  • Map-Side Joins
  • Reduce-Side Joins

Programming Practices & Performance Tuning

  • Developing MapReduce Programs
    • Local Mode
    • Pseudo-distributed Mode
  • Monitoring and debugging on a Production Cluster
    • Counters
    • Skipping Bad Records
    • Rerunning failed tasks with Isolation Runner
  • Tuning for Performance
    • Reducing network traffic with combiner
    • Reducing the amount of input data
    • Using Compression
    • Reusing the JVM
    • Running with speculative execution
  • Refactoring code and rewriting algorithms Parameters affecting Performance
  • Other Performance Aspects
Hadoop with Analytics using R
  • Introduction to Big Data analytics
  • Use of  statistics over big data using R.
  • Introduction over R.
  • Using R, How to create API which will interact hadoop Ecosystem compoment.
  • Integration of Java,R,Hadoop,Hive etc.

Graph Manipulation in Hadoop

  • Introduction to graph techniques
  • Representing Graphs in Hadoop
  • Implementing a sample algorithm: Single Source Shortest Path

Writing a MapReduce Program

  • Examining a Sample MapReduce Program
  • Basic API Concepts
  • The Driver Code
  • Anatomy of File Read and Write
  • Basic Record Reader Anatomy
  • Input and Ouput Format class
  • The Mapper
  • The Reducer
  • Hadoop's Streaming API

Integrating Hadoop Into The Workflow

  • Relational Database Management Systems
  • Storage Systems
  • Importing Data from RDBMSs With Sqoop
  • Importing Real-Time Data with Flume

Delving Deeper Into The Hadoop API

  • Using Combiners
  • The configure and close Methods
  • SequenceFiles
  • Partitioners
  • Custom RecordReader
  • Custom Input and Output Class
  • Counters
  • Directly Accessing HDFS
  • Tool Runner
  • Using The Distributed Cache

Common MapReduce Algorithms

  • Sorting and Searching
  • Indexing
  • Classification/Machine Learning
  • Term Frequency - Inverse Document Frequency
  • Word Co-Occurrence

Using Hive and Pig

  • Hive Basics
  • Pig Basics

Debugging MapReduce Programs

  • Testing with MR Unit
  • Logging
  • Other Debugging Strategies

Advanced MapReduce Programming

  • A Recap of the MapReduce Flow
  • Custom Writables and WritableComparables
  • The Secondary Sort
  • Creating InputFormats and OutputFormats
  • Pipelining Jobs With Oozie

Request for demo