Course Information

Hadoop Course Duration: 30 Hours

Hadoop Training Timings: Week days 1-2 Hours per day (or) Weekends: 2-3 Hours per day

Hadoop Training Method: Online/Classroom Training

Hadoop Study Material: Soft Copy

Course Content

The Motivation For Hadoop

  • Problems with traditional large-scale systems
  • Requirements for a new approach

Hadoop: Basic Concepts

  • What is Hadoop?
  • The Hadoop Distributed File System
  • How MapReduce Works
  • Anatomy of a Hadoop Cluster

Joining Data Sets in MapReduce Jobs

  • Map-Side Joins
  • Reduce-Side Joins

Programming Practices & Performance Tuning

  • Developing MapReduce Programs
    • Local Mode
    • Pseudo-distributed Mode
  • Monitoring and debugging on a Production Cluster
    • Counters
    • Skipping Bad Records
    • Rerunning failed tasks with Isolation Runner
  • Tuning for Performance
    • Reducing network traffic with combiner
    • Reducing the amount of input data
    • Using Compression
    • Reusing the JVM
    • Running with speculative execution
  • Refactoring code and rewriting algorithms Parameters affecting Performance
  • Other Performance Aspects
Hadoop with Analytics using R
  • Introduction to Big Data analytics
  • Use of  statistics over big data using R.
  • Introduction over R.
  • Using R, How to create API which will interact hadoop Ecosystem compoment.
  • Integration of Java,R,Hadoop,Hive etc.

Graph Manipulation in Hadoop

  • Introduction to graph techniques
  • Representing Graphs in Hadoop
  • Implementing a sample algorithm: Single Source Shortest Path

Writing a MapReduce Program

  • Examining a Sample MapReduce Program
  • Basic API Concepts
  • The Driver Code
  • Anatomy of File Read and Write
  • Basic Record Reader Anatomy
  • Input and Ouput Format class
  • The Mapper
  • The Reducer
  • Hadoop's Streaming API

Integrating Hadoop Into The Workflow

  • Relational Database Management Systems
  • Storage Systems
  • Importing Data from RDBMSs With Sqoop
  • Importing Real-Time Data with Flume

Delving Deeper Into The Hadoop API

  • Using Combiners
  • The configure and close Methods
  • SequenceFiles
  • Partitioners
  • Custom RecordReader
  • Custom Input and Output Class
  • Counters
  • Directly Accessing HDFS
  • Tool Runner
  • Using The Distributed Cache

Common MapReduce Algorithms

  • Sorting and Searching
  • Indexing
  • Classification/Machine Learning
  • Term Frequency - Inverse Document Frequency
  • Word Co-Occurrence

Using Hive and Pig

  • Hive Basics
  • Pig Basics

Debugging MapReduce Programs

  • Testing with MR Unit
  • Logging
  • Other Debugging Strategies

Advanced MapReduce Programming

  • A Recap of the MapReduce Flow
  • Custom Writables and WritableComparables
  • The Secondary Sort
  • Creating InputFormats and OutputFormats
  • Pipelining Jobs With Oozie

Request For Demo