Hadoop Training

Hadoop Training

Hadoop was initially inspired by papers published by Google outlining its approach to handling an avalanche of data, and has since become the de facto standard for storing, processing and analyzing hundreds of terabytes, and even petabytes of data.

Course Information

Hadoop Course Duration: 30 Hours

Hadoop Training Timings: Week days 1-2 Hours per day (or) Weekends: 2-3 Hours per day

Hadoop Training Method: Online/Classroom Training

Hadoop Study Material: Soft Copy

Course Content

The Motivation For Hadoop 

oProblems with traditional large-scale systems

oRequirements for a new approach

Hadoop: Basic Concepts 

oWhat is Hadoop?

oThe Hadoop Distributed File System

oHow MapReduce Works

oAnatomy of a Hadoop Cluster

Joining Data Sets in MapReduce Jobs 

oMap-Side Joins

oReduce-Side Joins

Programming Practices & Performance Tuning

oDeveloping MapReduce Programs

Local Mode

Pseudo-distributed Mode

oMonitoring and debugging on a Production Cluster


Skipping Bad Records

Rerunning failed tasks with Isolation Runner

oTuning for Performance

Reducing network traffic with combiner

Reducing the amount of input data

Using Compression

Reusing the JVM

Running with speculative execution

oRefactoring code and rewriting algorithms Parameters affecting Performance

oOther Performance Aspects

Hadoop with Analytics using R

oIntroduction to Big Data analytics

oUse of  statistics over big data using R.

o Introduction over R.

oUsing R, How to create API which will interact hadoop Ecosystem compoment.

oIntegration of Java,R,Hadoop,Hive etc.

Graph Manipulation in Hadoop 

oIntroduction to graph techniques

oRepresenting Graphs in Hadoop

oImplementing a sample algorithm: Single Source Shortest Path

Writing a MapReduce Program 

oExamining a Sample MapReduce Program

oBasic API Concepts

oThe Driver Code

oAnatomy of File Read and Write

oBasic Record Reader Anatomy

oInput and Ouput Format class

oThe Mapper

oThe Reducer

oHadoop's Streaming API

Integrating Hadoop Into The Workflow 

oRelational Database Management Systems

oStorage Systems

oImporting Data from RDBMSs With Sqoop

oImporting Real-Time Data with Flume

Delving Deeper Into The Hadoop API 

oUsing Combiners

oThe configure and close Methods



oCustom RecordReader

oCustom Input and Output Class


oDirectly Accessing HDFS


oUsing The Distributed Cache

Common MapReduce Algorithms 

oSorting and Searching


oClassification/Machine Learning

oTerm Frequency - Inverse Document Frequency

oWord Co-Occurrence

Using Hive and Pig 

oHive Basics

oPig Basics

Debugging MapReduce Programs 

oTesting with MRUnit


oOther Debugging Strategies

Advanced MapReduce Programming 

oA Recap of the MapReduce Flow

oCustom Writables and WritableComparables

oThe Secondary Sort

oCreating InputFormats and OutputFormats

oPipelining Jobs With Oozie

Key Features

Instructor Led Hadoop Online Training

Flexible Time At Your Convenience

Over 1,00,000+ Professionals Trained Across 100 Countries

24x7 Live Support via Chat, Mail and Phone

Corporate Training and On-Job Support

Request for demo