info@caliberacademy.in
(+91) 7993030801

Hadoop programming

img

Hadoop programming

Become an expert Hadoop programming, Master the elements of cluster monitoring, governance, security, and troubleshooting.

Learn Apache Hadoop from the leading hadoop expert. Sixty-five percent of the current Fortune 100 are using big data to drive their business.

You can too, by getting expert training through caliber integrated course —the industry's only truly dynamic Hadoop training curriculum that’s updated regularly to reflect the state-of-the-art in big data domain.

Core Java: Part 1

  • Introduction to Big Data
    • Rise of Big Data
    • Compare Hadoop vs traditonal systems
    • Hadoop Master-Slave Architecture
    • Understanding HDFS Architecture
    • NameNode, DataNode, Secondary Node
    • Learn about JobTracker, TaskTracker
  • HDFS and MapReduce Architecture
    • Core components of Hadoop
    • Understanding Hadoop Master-Slave Architecture
    • Learn about NameNode, DataNode, Secondary Node
    • Understanding HDFS Architecture
    • Anatomy of Read and Write data on HDFS
    • MapReduce Architecture Flow
    • JobTracker and TaskTracker
  • Hadoop Configuration
    • Hadoop Modes
    • Hadoop Terminal Commands
    • Cluster Configuration
    • Web Ports
    • Hadoop Configuration Files
    • Reporting, Recovery
    • MapReduce in Action
  • Understanding Hadoop MapReduce Framework
    • Overview of the MapReduce Framework
    • Use cases of MapReduce
    • MapReduce Architecture
    • Anatomy of MapReduce Program
    • Mapper/Reducer Class, Driver code
    • Understand Combiner and Partitioner
  • Advance MapReduce - Part 1
    • Write your own Partitioner
    • Writing Map and Reduce in Python
    • Map side/Reduce side Join
    • Distributed Join
    • Distributed Cache
    • Counters
    • Joining Multiple datasets in MapReduce
  • Advance MapReduce - Part 2
    • MapReduce internals
    • Understanding Input Format
    • Custom Input Format
    • Using Writable and Comparable
    • Understanding Output Format
    • Sequence Files
    • JUnit and MRUnit Testing Frameworks
  • Apache Pig
    • PIG vs MapReduce
    • PIG Architecture & Data types
    • PIG Latin Relational Operators
    • PIG Latin Join and Co Group
    • PIG Latin Group and Union
    • Describe, Explain, Illustrate
    • PIG Latin: File Loaders & UDF
  • Apache Hive and HiveQL
    • What is Hive
    • Hive DDL - Create/Show Database
    • Hive DDL - Create/Show/Drop Tables
    • Hive DML - Load Files & Insert Data
    • Hive SQL - Select, Filter, Join, Group By
    • Hive Architecture & Components
    • Difference between Hive and RDBMS
  • Advance HiveQL
    • Multi-Table Inserts
    • Joins
    • Grouping Sets, Cubes, Rollups
    • Custom Map and Reduce scripts
    • Hive SerDe
    • Hive UDF
    • Hive UDAF
  • Apache Flume, Sqoop, Oozie
    • Sqoop - How Sqoop works
    • Sqoop Architecture
    • Flume - How it works
    • Flume Complex Flow - Multiplexing
    • Oozie - Simple/Complex Flow
    • Oozie Service/ Scheduler
    • Use Cases - Time and Data triggers
  • NoSQL Databases
    • CAP theorem
    • RDBMS vs NoSQL
    • Key Value stores: Memcached, Riak
    • Key Value stores: Redis, Dynamo DB
    • Column Family: Cassandra, HBase
    • Graph Store: Neo4J
    • Document Store: MongoDB, CouchDB
  • Apache HBase
    • When/Why to use HBase
    • HBase Architecture/Storage
    • HBase Data Model
    • HBase Families/ Column Families
    • HBase Master
    • HBase vs RDBMS
    • Access HBase Data
  • Apache Zookeeper
    • What is Zookeeper
    • Zookeeper Data Model
    • ZNokde Types
    • Sequential ZNodes
    • Installing and Configuring
    • Running Zookeeper
    • Zookeeper use cases
  • Hadoop 2.0, YARN, MRv2
    • Hadoop 1.0 Limitations
    • MapReduce Limitations
    • HDFS 2: Architecture
    • HDFS 2: High availability
    • HDFS 2: Federation
    • YARN Architecture
    • Classic vs YARN
    • YARN multitenancy
    • YARN Capacity Scheduler
  • Project