Hadoop
Register for Free Demo
Introduction to BIGDATA and HADOOP
- What is Big Data?
- What is Hadoop?
- Relation between Big Data and Hadoop.
- What is the need of going ahead with Hadoop?
- Scenarios to apt Hadoop Technology in REAL TIME Projects
- Challenges with Big Data
- Storage
- Processing
- How Hadoop is addressing Big Data Changes
- Comparison with Other Technologies
- Different Components of Hadoop Echo System
- Storage Components
- Processing Components
- Importance of Hadoop Echo System Components
HDFS (Hadoop Distributed File System)
- What is a Cluster Environment?
- Cluster Vs Hadoop Cluster.
- Significance of HDFS in Hadoop
- Features of HDFS
- Storage aspects of HDFS
- Block
- How to Configure block size?
- Default Vs Configurable Block size
- Why HDFS Block size so large?
- Design Principles of Block Size
HDFS Architecture - 5 Daemons of Hadoop
- NameNode and its functionality
- DataNode and its functionality
- JobTracker and its functionality
- TaskTrack and its functionality
- Secondary Name Node and its functionality.
Replication in Hadoop – Fail Over Mechanism
- Data Storage in Data Nodes
- Fail Over Mechanism in Hadoop – Replication
- Replication Configuration
- Custom Replication
- Design Constraints with Replication Factor Can we change the replication factor in
- Hadoop?
- Can we change the block size for a file or directory in Hadoop?
- Accessing HDFS
- CLI (Command Line Interface) and HDFS Commands
- Configuration files in Hadoop Installation and the Purpose
- How to & Where to Configure Hadoop Daemons in a Hadoop Cluster?
- Name Node HA (High Availability in Hadoop 2.X.X)
MapReduce
- Why Map Reduce is essential in Hadoop?
- Processing Daemons of Hadoop
- Job Tracker
- Roles of Job Tracker
- Drawbacks w.r.to Job Tracker failure in Hadoop Cluster
- How to configure Job Tracker in Hadoop Cluster?
- Task Tracker
- Roles of Task Tracker
- Drawbacks w.r.to Task Tracker Failure in Hadoop Cluster
Input Split
- Input Split
- Need of Input Split in Map Reduce
- Input Split Size
- Input Split Size Vs Block Size
- Input Split Vs Mappers
Map Reduce Life Cycle
- Communication Mechanism of Job Tracker & Task Tracker
- Input Format Class
- Record Reader Class
- Success Case Scenarios
- Failure Case Scenarios
- Retry Mechanism in Map Reduce
- Map Reduce Programming Model
- Different phases of Map Reduce Algorithm
- Different Data types in Map Reduce
- Primitive Data Types Vs Map Reduce Data types
- How to write a basic Map Reduce Program?
- Driver Code
- Mapper Code
- Reducer Code
- Driver Code
- Importance of Driver Code in a Map Reduce program
- How to Identify the Driver Code in Map Reduce program?
- Different sections of Driver code
- Mapper Code
- Importance of Mapper Phase in Map Reduce
- How to Write a Mapper Class?
- Methods in Mapper Class
- Reducer Code
- Importance of Reduce phase in Map Reduce
- How to Write Reducer Class?
- Methods in Reducer Class
IDENTITY MAPPER & IDENTITY REDUCER
Input Format’s in Map Reduce
- TextInputFormat
- KeyValueTextInputFormat
- NLineInputFormat
- DBInputFormat
- SequenceFileInputFormat.
- How to use the specific input format in Map Reduce?
- How to write Custom Input Format Class and Custom Record Reader
Output Format’s in Map Reduce
- TextOutputFormat
- KeyValueTextOutputFormat
- NLineOutputFormat
- DBOutputFormat
- SequenceFileOutputFormat.
- How to use the specific Output format in Map Reduce?
- How to write Custom Output Format Class and Custom Record Writer
- Map Reduce API (Application Programming Interface)
- New API
- Deprecated API
- Combiner in Map Reduce
- Is combiner mandate in Map Reduce
- How to use the combiner class in Map Reduce?
- Performance tradeoffs w.r.to Combiner
- Real Time Use Cases
- Where to Use & Where Not to Use Combiner
Apache PIG
- Introduction to Apache Pig
- Map Reduce Vs Apache Pig
- SQL Vs Apache Pig
- Different datatypes in Pig
- Where to Use Map Reduce and PIG in REAL Time Hadoop Projects
- Modes of Execution in Pig
- Local Mode
- Map Reduce OR Distributed Mode
- Execution Mechanism
- Grunt Shell
- Script
- Transformations in Pig
- How to write a simple pig script?
- How to develop the Complex Pig Script?
- Bags, Tuples and fields in PIG
- UDFs in Pig
- Need of using UDFs in PIG
- How to use UDFs
- REGISTER Key word in PIG
HIVE
- Hive Introduction
- Need of Apache HIVE in Hadoop
- When to choose PIG & HIVE in REAL Time Project
- Hive Architecture
- Driver
- Compiler
- Executor (Semantic Analyzer)
- Meta Store in Hive
- Importance of Hive Meta Store
- Embedded metastore configuration
- External metastore configuration
- Communication mechanism with Metastore
- Hive Integration with Hadoop
- Hive Query Language (Hive QL)
- SQL VS Hive QL
- Data Slicing Mechanisms
- Partitions in Hive
- Buckets in Hive
- Partitioning Vs Bucketing
- Real Time Use Cases
- User Defined Functions(UDFs) in HIVE
- UDFs
- UDAFs
- UDTFs
- Need of UDFs in HIVE
- HIVE – HBASE Integration
SQOOP
- Introduction to Sqoop.
- MySQL client and Server Installation
- How to connect to Relational Database using Sqoop
- Different Sqoop Commands
- Different flavors of Imports
- Export
- Hive-Imports
Hbase
- Hbase introduction
- HDFS Vs Hbase
- Hbase Vs RDBMS
- Hbase Vs NO SQL
- Hbase usecases
- Hbase Data modeling Elements
- Column families
- Column Qualifier Name
- Row Key
- Hbase Architecture
- Clients
- REST
- Thrift
- Java Based
- Avro
- Map Reduce Integration
- Map Reduce over Hbase
- Hbase Admin
- Schema Definition
- Basic CRUD Operations
- Client Side Buffering in Hbase
Hadoop Administration
Hadoop Single Node Cluster Set Up (Hands on Installation on Laptops)
- Operating System Installation
- JDK Installation
- SSH Configuration.
- Dedicated Group & User Creation
- Hadoop Installation
- Different Configuration Files Setting
- Name node format
- Starting the Hadoop Daemons
Multi Node Hadoop Cluster Set Up (Hands on Installation on Laptops)
- Network related settings
- Hosts Configuration
- Password less SSH Communication
- Hadoop Installation
- Configuration Files Setting
- Name Node Format
- Starting the Hadoop Daemons
PIG Installation (Hands on Installation on Laptops)
- Local Mode
- Clustered Mode
- Bashrc file configuration
SQOOP Installation (Hands on Installation on Laptops)
- Sqoop installation with MySQL Client
HIVE Installation (Hands on Installation on Laptops)
- Local Mode
- Clustered Mode
Input Format’s in Map Reduce
- TextInputFormat
- KeyValueTextInputFormat
- NLineInputFormat
- DBInputFormat
- SequenceFileInputFormat.
- How to use the specific input format in Map Reduce?
- How to write Custom Input Format Class and Custom Record Reader
Output Format’s in Map Reduce
- TextOutputFormat
- KeyValueTextOutputFormat
- NLineOutputFormat
- DBOutputFormat
- SequenceFileOutputFormat.
- How to use the specific Output format in Map Reduce?
- How to write Custom Output Format Class and Custom Record Writer
- Map Reduce API (Application Programming Interface)
- New API
- Deprecated API
- Combiner in Map Reduce
- Is combiner mandate in Map Reduce
- How to use the combiner class in Map Reduce?
- Performance tradeoffs w.r.to Combiner
- Real Time Use Cases
- Where to Use & Where Not to Use Combiner
Apache PIG
- Introduction to Apache Pig
- Map Reduce Vs Apache Pig
- SQL Vs Apache Pig
- Different datatypes in Pig
- Where to Use Map Reduce and PIG in REAL Time Hadoop Projects
- Modes of Execution in Pig
- Local Mode
- Map Reduce OR Distributed Mode
- Execution Mechanism
- Grunt Shell
- Script
- Transformations in Pig
- How to write a simple pig script?
- How to develop the Complex Pig Script?
- Bags, Tuples and fields in PIG
- UDFs in Pig
- Need of using UDFs in PIG
- How to use UDFs
- REGISTER Key word in PIG
HIVE
- Hive Introduction
- Need of Apache HIVE in Hadoop
- When to choose PIG & HIVE in REAL Time Project
- Hive Architecture
- Driver
- Compiler
- Executor (Semantic Analyzer)
- Meta Store in Hive
- Importance of Hive Meta Store
- Embedded metastore configuration
- External metastore configuration
- Communication mechanism with Metastore
- Hive Integration with Hadoop
- Hive Query Language (Hive QL)
- SQL VS Hive QL
- Data Slicing Mechanisms
- Partitions in Hive
- Buckets in Hive
- Partitioning Vs Bucketing
- Real Time Use Cases
- User Defined Functions(UDFs) in HIVE
- UDFs
- UDAFs
- UDTFs
- Need of UDFs in HIVE
- HIVE – HBASE Integration
SQOOP
- Introduction to Sqoop.
- MySQL client and Server Installation
- How to connect to Relational Database using Sqoop
- Different Sqoop Commands
- Different flavors of Imports
- Export
- Hive-Imports
Hbase
- Hbase introduction
- HDFS Vs Hbase
- Hbase Vs RDBMS
- Hbase Vs NO SQL
- Hbase usecases
- Hbase Data modeling Elements
- Column families
- Column Qualifier Name
- Row Key
- Hbase Architecture
- Clients
- REST
- Thrift
- Java Based
- Avro
- Map Reduce Integration
- Map Reduce over Hbase
- Hbase Admin
- Schema Definition
- Basic CRUD Operations
- Client Side Buffering in Hbase
Hadoop Administration
Hadoop Single Node Cluster Set Up (Hands on Installation on Laptops)
- Operating System Installation
- JDK Installation
- SSH Configuration.
- Dedicated Group & User Creation
- Hadoop Installation
- Different Configuration Files Setting
- Name node format
- Starting the Hadoop Daemons
Multi Node Hadoop Cluster Set Up (Hands on Installation on Laptops)
- Network related settings
- Hosts Configuration
- Password less SSH Communication
- Hadoop Installation
- Configuration Files Setting
- Name Node Format
- Starting the Hadoop Daemons
PIG Installation (Hands on Installation on Laptops)
- Local Mode
- Clustered Mode
- Bashrc file configuration
SQOOP Installation (Hands on Installation on Laptops)
- Sqoop installation with MySQL Client
HIVE Installation (Hands on Installation on Laptops)
- Local Mode
- Clustered Mode