BDA - Big Data And Analytics Capsule/Notes


BDA - Big Data And Analytics Capsule/Notes

Big data analytics notes/capsule.

Topic covered:

  1. Top Amazing Facts.
  2. Three Characteristics of Big Data.`
  3. Big Data Sources.
  4. What is Hadoop?
  5. What is Hadoop used for?
  6. Which companies are using Hadoop?
  7. What is HDFS?
    1. HDFS Architecture.
  8. What is PIG.
  9. What is Hive.
  10. What is MapReduce.
  11. Why Sqoop?
  12. What is Sqoop

1. Top Amazing Facts

  • Over 90% of all the info within the world was created within the past 2 years.
  • Every minute we send 204+ million emails, generate 1.8+ million facebook likes,send 278+thousand tweets and upload 200,000+ photos to facebook.
  • Google alone processes on the average over 40+ thousand search queries per second.
  • Big data could be subsequent big thing within the IT world.
  • First organisations to embrace it were online startup firms.Firms like Google,eBay,Linkedin, and facebook were built around big data from the start .
  • Do you know that company named Walmart, it handles more than 1+ million customer transactions every hour.

2. Three Characteristics of Big Data.

  • Volume 
    • Data quality
  • Velocity
    • Data speed
  • Variety
    • Data types

3. Big Data Sources.

  • Users
  • Application 
  • Systems
  • Sensors
  • Or Mobile Devices,Microphones,Readers/Scanners,Science Facilities,Programs/Software,Social Media,Cameras.

4. What is Hadoop?

  • Hadoop is a very flexible and available architecture for large scale computation and data processing on a network of commodity hardware.

5. What's Hadoop used for?

  • Searching,
  • Log processing,
  • Recommendation systems,
  • Analytics,
  • Video and image analysis,
  • Data retention.

6. Which companies are using Hadoop?

  • Amazon/A9,Facebook,Google,IBM,Blue Cloud,Joost,,New York Times,PowerSet,Veoh,Yahoo.

7. What is HDFS?

  • A distributed file(filing) system that runs on large clusters of commodity machines.

7.1. HDFS Architecture(Official image release by Apache Hadoop)

HDFS Architecture: Source Hadoop apache.

8. What is Pig?

  • A pig is a data flow language(DFL) and execution environment for exploring very big datasets.
  • Pig runs on HDFS and MapReduce clusters

9. What is hive?

  • A distributed data warehouse. Hive manages the data stored in HDFS(Hadoop Distributed File System) and provides a query language based on SQL (and which is translated by the runtime engine to MapReduce jobs) for querying the data.

10.What is MapReduce?

  • Map reduces is a programming model for data processing.
  • It was first introduced at Google.
  • MapReduce works by breaking the processing into two part(phases) i.e : the map phase and the reduce phase. Each phase has the key-value pairs as input and output, the types of which may be chosen by the programmer. The programmer also specifies two functions: the map function and the reduce function.

11. Why Sqoop?

  • SQL Servers are already deployed opulent worldwide.
  • Nightly processing is done on SQL servers for years.
  • As Hadoop making ways into enterprise, there was a need to move certain part of data from traditional SQL DB (RD) to Hadoop.
  • Transferring data using scripts is inefficient and time Consuming.
  • Traditional DB have already got reporting, data visualization etc. applications built in enterprise
  • Bringing processed data from Hadoop to those application is the needed.

12. What is Sqoop ?

  • Sqoop is a advance "tool" designed to transfer data between Hadoop and relational databases.
  • You can use it to import data from a relational database (RDB) such as SQL or MysQL or Oracle into the Hadoop Distributed File System (HDFS)
  • Transform data in Hadoop with MapReduce or Hive.
  • Export data back into RDB. 
  • Sqoop (SQL-to-Hadoop) is a big data tool that offers the capability to extract data from non-Hadoop data stores, transform the data into a form usable by Hadoop, and then load the data(info) into HDFS. This process is named ETL, for Extract, Transform, and Load.
***Thanks for reading***

To bookmark this page,select on add to homescreen from your chrome browser.

Source: Oreily Hadoop


Popular posts from this blog

How to set image in carousel using flask?

Invalid syntax , perhaps you forgot a comma? Error in Python

Cyber Security Capsule