Tags

,

What is Big Data?

Big data is when the volume, velocity and/or variety of data gets to the point where it is too difficult or expensive for traditional systems to work with.

Big-Data-2

Volume:Data coming in from new sources as well as increased regulation in multiple areas means storing larger sets of data for longer periods of time.

Variety:Unstructured and semi-structured data is becoming as strategic as traditional structured data  and is growing at faster rates.

Velocity:Social media, RFID, machine data, etc. are needing to be ingested at speeds not even imagined a few years ago.

Market For Big data

The big data and analytics market will reach $125 billion worldwide in 2015, according to IDC.

The Big Data technology and services market represents a fast-growing multibillion-dollar worldwide opportunity. In fact, a recent IDC forecast shows that the Big Data technology and services market will grow at a 26.4% compound annual growth rate to $41.5 billion through 2018, or about six times the growth rate of the overall information technology market.

Big data revenue will reach $135 billion by the end of 2019

Big Data, New Data Types 

Sentiment
Understand how your customers feel about your brand and products 

Clickstream
Capture and analyze website visitors’ data trails and optimize your website

Sensor/Machine
Discover patterns in data streaming automatically from remote sensors and machines

Geographic
Analyze location-based data to manage operations where they occur

Server Logs
Research logs to diagnose process failures and prevent security breaches

Unstructured (txt, video, pictures, etc..)
Understand patterns in files across millions of web pages, emails, and documents

What is Hadoop

Hadoop is a High Performance Super Computer environment that is horizontally scalable with commodity hardware . Hadoop does parallel processing across data nodes on a highly available distributed file system.

Hadoop zekeriyabesiroglu

Compare Traditional Systems vs. Hadoop

Data Types Structured  & Multi and unstructured

Speed Read as Fast & Write as Fast

Schema Required on write & Required on read

For Use 

Interactive OLAP Analytics ,Complex ACID Transactions,Operational Data Store & Data Discovery,Processing unstructured data,Massive Storage/Processing

Hortonworks

Ekran Resmi 2015-03-12 21.13.48

Hadoop 2.6.0 is the fourth major release for the year 2014 in the hadoop-2.x
line, and brings a huge number of enhancements to the core platform – both HDFS
& YARN. This release has nearly 900 resolved issues:

• Hadoop Common: 231 JIRAs resolved

• Hadoop HDFS: 305 JIRAs resolved

• Hadoop YARN: 290 JIRAs resolved

• Hadoop MapReduce: 70 JIRAs resolved

Some highlights:

• Hadoop Common • HADOOP-10433 – Key management server (beta)

• HADOOP-10607 – Credential provider (beta)

• Hadoop HDFS

• Heterogeneous Storage Tiers – Phase 2

• HDFS-5682 – Application APIs for heterogeneous storage

• HDFS-7228 – SSD storage tier

• HDFS-5851 – Memory as a storage tier (beta)

• HDFS-6584 – Support for Archival Storage

• HDFS-6134 – Transparent data at rest encryption (beta)

• HDFS-2856 – Operating secure DataNode without requiring root access

• HDFS-6740 – Hot swap drive: support add/remove data node volumes without
restarting data node (beta)

• HDFS-6606 – AES support for faster wire encryption

Hadoop YARN

• YARN-896 – Support for long running services in YARN

• YARN-913 – Service Registry for applications

• YARN-666 – Support for rolling upgrades

• YARN-556 – Work-preserving restarts of ResourceManager

• YARN-1336 – Container-preserving restart of NodeManager

• YARN-796 – Support node labels during scheduling

• YARN-1051 – Support for time-based resource reservations in Capacity
Scheduler (beta)

• YARN-1492 – Global, shared cache for application artifacts (beta)

• YARN-1964 – Support running of applications natively in Docker containers
(alpha)

Source:Hortonworks,IDC,apache