What is Big Data?
Big data is when the volume, velocity and/or variety of data gets to the point where it is too difficult or expensive for traditional systems to work with.
Volume:Data coming in from new sources as well as increased regulation in multiple areas means storing larger sets of data for longer periods of time.
Variety:Unstructured and semi-structured data is becoming as strategic as traditional structured data and is growing at faster rates.
Velocity:Social media, RFID, machine data, etc. are needing to be ingested at speeds not even imagined a few years ago.
Market For Big data
The big data and analytics market will reach $125 billion worldwide in 2015, according to IDC.
The Big Data technology and services market represents a fast-growing multibillion-dollar worldwide opportunity. In fact, a recent IDC forecast shows that the Big Data technology and services market will grow at a 26.4% compound annual growth rate to $41.5 billion through 2018, or about six times the growth rate of the overall information technology market.
Big data revenue will reach $135 billion by the end of 2019
Big Data, New Data Types
Sentiment
Understand how your customers feel about your brand and products
Clickstream
Capture and analyze website visitors’ data trails and optimize your website
Sensor/Machine
Discover patterns in data streaming automatically from remote sensors and machines
Geographic
Analyze location-based data to manage operations where they occur
Server Logs
Research logs to diagnose process failures and prevent security breaches
Unstructured (txt, video, pictures, etc..)
Understand patterns in files across millions of web pages, emails, and documents
What is Hadoop
Hadoop is a High Performance Super Computer environment that is horizontally scalable with commodity hardware . Hadoop does parallel processing across data nodes on a highly available distributed file system.
Compare Traditional Systems vs. Hadoop
Data Types Structured & Multi and unstructured
Speed Read as Fast & Write as Fast
Schema Required on write & Required on read
For Use
Interactive OLAP Analytics ,Complex ACID Transactions,Operational Data Store & Data Discovery,Processing unstructured data,Massive Storage/Processing
Hortonworks
Hadoop 2.6.0 is the fourth major release for the year 2014 in the hadoop-2.x
line, and brings a huge number of enhancements to the core platform – both HDFS
& YARN. This release has nearly 900 resolved issues:
• Hadoop Common: 231 JIRAs resolved
• Hadoop HDFS: 305 JIRAs resolved
• Hadoop YARN: 290 JIRAs resolved
• Hadoop MapReduce: 70 JIRAs resolved
Some highlights:
• Hadoop Common • HADOOP-10433 – Key management server (beta)
• HADOOP-10607 – Credential provider (beta)
• Hadoop HDFS
• Heterogeneous Storage Tiers – Phase 2
• HDFS-5682 – Application APIs for heterogeneous storage
• HDFS-7228 – SSD storage tier
• HDFS-5851 – Memory as a storage tier (beta)
• HDFS-6584 – Support for Archival Storage
• HDFS-6134 – Transparent data at rest encryption (beta)
• HDFS-2856 – Operating secure DataNode without requiring root access
• HDFS-6740 – Hot swap drive: support add/remove data node volumes without
restarting data node (beta)
• HDFS-6606 – AES support for faster wire encryption
Hadoop YARN
• YARN-896 – Support for long running services in YARN
• YARN-913 – Service Registry for applications
• YARN-666 – Support for rolling upgrades
• YARN-556 – Work-preserving restarts of ResourceManager
• YARN-1336 – Container-preserving restart of NodeManager
• YARN-796 – Support node labels during scheduling
• YARN-1051 – Support for time-based resource reservations in Capacity
Scheduler (beta)
• YARN-1492 – Global, shared cache for application artifacts (beta)
• YARN-1964 – Support running of applications natively in Docker containers
(alpha)
Source:Hortonworks,IDC,apache