Apache Spark, Resilent Distributed Dataset RDD.

Apache Spark is a fast, general engine for large scale data processing on a  cluster. Advantages of Spark High level programming framework Write applications quickly in  Scala, Python or Java. Cluster computing Combine SQL, streaming, and complex analytics Distributed storage Data in memory Easier Development Near real time processing In-Memory Data Storage We can use … Continue reading Apache Spark, Resilent Distributed Dataset RDD.