Building Sample RDD

Let us build a sample rdd to print from in Scala.

Scala
import org.apache.spark.sql.SparkSession

val spark: SparkSession = SparkSession.builder().master("local[1]").getOrCreate()

val rdd=spark.sparkContext.parallelize(Seq(("Tutorials", "Print Rdd"),
  ("Language", "Scala"), ("Platform", "Gfg")))

Here we have just a simple rdd and filled in some values.

How to Print RDD in scala?

Scala stands for scalable language. It was developed in 2003 by Martin Odersky. It is an object-oriented language that provides support for functional programming approach as well. Everything in scala is an object e.g. – values like 1,2 can invoke functions like toString(). Scala is a statically typed language although unlike other statically typed languages like C, C++, or Java, it doesn’t require type information while writing the code. The type verification is done at the compile time. Static typing allows to building of safe systems by default. Smart built-in checks and actionable error messages, combined with thread-safe data structures and collections, prevent many tricky bugs before the program first runs.

Table of Content

  • Understanding RDD and Spark
  • Building Sample RDD
  • How to Print RDD in Scala?
  • Conclusion

Similar Reads

Understanding RDD and Spark

Before building an RDD, let’s take a brief introduction about it. An RDD is the base object of Spark Language. Spark is used to develop distributed products i.e. a code that can be run on many machines at the same time. The main purpose of such products is to process large data for business analysis. The RDD is a collection of partitioned elements that can be operated in parallel. RDD stands for Resilient Distributed Dataset. Resilient means that the data structure will persist even after any failure that could result in data loss like a power outage. Distributed means that the processing of large datasets will be broken into smaller chunks to process. The RDD has now become an old API of the Spark Language, as its successors like DataFrame and DataSet have come up which are more optimized and provide type-safety to build better code....

Building Sample RDD

Let us build a sample rdd to print from in Scala....

How to Print RDD in Scala?

Method 1: Using collect...

Conclusion

We can print the rdd using alot of methods. The first being the collect method which accumulates all the data into a single array and returns it. The second is the foreach method which loops over the entire rdd and prints each row one by one. The last one being the toDF function which converts the rdd to a dataframe and then we can use the show function to display the dataframe. The show function is very powerful and can display the dataframe in a variety of ways using its arguments....