SparkSession Example, using Java

Hi Java cum BigData Gurus,

Its been some time for me to post something here. Thanks for liking and commenting on my post about Spark cluster setup.

Today, we will look into executing a Spark Java WordCount example using maven.

To execute the code, you will need eclipse, and the code. Code is available on my git site : My Github Repo.    My git project contains some more examples too, feel free to explore / comment there. Just import the project as a maven project.

So, while searching through the examples, I came across Java programs using SparkContext object. But, this SparkSession is something new. One of the striking methods that I found was SparkSession.builder().getOrCreate();  This will create a session if it does not exist, or use the existing one. Thats something great, because Spark sessions are heavy objects. Also, another advantage is, SparkSession provides a uniform wrapping across all the data-access for Spark, may it be SparkSQL or text-file data or HDFS data etc. It works across all and hence I preferred to use it across.

To execute the code, you will need the following libraries in pom.xml

spark-core_2.10, Ver: 2.1.1
spark-sql_2.10, Ver: 2.1.1

Below is the code for a quick reference:


package my.sample.java.spark;

import java.util.Arrays;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

public class SparkWC {

SparkSession sparkSession;
public static void main(String[] args) { 

// Get an instance of spark-conf, required to build the spark session
SparkConf conf = new SparkConf();
// Set the master to local[2], you can set it to local
// Note : For setting master, you can also use SparkSession.builder().master()
// I preferred using config, which provides us more flexibility
conf.setMaster("local[2]");
// Create the session using the config above
// getOrCreate : Gets an existing SparkSession or, if there is no existing one,
// creates a new one based on the options set in this builder.
SparkSession session = SparkSession.builder().getOrCreate();
System.out.println("Session created");
// Read the input file, and create a RDD from it
JavaRDD<Row> lines = session.read().text("/tmp/README.md").javaRDD();
 // Split the lines into words
JavaRDD<String> words = lines.flatMap(s -> Arrays.asList(s.getString(0).split(" ")).iterator());
 // Print lines
System.out.println(lines.count());
// Print words
System.out.println(words.count()); 

}

}

Thats how you can use a SparkSession object for Java code.

This was a small post, I will keep on updating this post as I gather more and more uses about this interesting object.

Till then…

Alios,

Abhay Dandekar

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s