How to Read and Write CSV File in Scala?

Data processing and analysis in Scala mostly require dealing with CSV (Comma Separated Values) files. CSV is a simple file format used to store tabular data, such as a spreadsheet or database. Each line of a CSV file is plain text, representing a data row, with values separated by commas (,). Reading from and writing to CSV files are common tasks across several programming situations. This article focuses on discussing steps to read and write CSV files in Scala.

Table of Content

  • Setting Up the Environment
  • Using Java.io.PrintWriter Class
  • Using Scala-csv Library
  • Conclusion
  • FAQs

Setting Up the Environment

To work with CSV files in Scala, you need to set up your development environment. Ensure you have Scala installed and a build tool like SBT (Scala Build Tool).

1. Create a new project using SBT:

sbt new scala/scala-seed.g8

2. Add necessary dependencies in the build.sbt:

Add the scala-csv library dependency in your build.sbt file.

libraryDependencies += “com.github.tototoshi” %% “scala-csv” % “1.3.10”

It’ll look like,

import Dependencies._

ThisBuild / scalaVersion := “2.13.12”
ThisBuild / version := “0.1.0-SNAPSHOT”
ThisBuild / organization := “com.example”
ThisBuild / organizationName := “example”

lazy val root = (project in file(“.”))
.settings(
name := “CsvWork”,
libraryDependencies += munit % Test,
libraryDependencies += “com.github.tototoshi” %% “scala-csv” % “1.3.10”
)

// See https://www.scala-sbt.org/1.x/docs/Using-Sonatype.html for instructions on how to publish to Sonatype.

Using Java.io.PrintWriter Class

Writing a CSV File

Data will be written to a CSV file using the java.io.PrintWriter class. The data will be written row by row, with each field separated by a comma. The data will be in form of strings and first row will be the header of the data.

Functions:

  1. new PrintWriter(new File(filename)): Opens or creates a file for writing.
  2. writer.println(data): Writes a line of data to the file.

Below is the Scala program to write CSV File:

Scala
import java.io.PrintWriter
object WriterExample1 {
  def main(args: Array[String]): Unit = {
   val filename = "outputs.csv"
    val writer = new PrintWriter(filename)
    writer.println("Name, Age, City")
    writer.println("John, 30, New York")
    writer.println("Alice, 25, London")
    writer.close()
  }
}

Explanation:

  1. Import important libraries: Scala’s java.io.PrintWriter class can be used to write to files.
  2. Create the CSV file: Use new PrintWriter(new File(“output.csv”)) to open or create a new CSV file.
  3. Write data to the file: Use println or write methods to write data to the file.

Output:

Write CSV File in Scala

Reading CSV In Scala

We will read a CSV file using scala.io.Source. The file will be read line by line, and each line will be sperate into fields using the comma.

Functions:

  1. Source.fromFile(filename): Opens the file for reading.
  2. file.getLines(): Reads the file line by line.
  3. line.split(delimiter): Splits each line into fields based on the delimiter.

Below is the Scala program to read a CSV File:

Scala
import scala.io.Source

object ReaderExample1 {
  def main(args: Array[String]): Unit = {
    val filename = "output.csv"
    val delimiter = ","
    val file = Source.fromFile(filename)
    for (line <- file.getLines()) {
        val fields = line.split(delimiter).map(_.trim)
        println(fields.mkString(", "))
    }
    file.close()
  }
}

Explanation:

  1. Import the required libraries: In Scala, you can use the scala.io.Source library to read files.
  2. Open CSV file: Use Source.fromFile(“output.csv”) to open the CSV file.
  3. Read the data: Split each line using the delimiter and process the data as needed.

Output:

Read CSV File in Scala

Using Scala-csv Library

Writing CSV in Scala

Data will be written to a CSV file using the scala-csv library. The data will be in the form of a list of maps with each map representing individual rows. The headers, taken from the keys in the first map, shall precede all other records on file.

Approach:

  1. Import the necessary libraries.
  2. Open the CSV file using CSVWriter.
  3. Define the data as a list of maps.
  4. Extract headers from the data.
  5. Convert the data to a sequence of sequences.
  6. Write the headers and data to the CSV file.
  7. Close the writer.

Below is the Scala program to write CSV file:

Scala
import java.io.File
import com.github.tototoshi.csv._

object WriterExample2 {
  def main(args: Array[String]): Unit = {
    val writer = CSVWriter.open(new File("output.csv"))
    val data = List(
      Map("Name" -> "John", "Age" -> "30", "Country" -> "USA"),
      Map("Name" -> "Anna", "Age" -> "28", "Country" -> "UK")
    )
    val headers = data.head.keys.toSeq
    val rows = data.map(_.values.toSeq)
    writer.writeRow(headers)
    writer.writeAll(rows)
    writer.close()
  }
}

Output:

Write CSV File in Scala

Reading CSV in Scala

We will use the scala-csv library to read data from a CSV file and print it to the console. The data will be read as a list of maps where each map represents a row with column headers as keys.

Approach:

  1. Import the necessary libraries.
  2. Open the CSV file using CSVReader.
  3. Read all rows with headers using allWithHeaders().
  4. Print each row.
  5. Close the reader.

Below is the Scala program to read a CSV file:

Scala
import java.io.File
import com.github.tototoshi.csv._

object ReaderExample2 {
  def main(args: Array[String]): Unit = {
    val reader = CSVReader.open(new File("output.csv"))
    val allRows = reader.allWithHeaders()
    allRows.foreach(println)
    reader.close()
  }
}

Output:

Read CSV File in Scala

Conclusion

The scala-csv library is an efficient way of reading and writing CSV files in Scala. By following the steps given above, it is easy to add the operations on CSV file into your Scala applications. This feature is important in ETL processes, data science and other domains that involve data manipulation.

FAQs

1. How can I handle large CSV files in Scala?

For large CSV files, consider using Apache Spark with Scala. Spark’s DataFrame API optimizes performance for handling big datasets.

2. Can I read/write CSV files with different delimiters?

Yes, by overriding the delimiter property in DefaultCSVFormat you can specify custom delimiter.

3. How do I handle missing values in a CSV file?

Scala’s collection methods allow us to either filter out rows that have missing values or substitute them with some default values before writing or processing the data.

4. Is it possible to append data to an existing CSV file?

Yes, you can open a CSVWriter in append mode by specifying the append parameter:

val writer = new File(“path/to/file.csv”).openCSVWriter(true)

5. What if my CSV file has a header row?

The scala-csv’s allWithHeaders reads the header row automatically and maps each column into its corresponding value in every row.