How To Load Csv File In Jupyter Notebook?

Loading a CSV (Comma Separated Values) file in Jupyter Notebook allows you to work with tabular data conveniently. This process is fundamental for data analysis and manipulation tasks.

To load a CSV file in Jupyter Notebook, we can use the pandas library, which provides easy-to-use functions for reading and manipulating tabular data. Let’s delve into the article with Step-by-Step Guide:

Load the CSV file – Standard Pandas Operation (pd.read_csv)

  • Use the pd.read_csv() function to load your CSV file.
  • You’ll need to provide the path to your CSV file as an argument. If the CSV file is in the same directory as your notebook, you can just provide the filename.

The Python code snippet utilizes the pandas library to read a CSV file dataset and load its contents into a DataFrame.

Python
import pandas as pd
df = pd.read_csv('zomato.csv')
df.head()

Output:

Traditional Method (pd.read_csv):

Handling Unicode Error

Sometimes, when working with CSV files, you may encounter a Unicode error, especially if the file contains characters that are not in the standard ASCII character set. To handle this error, we can try different encoding options until we find the one that works.

Below is the snippet of Unicode error encountered while loading a CSV file. Below, you can see the error message indicating the UnicodeError and the line of code where the error occurred.

Output:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-6-4108123d2b33> in <cell line: 2>()
      1 import pandas as pd
----> 2 df=pd.read_csv('/content/zomato.csv')

10 frames
/usr/local/lib/python3.10/dist-packages/pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 7044: invalid continuation byte

UnicodeError occurs when there is an issue with encoding or decoding Unicode data. This can happen when the default encoding used by Python’s read_csv function does not match the encoding of the CSV file, especially when dealing with characters outside the ASCII range.

How to Handle Unicode Error?

To handle this error, one common approach is to specify the correct encoding parameter when using the read_csv function. In the article, the encoding parameter we will use, encoding=’latin-1′ is used.

Python
import pandas as pd
df= pd.read_csv('/content/zomato.csv',encoding='latin-1')
df.head()

Output:

Handling Unicode Error

However, one can try Different Encodings: Modify your code to try different encoding options when reading the CSV file. Common encoding options include ‘utf-8′, ‘utf-16’, ‘latin-1’, and ‘cp1252’.

If the CSV file is in a different directory, you’ll need to provide the full path to the file:

df = pd.read_csv('/path/to/your/file/your_file.csv')

Conclusion

Unlock the prowess of Pandas for seamless CSV file handling in Jupyter Notebook.