Merge Data Frames

The merge() function in pandas is used for all standard database join operations. Merge operation on data frames will join two data frames based on their common column values. Let’s create a data frame.

Python3




#Creating dataframe1
df1 = pd.DataFrame({
        'Name':['Jeevan', 'Raavan', 'Geeta', 'Bheem'],
        'Age':[25, 24, 52, 40],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']})
df1


Output:

     Name  Age Qualification
0 Jeevan 25 Msc
1 Raavan 24 MA
2 Geeta 52 MCA
3 Bheem 40 Phd

Now we will create another data frame.

Python3




#Creating dataframe2
df2 = pd.DataFrame({'Name':['Jeevan', 'Raavan', 'Geeta', 'Bheem'],
                    'Salary':[100000, 50000, 20000, 40000]})
df2


Output:

     Name  Salary
0 Jeevan 100000
1 Raavan 50000
2 Geeta 20000
3 Bheem 40000

Now. let’s merge these two data frames created earlier.

Python3




#Merging two dataframes
df = pd.merge(df1, df2)
df


Output:

     Name  Age Qualification  Salary
0 Jeevan 25 Msc 100000
1 Raavan 24 MA 50000
2 Geeta 52 MCA 20000
3 Bheem 40 Phd 40000

Data Processing with Pandas

Data Processing is an important part of any task that includes data-driven work. It helps us to provide meaningful insights from the data. As we know Python is a widely used programming language, and there are various libraries and tools available for data processing.

In this article, we are going to see Data Processing in Python, Loading, Printing rows and Columns, Data frame summary, Missing data values Sorting and Merging Data Frames, Applying Functions, and Visualizing Dataframes.

Table of Content

  • What is Data Processing in Python?
  • What is Pandas?
  • Loading Data in Pandas DataFrame
  • Printing rows of the Data
  • Printing the column names of the DataFrame
  • Summary of Data Frame
  • Descriptive Statistical Measures of a DataFrame
  • Missing Data Handing
  • Sorting DataFrame values
  • Merge Data Frames
  • Apply Function
  • By using the lambda operator
  • Visualizing DataFrame
  • Conclusion

Similar Reads

What is Data Processing in Python?

Data processing in Python refers to manipulating, transforming, and analyzing data by using Python. It contains a series of operations that aim to change raw data into structured data. or meaningful insights. By converting raw data into meaningful insights it makes it suitable for analysis, visualization, or other applications.Python provides several libraries and tools that facilitate efficient data processing, making it a popular choice for working with diverse datasets....

What is Pandas?

Pandas is a powerful, fast, and open-source library built on NumPy. It is used for data manipulation and real-world data analysis in Python. Easy handling of missing data, Flexible reshaping and pivoting of data sets, and size mutability make pandas a great tool for performing data manipulation and handling the data efficiently....

Loading Data in Pandas DataFrame

Reading CSV file using pd.read_csv and loading data into a data frame. Import pandas as using pd for the shorthand. You can download the data from here....

Printing rows of the Data

...

Printing the column names of the DataFrame

By default, data_frame.head() displays the first five rows and data_frame.tail() displays last five rows. If we want to get first ‘n’ number of rows then we use, data_frame.head(n) similar is the syntax to print the last n rows of the data frame....

Summary of Data Frame

...

Descriptive Statistical Measures of a DataFrame

Python3 # Program to print all the column name of the dataframe print(list(data_frame.columns))...

Missing Data Handing

...

Sorting DataFrame values

The functions info() prints the summary of a DataFrame that includes the data type of each column, RangeIndex (number of rows), columns, non-null values, and memory usage....

Merge Data Frames

...

Apply Function

The describe() function outputs descriptive statistics which include those that summarize the central tendency, dispersion, and shape of a dataset’s distribution, excluding NaN values. For numeric data, the result’s index will include count, mean, std, min, and max as well as lower, 50, and upper percentiles. For object data (e.g. strings), the result’s index will include count, unique, top, and freq....

By using the lambda operator

...

Visualizing DataFrame

Find missing values in the dataset...

Conclusion

...