Popular Data Extraction Techniques
Now it is time for some coding and hands on visualization of extraction techniques in simple approaches. Remember that data extraction is important process but it should be done by proper permission and authorization to use third party data. Here we will show three most common extraction techniques in simple format.
Data extraction from CSV file
Most commonly used process of extraction and used by everyone no matter their work designation. This CSV file can contain various types of data including customer data, financial data, user satisfaction measure data and many more. In this approach we will Python Pandas module to load a CSV file and then extraction data based on a predefined column. Basically we will extract only that amount of data which will satisfy the predefined condition.
Python3
import pandas as pd from io import StringIO # Sample in-line CSV data for example purpose csv_data = """Name,Age,Occupation John,25,Engineer Jane,30,Teacher Bob,22,Student Alice,35,Doctor""" # Read the CSV data into a DataFrame df_csv = pd.read_csv(StringIO(csv_data)) # Extract data based on the 'Occupation' column engineers_data_csv = df_csv[df_csv[ 'Occupation' ] = = 'Engineer' ][[ 'Name' , 'Age' ]] # Display the extracted data print ( "Data from CSV:" ) print (engineers_data_csv) |
Output:
Data from CSV:
Name Age
0 John 25
So, we can see that based on the CSV data we have the correct output got printed.
Data extraction from Databases
This extraction process requires complete authorization and permission of the database owing organization. However most of the time hacker launch this extraction attack to retrieve sensitive information. We are not going perform any attack but simply we will visualize the basic process. Here we will use Sqlite3 module to create a in-memory database and then we will extract data using SQL query.
Python3
import sqlite3 # Create a SQLite in-memory database and insert sample data conn = sqlite3.connect( ':memory:' ) cursor = conn.cursor() cursor.execute( '''CREATE TABLE people (Name TEXT, Age INTEGER, Occupation TEXT)''' ) cursor.executemany( '''INSERT INTO people VALUES (?, ?, ?)''' , [('John ', 25, ' Engineer '), (' Jane ', 30, ' Teacher '), (' Bob ', 22, ' Student '), (' Alice ', 35, ' Doctor')]) conn.commit() # Extract data based on the 'Occupation' column using SQL query engineers_data_db = pd.read_sql_query( "SELECT Name, Age FROM people WHERE Occupation='Engineer'" , conn) # Display the extracted data print ( "\nData from SQLite Database:" ) print (engineers_data_db) |
Output:
Data from SQLite Database:
Name Age
0 John 25
So, we have successfully extracted the required data from the database. However, in real databases this process involves several steps and complex SQL queries.
Data Extraction using Web Scraping
Web Scraping is also a widely used data extraction technique. Here we will fetch the basic articles of w3wiki from URL. To do this we will use BeautifulSoup module to fetch the HTML structure of the website.
Python3
import requests from bs4 import BeautifulSoup # URL for web scraping # Send a GET request to the URL response = requests.get(url) # Check if the request was successful (status code 200) if response.status_code = = 200 : # Parse the HTML content soup = BeautifulSoup(response.text, 'html.parser' ) # Extract article titles article_titles = [title.text.strip() for title in soup.find_all( 'div' , class_ = 'head' )] # Adjust the class based on the website structure # Display the extracted data print ( "Article Titles from w3wiki:" ) for title in article_titles: print ( "- " + title) else : print ( "Failed to retrieve the webpage. Status code:" , response.status_code) |
Output:
Article Titles from w3wiki:
- Array Data Structure
- Tribal Leader Vishnu Deo Sai (Biography): New Chhattisgarh Chief Minister
- Protection Against False Allegations and Its Types
- Double Angle Formulas
- MariaDB CHECK Constraint
- Cristiano Ronaldo Net Worth 2024: Football Success and Endorsements
- PM Modi Proposes to host COP 33 Summit in 2028 in India
- Why Ghol fish Declared as State Fish of Gujarat?
- Neurosurgeon Salary in India
- How To Create A Newspaper In Google Docs
- How to Make a Calendar in Google Docs in 2024
- Contract Acceptance Testing (CAT) – Software Testing
- Control Variables in Statistics
- Software Quality Assurance Plan in Software Development
- UI/UX in Mobile Games
- Difference Between Spring Tides and Neap Tides
- Setting Up C Development Environment
- Sutherland Global Services for Customer Support Interview Experience
- Product-Market Fit : Definition, Importance and Example
- Hybridization of SF4
So, all article titles got printed as output. However, output may change time-to-time as per article inclusion or any change in web-page.
What is Data Extraction?
Extracting data is key in managing and analyzing information. As firms collect stacks of data from different places, finding important info becomes crucial. We gather specific info from different places like databases, files, websites, or APIs to analyze and process it better. Doing this helps us make smart decisions and understand things better.
In this article, we will discuss various aspects of data extraction, its process, Benefits, Types of data extraction, and the Future of Data Extraction.
Table of Content
- What is Data Extraction
- Data Extraction process
- Types of Data Extraction
- Benefits of Data Extraction
- Popular Data Extraction Techniques
- Benefits of Data Extraction Tools
- Relation between Data Extraction and ETL