BeautifulSoup – Modifying the tree
Prerequisites: BeautifulSoup
Beautifulsoup is a Python library used for web scraping. This powerful python tool can also be used to modify html webpages. This article depicts how beautifulsoup can be employed to modify the parse tree. BeautifulSoup is used to search the parse tree and allow you to modify the tree. You can rename tag, change the values of its attributes, add and delete attribute.
Modifying the name of the tag and its attributes
You can change the name of the tag and modify its attribute by adding or deleting them.
- To change tag name:
Syntax: tag.name = “new_tag”
- To modify its attribute or to add new attribute:
Syntax: tag[“attribute”] = “value”
- To delete any attribute:
Syntax: del tag[“attribute”]
A tree can also be modified by inserting new elements at required places.
- insert() function will insert new element at any position
Syntax: tag.insert()
- insert_after() function will insert element after something in the parse tree.
Syntax: tag.insert_after()
- insert_before() function will insert element before something in the parse tree.
Syntax: tag.insert_before()
Approach :
- Import module
- Scrap data from webpage
- Parse the string scraped to html
- Select tag within which modification has to be performed
- Make required changes
Example 1:
Python3
# importing module from bs4 import BeautifulSoup markup = """<p class="para">gfg</p> """ # parsering string to HTML soup = BeautifulSoup(markup, 'html.parser' ) # extracting a tag tag = soup.p print ( "Before modifying the tag name: " ) print (tag) print () # modifying tag name tag.name = "div" print ( "After modifying the tag name: " ) print (tag) print () # modifying its class attribute tag[ 'class' ] = "div_class" # adding new attribute tag[ 'id' ] = "div_id" print ( "After modifying and adding attributes: " ) print (tag) print () # to delete any attributes del tag[ "class" ] print ( "After deleting class attribute: " ) print (tag) print () # modifying the tags content tag.string = "Beginner" print ( "After modifying tag string: " ) print (tag) print () # using insert function. tag = soup.div print ( "Before inserting: " ) print (tag) print () # inserting content tag.insert( 1 , " for Beginner" ) print ( "After inserting: " ) print (tag) print () |
Output:
Example 2:
Python3
# importing module from bs4 import BeautifulSoup soup = BeautifulSoup( "<b>| A Computer Science portal</b>" , 'html.parser' ) tag = soup.new_tag( "p" ) tag.string = "Beginner" # insert before soup.b.string.insert_before(tag) print (soup.b) print () # insert after soup.b.p.insert_after(soup.new_string( " for Beginner" )) print (soup.b) |
Output:
Adding new tag and wrapping element
The tree can be modified by adding a new tag at any required location. We can also wrap the element to modify it.
- new_tag() function will add a new tag
Syntax: new_tag(“attribute”)
- wrap() function will enclose an element in the tag you specify and returns a new wrapper
Syntax: wrap()
- unwrap() function unwrap the wrapped elements.
Syntax: unwrap()
Example:
Python3
# importing module from bs4 import BeautifulSoup markup = ' <p>Beginner for Beginner< / p> ' # parsering string to HTML soup = BeautifulSoup(markup, 'html.parser' ) print (soup) # wrapping around the string soup.p.string.wrap(soup.new_tag( "i" )) print (soup) # wrapping around the tag soup.p.wrap(soup.new_tag( "div" )) print (soup) # unwrapping the i tag soup.p.i.unwrap() print (soup) old_tag = soup.div # new tag new_tag = soup.new_tag( 'div' ) new_tag.string = "| A Computer Science portal for Beginner" # adding new tag old_tag.append(new_tag) print (soup) |
Output:
Replacing element
replace_with() function will replace old tag or string with new tag or string in the parse tree.
Syntax: replace_with()
Example:
Python3
# importing BeautifulSoup Module from bs4 import BeautifulSoup markup = '<a href="http://gfg.com/">Beginner for Beginner <i>gfg.com</i></a>' # parsering string to HTML soup = BeautifulSoup(markup, 'html.parser' ) # tag to be replaced old_tag = soup.a # new tag new_tag = soup.new_tag( "p" ) # input string new_tag.string = "gfg.in" '''replacing tag page_element.replace_with("string") removes a tag or string from the tree, and replaces it with the tag or string of your choice.''' old_tag.i.replace_with(new_tag) print (old_tag) |
Output:
<a href=”http://gfg.com/”>Beginner for Beginner <p>gfg.in</p></a>
Adding new content to an existing tag
For adding new contents to an existing tag can be done by append() function or NavigableString() constructor.
Syntax: tag.append(“content”)
Example:
Python3
# importing module from bs4 import BeautifulSoup from bs4 import NavigableString # parsering string to HTML soup = BeautifulSoup(markup, 'html.parser' ) # extracting a tag tag = soup.a # appending content tag.append( "| A Computer Science portal" ) print (tag) # appending content using navigableString constructor new_str = NavigableString( " for Beginner" ) tag.append(new_str) print (tag) |
Output:
<a href=”https://www.w3wiki.net/”>Beginner for Beginner| A Computer Science portal</a>
<a href=”https://www.w3wiki.net/”>Beginner for Beginner| A Computer Science portal for Beginner</a>
Removing content and element
A tree can be modified by removing content from it or by removing element also.
- clear() removes the contents of the tag.
Syntax: clear()
- extract() removes a tag or strings from the tree.
Syntax: extract()
- decompose() removes the tag and delete it all content.
Syntax: decompose()
Example:
Python3
# importing module from bs4 import BeautifulSoup markup = '<a href="https://www.w3wiki.net/">Beginner for Beginner <i>| A Computer Science portal</i></a>' # parsering string to HTML soup = BeautifulSoup(markup, 'html.parser' ) tag = soup.a print (tag) print () # clearing its all content tag.clear() print (tag) print () # extracting i tag # parsering string to HTML soup2 = BeautifulSoup(markup, 'html.parser' ) a_tag = soup2.a print (a_tag) print () i_tag = soup2.i.extract() print (a_tag) print () # decomposing i tag # parsering string to HTML soup2 = BeautifulSoup(markup, 'html.parser' ) a_tag = soup2.a print (a_tag) print () i_tag = soup2.i.decompose() print (a_tag) |
Output: