What is NLP | Location Tags Extraction?

In this article, we will learn NLP | Location Tags Extraction,This free Python tutorial for complete beginners will help you learn Python from scratch.

NLP | Location Tags Extraction - ❤️Python Tutorials In 2024

Different kind of ChunkParserI subclass can be used to identify the LOCATION chunks. As it uses the gazetteers corpus to identify location words. The gazetteers corpus is a WordListCorpusReader class that contains the following location words:

Country names
U.S. states and abbreviations
Mexican states
Major U.S. cities
Canadian provinces

LocationChunker class looking for words that are found in the gazetteers corpus by iterating over a tagged sentence. It creates a LOCATION chunk using IOB tags when it finds one or more location words. The IOB LOCATION tags are produced in the iob_locations() and the parse() method converts the IOB tags to Tree.

Code #1 : LocationChunker class

from nltk.chunk import ChunkParserI 
from nltk.chunk.util import conlltags2tree 
from nltk.corpus import gazetteers 
  
class LocationChunker(ChunkParserI): 
    def __init__(self): 
        self.locations = set(gazetteers.words()) 
        self.lookahead = 0
        for loc in self.locations: 
            nwords = loc.count(' ') 
        if nwords > self.lookahead: 
            self.lookahead = nwords 

Code #2 : iob_locations() method

def iob_locations(self, tagged_sent): 
      
    i = 0
    l = len(tagged_sent) 
    inside = False
      
    while i < l: 
        word, tag = tagged_sent[i] 
        j = i + 1
        k = j + self.lookahead 
        nextwords, nexttags = [], [] 
        loc = False
          
    while j < k: 
        if ' '.join([word] + nextwords) in self.locations: 
            if inside: 
                yield word, tag, 'I-LOCATION'
            else: 
                yield word, tag, 'B-LOCATION'
            for nword, ntag in zip(nextwords, nexttags): 
                yield nword, ntag, 'I-LOCATION'
                loc, inside = True, True
                i = j 
                break
              
        if j < l: 
            nextword, nexttag = tagged_sent[j] 
            nextwords.append(nextword) 
            nexttags.append(nexttag) 
            j += 1
        else: 
            break
        if not loc: 
            inside = False
            i += 1
            yield word, tag, 'O'
              
    def parse(self, tagged_sent): 
        iobs = self.iob_locations(tagged_sent) 
        return conlltags2tree(iobs) 

Code #3 : use the LocationChunker class to parse the sentence

from nltk.chunk import ChunkParserI 
from chunkers import sub_leaves 
from chunkers import LocationChunker 
  
t = loc.parse([('San', 'NNP'), ('Francisco', 'NNP'), 
               ('CA', 'NNP'), ('is', 'BE'), ('cold', 'JJ'),  
               ('compared', 'VBD'), ('to', 'TO'), ('San', 'NNP'), 
               ('Jose', 'NNP'), ('CA', 'NNP')]) 
  
print ("Location : \n", sub_leaves(t, 'LOCATION')) 

Output :

Location : 
[[('San', 'NNP'), ('Francisco', 'NNP'), ('CA', 'NNP')], 
[('San', 'NNP'), ('Jose', 'NNP'), ('CA', 'NNP')]]