How to count the number of sentences in a text in R

A fundamental task in R that is frequently used in text analysis and natural language processing is counting the number of sentences in a text. Sentence counting is necessary for many applications, including language modelling, sentiment analysis, and text summarization. In this article, we’ll look at various techniques and R packages for quickly and correctly counting the amount of phrases in a given text using R.

Related Concepts :

  • Regular Expressions : Regular expression specifies pattern that is used to identify sentences .
  • Functions in R : Various string related functions will be used for counting sentences

Steps Required For Counting Sentences in R :

  • First we need to write R script in R Studio that will perform counting of sentences .
  • We will store our text in a variable as string .
  • Then we will use regular expression to match it with text to count sentences .
  • Now we will use below examples to get count of sentences .
  • Finally we will display the count of sentences on console .

Code for Counting Sentences in Text using stringr Package

R




text <- "This is R program for counting number of sentences in text.
This program is for GFG article . And it is using stringr package for counting."
 
sentences <- unlist(strsplit(text, "[.!?]"))
 
num_sentences <- length(sentences)
 
cat("Number of sentences using unlist and strsplit :", num_sentences)


Output:

Number of sentences using unlist and strsplit : 3

  • First we store text in text variable .
  • Then we use strsplit to split text using regular expression .
  • unlist() – on above split output to convert it to list and store it in sentences variable.
  • length() is used to find number of sentences in sentences variable.

Finally we use cat to display the sentence count as below. As there are 3 sentences in the text ending with full stop(.) the output will be 3 .

Counting Sentences in Text using R and strcount()

R




if (!require(stringr)) {
  install.packages("stringr")
  library(stringr)
}
 
text <- "This is R program for counting number of sentences in text.
This program is for GFG article .
And it is using stringr package for counting. And is it working ?"
 
sentence_pattern <- "[.!?]"
 
num_sentences <- str_count(text, sentence_pattern)
 
cat("Number of sentences using stringr :", num_sentences, "\n")


Output:

Number of sentences using stringr : 4 

  • First we install the stringr package if it is not installed and store text similarly as above in text variable.
  • Then we store our regular expression in sentence_pattern variable .
  • str_count() to count sentences by matching text on regular expression .

Finally we will display the sentence count using cat. Here in text there are four sentences in total 3 ending with full stop(.) and one ending with question mark(?) .Hence the output is 4

Code for Counting Sentences in Text using openNLP Package

R




if (!require(openNLP)) {
  install.packages("openNLP") #this will install the package if not present
  library(openNLP)
}
 
text <- "This is gfg sentence. Another sentence from gfg ! And a third one?"
 
 
sent_token_annotator <- Maxent_Sent_Token_Annotator()
sentences <- sent_token_annotator(text)
 
num_sentences <- length(sentences)
 
cat("Number of sentences using openNLP:", num_sentences, "\n")


Output:

Number of sentences using openNLP: 3

  • we store text in text variable .
  • Then we set data as “sent_token_english” which will load the model .
  • maxent sentence tokenizer to count number of sentences .
  • Finally we use length() to count length of sentences and we will display it using cat .
  • Make Sure you have JAVA installed and path is set to make this code work.

Here there are 3 sentences seperated by full stop(.) , exclamation mark(!) and question mark(?) respectively . Hence the output is 3.

Code for Counting Sentences in Text using tokenizers Package

R




if (!require(tokenizers)) {
  install.packages("tokenizers")
  library(tokenizers)
}
 
text <- "This is an example gfg sentence. Another gfg sentence! this is last example."
 
sentences <- unlist(tokenize_sentences(text))
 
num_sentences <- length(sentences)
 
cat("Number of sentences using tokenizers:", num_sentences, "\n")


Output:

Number of sentences using tokenizers: 3 

  • we store text data in text variable.
  • use tokenize_sentences() to tokenize text into sentences.
  • unlist() to list the sentences and store it in sentences .
  • length() to count sentences and display it using cat .

As there are three sentences in text variable . Two of them separated by full stop(.) and one of them separated by exclamation mark(!). The count is 3.