What is Sequence in BioPython module?

In this article, we will learn Sequence in BioPython module,This free Python tutorial for complete beginners will help you learn Python from scratch.

Sequence in BioPython module - ❤️Python Tutorials In 2024

Prerequisite: BioPython module

Sequence is basically a special series of letters which is used to represent the protein of an organism, DNA or RNA. Sequences in Biopython are usually handled by the Seq object described in Bio.Seq module. The Seq object has inbuilt functions like complement, reverse_complement, transcribe, back_transcribe and translate, etc. The Seq objects has numerous string methods like count(), find(), split(), strip(), etc.

Below are some examples of sequence in Biopython:

Example 1:

Python3

# Import libraries 
from Bio.Seq import Seq 
  
# Creating a sequence 
seq = Seq("GACT") 
  
# Printing Sequence 
print(seq) 

Output:

GACT

In the above example, the sequence GACT, each letter represents Glycine, Alanine, Cysteine and Threonine. Each Seq object has two important attributes:

Data, which is the actual sequence string(GACT in this case).
Alphabet, which is used to represent the type of the sequence i.e. DNA sequence, RNA sequence, etc. It is generic in nature and by default does not represent any sequence.

Example 2:

Python3

# Import libraries 
from Bio.Seq import Seq 
  
# Creating a sequence 
seq = Seq("ACGT=TT") 
  
# Updating sequence 
updatedSeq = my_dna.ungap("=") 
  
# Printing Sequence 
print(updatedSeq) 

Output:

ACGTT

Here, the sequence ACGT, each letter represents Adenine, Cytosine, Guanine, and Thymine. The =TT refers various protein naming conventions and functionalities.

Alphabet Class:

In addition to the string properties, Seq object also posses alphabet properties, these properties are instances of Alphabet class from Bio.Alphabet module, example IUPAC DNA or generic DNA describes the type of molecule i.e DNA, RNA, protein or it may also indicate expected symbols.

The Alphabet module provides the following classes to represent various sequences:

Class	Property
SingleLetterAlphabet	Generic alphabet with letters of size one,derives from alphabet and all other alphabet types are derived from this.
ProteinAlphabet	Generic single letter protein alphabet
NucleotideAlphabet	Generic single letter nucleotide alphabet
DNAAlphabet	Generic single letter DNA alphabet.
RNAAlphabet	Generic single letter RNA alphabet.
SecondaryStructure	Alphabet used to describe secondary structure.
ThreeLetterProtein	Three letter protein alphabet.
AlphabetEncoder	class used to construct a new and extended alphabet from an existing one.
Gapped	Alphabets which contain a gap character.
HasStopCodon	Alphabets which contain a stop symbol.

Bio.Alphabet also provides an IUPAC module which gives sequence types as defined by the IUPAC community. Some classes in IUPAC module are listed below:

Name	Class	Property
IUPACProtein	Protein	IUPAC protein alphabet of 20 standard amino acids.
ExtendedIUPACProtein	extended_protein	Extended uppercase IUPAC protein single letter alphabet .
IUPACAmbiguousDNA	ambiguous_dna	Uppercase IUPAC ambiguous DNA.
IUPACUnambiguousDNA	unambiguous_dna	Uppercase IUPAC unambiguous DNA (GATC).
ExtendedIUPACDNA	extended_dna	Extended IUPAC DNA alphabet.
IUPACAmbiguousRNA	ambiguous_rna	Uppercase IUPAC ambiguous RNA.
IUPACUnambiguousRNA	unambiguous_rna	Uppercase IUPAC unambiguous RNA (GAUC).

The, Bio.Alphabet was deleted from Biopython. The intended function of the alphabet objects has never been well established, and there have been disadvantages to the pre-existing 20-year-old style. In particular, the AlphabetEncoder class was excessively complex, making it difficult to decide the type of molecule. The consensus of several alphabet objects (e.g. during string addition) was often difficult.

Without a concrete plan for how to strengthen or replace the current structure, it was decided to completely abolish Bio.Aplphabet module.