Data Loading and Preprocessing
The dataset has been provided in two files one is for training and the other one is for testing. We will load this data and then one hot encode the labels considering the fact we are not building the classifier for ‘J’ and ‘Z’ alphabet.
Python3
def load_data(path): df = pd.read_csv(path) y = np.array([label if label < 9 else label - 1 for label in df[ 'label' ]]) df = df.drop( 'label' , axis = 1 ) x = np.array([df.iloc[i].to_numpy().reshape(( 28 , 28 )) for i in range ( len (df))]).astype( float ) x = np.expand_dims(x, axis = 3 ) y = pd.get_dummies(y).values return x, y X_train, Y_train = load_data( '/content/sign_mnist_train.csv' ) X_test, Y_test = load_data( '/content/sign_mnist_test.csv' ) |
Now let’s check the shape of the training and the testing data.
Python3
print (X_train.shape, Y_train.shape) print (X_test.shape, Y_test.shape) |
Output:
(27455, 28, 28, 1) (27455, 24) (7172, 28, 28, 1) (7172, 24)
Sign Language Recognition System using TensorFlow in Python
The first step of any machine learning problem is finding the appropriate dataset. For Sign language recognition let’s use the Sign Language MNIST dataset. It has images of signs corresponding to each alphabet in the English language. Since the sign language of J and Z requires motion, those two classes are not available in the dataset.