Handwriting digit recognition typically involves using machine learning or deep learning models to classify handwritten digits, such as those found in the popular MNIST dataset. Here’s an overview of how it’s usually done:
1. Data Collection¶
- MNIST Dataset: The most common dataset for digit recognition, containing 60,000 training images and 10,000 test images, all of handwritten digits (0-9) in 28×28 grayscale images.
2. Preprocessing¶
- Normalization: The pixel values are normalized (often scaled between 0 and 1) to improve model performance.
- Reshaping: Input images are reshaped to be fed into the model. For example, for neural networks, each image might be flattened into a vector of size 784 (28×28), while convolutional neural networks (CNNs) keep the original shape.
3. Model Selection¶
- Machine Learning Algorithms: Basic methods like Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), and Decision Trees can be used.
- Deep Learning: CNNs are highly effective for image recognition tasks. A common architecture for MNIST is a few convolutional layers followed by fully connected layers.
4. Training¶
- Loss Function: Cross-entropy loss is typically used for classification tasks.
- Optimizer: Stochastic Gradient Descent (SGD), Adam, or RMSProp are popular choices.
- Backpropagation: The model updates its weights based on the loss and optimizer during training.
5. Evaluation¶
- The trained model is tested on unseen data (the test set) to evaluate its accuracy, precision, recall, and other metrics.
6. Deployment¶
- Once trained, the model can be deployed in real-time systems to recognize handwritten digits from user inputs, like in digitized forms or digit-recognition apps.
Would you like to implement this using Python and machine learning libraries like TensorFlow or PyTorch? I can guide you through the process.
Let’s review practically ¶
In [1]:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
c:\users\karan\appdata\local\programs\python\python36\lib\site-packages\numpy\_distributor_init.py:32: UserWarning: loaded more than 1 DLL from .libs: c:\users\karan\appdata\local\programs\python\python36\lib\site-packages\numpy\.libs\libopenblas.IPBC74C7KURV7CB2PKT5Z5FNR3SIBV4J.gfortran-win_amd64.dll c:\users\karan\appdata\local\programs\python\python36\lib\site-packages\numpy\.libs\libopenblas.PYQHXLVVQ7VESDPUVUADXEVJOBGHJPAY.gfortran-win_amd64.dll stacklevel=1)
In [2]:
mnist = fetch_openml("mnist_784")
In [3]:
plt.figure(figsize=(20,4))
for index, (image, label) in enumerate(zip(mnist.data[:5], mnist.target[:5])):
plt.subplot(1, 5, index+1)
plt.imshow(np.reshape(image, (28,28)), cmap="gray")
plt.title("Number: %s" % label)
In [4]:
X_train, X_test, y_train, y_test = train_test_split(mnist.data, mnist.target, test_size=0.2)
In [5]:
mdl = LogisticRegression(solver='lbfgs')
mdl.fit(X_train, y_train)
predictions = mdl.predict(X_test)
score = mdl.score(X_test, y_test)
print(score)
0.9207142857142857
c:\users\karan\appdata\local\programs\python\python36\lib\site-packages\sklearn\linear_model\_logistic.py:940: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
In [12]:
mdl.predict([X_test[2]])[0]
Out[12]:
'0'
In [11]:
index = 2
plt.imshow(np.reshape(X_test[index], (28,28)))
plt.show()
In [8]:
print("Prediction" + mdl.predict([X_test[index]])[0])
Prediction4
In [9]:
cm = metrics.confusion_matrix(y_test, predictions)
cm
Out[9]:
array([[1331, 0, 6, 8, 2, 8, 5, 2, 13, 1], [ 0, 1503, 13, 5, 2, 3, 2, 3, 17, 2], [ 9, 13, 1264, 19, 17, 7, 16, 19, 40, 8], [ 8, 3, 33, 1279, 3, 38, 5, 7, 30, 15], [ 5, 4, 4, 4, 1252, 2, 11, 5, 13, 53], [ 16, 6, 9, 40, 21, 1080, 24, 3, 41, 13], [ 10, 1, 12, 0, 12, 15, 1292, 1, 5, 0], [ 4, 3, 11, 10, 14, 0, 2, 1382, 5, 42], [ 10, 34, 17, 27, 6, 45, 13, 2, 1209, 15], [ 6, 9, 6, 20, 38, 7, 1, 41, 10, 1298]], dtype=int64)
In [10]:
plt.figure(figsize=(9,9))
plt.imshow(cm, cmap='Pastel1')
plt.title('Confusion Matrix for MNIST Data')
plt.xticks(np.arange(10))
plt.yticks(np.arange(10))
plt.ylabel('Actual Label')
plt.xlabel('Predicted Label')
plt.colorbar()
width, height = cm.shape
for x in range(width):
for y in range(height):
plt.annotate(str(cm[x][y]), xy=(y,x), horizontalalignment='center', verticalalignment='center')
In [ ]: