NumPy Data Types¶

import numpy as np
import pandas as pd

Data Types in NumPy¶

Numpy has the following data types:

int
float
complex
bool
string
unicode
object

The numeric data types have various precisions like 32-bit or 64-bit.

Numpy data types can be represented using either Type or Type Code

dtypes = pd.DataFrame(
    {
        'Type': [
            'int8', 
            'uint8', 
            'int16', 
            'uint16', 
            'int or int32', 
            'uint32', 
            'int64', 
            'uint64', 
            'float16', 
            'float32', 
            'float or float64',
            'float128', 
            'complex64', 
            'complex or complex128', 
            'bool', 
            'object', 
            'string_',
            'unicode_',
        ],
        
        'Type Code': [
            'i1', 
            'u1', 
            'i2', 
            'u2', 
            'i4 or i', 
            'u4', 
            'i8', 
            'u8', 
            'f2', 
            'f4 or f', 
            'f8 or d', 
            'f16 or g', 
            'c8', 
            'c16', 
            None, 
            'O', 
            'S', 
            'U',
        ]
    }
)
dtypes

Data types can be defined at creating the numpy array and converted to other types later.

You can use either type, type code or np dot methods to define the data type of an array, but when you use np dot method to define the data type, it can only follow type rather than type code.

arr = np.array([1,2,3], dtype='f4')
arr.dtype

dtype('float32')

# Identical to the above
arr = np.array([1,2,3], dtype='float32')
arr.dtype

dtype('float32')

arr = np.array([1+2j, 3-4j], dtype=np.complex64)
arr.dtype

dtype('complex64')

# Identical to the above
arr = np.array([1+2j, 3-4j], dtype='c8')
arr.dtype

dtype('complex64')

# ERROR
arr = np.array([1+2j, 3-4j], dtype=np.c8)
arr.dtype

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-a5f30f9069f2> in <module>
      1 # ERROR
----> 2 arr = np.array([1+2j, 3-4j], dtype=np.c8)
      3 arr.dtype
AttributeError: module 'numpy' has no attribute 'c8'

Type Conversion¶

astype method: convert the data type of an array to other data types.

Notice that astype returns a copy of the array instead of converting the data type in place. You need to assign the copy to the original array or a new array.

arr = np.array([1,2,3], dtype='int16')
print('Original Data Type: ' + str(arr.dtype))
arr = arr.astype(np.float32)
print('Data Type After Conversion: ' + str(arr.dtype))

Original Data Type: int16
Data Type After Conversion: float32

WARNING: be cautious about data overflow when you downcast the data type (from higher precision to lower precision). Some unexpected and undefined values might occur and it is usually difficult to debug such issues.

# An example of integer overflow at downcasting
arr = np.array([126,127,256], dtype='int16')
print('np array before type conversion: ' + str(arr))
# Range of int8 [-128, 127], 256 overflows after conversion
arr = arr.astype('int8')
print('np array after type conversion: ' + str(arr))

np array before type conversion: [126 127 256]
np array after type conversion: [126 127   0]

String and Unicode Data Type¶

The string_ and unicode_ data types are all implicitly fixed-length.

The length of the string is given by their type code appended with a number. For example, S3 represents string of length 3; U10 represents unicode of length 10. Otherwise, the default length is the length of the longest string in the array.

If the length of a string in the array is shorter than the length of the data type defined or converted to, the string will be truncated.

# An example of truncated string
s = np.array(['abc', 'defg'], dtype='S3')
print(s)
# An example of truncated unicode
s = np.array(['abcd', 'efghi'], dtype='U3')
print(s)

[b'abc' b'def']
['abc' 'efg']

arr = np.array(['a', 'ab', 'abc'], dtype=np.string_)
print('The array is ' + str(arr))
print('The data type is ' + str(arr.dtype) + ' because the longest string in the array is "abc" and its length is 3.')
arr = np.array(['a', 'abc', 'abcd'], dtype=np.unicode_)
print('The array is ' + str(arr))
print('The data type is ' + str(arr.dtype) + ' because the longest unicode in the array is "abcd" and its length is 4.')

The array is [b'a' b'ab' b'abc']
The data type is |S3 because the longest string in the array is "abc" and its length is 3.
The array is ['a' 'abc' 'abcd']
The data type is <U4 because the longest unicode in the array is "abcd" and its length is 4.

What do “|” and “<” in the data types above mean?

They are the byte order indicators, which are beyond the scope of this tutorial.

Machine Learning Tutorials, Courses and Certifications

Numpy Datatypes

Related Articles

NumPy Data Types¶

Data Types in NumPy¶

Type Conversion¶

String and Unicode Data Type¶

Related

About Machine Learning

Check Also

Groupby in Pandas – Data Science Tutorials

Leave a Reply Cancel reply

From Algorithms to AI: The Evolution of Programming in the Age of Generative Intelligence

Multi Linear Regression

Microsoft AI Classroom Series Assessment Answers

Polynomial Regression

Support Vector Regression

Spark MLlIB Cognitive Class Exam Answers:-

Python MYSQL Create Database

Data Visualization with Python Certification

Python Numbers

Cloud Conference – App Security and Threat Modeler Lab Cognitive Class Exam Answers:-

From Algorithms to AI: The Evolution of Programming in the Age of Generative Intelligence

FCF – Introduction to the Threat Landscape 2.0 Self-Paced Quiz Exam Answers

Computer Vision and Image Processing Specialization Certification

Linux Device Drivers Certification

Linux Server Administration Certification

	Type	Type Code
0	int8	i1
1	uint8	u1
2	int16	i2
3	uint16	u2
4	int or int32	i4 or i
5	uint32	u4
6	int64	i8
7	uint64	u8
8	float16	f2
9	float32	f4 or f
10	float or float64	f8 or d
11	float128	f16 or g
12	complex64	c8
13	complex or complex128	c16
14	bool	None
15	object	O
16	string_	S
17	unicode_	U