NumPy Data Types¶
import numpy as np
import pandas as pd
Data Types in NumPy¶
Numpy has the following data types:
int
float
complex
bool
string
unicode
object
The numeric data types have various precisions like 32-bit or 64-bit.
Numpy data types can be represented using either Type or Type Code
dtypes = pd.DataFrame(
{
'Type': [
'int8',
'uint8',
'int16',
'uint16',
'int or int32',
'uint32',
'int64',
'uint64',
'float16',
'float32',
'float or float64',
'float128',
'complex64',
'complex or complex128',
'bool',
'object',
'string_',
'unicode_',
],
'Type Code': [
'i1',
'u1',
'i2',
'u2',
'i4 or i',
'u4',
'i8',
'u8',
'f2',
'f4 or f',
'f8 or d',
'f16 or g',
'c8',
'c16',
None,
'O',
'S',
'U',
]
}
)
dtypes
Data types can be defined at creating the numpy array and converted to other types later.
You can use either type, type code or np
dot methods to define the data type of an array, but when you use np
dot method to define the data type, it can only follow type rather than type code.
arr = np.array([1,2,3], dtype='f4')
arr.dtype
# Identical to the above
arr = np.array([1,2,3], dtype='float32')
arr.dtype
arr = np.array([1+2j, 3-4j], dtype=np.complex64)
arr.dtype
# Identical to the above
arr = np.array([1+2j, 3-4j], dtype='c8')
arr.dtype
# ERROR
arr = np.array([1+2j, 3-4j], dtype=np.c8)
arr.dtype
Type Conversion¶
astype
method: convert the data type of an array to other data types.
Notice that astype
returns a copy of the array instead of converting the data type in place. You need to assign the copy to the original array or a new array.
arr = np.array([1,2,3], dtype='int16')
print('Original Data Type: ' + str(arr.dtype))
arr = arr.astype(np.float32)
print('Data Type After Conversion: ' + str(arr.dtype))
WARNING: be cautious about data overflow when you downcast the data type (from higher precision to lower precision). Some unexpected and undefined values might occur and it is usually difficult to debug such issues.
# An example of integer overflow at downcasting
arr = np.array([126,127,256], dtype='int16')
print('np array before type conversion: ' + str(arr))
# Range of int8 [-128, 127], 256 overflows after conversion
arr = arr.astype('int8')
print('np array after type conversion: ' + str(arr))
String and Unicode Data Type¶
The string_
and unicode_
data types are all implicitly fixed-length.
The length of the string is given by their type code appended with a number. For example, S3
represents string of length 3; U10
represents unicode of length 10. Otherwise, the default length is the length of the longest string in the array.
If the length of a string in the array is shorter than the length of the data type defined or converted to, the string will be truncated.
# An example of truncated string
s = np.array(['abc', 'defg'], dtype='S3')
print(s)
# An example of truncated unicode
s = np.array(['abcd', 'efghi'], dtype='U3')
print(s)
arr = np.array(['a', 'ab', 'abc'], dtype=np.string_)
print('The array is ' + str(arr))
print('The data type is ' + str(arr.dtype) + ' because the longest string in the array is "abc" and its length is 3.')
arr = np.array(['a', 'abc', 'abcd'], dtype=np.unicode_)
print('The array is ' + str(arr))
print('The data type is ' + str(arr.dtype) + ' because the longest unicode in the array is "abcd" and its length is 4.')
What do “|” and “<” in the data types above mean?
They are the byte order indicators, which are beyond the scope of this tutorial.
Further readings if you are interested: