Monday , April 22 2024
Numpy Datatypes

Numpy Datatypes

3 – Numpy Data Types

NumPy Data Types

In [2]:
import numpy as np
import pandas as pd

Data Types in NumPy

Numpy has the following data types:

  • int
  • float
  • complex
  • bool
  • string
  • unicode
  • object

The numeric data types have various precisions like 32-bit or 64-bit.

Numpy data types can be represented using either Type or Type Code

In [3]:
dtypes = pd.DataFrame(
        'Type': [
            'int or int32', 
            'float or float64',
            'complex or complex128', 
        'Type Code': [
            'i4 or i', 
            'f4 or f', 
            'f8 or d', 
            'f16 or g', 
Type Type Code
0 int8 i1
1 uint8 u1
2 int16 i2
3 uint16 u2
4 int or int32 i4 or i
5 uint32 u4
6 int64 i8
7 uint64 u8
8 float16 f2
9 float32 f4 or f
10 float or float64 f8 or d
11 float128 f16 or g
12 complex64 c8
13 complex or complex128 c16
14 bool None
15 object O
16 string_ S
17 unicode_ U

Data types can be defined at creating the numpy array and converted to other types later.

You can use either type, type code or np dot methods to define the data type of an array, but when you use np dot method to define the data type, it can only follow type rather than type code.

In [4]:
arr = np.array([1,2,3], dtype='f4')
In [5]:
# Identical to the above
arr = np.array([1,2,3], dtype='float32')
In [6]:
arr = np.array([1+2j, 3-4j], dtype=np.complex64)
In [7]:
# Identical to the above
arr = np.array([1+2j, 3-4j], dtype='c8')
In [8]:
arr = np.array([1+2j, 3-4j], dtype=np.c8)
AttributeError                            Traceback (most recent call last)
<ipython-input-8-a5f30f9069f2> in <module>
      1 # ERROR
----> 2 arr = np.array([1+2j, 3-4j], dtype=np.c8)
      3 arr.dtype
AttributeError: module 'numpy' has no attribute 'c8'

Type Conversion

astype method: convert the data type of an array to other data types.

Notice that astype returns a copy of the array instead of converting the data type in place. You need to assign the copy to the original array or a new array.

In [9]:
arr = np.array([1,2,3], dtype='int16')
print('Original Data Type: ' + str(arr.dtype))
arr = arr.astype(np.float32)
print('Data Type After Conversion: ' + str(arr.dtype))
Original Data Type: int16
Data Type After Conversion: float32

WARNING: be cautious about data overflow when you downcast the data type (from higher precision to lower precision). Some unexpected and undefined values might occur and it is usually difficult to debug such issues.

In [10]:
# An example of integer overflow at downcasting
arr = np.array([126,127,256], dtype='int16')
print('np array before type conversion: ' + str(arr))
# Range of int8 [-128, 127], 256 overflows after conversion
arr = arr.astype('int8')
print('np array after type conversion: ' + str(arr))
np array before type conversion: [126 127 256]
np array after type conversion: [126 127   0]

String and Unicode Data Type

The string_ and unicode_ data types are all implicitly fixed-length.

The length of the string is given by their type code appended with a number. For example, S3 represents string of length 3; U10 represents unicode of length 10. Otherwise, the default length is the length of the longest string in the array.

If the length of a string in the array is shorter than the length of the data type defined or converted to, the string will be truncated.

In [11]:
# An example of truncated string
s = np.array(['abc', 'defg'], dtype='S3')
# An example of truncated unicode
s = np.array(['abcd', 'efghi'], dtype='U3')
[b'abc' b'def']
['abc' 'efg']
In [12]:
arr = np.array(['a', 'ab', 'abc'], dtype=np.string_)
print('The array is ' + str(arr))
print('The data type is ' + str(arr.dtype) + ' because the longest string in the array is "abc" and its length is 3.')
arr = np.array(['a', 'abc', 'abcd'], dtype=np.unicode_)
print('The array is ' + str(arr))
print('The data type is ' + str(arr.dtype) + ' because the longest unicode in the array is "abcd" and its length is 4.')
The array is [b'a' b'ab' b'abc']
The data type is |S3 because the longest string in the array is "abc" and its length is 3.
The array is ['a' 'abc' 'abcd']
The data type is <U4 because the longest unicode in the array is "abcd" and its length is 4.

What do “|” and “<” in the data types above mean?

They are the byte order indicators, which are beyond the scope of this tutorial.

Further readings if you are interested:

About Machine Learning

Check Also

Combining and Merging in Pandas - Data Science Tutorials

Combining and Merging in Pandas – Data Science Tutorials

13- Combining and Merging Combining and Merging in Pandas¶The datasets you want to analyze can …

Leave a Reply

Your email address will not be published. Required fields are marked *