Missing Data
Let’s show a few convenient methods to deal with Missing Data in pandas:
pd.dropna(axis, thresh)
pd.fillna(value)
import numpy as np |
A | B | C | |
---|---|---|---|
0 | 1.0 | 5.0 | 1 |
1 | 2.0 | NaN | 2 |
2 | NaN | NaN | 3 |
Drop Nan data
df.dropna() |
A | B | C | |
---|---|---|---|
0 | 1.0 | 5.0 | 1 |
Drop data with over 2 NaNs
df.dropna(thresh=2) |
A | B | C | |
---|---|---|---|
0 | 1.0 | 5.0 | 1 |
1 | 2.0 | NaN | 2 |
Drop Columns with NaN data
df.dropna(axis=1) |
C | |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
Fill Data with Specified Value
df.fillna(value='FILL VALUE') |
A | B | C | |
---|---|---|---|
0 | 1 | 5 | 1 |
1 | 2 | FILL VALUE | 2 |
2 | FILL VALUE | FILL VALUE | 3 |
Fill Data with Column’s Mean Value
df['A'].fillna(value=df['A'].mean()) |
0 1.0
1 2.0
2 1.5
Name: A, dtype: float64