Untitled

DATA DOUBLE CONFIRM

Handling None and NaN in Pandas - Python

The other day as I was reading in a data from BigQuery into pandas dataframe, I realised the data type for column containing all nulls got changed from the original schema. This is because if all the values in a column within a query result is null, Python will convert it into 'object' data type with nulls converting to None. However, this won't be an issue if there are some records with numbers and many others are null; Python should interpret it as numeric column type with NaN replacing nulls.

To change back to a numeric data type, I had to do this step:

df = df.fillna(value=np.nan)

On a related note, I learnt that when we do a groupby and the column contains entirely NaN values, if we want the sum to return NaN instead of 0, we can do the following:

df = df.groupby(['col1','col2']).sum(min_count=1).reset_index()

References:

https://stackoverflow.com/questions/23743460/replace-none-with-nan-in-pandas-dataframe

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sum.html