I have a dataframe where the rows contain NaN values. The df contains original columns namely Heading 1 Heading 2 and Heading 3 and extra columns called Unnamed: 1 Unnamed: 2 and Unnamed: 3 as shown:
| Heading 1 | Heading 2 | Heading 3 | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| NaN | 34 | 24 | 45 | NaN | NaN |
| NaN | NaN | 24 | 45 | 11 | NaN |
| NaN | NaN | NaN | 45 | 45 | 33 |
| 4 | NaN | 24 | NaN | NaN | NaN |
| NaN | NaN | 4 | NaN | NaN | NaN |
| NaN | 34 | 24 | NaN | NaN | NaN |
| 22 | 34 | 24 | NaN | NaN | NaN |
| NaN | 34 | NaN | 45 | NaN | NaN |
I want to iterate through each row and find out the amount of initial NaN values in original columns (Heading 1 Heading 2 and Heading 3) and the amount of non NaN values in the extra columns (Unnamed: 1 Unnamed: 2 and Unnamed: 3). For each and every row this should be calculated and returned in a dictionary where the key is the index of the row and the value for that key is a list containing the amount of initial NaN values in original columns (Heading 1 Heading 2 and Heading 3) and the second element of the list would the amount of non NaN values in the extra columns (Unnamed: 1 Unnamed: 2 and Unnamed: 3).
So the result for the above dataframe would be:
{0 : [1, 1],
1 : [2, 2],
2 : [3, 3],
3 : [0, 0],
4 : [2, 0],
5 : [1, 0],
6 : [0, 0],
7 : [1, 1]}
Notice how in row 3 and row 7 the original columns contain 1 and 2 NaN respectively but only the initial NaN's are counted and not the in between ones!
UPDATE / RESULTS:
Both @mozaway and @Panda Kim gave the correct solution for the current dataframe but @mozway solution does not work at all for another test dataframe.
@Panda Kim gave 2 solutions but both the methods he gave (cumsum() and x.first_valid_index()) are giving slightly different results for the different dataframe.
| Heading 1 | Heading 2 | Heading 3 | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| NaN | 34 | 24 | 45 | NaN | NaN | NaN |
| NaN | NaN | 24 | 45 | 11 | NaN | NaN |
| NaN | NaN | NaN | 45 | 45 | 33 | NaN |
| 4 | NaN | 24 | NaN | NaN | NaN | NaN |
| NaN | NaN | 4 | NaN | NaN | NaN | NaN |
| NaN | 34 | 24 | NaN | NaN | NaN | NaN |
| 22 | 34 | 24 | NaN | NaN | NaN | NaN |
| NaN | 34 | NaN | 45 | NaN | NaN | NaN |
| NaN | NaN | NaN | NaN | 12 | 22 | 45 |
| NaN | NaN | NaN | NaN | NaN | 11 | 69 |
| NaN | NaN | NaN | NaN | 12 | NaN | 45 |
| NaN | NaN | NaN | NaN | NaN | NaN | 45 |
| NaN | NaN | NaN | NaN | NaN | 44 | NaN |
For the above df here are the results:
@Panda KIM (first_valid_index())
{0: [1, 1], 1: [2, 2], 2: [3, 3], 3: [0, 0], 4: [2, 0], 5: [1, 0], 6: [0, 0], 7: [1, 1], 8: [3, 3], 9: [3, 2], 10: [3, 2], 11: [3, 1], 12: [3, 1]}
@Panda Kim (cumsum())
{0: [1, 1], 1: [2, 2], 2: [3, 3], 3: [0, 0], 4: [2, 0], 5: [1, 0], 6: [0, 0], 7: [1, 1], 8: [4, 3], 9: [5, 2], 10: [4, 2], 11: [6, 1], 12: [5, 1]}
@mozway solution
{0: [1, 1], 1: [2, 2], 2: [3, 3], 3: [0, 0], 4: [2, 0], 5: [1, 0], 6: [0, 0], 7: [1, 1], 8: [3, 0], 9: [3, 0], 10: [3, 0], 11: [3, 0], 12: [3, 0]}
2条答案
按热度按时间ha5z0ras1#
您可以用途:
输出量:
作为字典
输出量:
n6lpvg4x2#
First
divide dataframe (iloc or filter or and so on)
Second
count initial NaNs in df1 and count notnull in df2
Last
concat and make dict
result:
Update
result:
Anyone can know that this is different from questioner's result
(@Panda Kim (cumsum()))
.Of course, if function is not applied to
df1
, the result is different.Let's apply
cumsum
code todf
instead ofdf1
for wrong result:wrong result(same to questioner's result that he think result of my code)
It is common for the person to apply and get different results, but that should be checked by the person himself before endless question.