我有一个包含太多数据的xlsx
文件。但是,数据包含名为UniversalIDS
的列中的duplicate
值,我想将其替换为随机生成的IDS
和Pandas
。
到目前为止,我已经尝试了不同的情况下,我谷歌,但没有工作.例如,我尝试了这个:
import pandas as pd
import uuid
df = pd.read_excel('C:/Users/Nervous/Downloads/ss.xlsx')
df.loc[df.duplicated(subset=["UniversalIDS"]), "UniversalIDS"] = uuid.uuid4()
df.to_excel("C:/Users/Nervous/Downloads/tt.xlsx")
print("done!")
此外,我尝试了在这个网站上看到的其他替代品,例如:
df2 = df.assign(UniversalIDS=df['UniversalIDS'].where(
~df.duplicated(['UniversalIDS']), uuid.uuid4()))
这个也不管用
df.loc[df["UniversalIDS"].duplicated(), "test"]
这是xlsx data
的一个片段:
UniversalIDS
f6112cd7-0868-4cc9-b5ab-d7381cc23bdf
f3e75641-f328-429f-ae32-41399d8a0bf0
08dccc5c-5774-4614-925c-ad9373821075
79a8ebed-154c-47c7-b16d-cbba5d8469eb
396f8e63-1950-4c36-9524-c1dec8728ffd
62cba3bd-a855-4141-8460-7ff838ecea62
b7f4f753-b479-413a-abcd-34d08436eb85
c0fd6e61-edb1-4dce-94ac-7d0529860e1f
c42f8c98-c285-4552-af9f-2f4d8e89b9e8
8cb77021-eb4f-4cfa-a4a3-e649f6a53c03
cb7f4b8d-976a-4481-919e-c8ff7d384cc6
e15fd2bb-5409-4a8b-9fdc-bf1e5862db58
27b97893-aae7-4a9a-aae1-0fc21a099209
1abc2c2f-94f2-4546-b912-b85bc4ed0cb8
6bf264fb-1b82-48e3-966a-14e48a61a63e
9653faeb-7b3d-408e-93e3-bc729f259c75
a09f3eb6-0059-4a77-bf2f-4f7436508ba8
65e06948-2e6c-413f-a768-c3faf8108a6c
291ff491-4ff0-4fb2-b095-b3e66f2d7ca0
653535c7-0389-4077-8e72-3835fbd72d4d
61408fc8-4f45-48e0-b83a-40b6bfd76ad5
3ae8d547-bf4b-42ac-b083-a1662f1a5c82
4955c673-c5da-464c-8e14-a897df0774eb
a39bad90-5235-4679-945e-534bb47b8347
264a1f6e-adf4-45a7-b1b1-e6f3fc073447
a855025b-ee84-46d5-aedb-cbac9a5b1920
71b16a5b-3f6d-4d30-8a65-203959fe87a2
4f3f86f2-4e61-475a-bc1d-eb2112f23953
59da45de-c192-4885-8a55-9138ca49b33a
8f41df73-d9dc-4663-9f64-d090d7c5ca77
84f7103f-e9de-444f-b046-c02d75af0ed1
2738f733-7438-494c-9368-5fb700df93d1
777a3cd7-19ae-4181-b91d-9be8eaf30523
b6083731-a43e-4b5a-ac9a-94a3202103e7
f22873c1-6811-4025-8f0d-47d72d49e499
f262c369-f44a-4b90-8219-d29b33bc14e8
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
ea0d26f5-d8c4-4082-983a-8eeea29c6c54
如上文所示,在UniversalIDS
列中存在重复值,同样值得一提的是,在数据中存在其他列,但为了简单起见,将导致问题的列删除。
因此我的问题是如何用新的唯一ID替换UniversalIDS
列中的重复值?
2条答案
按热度按时间dldeef671#
你的表情:
是正确的python,但是它为所有重复的元素设置了一个uuid,这意味着元素在执行后仍然会重复。您应该创建一个具有不同uuid的Series:
gk7wooem2#
我发现基本上与以前的答案相同,当我重现你的问题,但想出了一个略有不同的解决方案,所以张贴供参考: