因此,我有一个数据框,我想在其中创建一个阈值,意思是,对于任何低于0.119的值,我想将其替换为“NA”,意思是我想将其视为无值。当我键入代码时:
import pandas as pd
df = pd.read_csv("ups_core.csv")
for i in df.values:
df.replace(i<0.119, "NA")
我得到错误:TypeError:在'str'和'float'的示例之间不支持'〈'您能帮助找出我做错了什么吗?
我将张贴一张照片的部分 Dataframe 。data frame谢谢!
编辑:df.head().to_dict('list')的输出
df = pd.DataFrame({'gene.id': ['ENSG00000013275', 'ENSG00000053900', 'ENSG00000078140', 'ENSG00000078747', 'ENSG00000087191'], 'Adrenal Gland': [1.7052697835359134, 0.5864174746159394, 1.3103934038583631, 1.1328838852957983, 1.6132835184442524], 'Artery Aorta': [1.11728713807222, 0.7422617853145246, 1.5368751812880124, 1.3472335768656902, 1.0792282365044272], 'Artery Coronary': [1.4142135623730951, nan, 1.6934906247250543, 0.8408964152537145, 1.3947436663504054], 'Artery Tibial': [1.0069555500567189, nan, 1.7411011265922482, 0.8766057213160351, 1.0643701824533598], 'Brain Cerebellum': [0.7371346086455506, nan, 1.681792830507429, 1.11728713807222, 0.8408964152537145], 'Brain Cortex': [1.3947436663504054, 0.6155722066724582, 3.1601652474535085, 1.4742692172911012, 1.5368751812880124], 'Breast': [1.4845235706290492, 0.7071067811865476, 0.9659363289248456, 0.8950250709279725, 1.4044448757379973], 'Colon Sigmoid': [1.0570180405613805, 2.1584564730088545, 2.732080513508791, 1.086734862526058, 1.0792282365044272], 'Colon Transverse': [1.0210121257071934, 1.086734862526058, 2.027918959580058, 1.0570180405613805, 0.9330329915368074], 'GE junction': [1.1328838852957983, nan, 2.3133763678105748, 1.189207115002721, 1.1328838852957983], 'Esophagus Mucosa': [1.2834258975629045, 0.9592641193252645, 2.084931521682243, 1.4142135623730951, 1.3195079107728942], 'Esophagus Muscle': [1.0792282365044272, 1.905275996087875, 2.9485384345822023, 1.248330548901612, 1.1328838852957983], 'Heart Atrial': [1.6358041171155622, 0.9862327044933592, 2.329467172936912, 1.1566881839052874, 1.6132835184442524], 'Heart Ventricle': [1.827662900458801, 2.411615655381521, 2.5668517951258085, 1.0210121257071934, 1.7654059925813097], 'Liver': [1.6021397551792442, nan, 2.3456698984637576, 1.681792830507429, 1.7532114426320702], 'Lung': [1.0792282365044272, nan, 1.11728713807222, 1.0281138266560663, 1.1250584846888094], 'Minor Salivary': [1.3103934038583631, nan, 2.445280555384137, 0.8705505632961241, 1.2397076999389869], 'Muscle Skeletal': [2.0139111001134378, 0.5625292423444047, 2.3456698984637576, 1.4539725173203106, 2.0139111001134378], 'Nerve Tibial': [1.1974787046189286, 1.0570180405613805, 0.9201876506248752, 1.5583291593209998, 1.0570180405613805], 'Ovary': [0.9330329915368074, 0.8645372313078652, 0.7845840978967508, 1.0942937012607394, 1.0281138266560663], 'Pancreas': [1.248330548901612, 1.248330548901612, 1.515716566510398, 0.757858283255199, 1.214194884395047], 'Pituitary': [1.2397076999389869, 0.946057646725596, 2.23457427614444, 0.7737824967711949, 1.624504792712471], 'Prostate': [1.0281138266560663, nan, 2.8088897514759945, 1.0717734625362931, 1.1250584846888094], 'Skin Unexpo': [1.3660402567543954, nan, 1.4142135623730951, 0.9726549474122856, 1.2834258975629045], 'Skin SunExpo': [1.4640856959456254, nan, 1.6132835184442524, 1.0792282365044272, 1.4948492486349385], 'Small Intestine': [1.1407637158684236, 0.9794202975869268, 2.6026837108838667, 0.9265880618903708, 1.1328838852957983], 'Spleen': [1.1328838852957983, 0.993092495437036, 1.3566043274476718, 1.013959479790029, 1.109569472067845], 'Stomach': [1.148698354997035, 0.6597539553864471, 2.5491212546385245, 0.8526348917679567, 1.1647335864684558], 'Testis': [1.5052467474110671, nan, 1.0352649238413776, 1.0210121257071934, 1.4640856959456254], 'Thyroid': [0.946057646725596, 0.8705505632961241, 1.6358041171155622, 0.9794202975869268, 0.9726549474122856], 'Uterus': [0.8950250709279725, nan, 1.2226402776920684, 1.1647335864684558, 1.0069555500567189], 'Vagina': [1.0424657608411214, nan, 1.7411011265922482, 1.3103934038583631, 1.1407637158684236]})
4条答案
按热度按时间qaxu7uf21#
这个错误表明 Dataframe 中有字符串。我不清楚为什么会出现这个错误,因为
df.values
的元素是列表,所以我认为它会返回一个错误,因为你试图在列表和浮点数之间使用<
。对于第一个问题,ID列包含字符串。有几种方法可以解决这个问题。一种是使用不包含该列的 Dataframe 切片;另一种是将该列设置为索引;第三种是将
i<0.119
替换为首先检查是否有浮点数的代码,如果有,则检查它是否小于0.119。解决这两个问题的方法是使用
df.applymap(lambda x: isinstance(x, float) and x < .119)
创建一个掩码(但是,这不会捕获存储为int
的任何内容)。wz3gfoph2#
您应该能够使用
mask
:但是,看起来第一列上有字符串,因此您可能应该将此列作为索引:
chhkpiq43#
我猜
i
的类型是字符串,也许float(i)
可以解决这个问题。ffx8fchx4#