我正在尝试训练一个模型,起初我有一个包含5000张图像的数据集,训练效果很好,现在我又添加了几张图像,现在我的数据集包含6,423张图像。我在Ubuntu 18.04上使用Python 3.6.1,我的tensorflow版本是1.15,numpy版本是1.16(之前有相同的版本,效果很好)。现在当我用途:
python model_main.py --logtostderr --pipeline_config_path=training/faster_rcnn_resnet50_coco.config --model_dir=training
它会启动设置,持续几分钟,然后执行以下命令行:
INFO:tensorflow:Saving checkpoints for 0 into training/model.ckpt.
I1123 10:26:21.548237 140482563244160 basic_session_run_hooks.py:606] Saving checkpoints for 0 into training/model.ckpt.
2019-11-23 10:28:30.801453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
我得到以下错误:
2019-11-23 10:08:38.843259: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_3_hash_table_2/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.843323: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_1_hash_table_1/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.843345: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_2_hash_table/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.851405: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_3_hash_table_2/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.851488: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_1_hash_table_1/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.851512: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_2_hash_table/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.851807: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_1_hash_table_1/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.851848: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_2_hash_table/N10tensorflow6lookup15LookupInterfaceE does not exist.
2019-11-23 10:08:38.851899: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:788 : Not found: Resource localhost/_3_hash_table_2/N10tensorflow6lookup15LookupInterfaceE does not exist.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [585,1024,3], [batch]: [600,799,3]
[[{{node IteratorGetNext}}]]
[[ToAbsoluteCoordinates_118/Assert/AssertGuard/Assert/data_0/_5709]]
(1) Invalid argument: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [585,1024,3], [batch]: [600,799,3]
[[{{node IteratorGetNext}}]]
0 successful operations.
0 derived errors ignored.
训练停止。
5条答案
按热度按时间l5tcr1uw1#
您添加的新图像的分辨率似乎为585x1024,这与模型预期的大小(即600x799)不同。
如果是这样,那么解决方案是相应地调整这些新图像的大小。
z6psavjg2#
如果您需要批处理大小〉1,您可以使用config中的正确
image_resizer
(其中一个定义为in the image_resizer protobuf file,我假设它是用于解析该部分config的函数)将图像大小调整为统一大小。例如(从这里偷来的):
这似乎解决了我的问题。
sf6xfgos3#
将batch_size更改为1为我解决了这个问题。
k10s72fa4#
在小批处理中,所有图像必须具有相同的大小,因此必须将所有照片的大小调整为相同的大小或将批处理大小设置为1
pb3skfrl5#
刚刚删除了数据增强,它为我工作。也如果你想你可以尝试删除一个接一个的数据增强...但删除所有只是为我工作。