Chinese-CLIP 可以使用ViT-H/14训练脚本和部署,(使用ViT-B/16更改为ViT-H/14脚本,训练时模型的各个方面参数都在下降,但是将模型部署成tensorRT并用图片进行预测时,效果令人担忧,)

mspsb9vt  于 4个月前  发布在  其他
关注(0)|答案(2)|浏览(68)

训练脚本:


# !/usr/bin/env

# Guide:

# This script supports distributed training on multi-gpu workers (as well as single-worker training).

# Please set the options below according to the comments.

# For multi-gpu workers training, these options should be manually set for each worker.

# After setting the options, please run the script on each worker.

# Command: bash run_scripts/muge_finetune_vit-b-16_rbt-base.sh ${DATAPATH}

# Number of GPUs per GPU worker

GPUS_PER_NODE=1

# Number of GPU workers, for single-worker training, please set to 1

WORKER_CNT=1

# The ip address of the rank-0 worker, for single-worker training, please set to localhost

export MASTER_ADDR=localhost

# The port for communication

export MASTER_PORT=8514

# The rank of this worker, should be in {0, ..., WORKER_CNT-1}, for single-worker training, please set to 0

export RANK=0
export PYTHONPATH=${PYTHONPATH}: `pwd` /cn_clip/
DATAPATH=${1}

# data options

train_data=${DATAPATH}/datasets/xfxb/lmdb/train
val_data=${DATAPATH}/datasets/xfxb/lmdb/valid # if val_data is not specified, the validation will be automatically disabled

# restore options

resume=${DATAPATH}/pretrained_weights/clip_cn_vit-h-14.pt # or specify your customed ckpt path to resume
reset_data_offset="--reset-data-offset"
reset_optimizer="--reset-optimizer"

# output options

output_base_dir=${DATAPATH}/experiments/
name=muge_finetune_vit-h-14_roberta-base_bs128_8gpu
save_step_frequency=999999 # disable it
save_epoch_frequency=1
log_interval=1
report_training_batch_acc="--report-training-batch-acc"

# report_training_batch_acc=""

# training hyper-params

context_length=52
warmup=100
batch_size=32
valid_batch_size=32
accum_freq=1
lr=5e-5
wd=0.001
max_epochs=30 # or you can alternatively specify --max-steps
valid_step_interval=150
valid_epoch_interval=1
vision_model=ViT-H-14
text_model=RoBERTa-wwm-ext-large-chinese
use_augment="--use-augment"

# use_augment=""

/home/xfxb/anaconda3/envs/clip/bin/python -m torch.distributed.launch --use_env --nproc_per_node=${GPUS_PER_NODE} --nnodes=${WORKER_CNT} --node_rank=${RANK} \n--master_addr=${MASTER_ADDR} --master_port=${MASTER_PORT} cn_clip/training/main.py \n--train-data=${train_data} \n--val-data=${val_data} \n--resume=${resume} \n${reset_data_offset} \n${reset_optimizer} \n--name=${name} \n--save-step-frequency=${save_step_frequency} \n--save-epoch-frequency=${save_epoch_frequency} \n--log-interval=${log_interval} \n${report_training_batch_acc} \n--context-length=${context_length} \n--warmup=${warmup} \n--batch-size=${batch_size} \n--valid-batch-size=${valid_batch_size} \n--valid-step-interval=${valid_step_interval} \n--valid-epoch-interval=${valid_epoch_interval} \n--accum-freq=${accum_freq} \n--lr=${lr} \n--wd=${wd} \n--max-epochs=${max_epochs} \n--vision-model=${vision_model} \n${use_augment} \n--text-model=${text_model}
nom7f22z

nom7f22z2#

同问啊,为啥呢,指标全面下降。Vit-H/14按照Vit-B/16的参数在lickr30k-cna数据集上微调的结果要比Vit-B/16差很多。

相关问题