ludwig 重新训练之前微调的适配器

6rqinv9w 于 5个月前发布在其他

关注(0)|答案(4)|浏览(62)

您好，这个问题可能是由于您的模型在训练过程中遇到了一些问题。如果您能提供更多的信息，例如您的模型类型、数据集和代码，我可以更好地帮助您解决这个问题。

ludwig

来源：https://github.com/ludwig-ai/ludwig/issues/3932

4条答案

按热度按时间

efzxgjgh1#

看起来是我这边生成的模型本身存在问题。

赞(0）回复(0）举报 5个月前

kmbjn2e32#

你好，我们正在尝试执行增量训练，但是遇到了以下错误
完整日志文件->

Read the training data into a dataframe..
Reading the training data into a dataframe has been completed..
Setting up the HuggingFace API Token..
Huggingface token is added in the environment..
Load the Ludwig configuration YAML file..
Loading the Ludwig configuration YAML file has been completed..
Loading the Base Model..
Setting generation max_new_tokens to 512 to correspond with the max sequence length assigned to the output feature or the global max sequence length. This will ensure that the correct number of tokens are generated at inference time. To override this behavior, set `generation.max_new_tokens` to a different value in your Ludwig config.
Loading the trained Base Model has been completed..
Starting the Fine Tuning..

╒════════════════════════╕
│ EXPERIMENT DESCRIPTION │
╘════════════════════════╛

╒══════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════╕
│ Experiment name  │ api_experiment                                                                                    │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Model name       │ run                                                                                               │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Output directory │ /home/ubuntu/results/api_experiment_run_19                                                        │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ ludwig_version   │ '0.9.3'                                                                                           │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ command          │ '/home/ubuntu/train_llama-2_7b_Log_Analytics_8bit_merged_v8/codebase/train_llama_using_ludwig.py' │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ random_seed      │ 42                                                                                                │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ data_format      │ "<class 'pandas.core.frame.DataFrame'>"                                                           │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ torch_version    │ '2.1.0+cu121'                                                                                     │
├──────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ compute          │ {   'arch_list': [   'sm_50',                                                                     │
│                  │                      'sm_60',                                                                     │
│                  │                      'sm_70',                                                                     │
│                  │                      'sm_75',                                                                     │
│                  │                      'sm_80',                                                                     │
│                  │                      'sm_86',                                                                     │
│                  │                      'sm_90'],                                                                    │
│                  │     'devices': {   0: {   'device_capability': (8, 6),                                            │
│                  │                           'device_properties': "_CudaDeviceProperties(name='NVIDIA "              │
│                  │                                                "A10G', major=8, minor=6, "                        │
│                  │                                                'total_memory=22723MB, '                           │
│                  │                                                'multi_processor_count=80)',                       │
│                  │                           'gpu_type': 'NVIDIA A10G'}},                                            │
│                  │     'gencode_flags': '-gencode compute=compute_50,code=sm_50 -gencode '                           │
│                  │                      'compute=compute_60,code=sm_60 -gencode '                                    │
│                  │                      'compute=compute_70,code=sm_70 -gencode '                                    │
│                  │                      'compute=compute_75,code=sm_75 -gencode '                                    │
│                  │                      'compute=compute_80,code=sm_80 -gencode '                                    │
│                  │                      'compute=compute_86,code=sm_86 -gencode '                                    │
│                  │                      'compute=compute_90,code=sm_90',                                             │
│                  │     'gpus_per_node': 1,                                                                           │
│                  │     'num_nodes': 1}                                                                               │
╘══════════════════╧═══════════════════════════════════════════════════════════════════════════════════════════════════╛

╒═══════════════╕
│ LUDWIG CONFIG │
╘═══════════════╛

User-specified config (with upgrades):

{   'adapter': {   'alpha': 16,
                   'bias_type': 'none',
                   'dropout': 0.05,
                   'postprocessor': {   'merge_adapter_into_base_model': True,
                                        'progressbar': True},
                   'pretrained_adapter_weights': None,
                   'r': 8,
                   'target_modules': None,
                   'type': 'lora'},
    'backend': {'type': 'local'},
    'base_model': '/home/ubuntu/results/api_experiment_run_15/model/model_weights',
    'input_features': [   {   'name': 'prompt',
                              'preprocessing': {'max_sequence_length': 1024},
                              'type': 'text'}],
    'ludwig_version': '0.9.3',
    'model_type': 'llm',
    'output_features': [   {   'name': 'Response',
                               'preprocessing': {'max_sequence_length': 512},
                               'type': 'text'}],
    'preprocessing': {'sample_ratio': 1.0},
    'prompt': {   'template': '### Instruction:\n'
                              '{Instruction}\n'
                              '\n'
                              '### Context:\n'
                              '{Context}\n'
                              '\n'
                              '### Response:\n'},
    'quantization': {'bits': 8},
    'trainer': {   'batch_size': 1,
                   'enable_gradient_checkpointing': True,
                   'epochs': 3,
                   'gradient_accumulation_steps': 1,
                   'learning_rate': 0.0001,
                   'learning_rate_scheduler': {'warmup_fraction': 0.01},
                   'max_batch_size': 1,
                   'type': 'finetune'}}

Full config saved to:
/home/ubuntu/results/api_experiment_run_19/api_experiment/model/model_hyperparameters.json

╒═══════════════╕
│ PREPROCESSING │
╘═══════════════╛

No cached dataset found at /home/ubuntu/eeff4f02cfeb11ee808f12abaaebd043.training.hdf5. Preprocessing the dataset.
Using full dataframe
Building dataset (it may take a while)
Loaded HuggingFace implementation of /home/ubuntu/results/api_experiment_run_15/model/model_weights tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Max length of feature 'None': 143 (without start and stop symbols)
Max sequence length is 143 for feature 'None'
Loaded HuggingFace implementation of /home/ubuntu/results/api_experiment_run_15/model/model_weights tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Max length of feature 'Response': 144 (without start and stop symbols)
Max sequence length is 144 for feature 'Response'
Loaded HuggingFace implementation of /home/ubuntu/results/api_experiment_run_15/model/model_weights tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Loaded HuggingFace implementation of /home/ubuntu/results/api_experiment_run_15/model/model_weights tokenizer
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Building dataset: DONE
Writing preprocessed training set cache to /home/ubuntu/eeff4f02cfeb11ee808f12abaaebd043.training.hdf5
Writing preprocessed validation set cache to /home/ubuntu/eeff4f02cfeb11ee808f12abaaebd043.validation.hdf5
Writing preprocessed test set cache to /home/ubuntu/eeff4f02cfeb11ee808f12abaaebd043.test.hdf5
Writing train set metadata to /home/ubuntu/eeff4f02cfeb11ee808f12abaaebd043.meta.json

Dataset Statistics
╒════════════╤═══════════════╤════════════════════╕
│ Dataset    │   Size (Rows) │ Size (In Memory)   │
╞════════════╪═══════════════╪════════════════════╡
│ Training   │            31 │ 7.39 Kb            │
├────────────┼───────────────┼────────────────────┤
│ Validation │             4 │ 1.06 Kb            │
├────────────┼───────────────┼────────────────────┤
│ Test       │             9 │ 2.23 Kb            │
╘════════════╧═══════════════╧════════════════════╛

╒═══════╕
│ MODEL │
╘═══════╛

Warnings and other logs:
Loading large language model...
We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|█████     | 1/2 [01:15<01:15, 75.45s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [01:42<00:00, 46.69s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [01:42<00:00, 51.01s/it]
Done.
Loaded HuggingFace implementation of /home/ubuntu/results/api_experiment_run_15/model/model_weights tokenizer
Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
==================================================
Trainable Parameter Summary For Fine-Tuning
Fine-tuning with adapter: lora
trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06220594176090199
==================================================
Gradient checkpointing enabled for training.

╒══════════╕
│ TRAINING │
╘══════════╛

Creating fresh model training run.
Training for 93 step(s), approximately 3 epoch(s).
Early stopping policy: 5 round(s) of evaluation, or 155 step(s), approximately 5 epoch(s).

Starting with step 0, epoch: 0

Training:   0%|          | 0/93 [00:00<?, ?it/s]/opt/conda/envs/ludwig_train_env/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Unable to complete the finetuning due to error Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

Training:   0%|          | 0/93 [00:00<?, ?it/s]

你能帮助我们解决这个问题吗？
你好@所有人，
我也遇到了同样的问题。
有人解决了吗？
谢谢！

赞(0）回复(0）举报 5个月前

xxb16uws3#

你好，有人能帮我解决这个错误吗？

赞(0）回复(0）举报 5个月前

biswetbf4#

我正在尝试同样的事情，但我得到一个错误，而且更早：

PyTorch version 2.2.0 available.
███████████████████████
█ █ █ █  ▜█ █ █ █ █   █
█ █ █ █ █ █ █ █ █ █ ███
█ █   █ █ █ █ █ █ █ ▌ █
█ █████ █ █ █ █ █ █ █ █
█     █  ▟█     █ █   █
███████████████████████
ludwig v0.9.3 - Train

Traceback (most recent call last):
  File "/home/azureuser/ludwig/venv/bin/ludwig", line 8, in <module>
    sys.exit(main())
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/cli.py", line 197, in main
    CLI()
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/cli.py", line 72, in __init__
    getattr(self, args.command)()
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/cli.py", line 77, in train
    train.cli(sys.argv[2:])
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/train.py", line 395, in cli
    train_cli(**vars(args))
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/train.py", line 176, in train_cli
    model = LudwigModel(
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/api.py", line 317, in __init__
    self.config_obj = ModelConfig.from_dict(self._user_config)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/schema/model_types/base.py", line 141, in from_dict
    config_obj: ModelConfig = schema.load(config)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/marshmallow_dataclass/__init__.py", line 730, in load
    return clazz(**all_loaded)
  File "<string>", line 18, in __init__
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/schema/model_types/base.py", line 73, in __post_init__
    set_llm_parameters(self)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/schema/model_types/utils.py", line 314, in set_llm_parameters
    _set_generation_max_new_tokens(config)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/schema/model_types/utils.py", line 401, in _set_generation_max_new_tokens
    max_possible_sequence_length = _get_maximum_possible_sequence_length(config, _DEFAULT_MAX_SEQUENCE_LENGTH)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/ludwig/schema/model_types/utils.py", line 377, in _get_maximum_possible_sequence_length
    model_config = AutoConfig.from_pretrained(config.base_model)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1100, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 634, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
    resolved_config_file = cached_file(
  File "/home/azureuser/ludwig/venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 356, in cached_file
    raise EnvironmentError(
OSError: /home/azureuser/ludwig/results/experiment_run_50/model/model_weights does not appear to have a file named config.json. Checkout 'https://huggingface.co//home/azureuser/ludwig/results/experiment_run_50/model/model_weights/None' for available files.

你是如何让模型在一开始就加载的？

赞(0）回复(0）举报 5个月前

我来回答

ludwig 重新训练之前微调的适配器

4条答案

相关问题

热门标签

最新问答