Thursday, December 26, 2024

Getting Started with Llamafactory: Installation and Configuration Guide

Share


Author’s photo | DALLE-3

Training large-language models has always been tedious. Even with extensive support from public platforms like HuggingFace, the process involves setting up different scripts for each stage of the pipeline. From setting up data for pretraining, fine-tuning or RLHF to setting up the model based on quantization and LORA, LLM training requires labor-intensive manual efforts and tuning.

Recent release Lamy Factory in 2024 aims to solve the exact problem. The GitHub repository makes setting up model training for all stages of the LLM lifecycle extremely convenient. From initial training to SFT and even RLHF, the repository provides built-in support for configuring and training all the latest LLMs available.

Supported data models and formats

The repository supports all the latest models, including LLama, LLava, Mixtral Mixture-of-Experts, Qwen, Phi, and Gemma, among others. The full list can be found Here. It supports pre-training, SFT and major RL techniques including DPO, PPO and ORPO enables all the latest methodologies from full tuning to freeze tuning, LORA, QLoras and agent tuning.

Moreover, they also provide sample datasets for each stage of training. Sample datasets are typically compatible with alpaca template although sharegpt format is also supported. We highlight Alpaca data formatting below to better understand how to configure proprietary data.

Please note that when using your data, you will need to edit and add your feed information to the file dataset_info.json file in the Llama-Factory/data folder.

Pre-training data

The provided data is stored in a JSON file and only text column is used for LLM training. To set up initial training, your data must be in the format provided below.

[
  {"text": "document"},
  {"text": "document"}
]

Supervised tuning data

Three parameters are required in SFT data; instruction, input and output. However, system and history can be passed optionally and will be used to train the model appropriately if provided in the dataset.

The general alpaca format for SFT data is as follows:

[
  {
	"instruction": "human instruction (required)",
	"input": "human input (optional)",
	"output": "model response (required)",
	"system": "system prompt (optional)",
	"history": [
  	["human instruction in the first round (optional)", "model response in the first round (optional)"],
  	["human instruction in the second round (optional)", "model response in the second round (optional)"]
	]
  }
]

Reward modeling data

Llama-Factory provides LLM training support for preference adjustments using RLHF. The data format must provide two different responses for the same instruction, which must emphasize alignment preferences.

The better matching response is passed to the selected key, and the worse matching response is passed to the discarded parameter. The data format is as follows:

[
  {
	"instruction": "human instruction (required)",
	"input": "human input (optional)",
	"chosen": "chosen answer (required)",
	"rejected": "rejected answer (required)"
  }
]

Configuration and installation

The GitHub repository provides support for uncomplicated installation using setup.py file and requirements. However, it is recommended to apply a pure Python environment when configuring the repository to avoid dependency and package collisions.

Although Python 3.8 is the minimum requirement, it is recommended to install Python 3.11 or later. Clone the repository from GitHub using the command below:

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory

We can now create a fresh Python environment using the commands below:

python3.11 -m venv venv
source venv/bin/activate

Now we need to install the required packages and dependencies using the setup.py file. We can install them using the command below:

pip install -e ".[torch,metrics]"

This will install all required dependencies, including Torch, TRL, Acceler, and more. To ensure proper installation, we should now be able to apply the command line interface for Llama-Factory. Running the command below should display usage facilitate information on the terminal as shown in the image.

XXXXXX
If the installation was successful, this information should be printed on the command line.

Fine-tuning LLM

We can now start our LLM training! It’s as basic as writing a configuration file and calling the bash command.

Please note that a GPU is a must for LLM training using Llama factory.

We choose a smaller model to save on GPU memory and training resources. In this example, we will perform a LORA-based SFT for Phi3-mini-Instruct. We choose to create a YAML configuration file, but you can also apply a JSON file.

Create a fresh config.yaml file as follows. This configuration file is used for SFT training, and more examples of the different methods can be found in the file catalog of examples.

### model
model_name_or_path: microsoft/Phi-3.5-mini-instruct

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all

### dataset
dataset: alpaca_en_demo
template: llama3
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/phi-3/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

While this goes without saying, we need to focus on two critical parts of the configuration file.

Configuring the dataset for training

The data set name you provide is a key parameter. Before training, you must add further dataset details to the dataset_info.json file in the data directory. This information includes the necessary information about the actual path of the data file, the data format used, and the columns to be used from the data.

In this tutorial, we apply the alpaca_demo dataset, which contains questions and answers related to the English language. You can view the entire dataset Here.

The data will then be automatically loaded from the information provided. Moreover, the dataset key accepts a comma-separated list of values. Given the list, all datasets will be loaded and used for LLM training.

Configuring model training

Changing your workout type in Llama-Factory is as basic as changing a configuration parameter. As shown below, we only need the following parameters to configure LORA based SFT for LLM.

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all

We can replace SFT with pre-training modeling and rewards with correct configuration files available in the file catalog of examples. You can easily convert SFT to reward modeling by changing the parameters provided.

Start your LLM training

Now we have everything set. All that remains is to invoke the bash command, passing the configuration file as command line input.

Run the command below:

llamafactory-cli train config.yaml

The program will automatically configure all required datasets, models and pipelines for training. It took me 10 minutes to train one epoch on the TESLA T4 GPU. The output model is saved in the output directory specified in the config.yaml file.

Inference

Inference is even simpler than training a model. We need a training-like configuration file containing the base model and the path to the trained LORA adapter.

Create a fresh infer_config.yaml file and provide values ​​for the given keys:

model_name_or_path: microsoft/Phi-3.5-mini-instruct
adapter_name_or_path: saves/phi3-8b/lora/sft/  # Path to trained model
template: llama3
finetuning_type: lora

We can talk to the trained model directly on the command line with this command:

llamafactory-cli chat infer_config.yaml

This will load the model with the trained adapter and you will be able to chat easily via the command line just like you can with other packages like Ollama.

An example response on the terminal is shown in the image below:

Inference resultInference result
Inference result

WebUI

If that wasn’t basic enough, Llama-factory provides the option to train and infer without code using LlamaBoard.

You can launch the GUI using the bash command:

This will launch the web GUI on the local server as shown in the image. We can select the training model and parameters, load and preview the dataset, set hyperparameters, train and infer everything in the GUI.

Screenshot of the LlamaBoard web interfaceScreenshot of the LlamaBoard web interface
Screenshot of the LlamaBoard web interface

Application

Lama-factory is quickly gaining popularity – there are currently over 30,000 stars on GitHub. This makes setting up and training LLM from scratch much easier, eliminating the need to manually configure the training pipeline for different methods.

It supports all the latest methods and models and still claims to be 3.7 times faster than P-Tuning ChatGLM while using less GPU memory. This makes it easier for regular users and enthusiasts to train LLM with minimal code.

Kanwal Mehreen Kanwal is a machine learning engineer and technical writer with a deep passion for data science and the intersection of artificial intelligence and medicine. She is co-author of the e-book “Maximizing Productivity with ChatGPT”. As a 2022 Google Generation Scholar for APAC, she promotes diversity and academic excellence. She is also recognized as a Teradata Diversity in Tech Scholar, a Mitacs Globalink Research Scholar, and a Harvard WeCode Scholar. Kanwal is a staunch advocate for change and founded FEMCodes to empower women in STEM fields.

Our top 3 partner recommendations

1. The best VPN for engineers – 3 months free – Stay safe and sound online with a free trial

2. The best project management tool for technical teams – Boost your team’s effectiveness today

4. The best network management tool – Best for medium and gigantic companies

Latest Posts

More News