Reformatted Alignment
The paper introduces REALIGN, a novel approach aimed at enhancing the quality of instruction data for large language models (LLMs) to better align with human values. This method reformats instruction data responses into a format that aligns with pre-established criteria and evidence, reducing errors and scaling issues associated with manual annotation and LLM hallucinations. The method is orthogonal to existing techniques and shows significant improvements in LLM performance without requiring new data or advanced training techniques.
Method Overview
REALIGN operates in three main steps: criteria definition, where preferences for various scenarios are defined; retrieval augmentation, which broadens the knowledge base for tasks; and reformatting, which aligns responses with criteria and evidence. Here’s the overview:
REALIGN overview
So, given a dataset of pairs (query, response), REALIGN reformats the response and changes the dataset. This dataset is now used for fine-tuning as opposed to the original dataset. Here’s a qualitative example on how the response looks like before and after applying REALIGN:
Original response (left) vs REALIGN response (right)
Results
Experiments show that REALIGN significantly improves LLMs in terms of general alignment, math reasoning, factuality, and readability. For instance, it increased the math reasoning ability of LLaMA-2-13B from 46.77% to 56.63% in accuracy.
Math reasoning results
Moreover, 5% of REALIGN data resulted in a 67% boost in general alignment ability, indicating the method’s efficiency and effectiveness. These results highlight the potential of REALIGN to enhance the performance of LLMs across various tasks without additional data or complex training methodologies.
Conclusion
REALIGN presents a simple and efficient method for improving the alignment of LLMs with human values through reformatting the instruction data. For more details please consult the full paper and the code.
Congrats to the authors for their great work!
Fan, Run-Ze et al. “Reformatted Alignment.” (2024).