Reinforcement Studying from Reflective Suggestions (RLRF): Aligning and Enhancing LLMs through Positive-Grained Self-Reflection
Authors: Kyungjae Lee, Dasol Hwang, Sunghyun Park, Youngsoo Jang, Moontae Lee
Summary: Regardless of the promise of RLHF in aligning LLMs with human preferences, it typically results in superficial alignment, prioritizing stylistic modifications over enhancing downstream efficiency of LLMs. Underspecified preferences may obscure instructions to align the fashions. Missing exploration restricts identification of fascinating outputs to enhance the fashions. To beat these challenges, we suggest a novel framework: Reinforcement Studying from Reflective Suggestions (RLRF), which leverages fine-grained suggestions based mostly on detailed standards to enhance the core capabilities of LLMs. RLRF employs a self-reflection mechanism to systematically discover and refine LLM responses, then fine-tuning the fashions through a RL algorithm together with promising responses. Our experiments throughout Simply-Eval, Factuality, and Mathematical Reasoning reveal the efficacy and transformative potential of RLRF past superficial surface-level adjustment