pisco_log
banner

An Exploration of the Application of Reinforcement Learning in Large Model Training

Fuguang Huang

Abstract


This paper conducts a systematic study of the core principles of reinforcement learning and its application mechanisms in the training of large models. First, the paper outlines the foundational framework of reinforcement learning based on Markov decision processes,
which includes value functions, policy optimization, and three major algorithm types. Building on this foundation, it analyzes the integrated
architecture of large models and reinforcement learning. Starting from three key modulesagents, environment interaction, and reward
designthe paper examines the adaptation logic of key algorithms such as RLHF, PPO, and Actor-Critic, and explores their application stages
and optimization pathways throughout the entire training process of large models. The study demonstrates that reinforcement learning can
effectively align large models with human preferences, thereby enhancing output quality and training stability. Through framework optimization, algorithmic improvements, and multidimensional validation, a closed-loop optimization system can be established, significantly improving the decision-making capabilities and generalization performance of large models. This provides a feasible technical pathway for the efficient training and alignment optimization of large models.

Keywords


Reinforcement Learning; Large Model Training; Application Research

Full Text:

PDF

Included Database


References


[1] Dai Huijie. Design and Implementation of Automatic Pathfinding Based on A* Algorithm and Reinforcement Learning Algorithm[J].

Computer Programming Skills & Maintenance, 2026, (01): 3-6+13.

[2] Xie Hao. Network Slicing Resource Allocation Based on Deep Reinforcement Learning [D]. Nanjing University of Posts and Telecommunications, 2024.

[3] Liang Yuangao. On the Hierarchical Regulation of Risks in Training Data for Large AI Models[J]. Journal of Zhengzhou University

(Philosophy and Social Sciences), 2025, 58(03): 61-67+144.

[4] Zheng Weimin. Four Issues to Consider in Building a Computer System Supporting Large Model Training[J]. Big Data Research, 2024,

10(01): 1-8.




DOI: http://dx.doi.org/10.70711/aitr.v3i11.9349

Refbacks

  • There are currently no refbacks.