dTRPO designs two-stage trajectory reduction techniques to enable efficient policy optimization of diffusion large language models (dLLMs): This repo provides the training code, scripts, and configs ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results