(trl trainer code below) how do you scale language model search? can a language model be guided towards solving new problems that it usually cannot solve purely through guided resampling without training the model whatsoever? turns out the answer is sorta yes. this has important implications for post-training and safety research.
7,96K