Deriving the DPO Objective Under the Plackett-Luce Model
#aifinetuning #directpreferenceoptimization #reinforcementlearning #languagemodels #languagemodeloptimization #rewardmodeling #bradleyterrymodel #plackettlucemodel
https://hackernoon.com/deriving-the-dpo-objective-under-the-plackett-luce-model
#aifinetuning #directpreferenceoptimization #reinforcementlearning #languagemodels #languagemodeloptimization #rewardmodeling #bradleyterrymodel #plackettlucemodel
https://hackernoon.com/deriving-the-dpo-objective-under-the-plackett-luce-model
Hackernoon
Deriving the DPO Objective Under the Plackett-Luce Model
Learn how the Plackett-Luce model is used to derive the DPO objective.