CarperAI is doing human preference learning at scale via a representation learning + RL approach.