PKU-Alignment/safe-rlhf
Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Language: Python
#ai_safety #alpaca #datasets #deepspeed #large_language_models #llama #llm #llms #reinforcement_learning #reinforcement_learning_from_human_feedback #rlhf #safe_reinforcement_learning #safe_reinforcement_learning_from_human_feedback #safe_rlhf #safety #transformers #vicuna
Stars: 279 Issues: 0 Forks: 14
https://github.com/PKU-Alignment/safe-rlhf
Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Language: Python
#ai_safety #alpaca #datasets #deepspeed #large_language_models #llama #llm #llms #reinforcement_learning #reinforcement_learning_from_human_feedback #rlhf #safe_reinforcement_learning #safe_reinforcement_learning_from_human_feedback #safe_rlhf #safety #transformers #vicuna
Stars: 279 Issues: 0 Forks: 14
https://github.com/PKU-Alignment/safe-rlhf
GitHub
GitHub - PKU-Alignment/safe-rlhf: Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback - PKU-Alignment/safe-rlhf
👍2👏1
Gen-Verse/OpenClaw-RL
OpenClaw-RL: Personalize openclaw simply by talking to it
Language: TypeScript
#async #grpo #memory_systems #on_policy_distillation #open_claw #openclaw_skills #rlhf #sglang #skill_learning #slime
Stars: 672 Issues: 3 Forks: 60
https://github.com/Gen-Verse/OpenClaw-RL
OpenClaw-RL: Personalize openclaw simply by talking to it
Language: TypeScript
#async #grpo #memory_systems #on_policy_distillation #open_claw #openclaw_skills #rlhf #sglang #skill_learning #slime
Stars: 672 Issues: 3 Forks: 60
https://github.com/Gen-Verse/OpenClaw-RL
GitHub
GitHub - Gen-Verse/OpenClaw-RL: OpenClaw-RL: Train any agent simply by talking
OpenClaw-RL: Train any agent simply by talking. Contribute to Gen-Verse/OpenClaw-RL development by creating an account on GitHub.