Skip to content
View P2333's full-sized avatar

Organizations

@thu-ml @sail-sg @MLNLP-World

Block or report P2333

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"

Python 6 Updated Apr 9, 2025

Safety at Scale: A Comprehensive Survey of Large Model Safety

145 3 Updated Feb 19, 2025
Python 11 2 Updated Apr 14, 2025

Understanding R1-Zero-Like Training: A Critical Perspective

Python 845 39 Updated Apr 14, 2025

An Open-source RL System from ByteDance Seed and Tsinghua AIR

1,099 46 Updated Apr 10, 2025

The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling

Python 29 Updated Mar 20, 2025

The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"

17 Updated Mar 17, 2025

Official Implementation of "Error Analyses of Auto-Regressive Video Diffusion Models: A Unified Framework"

5 Updated Mar 12, 2025

V1: Toward Multimodal Reasoning by Designing Auxiliary Task

Python 28 Updated Apr 14, 2025

Reproducing R1 for Code with Reliable Rewards

Python 167 11 Updated Apr 7, 2025

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

Python 319 20 Updated Apr 8, 2025

QwQ is the reasoning model series developed by Qwen team, Alibaba Cloud.

Python 451 17 Updated Mar 27, 2025

Exploration of automated dataset selection approaches at large scales.

Python 37 2 Updated Mar 4, 2025

The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'

Python 13 Updated Mar 5, 2025

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Python 1,846 135 Updated Apr 11, 2025

RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.

Python 1,358 98 Updated Apr 10, 2025

Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as DeepSeek-R1 and OpenAI o1, which are currently very popular.

Python 61 4 Updated Apr 14, 2025

LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification

Python 44 Updated Mar 2, 2025

My learning notes/codes for ML SYS.

Python 1,787 109 Updated Apr 14, 2025

Fully open data curation for reasoning models

Python 1,697 146 Updated Apr 7, 2025

Sky-T1: Train your own O1 preview model within $450

Python 3,192 322 Updated Apr 8, 2025

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

Python 2,178 272 Updated Jan 10, 2025

A framework for few-shot evaluation of language models.

Python 8,619 2,301 Updated Apr 10, 2025

Contains all assets to run with Moonshot Library (Connectors, Datasets and Metrics)

Python 29 29 Updated Apr 6, 2025

A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.

Python 222 10 Updated Apr 4, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 6,567 699 Updated Apr 14, 2025

Improving Your Model Ranking on Chatbot Arena by Vote Rigging

Python 20 2 Updated Feb 25, 2025
Python 2,633 238 Updated Apr 10, 2025

🔱 Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

55 3 Updated Mar 21, 2025
Next