-
Sea AI Lab
- Singapore
- https://p2333.github.io/
- @TianyuPang1
Stars
Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"
Safety at Scale: A Comprehensive Survey of Large Model Safety
Understanding R1-Zero-Like Training: A Critical Perspective
An Open-source RL System from ByteDance Seed and Tsinghua AIR
sail-sg / SkyLadder
Forked from jzhang38/TinyLlamaThe official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"
Official Implementation of "Error Analyses of Auto-Regressive Video Diffusion Models: A Unified Framework"
V1: Toward Multimodal Reasoning by Designing Auxiliary Task
🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
QwQ is the reasoning model series developed by Qwen team, Alibaba Cloud.
Exploration of automated dataset selection approaches at large scales.
The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as DeepSeek-R1 and OpenAI o1, which are currently very popular.
LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification
My learning notes/codes for ML SYS.
Fully open data curation for reasoning models
Sky-T1: Train your own O1 preview model within $450
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
A framework for few-shot evaluation of language models.
Contains all assets to run with Moonshot Library (Connectors, Datasets and Metrics)
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
verl: Volcano Engine Reinforcement Learning for LLMs
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
🔱 Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs