- Singapore
- https://lxtgh.github.io/
- @xtl994
Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Stars
Unifying 3D Mesh Generation with Language Models
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
ByteCheckpoint: An Unified Checkpointing Library for LFMs
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
Memory-optimized training library for diffusion models
verl: Volcano Engine Reinforcement Learning for LLMs
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Train transformer language models with reinforcement learning.
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
10 Lessons to Get Started Building AI Agents
[CVPR 2025] DreamRelation: Bridging Customization and Relation Generation
Official PyTorch implementation for "Large Language Diffusion Models"
Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"
Paper List of Inference/Test Time Scaling/Computing
(TPAMI 2024) A Survey on Open Vocabulary Learning
[T-PAMI-2024] Transformer-Based Visual Segmentation: A Survey
This is a repo to track the latest autoregressive visual generation papers.
Fast and memory-efficient exact attention
Implementation of [CVPR 2025] "DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation"
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
A Unified Tokenizer for Visual Generation and Understanding
[ICLR 2025] Official Implementation of Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Open reproduction of MUSE for fast text2image generation.