Skip to content
View lxtGH's full-sized avatar
💬
At home
💬
At home

Highlights

  • Pro

Block or report lxtGH

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Pixel-Space Generative Models

Python 150 6 Updated Apr 13, 2025

Unifying 3D Mesh Generation with Language Models

Python 1,006 55 Updated Mar 28, 2025

LLM inference in C/C++

C++ 78,173 11,406 Updated Apr 15, 2025

OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]

Python 1,275 48 Updated Dec 11, 2024

Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Python 63 Updated Apr 12, 2025

Code for D-DiT

Jupyter Notebook 22 3 Updated Apr 1, 2025

ByteCheckpoint: An Unified Checkpointing Library for LFMs

Python 186 5 Updated Apr 2, 2025

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

498 23 Updated Apr 9, 2025

This is the project repo for 'PSG-4D-LLM'.

CSS 8 Updated Apr 7, 2025
Python 554 55 Updated Apr 11, 2025

Memory-optimized training library for diffusion models

Python 1,045 116 Updated Apr 12, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 6,677 716 Updated Apr 16, 2025

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Python 1,033 67 Updated Apr 15, 2025

Train transformer language models with reinforcement learning.

Python 13,223 1,800 Updated Apr 15, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 2,521 188 Updated Apr 15, 2025

10 Lessons to Get Started Building AI Agents

Jupyter Notebook 14,993 3,594 Updated Apr 14, 2025

[CVPR 2025] DreamRelation: Bridging Customization and Relation Generation

Python 7 Updated Apr 5, 2025

Official PyTorch implementation for "Large Language Diffusion Models"

Python 1,468 107 Updated Apr 7, 2025

Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"

Jupyter Notebook 994 46 Updated Aug 12, 2024

Paper List of Inference/Test Time Scaling/Computing

Python 176 5 Updated Apr 9, 2025

(TPAMI 2024) A Survey on Open Vocabulary Learning

917 51 Updated Mar 23, 2025

[T-PAMI-2024] Transformer-Based Visual Segmentation: A Survey

738 52 Updated Aug 25, 2024

This is a repo to track the latest autoregressive visual generation papers.

255 4 Updated Apr 15, 2025

Fast and memory-efficient exact attention

Python 16,899 1,603 Updated Apr 13, 2025

Implementation of [CVPR 2025] "DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation"

Python 767 63 Updated Feb 5, 2025

HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo

Python 1,310 105 Updated Mar 28, 2025

CogView4, CogView3-Plus and CogView3(ECCV 2024)

Python 992 71 Updated Mar 29, 2025

A Unified Tokenizer for Visual Generation and Understanding

Python 254 5 Updated Apr 15, 2025

[ICLR 2025] Official Implementation of Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Python 303 10 Updated Mar 20, 2025

Open reproduction of MUSE for fast text2image generation.

Python 348 29 Updated Jun 1, 2024
Next