Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 2,521 188 Updated Apr 15, 2025

microsoft / ai-agents-for-beginners

10 Lessons to Get Started Building AI Agents

Jupyter Notebook 14,993 3,594 Updated Apr 14, 2025

Shi-qingyu / DreamRelation

[CVPR 2025] DreamRelation: Bridging Customization and Relation Generation

Python 7 Updated Apr 5, 2025

ML-GSAI / LLaDA

Official PyTorch implementation for "Large Language Diffusion Models"

Python 1,468 107 Updated Apr 7, 2025

chongzhou96 / EdgeSAM

Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"

Jupyter Notebook 994 46 Updated Aug 12, 2024

ThreeSR / Awesome-Inference-Time-Scaling

Paper List of Inference/Test Time Scaling/Computing

Python 176 5 Updated Apr 9, 2025

jianzongwu / Awesome-Open-Vocabulary

(TPAMI 2024) A Survey on Open Vocabulary Learning

917 51 Updated Mar 23, 2025

lxtGH / Awesome-Segmentation-With-Transformer

[T-PAMI-2024] Transformer-Based Visual Segmentation: A Survey

738 52 Updated Aug 25, 2024

lxa9867 / Awesome-Autoregressive-Visual-Generation

This is a repo to track the latest autoregressive visual generation papers.

255 4 Updated Apr 15, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 16,899 1,603 Updated Apr 13, 2025

jianzongwu / DiffSensei

Implementation of [CVPR 2025] "DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation"

Python 767 63 Updated Feb 5, 2025

Tencent / HunyuanVideo-I2V

HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo

Python 1,310 105 Updated Mar 28, 2025

THUDM / CogView4

CogView4, CogView3-Plus and CogView3(ECCV 2024)

Python 992 71 Updated Mar 29, 2025

FoundationVision / UniTok

A Unified Tokenizer for Visual Generation and Understanding

Python 254 5 Updated Apr 15, 2025

viiika / Meissonic

[ICLR 2025] Official Implementation of Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Python 303 10 Updated Mar 20, 2025

huggingface / open-muse

Open reproduction of MUSE for fast text2image generation.

Python 348 29 Updated Jun 1, 2024

Xiangtai Li lxtGH

Highlights

Lists (3)

🔮 Future ideas

✨ Inspiration

🚀 My stack

Stars