Stars
Lumina-mGPT 2.0: Stand-alone Autoregressive Image Modeling
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
Frequency Autoregressive Image Generation with Continuous Tokens
[ArXiv 2025] VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior
Official Code for our MedIA paper "Mutual Consistency Learning for Semi-supervised Medical Image Segmentation" (ESI Highly Cited Paper)
MulimgViewer is a multi-image viewer that can open multiple images in one interface, which is convenient for image comparison and image stitching.
[CVPR 2025] MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
Vector (and Scalar) Quantization, in Pytorch
Liquid: Language Models are Scalable and Unified Multi-modal Generators
[CVPR 2025 Oral] VGGT: Visual Geometry Grounded Transformer
LBM: Latent Bridge Matching for Fast Image-to-Image Translation ✨
Infinite Photorealistic Worlds using Procedural Generation
Motion-Controllable Video Diffusion via Warped Noise
Benchmarking physical understanding in generative video models
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
Official electron build of draw.io
Implementing Trust in Non-Small Cell Lung Cancer Diagnosis with a Conformalized Uncertainty-Aware AI Framework in Whole-Slide Images
An open-source remote desktop application designed for self-hosting, as an alternative to TeamViewer.
Code release for https://kovenyu.com/WonderWorld/
Novel View Synthesis with multiplane/multilayer representation: CVPR2022, WACV2023
A new markup-based typesetting system that is powerful and easy to learn.
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
[CVPR'25] DepthSplat: Connecting Gaussian Splatting and Depth
[Single/Sparse View-to-Scene on a 4090(24G)] VistaDream: Sampling multiview consistent images for single-view scene reconstruction
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).