⚡️Waifu: Efficient High-Resolution Waifu Synthesis

Train in progress!

Prompt: 1girl, solo, animal ears, bow, teeth, jacket, tail, open mouth, brown hair, orange background, bowtie, orange nails, simple background, cat ears, orange eyes, blue bow, animal ear fluff, cat tail, looking at viewer, upper body, shirt, school uniform, hood, striped bow, striped, white shirt, black jacket, blue bowtie, fingernails, long sleeves, cat girl, bangs, fangs, collared shirt, striped bowtie, short hair, tongue, hoodie, sharp teeth, facial mark, claw pose

19.12:

20.12:

Burned money: ~$1000 Pls, let' us know if you have some money or GPU for training opensource waifu model, contacts: recoilme

💡 Introduction

tldr; We just need a model to generate waifu

We introduce Waifu, a text-to-image framework that can efficiently generate images up to 768 × 768 resolution on 80+ languages. Our goal was to create a small model that is easy to full finetune on custom GPU, but without compromising on quality. It's like a SD 1.5, but developed in 2024 using the most advanced components at this moment. Waifu can synthesize high-resolution, high-quality images of waifu with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU.

Core designs include:

(1) AuraDiffusion/16ch-vae: A fully open source 16ch VAE. Natively trained in fp16.
(2) Linear DiT: we use 1.6b DiT transformer with linear attention.
(3) MEXMA-SigLIP: MEXMA-SigLIP is a model that combines the MEXMA multilingual text encoder and an image encoder from the SigLIP model. This allows us to get a high-performance CLIP model for 80 languages..
(4) Other: we use Flow-Euler sampler, Adafactor-Fused optimizer and bf16 precision for training, and combine efficient caption labeling (MoonDream, CogVlM) and danbooru tags to accelerate convergence.

As a result, Waifu-2b is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Waifu-2b can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 768 × 768 resolution image. Waifu enables waifu creation at low cost.

Example

    1  apt update
    4  git clone https://github.com/recoilme/waifu
    5  cd waifu/
    6  pip install -e .
    7  pip install flash-attn --no-build-isolation
    8  cd ..
   13  python waifu/train_scripts/make_buckets_new.py --h
   14  python waifu/train_scripts/make_buckets_new.py --config_path waifu/configs/sana_config/576ms/waifu-2b-576.yaml --load_from waifu-2b-v01.pth 
   15  cd waifu
   21  nvidia-smi
   25  accelerate config
   61  nohup accelerate launch train_scripts/train_waifu.py --config configs/sana_config/576ms/waifu-2b-576.yaml --name 33 --load_from /workspace/waifu-2b-v01.pth &
   62  tail -f nohup.out

// AiArtLab team

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
.github/workflows		.github/workflows
CIs		CIs
app		app
asset		asset
configs		configs
diffusion		diffusion
inference		inference
sana		sana
scripts		scripts
tests/bash		tests/bash
tools		tools
train_scripts		train_scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
1.ipynb		1.ipynb
CITATION.bib		CITATION.bib
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment_setup.sh		environment_setup.sh
pyproject.toml		pyproject.toml
test.ipynb		test.ipynb
wds.py		wds.py
wget-log		wget-log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡️Waifu: Efficient High-Resolution Waifu Synthesis

Train in progress!

💡 Introduction

Example

About

Releases

Packages

Languages

License

recoilme/waifu

Folders and files

Latest commit

History

Repository files navigation

⚡️Waifu: Efficient High-Resolution Waifu Synthesis

Train in progress!

💡 Introduction

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages