yejingfu

Follow

yejingfu yejingfu

Follow

这家伙很懒，什么都不愿意留下

30 followers · 15 following

Shanghai, China

Achievements

Achievements

Starred repositories

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 3,753 304 Updated Apr 18, 2025

thu-pacman / chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 1,091 72 Updated Apr 18, 2025

AlibabaPAI / llumnix

Efficient and easy multi-instance LLM serving

Python 375 31 Updated Apr 18, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 13,583 953 Updated Apr 18, 2025

JunqiZhao888 / buzz-llm

Python 3 Updated Oct 31, 2024

unaidedelf8777 / faster-outlines

A Lazy, high throughput and blazing fast structured text generation backend.

Rust 5 Updated Nov 15, 2024

NVIDIA / Star-Attention

Efficient LLM Inference over Long Sequences

Python 368 19 Updated Feb 14, 2025

dhcode-cpp / online-softmax

simplest online-softmax notebook for explain Flash Attention

Jupyter Notebook 9 Updated Dec 27, 2024

IsaacRe / vllm-kvcompress

KV cache compression for high-throughput LLM inference

Python 126 5 Updated Feb 5, 2025

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 796 34 Updated Sep 21, 2024

stanfordnlp / dspy

DSPy: The framework for programming—not prompting—language models

Python 23,573 1,818 Updated Apr 17, 2025

Lightning-AI / LitServe

Deploy high-performance AI models and inference pipelines on FastAPI with built-in batching, streaming and more.

Python 3,062 199 Updated Apr 10, 2025

FasterDecoding / TEAL

Python 122 7 Updated Feb 15, 2025

zeroine / cutlass-cute-sample

C++ 30 10 Updated Apr 15, 2024

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 1,236 118 Updated Apr 17, 2025

KittyCAD / text-to-cad-ui

A lightweight UI for interfacing with the Zoo text-to-cad API, built with SvelteKit.

Svelte 208 30 Updated Apr 11, 2025

fishaudio / fish-speech

SOTA Open Source TTS

Python 20,719 1,639 Updated Apr 12, 2025

gabotechs / MusicGPT

Generate music based on natural language prompts using LLMs running locally

Rust 965 97 Updated Feb 9, 2025

aiola-lab / whisper-medusa

Whisper with Medusa heads

Python 830 52 Updated Feb 26, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 13,307 1,537 Updated Apr 18, 2025

Zefan-Cai / KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

Python 1,007 131 Updated Jan 4, 2025

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 8,658 2,309 Updated Apr 16, 2025

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python 20,423 1,712 Updated Apr 6, 2025

arcee-ai / mergekit

Tools for merging pretrained large language models.

Python 5,560 526 Updated Apr 10, 2025

Emerging-AI / ENOVA

A deployment, monitoring and autoscaling service towards serverless LLM serving.

Python 151 23 Updated Mar 3, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 6,127 530 Updated Apr 17, 2025

microsoft / MInference

[NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an …

Python 971 47 Updated Apr 16, 2025

agent-husky / Husky-v1

Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and knowledge-based reasoning tasks.

Python 340 30 Updated Jun 16, 2024

ggml-org / llama.cpp

LLM inference in C/C++

C++ 78,328 11,440 Updated Apr 18, 2025

bbycroft / llm-viz

3D Visualization of an GPT-style LLM

TypeScript 4,624 521 Updated Aug 24, 2024

Starred topics

llamaindex