Skip to content
View yejingfu's full-sized avatar
  • Shanghai, China

Block or report yejingfu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A Datacenter Scale Distributed Inference Serving Framework

Rust 3,753 304 Updated Apr 18, 2025

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 1,091 72 Updated Apr 18, 2025

Efficient and easy multi-instance LLM serving

Python 375 31 Updated Apr 18, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 13,583 953 Updated Apr 18, 2025
Python 3 Updated Oct 31, 2024

A Lazy, high throughput and blazing fast structured text generation backend.

Rust 5 Updated Nov 15, 2024

Efficient LLM Inference over Long Sequences

Python 368 19 Updated Feb 14, 2025

simplest online-softmax notebook for explain Flash Attention

Jupyter Notebook 9 Updated Dec 27, 2024

KV cache compression for high-throughput LLM inference

Python 126 5 Updated Feb 5, 2025

A throughput-oriented high-performance serving framework for LLMs

Cuda 796 34 Updated Sep 21, 2024

DSPy: The framework for programming—not prompting—language models

Python 23,573 1,818 Updated Apr 17, 2025

Deploy high-performance AI models and inference pipelines on FastAPI with built-in batching, streaming and more.

Python 3,062 199 Updated Apr 10, 2025
Python 122 7 Updated Feb 15, 2025

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 1,236 118 Updated Apr 17, 2025

A lightweight UI for interfacing with the Zoo text-to-cad API, built with SvelteKit.

Svelte 208 30 Updated Apr 11, 2025

SOTA Open Source TTS

Python 20,719 1,639 Updated Apr 12, 2025

Generate music based on natural language prompts using LLMs running locally

Rust 965 97 Updated Feb 9, 2025

Whisper with Medusa heads

Python 830 52 Updated Feb 26, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 13,307 1,537 Updated Apr 18, 2025

Unified KV Cache Compression Methods for Auto-Regressive Models

Python 1,007 131 Updated Jan 4, 2025

A framework for few-shot evaluation of language models.

Python 8,658 2,309 Updated Apr 16, 2025

Universal LLM Deployment Engine with ML Compilation

Python 20,423 1,712 Updated Apr 6, 2025

Tools for merging pretrained large language models.

Python 5,560 526 Updated Apr 10, 2025

A deployment, monitoring and autoscaling service towards serverless LLM serving.

Python 151 23 Updated Mar 3, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 6,127 530 Updated Apr 17, 2025

[NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an …

Python 971 47 Updated Apr 16, 2025

Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and knowledge-based reasoning tasks.

Python 340 30 Updated Jun 16, 2024

LLM inference in C/C++

C++ 78,328 11,440 Updated Apr 18, 2025

3D Visualization of an GPT-style LLM

TypeScript 4,624 521 Updated Aug 24, 2024
Next