Stars
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"
[CVPR 2025 Oral] VGGT: Visual Geometry Grounded Transformer
ICRA2025: OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding
Examples and guides for using the VLM Run API
[ICRA 2025] PyTorch Code for Local Policies Enable Zero-shot Long-Horizon Manipulation
[CVPR 2025] MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
Robot kinematics implemented in pytorch
Minimal, clean, single-file implementations of common robotics controllers in MuJoCo.
[RSS 2024] 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
Agent-to-Sim Learning Interactive Behavior from Casual Videos.
This project is a 2D simulation focused on learning and implementing differential drive kinematics and PID control from scratch using Pygame. The goal is to explore and understand the mathematical …
Kris' Locomotion and Manipulation Planning Toolkit
Implementation of stereographic SEW (shoulder-elbow-wrist) angle for 7-DOF robot arms as well as inverse kinematics solutions for several 7-DOF arms from "Redundancy parameterization and inverse ki…
This repository includes the inverse kinematics solver code for 7-DoF anthropomorphic manipulators and a redundancy resolution strategy with global configuration control, joint limit and singularit…
An inverse kinematics solver written with KDL and URDF libraries
The open-source iOS app that's making quality voice transcription more accessible on mobile devices.
Action Chunking Transformer implementation for low cost robot
Placeholder repo for my 3D printed brushless robot arm.
LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a chain of LLMs to find the answer. The user can see the progres…
Automate browser-based workflows with LLMs and Computer Vision
InterfaceAgent: a versatile framework designed to create system and interface agents capable of managing mobile and desktop applications and features.