Skip to content
View kavgan's full-sized avatar

Highlights

  • Pro

Block or report kavgan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, …

Jupyter Notebook 1,168 792 Updated Dec 2, 2020

Robustness Gym is an evaluation toolkit for machine learning.

Python 441 38 Updated Jun 28, 2022

Curated List of Blog Posts From Opinosis Analytics

2 1 Updated Aug 14, 2021

Python word cloud library for use within Jupyter notebook and Python apps.

Jupyter Notebook 48 14 Updated May 15, 2024

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Python 16,037 3,579 Updated Jun 2, 2023

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

Python 6,859 2,257 Updated Mar 6, 2025

Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English

Python 128 45 Updated Jul 15, 2019
Jupyter Notebook 28 13 Updated Sep 30, 2016
Jupyter Notebook 46 47 Updated Feb 25, 2018

A few exercises for use at events.

Jupyter Notebook 1,446 678 Updated Apr 27, 2021

CNN text classification using keras

Python 15 6 Updated Nov 27, 2017

ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.

Java 214 37 Updated Apr 9, 2020

Cool links & research papers related to Machine Learning applied to source code (MLonCode)

6,353 840 Updated Dec 3, 2020

Discovering Related Clinical Concepts using Large Amounts of Clinical Notes. An unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical no…

26 11 Updated Jan 22, 2018

OpinRank Dataset. Dataset containing user reviews for entities namely cars and hotels. Full reviews from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews)

44 12 Updated May 28, 2021

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to t…

Java 997 348 Updated Mar 14, 2024

Examples of code in spark

Python 10 6 Updated Dec 2, 2017

RxNLP APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.

15 8 Updated Jan 24, 2020

This repo contains code and dataset for the Opinosis Summarization Framework

51 18 Updated Nov 14, 2019

Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...

Java 2,711 419 Updated Jun 1, 2022