Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Starred repositories
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, …
Robustness Gym is an evaluation toolkit for machine learning.
Curated List of Blog Posts From Opinosis Analytics
Python word cloud library for use within Jupyter notebook and Python apps.
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
A few exercises for use at events.
CNN text classification using keras
ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.
Cool links & research papers related to Machine Learning applied to source code (MLonCode)
Discovering Related Clinical Concepts using Large Amounts of Clinical Notes. An unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical no…
OpinRank Dataset. Dataset containing user reviews for entities namely cars and hotels. Full reviews from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews)
MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to t…
RxNLP APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.
This repo contains code and dataset for the Opinosis Summarization Framework
Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...