Project Based Text Mining in Python
In this course students will learn the basics of text mining and will build on it to perform document categorization, document grouping and subjective analysis.
Introduction
Meet the Instructor
Course Outline
Starter Code
Theoretical Concepts of Text Representation
Structuring One Document Corpus
Structuring a Multiple Document Corpus
Setting Parameters
Using TF-IDF Representation
Reading Data from a Labeled Dataset
Using Textual Dataset from UCI Repository
Machine Learning Overview
K-Nearest Neighbors Classifier
Naive Bayes Classifier
Decision Tree Classifier
Linear Classifier
Concluding Remarks on Classifiers
Classifiers Implementation with Default Settings
Classifiers with Different Parameter Settings
Classification with a UCI Repository Dataset
Introduction to Clustering
K-means Clustering
Implementing Partitional Clustering
Agglomerative Clustering with Default Settings
Agglomerative Clustering with Parameters
Clustering UCI Repository Dataset
Cross Validation
Validation
K-Fold Cross Validation
Leave One Out Validation
Classifiers Evaluation
Predictive Accuracy of KNN using KFold
Precision, Recall and F1-measure
Confusion Matrix
Putting it all Together
Clustering Evaluation Techniques
Implementing Clustering Evaluation
Text Normalization
Lowercase, Whitespaces, Punctuations
Removing Stopwords
Stemming and Lemmatization
Regular Expressions
Applying Regular Expressions
Parts-of-speech Tagging
Data Acquisition
Text Segmentation and Tokenization