Machine learning
nlp
Dec 23, 2018     7 minutes read

1. What is nlp and why should you care?

2. NLP subjects and how to approach them

Basic concepts

Basic concepts are: document, corpus, vector and model. You’ll find their definitions here.

Preprocessing

The input data is usually very messy. Just as in traditional machine learning (I refer here to classical, matrix-transformable tabular format) the data must be ‘clean’ so the results would not come out messy as well. What does ‘clean data’ in nlp stand for? In general having run through the following steps should give us a fairly usable, valid dataset:

Transforming

Exploration

Learning (classification)

Other interesting resources