CERTIFICATE COURSE IN NATURAL LANGUAGE PROCESSING
Programme Code: SCPS/CS/NLP/2022-23
Class: I BSc IT Total Hours: 36 hours Modules: 5
SYLLABUS
UNIT 1 ( 6 hours)
Introduction to NLP – Various stages of NLP –The Ambiguity of Language: Why NLP Is DifficultParts of Speech: Nouns and Pronouns, Words: Determiners and adjectives, verbs, Phrase Structure. Statistics Essential Information Theory : Entropy, perplexity, The relation to language, Cross- NLTK, Python 3 and the Jupyter Notebook
UNIT II (5 hours)
Textual Sources and Formats 1: “What’s in a Text?”- Sources 2: APIs, Social Media, Web Scraping- Building your Corpus
UNIT III (7 hours)
Tokenization, N-grams and Scriptio continua- Stemming and Lemmatization, Synsets and Hypernyms- Tokenizing your Corpus- POS Tagging and Stopwords- Text “Features” and TF-IDF Classification- The “Words” in a “Text”
UNIT IV (6 hours)
Named Entity Recognition (NER)- Sentiment Analysis- What Kind of Text is it?(Machine Learning Approaches to Textual Data)- Topic Modeling Basics- Topic Modeling: Strengths, Weaknesses, Correlations-
UNIT V (6 hours)
Stylometry & Stylometric Analysis- Dendograms, PCA scatterplots & k-means- Plotting the Text, Finding the Plot- Document Clustering and Word Vectors- Doc2vec, Word2vec- Advanced Vector Analyses