Query +91 (0)484 - 2972720

  .Siena College of Professional Studies has been NAAC Accredited with a CGPA of 2.85 on a seven point scale at B++ Grade.  •      .  •     .   •  

NATURAL LANGUAGE PROCESSING

CERTIFICATE COURSE IN NATURAL LANGUAGE PROCESSING

 

Programme Code: SCPS/CS/NLP/2022-23

Class: I BSc IT      Total Hours: 36 hours          Modules:  5

SYLLABUS

 

UNIT 1 ( 6 hours)

Introduction to NLP – Various stages of NLP –The Ambiguity of Language: Why NLP Is DifficultParts of Speech: Nouns and Pronouns, Words: Determiners and adjectives, verbs, Phrase Structure. Statistics Essential Information Theory : Entropy, perplexity, The relation to language, Cross- NLTK, Python 3 and the Jupyter Notebook

UNIT II (5 hours)

Textual Sources and Formats 1: “What’s in a Text?”- Sources 2: APIs, Social Media, Web Scraping- Building your Corpus

UNIT III (7 hours)

Tokenization, N-grams and Scriptio continua- Stemming and Lemmatization, Synsets and Hypernyms- Tokenizing your Corpus- POS Tagging and Stopwords- Text “Features” and TF-IDF Classification- The “Words” in a “Text”

UNIT IV (6 hours)

Named Entity Recognition (NER)- Sentiment Analysis- What Kind of Text is it?(Machine Learning Approaches to Textual Data)- Topic Modeling Basics- Topic Modeling: Strengths, Weaknesses, Correlations-

UNIT V (6 hours)

Stylometry & Stylometric Analysis- Dendograms, PCA scatterplots & k-means- Plotting the Text, Finding the Plot- Document Clustering and Word Vectors- Doc2vec, Word2vec- Advanced Vector Analyses

 

If you Have Any Query Call Us On +91 (0)484 - 2972720