課程簡介

Detailed training outline

  1. Introduction to NLP
    • Understanding NLP
    • NLP Frameworks
    • Commercial applications of NLP
    • Scraping data from the web
    • Working with various APIs to retrieve text data
    • Working and storing text corpora saving content and relevant metadata
    • Advantages of using Python and NLTK crash course
  2. Practical Understanding of a Corpus and Dataset
    • Why do we need a corpus?
    • Corpus Analysis
    • Types of data attributes
    • Different file formats for corpora
    • Preparing a dataset for NLP applications
  3. Understanding the Structure of a Sentences
    • Components of NLP
    • Natural language understanding
    • Morphological analysis - stem, word, token, speech tags
    • Syntactic analysis
    • Semantic analysis
    • Handling ambigiuty
  4. Text data preprocessing
    • Corpus- raw text
      • Sentence tokenization
      • Stemming for raw text
      • Lemmization of raw text
      • Stop word removal
    • Corpus-raw sentences
      • Word tokenization
      • Word lemmatization
    • Working with Term-Document/Document-Term matrices
    • Text tokenization into n-grams and sentences
    • Practical and customized preprocessing
  5. Analyzing Text data
    • Basic feature of NLP
      • Parsers and parsing
      • POS tagging and taggers
      • Name entity recognition
      • N-grams
      • Bag of words
    • Statistical features of NLP
      • Concepts of Linear algebra for NLP
      • Probabilistic theory for NLP
      • TF-IDF
      • Vectorization
      • Encoders and Decoders
      • Normalization
      • Probabilistic Models
    • Advanced feature engineering and NLP
      • Basics of word2vec
      • Components of word2vec model
      • Logic of the word2vec model
      • Extension of the word2vec concept
      • Application of word2vec model
    • Case study: Application of bag of words: automatic text summarization using simplified and true Luhn's algorithms
  6. Document Clustering, Classification and Topic Modeling
    • Document clustering and pattern mining (hierarchical clustering, k-means, clustering, etc.)
    • Comparing and classifying documents using TFIDF, Jaccard and cosine distance measures
    • Document classifcication using Naïve Bayes and Maximum Entropy
  7. Identifying Important Text Elements
    • Reducing dimensionality: Principal Component Analysis, Singular Value Decomposition non-negative matrix factorization
    • Topic modeling and information retrieval using Latent Semantic Analysis
  8. Entity Extraction, Sentiment Analysis and Advanced Topic Modeling
    • Positive vs. negative: degree of sentiment
    • Item Response Theory
    • Part of speech tagging and its application: finding people, places and organizations mentioned in text
    • Advanced topic modeling: Latent Dirichlet Allocation
  9. Case studies
    • Mining unstructured user reviews
    • Sentiment classification and visualization of Product Review Data
    • Mining search logs for usage patterns
    • Text classification
    • Topic modelling

最低要求

Knowledge and awareness of NLP principals and an appreciation of AI application in business

 21 時間:

客戶評論 (2)

相關課程

Smart Robots for Developers

84 時間:

課程分類