CS 272 Statistical NLP (34770)

Statistical Natural Language Processing

Instructor: Sameer Singh
Lectures: SH 174 TuTh 12:30-13:50
Office Hours
: DBH 4204 (by appointment)
Course Code: 34770

TA: Sai Prameela Konduru (spkondur@uci.edu), Office: Mon 13:30-14:00(ICS2 215),14:00-15:30(ICS2 216)

Campuswire: https://campuswire.com/c/G7DACEAF6

Resources for the course (books, datasets, papers): Resources

A computer’s ability to read, learn, and understand language is becoming of utmost importance with access to enormous amounts of digitized text (that we can’t possibly read), with personal communication increasingly becoming digital (that we can’t possibly remember), and with autonomous agents becoming bigger parts of our everyday lives (with whom we need to talk to). This course will introduce the historical and recent approaches to natural language processing, in particular focusing on the computational tasks and the machine learning techniques involved in NLP that have achieved incredible successes.

Tentatively, the course will cover the following topics:

  • Introduction: what is NLP? Applications and challenges, review of probability and statistics
  • Word and Bag of Words Representations: vector space models, word representations, word embeddings, text classification, naive bayes, discriminative classifiers, logistic regression, feed-forward neural networks, convolutional neural networks
  • N-grams and Sequence Modeling: language models, featurized language models, neural language models, sequence modeling, part of speech tagging, named entity recognition, hidden markov models, conditional random fields, recurrent neural networks
  • Sentence Structure Modeling: context-free grammars, probabilistic CFGs, PCFG parsing, constituency parsing, dependency parsing, semantic role labeling, recursive neural networks, neural parsing, sequence to sequence mapping with LSTMs
  • Information Extraction: sentence-level relation extraction, corpus-level relation extraction, within-doc coreference, cross-doc coreference, entity-linking, question answering
  • Text Generation, and other topics: machine translation, text summarization, textual entailment, reading comprehension


At minimum:

  • An introductory machine learning course (CS 178, CS 273A, or equivalent), although an advanced course like CS 274B is a plus.
  • An introductory artificial intelligence course (CS 171 or equivalent).
  • Programming assignments will require a working familiarity with Python, along with familiarity with data structures and algorithms.

Contact me if you are concerned about your background for the course.

Grading Policy

  • 4 programming assignments: 40%
  • 3 paper summaries: 15%
  • Final project: 30%
  • Participation (quizzes, Campuswire, course evaluations): 15%

Academic Honesty

Academic honesty is a requirement for passing this class. Any student who compromises the academic integrity of this course is subject to a failing grade. The work you submit must be your own. Academic dishonesty includes, but is not limited to copying answers from another student, allowing another student to copy your answers, communicating exam answers to other students during an exam, attempting to use notes or other aids during an exam, or tampering with an exam after it has been corrected and then returning it for more credit. If you do so, you will be in violation of the UCI Policies on Academic Honesty (see link). It is your responsibility to read and understand these policies. Note that any instance of academic dishonesty will be reported to the Academic Integrity Administrative Office for disciplinary action and may be cause for a failing grade in the course.

Course Summary:

Date Details Due