Bryn Mawr College
CMSC 325/LING B325: Computational Linguistics
Fall 2013
Course Materials
Prof. Deepak Kumar
General Information
Lecture Hours: Tuesdays & Thursdays, 9:45a to 11:15a
Room: Park 338
Lab: Wednesdays 10a to noon in Room 231
Laboratories:
- Computer Science Lab Room 231 (Science Building)
- You will also be able to use your own computer to do the labs
for this course.
Lab Assistants: The following Lab assistants
will be available during the week (names and schedules will be
posted by the end of this week) for assistance on lab assignments.
- Shohini Bhattasali, Hours Mondays 8:00p to 10:00p and Wednesdays 3:00p to 5:00p in Room 252 PSB
Texts &
Software
Speech & language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by
Daniel Jurafsky & James H. Martin, 2nd Edition, Pearson Prentice Hall 2008.
Available at the Campus Bookstore.
This book is available at Amazon.com ($99.69 new, $33.25 rent)
At the Bryn mawr Bookstore ($165. 25 new, $124.00 used, and $ 82.54 rent)
Python+NLTK
This software is installed on all computers in the CS Lab. It can also be installed on your computers. Await instructions in the lectures about installing on your own computers.
|
|
Important Dates
September 3 : First lecture
October 1: Exam 1
November 7: Exam 2
December 12: Last lecture/Exam 2
Assignments
- Assignment#1:
(Due on Thursday, September 19, 2013) Click here for details.
- Assignment#2 (Due on Thursday, September 26) Click here for details.
- Assignment#3: (Due on Thursday, October 24, 2013) Click here for details.
- Assignment#4: (Due on Thursday, October 31, 2013) Click here for details.
- Assignment#5: (Due on Thursday, November 7, 2013) Click here for details.
- Assignment#6: (Due on Tuesday, November 26, 2013) Click here for details.
Lectures
- Week 1 (September 3, 5)
September 3: Intoruduction to Computational Linguistics. Course overview. Examples of language processing: Google Search, machine translation: Google Translate, iTranslate iPhone app, Microsoft Demo (Nov. 2012). Identifying language tasks and the knowldege required for these tasks. Language processing versus data processing.
Read: Chapter 1 from Jurafsky & martin.
September 5: The NLP pipeline. Common themes/issues in NLP systems: Ambiguity, general architecture, successful algorithms and paradigms, accuracy, coverage. Formal versus Natural languages. Formal languages.
Read: Start reading Chapter 2 from J&M
- Week 2 (September 10,12)
September 10: Regular Expressions: for searching and specifying languages.
Basic elements of regular expressions: expressions, anchors, counters,
operator precedence, substitution, memory, examples.
Read: Chapter 2 from Jurafsky & Martin.
Do: Search in Google for "Microsoft Word regular expression search" and look for a link to a page at office.microsoft.com site on using regular expressions in Word. Follow the tutorial and learn how to use regular expressions in Word and note the little differences in the the use and specification of patterns in Word vs how we did them in class.
Read and do the Python/NLTK tutorials: Part 1, Part 2.
September 12: Putting regular expressions to work. The Python re library. Introduction to NLTK. How to acquire text, text corpora, and web pages.
Assignment#1: (Due on Thursday, September 19, 2013) Click here for details.
- Week 3 (September 17, 19)
September 17: Finite state automata, deterministic and non-deterministic FSAs. The equivalence of deterministic and non-deterministic FSAs. Formal Languages. The equivalence between regular expressions, regular languages, and finite state automata.
Read: Chapter 2 from Jurafsky & Martin.
September 19: Discussion of Assignment#1. Words: Morphology, morphemes, affixes, inflection, derivation, compounding, cliticization.
Read: Chapter 3 from J&M.
Assignment#2 (Due on Thursday, September 26) Click here for details.
- Week 4 (September 24, 26)
September 24 : Words: Morphology, morphemes, affixes, inflection, derivation, compounding, cliticization. FST for morphological parsing.
Read: Chapter 3 from J&M.
September 26 : FSTs for morphological parsing. Spelling errors: detection and correction. Minimum Edit Distance using dynamic programming.
- Week 5 (October 1, 3)
October 1: Exam 1 is today.
October 3: Review of Exam. Minimum Edit Distance. Top-down versus bottom-up algorithms. Dynamic Programming.
- Week 6 (October 8, 10)
October 8: Minimum edit distance: two implementations. Word segmentation. Segmenting hashtags using MaxMath algorithm.
Assignment#3: (Due on Thursday, October 24, 2013) Click here for details.
October 10: Parts of Speech. Traditional versu tagsets. Tag ambiguity. Today we will watch Schoolhouse Rock/Grammar Rock videos to get a quick intoroduction to basic POS categories.
- Week 7 (October 15, 17)
No Classes. Fall Break
- Week 8 (October 22, 24)
October 22: POS Tagging using tagsets. Rule-based versus Stochastic tagging algorithms. Computing empirical frequencies using corpuses in NLTK.
Python for Linguistics Tutorial, Part 3 (NLTK corpuses, tagged corpuses, frequency distributions, etc.): Click here to access.
Read: Start reading Chapter 5
October 24: POS Tagging. Rule-based taggers (Regular Expression Rules), Frequency-based Taggerrs (Unigram, Bigram, etc.). Cascading Taggers.
Assignment#4: (Due on Thursday, October 31, 2013) Click here for details.
Python for Linguistics Tutorial, Part 4 (A Tour of NLTK Taggers) Click here to access
- Week 9 (October 29, 31)
October 29: Hidden markov Models for POS Tagging.
October 30: TODAY ONLY: Shohini's TA Hours will be from 2;30p to 4:30p
October 31: HMMs for POS tagging. Decoding (example using Viterbi Algorithm), general issues relating to HMMs. Syntax & Grammars. Classification of sentences (declarative, imperative, wh-questions, yes-no-qiestions).
Read: Start reading Chapter 12 from J&M.
Assignment#5: (Due on Thursday, November 7, 2013) Click here for details.
- Week 10 (November 5, 7)
November 5: Syntax and grammars: an introduction.
November 7: Exam 2 is today.
- Week 11 (November 12, 14)
November 12: Context Free Grammars: Formal definition, issues in modeling Nooun Phrases, Verb Phrases. Examples. Language equivalence, Chomsky Normal Form. Probabilistic CFGs.
Read: Chapter 12 from J&M
November 14: Probabilistic Grammars. Parsing: Top-Down vs Bottom up. Recursive Descent parsers, Shift-Reduce parsers. Recursive Transition Networks (RTNs), Augmented Transition networks (ATNs).
Read: Start reading Chaper 13.
Python for Linguists Tutorial Part#5 (Parsing in NLTK)
Assignment#6: (Due on Tuesday, November 26, 2013) Click here for details.
- Week 12 (November 19, 21)
November 19: Parsing contd. Augmented Transition Networks, Dynamic Programming based parsers: Bottom Up (CKY), Top Down (Earley),
Read: Chapter 13.
November 21: Semantics: Meaning representation formalisms: an overview. First-Order predicate Calculus.
Read: Chapter 17
- Week 13 (November 26, 28)
November 26: Semantics: Attachment rules to CFG for semantics. Syntax-based semantics. Lambda expressions and recuctions and their uses in semantics.
Read: Chapter 8.
November 28: No class. Thanksgiving!
- Week 14 (December 3, 5)
December 3: No class today, Deepak is out of town (at Purdue University)
Watch: IBM Watson takes on Champions of jeopardy! Day 1 (Part1, Part2), Day2 (Part1, Part2), Day 3 (Part1, Part2)
Read: Building Watson: An Overview of the DeepQA Project, Ferrucci et al (AI Magazine, Vol. 31, Number 3, Fall 2010, AAAI Press).
December 5: No class today, Deepak is out of town (at Purdue University)
- Week 15 (December 10, 12)
December 10: Today's class is optional due to Snow Day. Blue Bird buses are not running between campuses so many of you will not make it to class. I will not plan to do anything new. Just prepare for the Exam 3 on thursday. I will be in my office all day in case you need to see me for anything. It was great having you all in class this semester!
December 12: Exam 2 is today.
Grading
All graded work will receive a grade, 4.0, 3.7, 3.3, 3.0, 2.7, 2.3, 2.0, 1.7, 1.3, 1.0, or 0.0. At the end of the semester, final grades will be calculated as a weighted average of all grades according to the following weights:
Exam 1: 15%
Exam 2: 15%
Exam 3: 15%
Labs & Written Work: 55%
Total: 100%
Links
Text's Home Page (Jurafsky & Martin)
The Association for Computational Linguistics (ACL)
The Language Computer Q&A demo
An online version of ELIZA
NLTK Home page
NLTK LITE Tutorials
NLTK LITE API Documentation
Created on August 8, 2013.