Bryn Mawr College
CMSC 325: Computational Linguistics
Fall 2022
Course Materials
Prof. Deepak Kumar
General Information
Instructor(s)
Deepak Kumar
202 Park Science Building
526-7485
dkumar at brynmawr dot edu
https://cs.brynmawr.edu/~dkumar/
Lecture Hours: Mondays & Wednesdays from 10:10a to 11:30a
Office Hours: Mon 11:40a to 12:30a, Tue 10:20a to 11:30a, or by appointment.
Lecture Room: Room 245 Park Science Building
Lab:
- Lab: Mondays 11:40a to 1:00p in Room Park 231
Laboratories
- Computer Science lab Room 231 (Science Building)
- You will also be able to use your own computer to do labs for some assignments this course.
Texts & Software
Main Texts (Required)
- Speech and Language Processing, 3rd Edition. Pearson-Prentice Hall, forthcoming in 2022-23.
You cannot purchase this as yet. The authors have made electronic copies of Chapters available: Click here.
- Natural Language Processing with Python, by Steven Bird, Ewan Klein, and Edward Loper.
Electronic copies available: Click here.
- Software: Python 3.x (download here) and NLTK 3.x (download here)
|
|
Syllabus
Course Description: Class Number: 2062
Introduction to computational models of understanding and processing human languages. How elements of linguistics, computer science, and artificial intelligence can be combined to help computers process human language and to help linguists understand language through computer models. Topics covered: syntax, semantics, pragmatics, generation and knowledge representation techniques. Prerequisite: CMSC 206 , or H106 and CMSC 231 or permission of instructor. Haverford: Natural Science (NA)
Enrollment Limit; 24.
Lab Attendance: Attendance in Lab is optional, but will be required during specific weeks. Look for announcements below during the semester. Prof. Kumar will be available in the Lab during all Lab times throughout the semester.
Important Dates
August 29 |
First class meeting |
Exam 1 |
September 28 |
Exam 2 |
November 7 |
December 5 |
Last class meeting |
Exam 3 |
December 7 |
Creating a Welcoming Environment
All members of the Instruction Staff are dedicated to the cause of improving diversity, equity, and inclusion in the field of computing, and to supporting the wellness and mental health of our students.
Diversity and Inclusion
It is essential that all members of the course community – the instructor, TAs, and students – work together to create a supportive, inclusive environment that welcomes all students, regardless of their race, ethnicity, gender identity, sexuality, or socioeconomic status. All participants in this course deserve to and should expect to be treated with respect by other members of the community.
Class meetings, lab sessions, office hours, and group working time should be spaces where everyone feels welcome and included. In order to foster a welcoming environment, students of this course are expected to: exercise consideration and respect in their speech and actions; attempt collaboration and consideration, including listening to opposing perspectives and authentically and respectfully raising concerns, before conflict; refrain from demeaning, discriminatory, or harassing behavior and speech.
Wellness
Additionally, your mental health and wellness are of utmost importance to the course Instruction Staff, if not the College as a whole. All members of the instruction staff will be happy to chat or just to listen if you need someone to talk to, even if it’s not specifically about this course.
If you or someone you know is in distress and urgently needs to speak with someone, please do not hesitate to contact BMC Counseling Serices: 610-526-7360 (610-526-7778 nights and weekends). If you are uncomfortable reaching out to Counseling Services, any member of the Instruction Staff will be happy to contact them on your behalf.
We understand that student life can be extremely difficult, both mentally and emotionally. If you are living with mental health issues such as anxiety, depression, ADHD, or other conditions that may affect you this semester, you are encouraged to discuss these with the Instructor. Although the details are up to you to disclose, the Instruction Staff will do their best to support and accommodate you in order to ensure that you can succeed this course while staying healthy.
Assignments
- Assignment#1 is posted (Due on Monday, Spetember 19): Click here for details.
- Assignment#2 is posted (Due on Monday, September 26): Click here for details.
- Assignment#3 is posted (Due on Wed, October 5): Click here for details.
- Assignment#4 is posted (Due on Wed, November 2): Click here for details.
- Assignment#5 is posted (Due on Monday, December 5, 2022): Click here for details.
Lectures
- Week 1 (August 29, 31)
August 29: Intoruduction to Computational Linguistics. Course overview. Examples of language processing: Google Search, machine translation: Google Translate, iTranslate iPhone app, Microsoft Demo (Nov. 2012), Dall-e. Identifying language tasks and the knowldege required for these tasks. Language processing versus data processing.
Slides: Click here.
Read: Chapter 1 from Jurafsky & Martin.
August 31: Language formalisms: Chomsky Hierarchy. Regular Languages, Regular Expressions. Regular Expressions: an introduction.
Read: Section 2.1 from J&M 3rd Edition, Chapter 2.
- Week 2 (September 5, 7)
September 5: Labor Day, no class.
September 7:
Regular expressions in Linux and Python - a demo. How to access text files and web pages through Python. Linux tools for processing texts: tr, sort, uniq, etc. How to count number of "words" and their frequencies.
Read: Sections 1 and 2 from Chapter 1 of NLTK book. Read and work through Python tutorials: Python for Linguists Part 1, Python for Linguists Part 2.
Assignment#1 is posted (Due on Monday, Spetember 19): Click here for details.
- Week 3 (September 12, 14)
September 12: Words: types & tokens. Corpora. Text Normalization: tokenizing, word normalization, sentence segmentation. Using REs for tokenization.
Read: Chapter 2 from J&M.
September 14:
Lemmatization. Morphological parsing (overview). Stemming in nltk: Porter Stemmer, RE Stemmer. Detecting and correcting spelling errors: Levenshtein Distance (Minimum Edit Distance) algorithm and implementation.
Read: Chapter 2 from J&M.
- Week 4 (September 19, 21)
September 19: Minimum Edit Distance algorithm. Suggesting spelling corrections using Minimum Edit Dstance (Python implementation). Maxmatch algorithm for word/hashtag segmentation. Python implementation and examples.
Assignment#2 is posted (Due on Monday, September 26): Click here for details.
September 21:
N-Gram Language Models. Unigrams, bigrams, trigrams, etc. Computing word probabilities using N-grams. Maximum Likelihood Estimates (MLE). The Google N-Gram dataset. Applications of N-Grams.
Read: Chapter 3 (Up to section 3.2) from J&M.
Demo: Visualizing Trigrams by Christopher Harrison.
- Week 5 (September 26, 28)
September 26: Sequence labeling: parts of speech (POS), named entity recognition (NER). An introduction to parts of speech by way of Grammar Rock from Schoolhouse Rock.
Read: Chapter 8 (up to and including Sections 8.1 and 8.2) from J&M.
Assignment#3 is posted (Due on Wed, October 5): Click here for details.
September 28:
Exam 1 is today.
- Week 6 (October 3, 5)
October 3: Exam 1 Review. POS Tagging. Tagsets: Brown, Penn Treebank, Google's Universal tag set, Universal Dependencies tag set. Tagging algorithms; rule-based; stochastic.
Read: Chapter 8 up to and including Setcions 8.1 and 8.2 from J&M.
Files: MinimumEditDist.py, maxmatch.py
October 5:
POS Tagging: Rule-based taggers (regular expression Taggers), Stochastic Taggers (default taggers (every word is a NOUN!), Unigram Taggers, Bigram Taggers, cascading Taggers). Evaluating taggers: Accuracy against a gold standard.
Read: Chapter 8 up to and including Setcions 8.1 and 8.2 from J&M. NLTK: Accessing and working with corpora, A Tour of Taggers in NLTK.
- Week 7 (October 10, 12)
Fall Break, no classes.
- Week 8 (October 17, 19)
October 17: POS Tagging: Stochastic Taggers. Finding the most propable tag sequence: Hidden Markov Models. Introduction to Markov Chains. Formulating the tagging process.
Read: Section 8.4 from J&M.
October 19:
Hidden Markov Models (HMM) for POS tagging.
Read: Section 8.4 from J&M.
Assignment#4 is posted (Due on Wed, November 2): Click here for details.
Week 9 (October 24, 26)
October 24: Sequence Labeling: Named Entity Recognition. NER schemes. NER in NLTK. Syntax - Formal Grammars. Context Free Grammars.
Read: Chapter 8 and Chapter 12 (up to Section 12.3) from J&M.
October 26:
Syntax - Formal Grammars. Context Free Grammars.
A simple CFG for some English sentences.
Read: Chapter 8 and Chapter 12 (up to and including Section 12.3) from J&M.
- Week 10 (October 31, November2)
October 31: CFGs for English. Structure of noun phrases, verb phrases, and other phrases. Common sentence-level constructions: declarative, imperative, yes-no questions, wh-questions.
Read: Sections 12.1 through 12.3 from J&M.
November 2:
Grammars and Parsing: Top Down vs Bottom Up Parsing, Sentence-level ambiguity: Structural, Coordination, local ambiguity. Probabilistic CFGs. Recursive Transition Network (RTN) Grammars. Gramar Equivalence.
- Week 11 (Novmber 7, 9)
November 7: Exam 2 is today.
November 9:
Exam 2 Review. Parsing algorithms: top-down (Recursuve Descent Parsing); bottom-up (Shift-Reduce Parsing).
- Week 12 (November 14, 16)
November 14: CKY Parsing. Chomsky Normal Form. Converting CFGs to CNF. Tracing the CKY algorithm. Examples.
Read: Chapter 13 (up to and including Section 13.2)
November 16:
Early Parsing. Tracing the Early algorithm for some examples.
Parsing in NLTK Tutorial: Click here.
Assignment#5 is posted (Due on Monday, December 5, 2022): Click here for details.
- Week 13 (November 21, 23)
November 21: Parsing using Augmented Transition Networks (ATNs). From parse trees to meaning: meaning representations, desiderata. The meaning of meaning representations.
Read: Start reading Chapter 15 from J&M.
November 23:
No class today. Safe travels and Happy Thanksgiving!
- Week 14 (November 28, 30)
November 28: Representing meaning of sentences. Meaning representation formalisms/languages. First-Order Predicate Calculus (FOPC): syntax, semantics. Lambda-notation and Lambda Reduction. Semantic attachments for grammar rules. Examples of extracting meaning.
Read: Chatpter 15 (up to and including Section 15.3), Chapter 18 from 2nd Edition (upto and including Section 18.3)
November 30:
Semantic attachments for grammar rules. Examples of extracting meaning.
- Week 15 (December 5, 7)
December 5: Course Wrap up.
Slides: Click here.
December 7:
Exam 3 is today.
Course Policies
Submission and Late Policy
No assignment will be accepted after it is past due.
No past work can be "made up" after it is due.
No regrade requests will be entertained one week after the graded work is returned in class.
Any extensions will be given only in the case of verifiable medical excuses or other such dire circumstances, if requested in advance and supported by your Academic Dean.
Communication
As you will discover, we are proponents of two-way communication and we welcome feedback during the semester about the course. We are available to answer student questions, listen to concerns, and talk about any course-related topic (or otherwise!). Come to office hours! This helps us get to know you. You are welcome to stop by and chat. There are many more exciting topics to talk about that we won't have time to cover in-class.
Please stay in touch with us, particularly if you feel stuck on a topic or project and can't figure out how to proceed. Often a quick e-mail, phone call or face-to-face conference can reveal solutions to problems and generate renewed creative and scholarly energy. It is essential that you begin assignments early, since we will be covering a variety of challenging topics in this course.
Grading
All graded work will receive a grade, 4.0, 3.7, 3.3, 3.0, 2.7, 2.3, 2.0, 1.7,
1.3, 1.0, or 0.0. At the end of the semester, final grades will be calculated as a weighted average of all grades according to the following weights:
Exams |
75% |
Exam 1 |
25% |
Exam 2 |
25% |
Exam 3 |
25% |
Assignments |
25% |
Incomplete grades will be given only for verifiable medical illness or other such dire circumstances.
Submission and Late Policy
No assignment will be accepted after it is past due.
No past work can be "made up" after it is due.
No regrade requests will be entertained one week after the graded work is returned in class.
Any extensions will be given only in the case of verifiable medical excuses or other such dire circumstances, if requested in advance and supported by your Academic Dean.
Study Groups
All submitted work should be solely your individual work. We encourage you to discuss the material and work together to understand it. Here are our thoughts on collaborating with other students:
- The readings and lecture topics are group work. Please discuss the readings and associated topics with each other. Work together to understand the material. We highly recommend forming a reading group to discuss the material -- we will explore many ideas and it helps to have multiple people working together to understand them.
- It is fine to discuss the topics covered in the homeworks, to discuss approaches to problems, and to sketch out general solutions. However, you MUST write up the homework answers, solutions, and programs individually without sharing specific solutions, mathematical results, program code, etc. If you made any notes or worked out something on a white board with another person while you were discussing the homework, you shouldn't use those notes while writing up your answer.
- Under ABSOLUTELY NO circumstances should you share computer code with another student. You are not permitted to use or consult code found on the internet for any of your assignments.
If you have any questions as to what types of collaborations are allowed, please feel free to ask.
Created on August 1, 2022.