Bryn Mawr College
CS 325: Computational Linguistics
Lab Assignment#1
Due in class on Thursday, September 15
Description: Write a program to access and extract the headline from the
Top Story at the web site of The Onion (www.theonion.com).
For example, a possible output from your program may be:
On September 8, 2005 the top story at The Onion was:
God Outdoes Terrorists Yet Again
Notes
- First, visit the web site yourself and see for yourself what
the top story is and then think about how you identified it.
- Take a look at the html-source ofthe web page and look for textual markers.
- You will need to make generous use of regular expressions. Python's regular
expression package is especially well suited for this purpose. Refer to
the tutorials posted on the main home page for this class.
- This is basically a text processing task.
- Work incrementally to accomplish the task.
- Remember that in this domain, the problems generally tend to be ill-defined
and solutions also tend to be imperfect.
- This exercise is designed to help you face with the above reality and
yet explore and come up with your own solution(s) to solving the problem.
- Try and document your thought process at each step.
- Once done, write down the process by which you arrived at the final solution.
- Hand in a report containing the outcome of (9) above, your well commented
program(s), and a sample output. Also, write a final section on your own
reflections on the exercise, the process, and how you arrived at the solution.
Is your solution general enough? For example, would it be able to extract
the same information from another similar source? What changes/modifications
would you require for another source?
Back to CS325 home page.