Topics in Computational Linguistics

LIN 6932 (Section 2702)

University of Florida

Spring Semester 2007

 

 

 

 

Instructor:             Hana Filip

Time:                     T7 (1:55-2:45) &  R7-8 (1:55-3:50)

Place:                     ARCH (Architecture) 120

Office:                   370 Dauer

Office hours:         T/R6 (12:50-1:40) & by appointment

E-mail:                   hana.filip@gmail.com

Office phone:        392-2101 ext 217

Web page:             http://plaza.ufl.edu/hfilip

 

Course Description: This course is an introductory overview to the field of natural language processing and computational linguistics.  It pursues two main goals.  First, it covers finite-state methods, parsing and grammars, computational semantics, information extraction and information retrieval, question answering, web search, discourse processing, empirical corpus-based linguistics, including creation and annotation of large-scale corpora.

      Second, the course is a computer literacy class, designed toward enabling you to make a computer do exactly what you want it to, and to this goal, tools for the working computational linguist will be introduced.  The focus of this class is on writing scripts to use available online implementations of NLP/CL applications, rather than on implementing complete applications themselves.  In this connection, the class will be concerned with the operating system Unix.

 

Requirements

¥    Homeworks: 7 homeworks (Homework Collaboration Policy, see below). Homework is due before the class starts on the day it is due.  LATE HOMEWORK WILL NOT BE ACCEPTED.  I will drop your lowest homework grade.

¥    Readings: To be read before the class period in which they will be discussed.  You will be expected to do a significant amount of textbook reading in this course.

 

Grading

¥     84% best 6 homeworks out of 7

¥     16% class participation

 

Required Texts

Selected online PDFs book chapters and articles.

The abbreviation ÔJ+MÕ in the syllabus refers to Jurafsky, Daniel and James Martin. 2007. Speech and Language Processing:  An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice-Hall.

 

Recommended Texts

Wynne, Martin. A Course In The Unix Operating System.   http://www.comp.lancs.ac.uk/computing/users/eiamjw/unix/ (online manual)

 

This is just one among many unix tutorials and manuals available online.  You may want to take a look yourself what is available.  If youÕd rather have a book in hand, you may consider purchasing:

 

Peek, Jerry, Grace Todino-Gonguet, John Strang.  2002. Learning the Unix Operating System. (Fifth Edition.) O'Reilly Media.   http://www.oreilly.com/catalog/lunix5/ also available at amazon.com.

SYLLABUS 

(subject to changes)

Wk

Date

HW

 

Lec

Topic and Readings

1

Jan 9

 

 

Introduction

1

Jan 11

 

Lec 1 (ppt)

Overview of Computer Speech and Language Processing, Regular Expressions

 

    * J+M Old Chapter 1: Introduction

2

Jan 16

 

Unix 1

(ppt)

Unix

2

Jan 18

HW1

Lec 2 (ppt)

Finite Automata

 

    * J+M Old Chapter 2: Regular Expressions and Automata

    * J Weizenbaum. ELIZA- A Computer Program for the Study of Natural Language Communication between Man and machine. CACM, Vol. 10, 1967

á   OPTIONAL ADVANCED READING J+M New Chapter 3: Finite-State Transducers, Morphology, and Edit Distance

á   FUN STUFF: http://www.cs.princeton.edu/introcs/75turing/

Downloadable Turing Machine Simulator   

(Princeton University, Computer Science Department)

3

Jan 23

 

Unix 2

(ppt)

Unix

3

Jan 25

 

 

Part of Speech Tagging and Intro to Probabilistic Modeling

 

    * J+M New Chapter 5: Word Classes and Part of Speech Tagging

4

Jan 30

 

Unix 2

(ppt)

Unix

4

Feb 1

 

Lec 4 (ppt)

Part of Speech Tagging (II)

 

    * J+M New Chapter 5: Word Classes and Part of Speech Tagging

5

Feb 6

 

 

Unix

5

Feb 8

 

Lec 5 (ppt)

N-grams

 

    * J+M New Chapter 6: Hidden Markov Models and Loglinear Models, page 1-13 only

    * J+M New Chapter 4: N-Grams (page 1-17 only)

6

Feb 13

 

 

Unix

6

Feb 15

 

Lec 6 (ppt)

Grammars and Parsing

 

    * J+M New Chapter 11: Formal Grammars of English

7

Feb 20

 

Unix 3

(ppt)

Unix

7

Feb 22

 

Lec 7 (ppt)

Grammars and Parsing (II)

 

    * Chomsky Hierarchy

8

Feb 27

 

Unix 4 (ppt)

Unix

8

Mar 1

 

Lec 8 (ppt)

Grammars and Parsing (III)

 

á         J+M New Chapter 12: Parsing with Context-Free Grammars

9

Mar 6

 

Unix 5

(ppt)

Unix

9

Mar 8

 

 

Question Answering

 

    * Chapter 1 of Christopher D. Manning, Prabhakar Raghavan and Hinrich SchŸtze, Introduction to Information Retrieval, Cambridge University Press. 2007.

    * D. Moldovan, S. Harabagiu, M. Pasca, R. Mihalcea, R. Goodrum, R. Girju, and V. Rus. 1999. LASSO: A tool for surfing the answer net. In Proceedings of the Eighth Text Retrieval Conference (TREC-8), 1999.

    * E. Brill, S. Dumais and M. Banko. 2002. An analysis of the  AskMSR question-answering system. Proceedings of EMNLP 2002.

10

Mar 13

 

SPRING BREAK

 

10

Mar 15

11

Mar 20

 

 

Unix

11

Mar 22

 

Lec 9 (ppt)

Machine Translation:  Statistical MT

 

    * J+M New Chapter 24 (pages 1-46)

12

Mar 27

 

Unix 5

(ppt)

Unix

12

Mar 29

 

Lec 10 (ppt)

Computational Lexical Semantics 1

 

    * J+M New Chapter 19 (pages1-28)

13

Apr 3

HW6.II.sol

Unix 6 (ppt)

Unix

13

Apr 5

 

Lec 11 (ppt)

Computational Lexical Semantics 2

 

    * J+M New Chapter 19 (pages 28-51)

14

Apr 10

 

Unix 7  (ppt)

Unix

14

Apr 12

 

 

 

Information Extraction: Faustus, TextPro

     * Introduction to Information Extraction Technology:

        A Tutorial Prepared for IJCAI-99 by Douglas E. Appelt

        and David J. Israel

     * Using Information Extraction to Improve Document Retrieval

         by John Bear et al., 1998

     * Douglas Appelt: TextPro Documentation

     * The (Non)Utility of Predicate-Argument Frequencies for

         Pronoun Interpretation by Douglas Appelt and Andy Kehler

 

15

Apr 17

 

 

Unix

15

Apr 19

 

 

Discourse

 

    * J+M New Chapter 20

16

Apr 24

 

Unix 8  (ppt)

Unix

 

Homework Collaboration Policy (This policy is directly taken from Chris ManningÕs CS 224N course held at Stanford University.)

 

¥      You may talk to anybody you want about the assignments, including working through problems together in groups.  Indeed, we encourage you to work in groups, and to work with different people through the quarter.

¥      However, for written problem sets (there may not be any for this class; I'll let you know):

1.    you must state on your written assignment the people you discussed problems with

2.    you are not allowed to take detailed notes in any group sessions that will appear verbatim in assignment write-ups.  Everybody has to turn in written homework answers that are written solely by himself/herself.

¥    Programming parts/projects: These can be done by oneself or in groups of at most 3, and people may submit a joint submission or identical material, which is assumed to be the joint work of all partners.

 

 

*********

Plagiarism or cheating on homework assignments will not be tolerated.  Any example of Academic Dishonesty will be subject to the rules and regulations set forth under the headings, ÒStandard of Ethical ConductÓ, ÒAcademic HonestyÓ, and ÒStudent Conduct CodeÓ in The University Record 2004-5, Sec.1, pp.8-9.

 

If you miss more than four sessions, please drop this course.  For details, see ÒAttendance PoliciesÓ, The University Record 2004-5, Sec.2, p.13.  Failure to attend and actively participate will result in a lowering of grades.