Sunday, April 11, 2010

lin1997identifying Identifying topics by position

Lin, C.-Y. and Hovy, E. (1997). Identifying topics by position. In Proceedings of the Fifth conference on Applied natural language processing, pages 283-290, San Francisco, CA, USA.



= = = = = = = = = =
[das2007survey]
Lin and Hovy (1997) studied the importance of a single feature, sentence position. Just weighing a sentence by its position in text, which the authors term as the "position method", arises from the idea that texts generally follow a predictable discourse structure, and that the sentences of greater topic centrality tend to occur in certain specifiable locations (e.g. title, abstracts, etc).

However, since the discourse structure signi cantly varies over domains, the position method cannot be defined as naively as in (Baxendale, 1958).

The paper makes an important contribution by investigating techniques of tailoring the position method towards optimality over a genre and how it can be evaluated for eeffectiveness.

A newswire corpus was used, the collection of Ziff -Davis texts produced from the TIPSTER program; it consists of text about computer and related hardware, accompanied by a set of key topic words and a small abstract of six sentences. For each document in the corpus, the authors measured the yield of each sentence position against the topic keywords. They then ranked the sentence positions by their average yield to produce the Optimal Position Policy (OPP) for topic positions for the genre.

Two kinds of evaluation were performed. Previously unseen text was used for testing whether the same procedure would work in a different domain. The first evaluation showed contours exactly like the training documents. In the second evaluation, word overlap of manual abstracts with the extracted sentences was measured. Windows in abstracts were compared with windows on the selected sentences and corresponding precision and recall values were measured. A high degree of coverage indicated the effectiveness of the position method.

In later work, Lin (1999) broke away from the assumption that features are independent of each other and tried to model the problem of sentence extraction using decision trees, instead of a naive-Bayes classifier.."
= = = = = = = = = =

No comments:

Post a Comment