Sunday, April 11, 2010

barzilay1997using Using lexical chains for text summarization

Barzilay, R. and Elhadad, M. (1997). Using lexical chains for text summarization. In Proceedings ISTS'97.


= = = = = = = = = =
[das2007survey]
Barzilay and Elhadad (1997) describe a work that used considerable amount of linguistic analysis for performing the task of summarization. For a better understanding of their method, we need to define a lexical chain: it is a sequence of related words in a text, spanning short (adjacent words or sentences) or long distances (entire text).

The authors' method progressed with the following steps: segmentation of the text, identi cation of lexical chains, and using strong lexical chains to identify the sentences worthy of extraction. They tried to reach a middle ground between (McKeown and Radev, 1995) and (Luhn, 1958) where the former relied on deep semantic structure of the text, while the latter relied on word statistics of the documents.

The authors describe the notion of cohesion in text as a means of sticking together different parts of the text.

Lexical cohesion is a notable example where semantically related words are used. For example, let us take a look at the following sentence.

                                            John bought a Jag. He loves the car

Here, the word car refers to the word Jag in the previous sentence, and exempli es lexical cohesion. The phenomenon of cohesion occurs not only at the word level, but at word sequences too, resulting in lexical chains, which the authors used as a source representation for summarization. Semantically related words and word sequences were identi ed in the document, and several chains were extracted, that form a representation of the document. To find out lexical chains, the authors used Wordnet (Miller, 1995), applying three generic steps:

1. Selecting a set of candidate words.
2. For each candidate word, finding an appropriate chain relying on a relatedness criterion among members of the chains,
3. If it is found, inserting the word in the chain and updating it accordingly.


The relatedness was measured in terms of Wordnet distance. Simple nouns and noun compounds were used as starting point to nd the set of candidates. In the final steps, strong lexical chains were used to create the summaries. The chains were scored by their length and homogeneity. Then the authors used a few heuristics to select the signi cant sentences.
= = = = = = = = = = 

No comments:

Post a Comment