Sunday, April 18, 2010

jing2000cut Cut and Paste Based Text Summarization

Jing, Hongyang and Kathleen McKeown. 2000. Cut and paste based summarization. In Proceedings of the First Conference of the North American Chapter of the Association of Computational Linguistics, pages 178–185, Seattle.

barzilay2001extracting Extracting paraphrases from a parallel corpus

Barzilay, Regina and Kathleen McKeown. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of the ACL/EACL, pages 50–57, Toulouse, France.

Abstract
While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text. Our approach yields phrasal and single word lexical paraphrases as well as syntactic paraphrases.

Wednesday, April 14, 2010

barzilay2005sentence Sentence Fusion for Multidocument News Summarization

Regina Barzilay, Kathleen McKeown
"Sentence Fusion for Multidocument News  Summarization",
Computational Linguistics, 2005.

A system that can produce informative summaries, highlighting common information found in many online documents, will help Web users to pinpoint information that they need without extensive reading. In this article, we introduce sentence fusion, a novel text-to-text generation technique for synthesizing common information across documents. Sentence fusion involves bottom-up local multisequence alignment to identify phrases conveying similar information and statistical generation to combine common phrases into a sentence. Sentence fusion moves the summarization field from the use of purely extractive methods to the generation of abstracts that contain sentences not found in any of the input documents and can synthesize information across sources.

1. Introduction
Redundancy in large text collections,  for natural language systems:
problems : difficulties for end users of search engines and news providers
opportunitie: can be exploited to identify important and accurate information for applications such as summarization and question answering

It would be highly desirable to have a mechanism that could identify common information among multiple related documents and fuse it into a coherent text. This article presents a method for sentence fusion that exploits redundancy to achieve this task in the context of multidocument summarization.


2. Framework for Sentence Fusion: MultiGen
3. Sentence Fusion
4. Sentence Fusion Evaluation
5. Related Work
6. Conclusions and Future Work

Tuesday, April 13, 2010

hahn2000challenges The Challenges of Automatic Summarization

+not added to my bibtex yet

Udo Hahn, Inderjeet Mani: The Challenges of Automatic Summarization. IEEE Computer 33(11): 29-36 (2000)

sekine2003survey A Survey for Multi-Document Summarization

not yet added in my BibText

A Survey for Multi-Document Summarization
Satoshi Sekine and Chikashi Nobata
The Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Workshop on Text Summarization ; 2003; Edmonton, Canada

Abstract
Automatic Multi-Document summarization is still hard to realize. Under such circumstances, we believe, it is important to observe how humans are doing the same task, and look around for different strategies.

We prepared 100 document sets similar to the ones used in the DUC multi-document summarization task.  For each document set, several people prepared the following data and we conducted a survey.
A) Free style summarization
B) Sentence Extraction type summarization
C) Axis (type of main topic)
D) Table style summary
In particular, we will describe the last two in detail, as these could lead to a new direction for  multi-summarization research.

1 Introduction
-tantangan: single-doc dan multi-doc summarization performansi tidak jauh dari baseline
-perlu pendekatan baru summarization: mencontoh manusia

-penulis mencoba membuat ringkasan dengan cara: meng highlight frase atau kalimat yang penting, kemudian menghubung-hubungkan sehingga didapat topik2 utama atau yg umum atau berupa list atau tabel. Hasilnya: ringkasan yang bagus
-Meskipun hasil bukan berupa kalimat yang mudah dibaca
-pertanyaan: secara umum berapa jenis "topik utama" yang bis dibuat, dan berapa persen rimgkasn unt tabel.

- tulisan ini ttg manul summary dg 100 document DUC-like

2 Document Sets
3 Task and annotator
4 Free style summarization

5 Sentence Extraction
6 Axis
7 Table
8 Discussion
9 FutureWork

afantenos2005summarization Summarization from medical documents: a survey

Summary

Objective: The aim of this paper is to survey the recent work in medical documents summarization.

Background: During the last decade, documents summarization got increasing attention by the AI research community. More recently it also attracted the interest of the medical research community as well, due to the enormous growth of information that is available to the physicians and researchers in medicine, through the large and growing number of published journals, conference proceedings, medical sites and portals on the World Wide Web, electronic medical records, etc.

Methodology: This survey gives first a general background on documents summarization, presenting the factors that summarization depends upon,  discussing evaluation issues and describing briefly the various types of summarization techniques. It then examines the characteristics of the medical domain through the different types of medical documents. Finally, it presents and discusses the summarization techniques used so far in the medical domain, referring to the corresponding systems and their characteristics.

Discussion and conclusions: The paper discusses thoroughly the promising paths for future research in medical documents summarization. It mainly focuses on the issue of scaling to large collections of documents in various languages and from different media, on personalization issues, on portability to new sub-domains, and on the integration of summarization technology in practical applications.

KEYWORDS
Summarization from medical documents;
Single-document summarization;
Multi-document summarization;
Multi-media summarization;
Extractive summarization;
Abstractive summarization;
Cognitive summarization

1. Introduction

2. Summarization roadmap

3. The medical domain
4. Summarization techniques in the medical domain
5. Promising paths for future research
6. Conclusions

Monday, April 12, 2010

survey paper review: jones2007automatic & das2007survey

At the beginning of this study I plan to read survey articles in depth and quickly read some important papers referred by the survey. Not many existing survey paper I found. Meanwhile, I get two papers (Jones 2007 shortened as J) and (Das 2007 shortened as D), whereby the latter is an unpublished paper, but pretty good.


In short it can be can said that (J) provides a new understanding of some of the general framework of summarization, while (D) explain in a quite clear and detailed summarization methods.

What's new in these paper? explain it; what are the advantages with these new stuffs?

One characteristic of a good review article is to provide a framework, a systematic category of topic discussed. (A) present three summarization frameworks: for the system itself, for the system evaluation, and for the factors affecting summarization, while (D), although not deliver a new framework but there is an interesting subcategory which related and support with text summarization research (i.e. "other approach to summarization").

With framework, then it will help map out the research topic. Although the 'factors affect summarization' framework in my opinion is too detail (too many parts) such as for example in the "input: other factors"

The paper was easy to follow?
(D) is easier to be understood while the (J) is more difficult because of the more abstract discussion.

Describing the role of machine learning techniques in comprehensive and detail?(J) discuss in brief while (D) in a longer, detailed and clearer.


Comparing (methods, performance) among summarization system?To compare and observe the relationship among methods used, it will be easier if the categorization be made in advance, may be as a tree. Similarly the features used, or in general the text representation. In (J) was covered it but too short and need to be more systematic and clear divisions.

Explaining early works?
This both paper, particularly (J) does not present a clear picture of the early systems. Early systems which influence to subsequent research is important enough to be explained. However, in accordance with its title (J) concentrated on "state of the art" systems in the last decade (~1997-2007). (D) describe clearly and at length about the early system of multi-document summarization, which not a long ago that is in 1995. Multi-document summarization is a text summarization's application and research which is important and dominant at the moment.

Evaluation system
(J) writes with comprehensive framework, while (D) explains two automatic evaluation systems in detail.


Single-doc, multi-doc summarization. What is the challenges for multi-doc summarization?
In (J) there is no clear discussion for the single vs. multi-doc summarization issue. For news corpus, the opening sentences are adequate enough to be a summary, so the automatic summarization is considered less useful. The majority of current research is multi-document summarization. Unfortunately in both papers it is not described what are the challenges for multi-document summarization and its components such as information ordering.


Application / implementation of summarization systems.
Both articles are not provide sufficient explanation about the summary implementation and benefits (and examples in real cases), especially for current conditions. Information about these applications is important to give a comprehensive picture for the readers.

Natural langauge (NL) knowledge used, NL resources used by summarization systems.

Both paper explained about the system that uses NL approach (NL deeper knowledge approach), but not delivered in a clear and structured what kind of NL knowledge and NL resources used and identified to support the performance.

Multi-language summarization.
There are still not many researches on multi-language summarization. In (D) multi-language summarization is briefly discussed, while in (J) does not. For non-English speaking people, such as Indonesian people, the necessity for multi-language summary is higher, because so many news sources in English. An aspect that can be analyzed for English text is how to get a good Indonesian summary from these texts? Summarized it first (in English) then translated, or translated first and then summarized. As well another issue, what is the good form of that translated summary, considering machine translation's results is often not satisfactory.


Explaning factors that influence summarization?


(J) explains clearly, while (D) does not explain.

Criticize current state of the art systems?
Both papers clearly explained several current summarization system. But (D) explain in more detail. It would be better if the weakness of the systems described deeper.

Describe main evaluation programmes (DUC, TAC)?
(J) writes in a clear overall picture of this, while (D) does not. As the event has a deep effect on the development of summarization research, explanation as in (J) is essential to help reader get an understanding of the event.


Describing current and predict future trends?
Both does not present at the special section about research trends and future predictions. However for the current trends have been written spread in the article, they provide explanation and examples. Unfortunately there are no enlightenment about the prediction of future research.


Explaining the main challenges of summarization research? (summarization system, evaluation system)
Explained, with quite detailed and clearer, especially in (J) especially for evaluation system.


Features used 
Both convey features used, but it would be better if there is a discussion to compare the role of features that can be/has been used (related to performance boosting).


Performance comparison
It would be better if there is a table to compare the performance along with related data such as the evaluation metric and the corpus used. 


Presenting examples? 
(D) contains examples with more detail and clearer.


Non extraction summarization
Extraction is the most widely used automatic text summarization, though for humans it is less favored because it becomes more difficult to read, but usually still tolerable. Abstractive summarization is a complex process and so far has not produced satisfaction result. In (J) are only discussed briefly about the non-extraction.


Dividing / breaking down / modularization the summarization system?
If the summarization system can indeed be divided into separate parts (and maybe some independent each other), then it will help to understand and also to design a summarization system. Suppose the division based on the summarization stages in (J) which is "interpretation" from the source text into the source representation, "transformation" from a source representation into summary representation, and "generation" from summary representation into the final summary.

mani2001recent Recent Developments in Text Summarization

mckeown1999towards Towards multidocument summarization by reformulation: Progress and prospects

McKeown, K., Klavans, J., Hatzivassiloglou, V., Barzilay, R., and Eskin, E. (1999). Towards multidocument summarization by reformulation: Progress and prospects. In AAAI/IAAI, pages 453-460.

Abstract
By synthesizing information common to retrieved documents, multi-document summarization can help users of information retrieval systems to find relevant documents with a minimal amount of reading. We are developing a multidocument summarization system to automatically generate a concise summary by identifying and synthesizing similarities across a set of related documents. Our approach is unique in
its integration of machine learning and statistical techniques to identify similar paragraphs, intersection of similar phrases within paragraphs, and language generation to reformulate the wording of the summary. Our evaluation of system components shows that learning over multiple extracted linguistic
features is more effective than information retrieval approaches at identifying similar text units for summarization and that it is possible to generate a fluent summary that conveys similarities among documents even when full semantic interpretations of the input text are not available.

barzilay1999information Information fusion in the context of multi-document summarization

Barzilay, R., McKeown, K., and Elhadad, M. (1999). Information fusion in the context of multi-document summarization.

Abstract
We present a method to automatically generate a concise summary by identifying and synthesizing
similar elements across related text from a set of multiple documents. Our approach is unique in its usage of language generation to reformulate the wording of the summary.

mckeown1995generating & radev1998generating Generating summaries of multiple news articles

McKeown, K. R. and Radev, D. R. (1995). Generating summaries of multiple news articles. In Proceedings of SIGIR '95, pages 74-82, Seattle, Washington.

Radev, D. R. and McKeown, K. (1998). Generating natural language summaries from multiple on-line sources. Computational Linguistics, 24(3):469-500.

We present a methodology for summarization of news about current events in the form of briefings  that include appropriate background (historical) information. The system that we developed, SUMMONS, uses the output of systems developed for the DARPA Message Understanding Conferences (MUC) to generate summaries of multiple documents on the same or related events, presenting similarities and differences, contradictions, and generalizations among sources of information. We describe the various components of the system, showing how information from multiple articles is combined, organized into a paragraph, and finally, realized as English sentences. A feature of our work is the extraction of descriptions of entities such as people and places for reuse to enhance a briefing.


= = = == = = = = =
[das2007survey]
As far as we know, SUMMONS (McKeown and Radev, 1995; Radev and McKeown, 1998) is the first historical example of a multi-document summarization system. It tackles single events about a narrow domain (news articles about terrorism) and produces a briefing merging relevant information about each event and how reports by different news agencies have evolved over time.

The whole thread of reports is then presented, as illustrated in the following example of a "good" summary:

"In the afternoon of February 26, 1993, Reuters reported that a suspect bomb killed at least five people in the World Trade Center. However, Associated Press announced that exactly five people were killed in the blast. Finally, Associated Press announced that Arab terrorists were possibly responsible for the terrorist act."

Rather than working with raw text, SUMMONS reads a database previously built by a template-based message understanding system. A full multi-document summarizer is built by concatenating the two systems, first processing full text as input and filling template slots, and then synthesizing a summary from the extracted information. The architecture of SUMMONS consists of two major components: a content planner that selects the information to include in the summary through combination of the input templates, and a linguistic generator that selects the right words to express the information in grammatical and coherent text.

Linguistic generator was devised by adapting existing language generation tools, namely the FUF/SURGE system.

Content planning is made through summary operators, a set of heuristic rules that perform operations like "change of perspective", "contradiction", "refinement", etc. Some of these operations require resolving conflicts, i.e., contradictory information among different sources or time instants; others complete pieces of information that are included in some articles and not in others, combining them into a single template. At the end, the linguistic generator gathers all the combined information and uses connective phrases to synthesize a summary.

While this framework seems promising when the domain is narrow enough so that the templates can be designed by hand, a generalization for broader domains would be problematic. This was improved later by McKeown et al. (1999) and Barzilay et al. (1999), where the input is now a set of related documents in raw text, like those retrieved by a standard search engine in response to a query.

The system starts by identifying themes, i.e., sets of similar text units (usually paragraphs). This is formulated as a clustering problem. To compute a similarity measure between text units, these are mapped to vectors of features, that include single words weighted by their TF-IDF scores, noun phrases, proper nouns, synsets from the Wordnet database and a database of semantic classes of verbs. For each pair of paragraphs, a vector is computed that represents matches on the different features. Decision rules
that were learned from data are then used to classify each pair of text units either as similar or dissimilar; this in turn feeds a subsequent algorithm that places the most related paragraphs in the same theme.

Once themes are identified, the system enters its second stage: information fusion. The goal is to decide which sentences of a theme should be included in the summary. Rather than just picking a sentence that is a group representative, the authors propose an algorithm which compares and intersects predicate argument structures of the phrases within each theme to determine which are repeated often enough to be included in the summary. This is done as follows: first, sentences are parsed through Collins' statistical parser (Collins, 1999) and converted into dependency trees, which allows capturing the predicate-argument structure and identify functional roles. Determiners and auxiliaries are dropped; Fig. 3 shows a sentence representation.

The comparison algorithm then traverses these dependency trees recursively, adding identical nodes to the output tree. Once full phrases (a verb with at least two constituents) are found, they are marked to be included in the summary. If two phrases, rooted at some node, are not identical but yet similar, the hypothesis that they are paraphrases of each other is considered; to take this into account, corpus driven
paraphrasing rules are written to allow paraphrase intersection. Once the summary content (represented as predicate-argument structures) is decided, a grammatical text is generated by translating those structures into the arguments expected by the FUF/SURGE language generation system
= = = = = = = = = =

1. Introduction
Some characteristics that distinguish a briefing from the general concept of a summary are:
- ..keep a person up to date on a certain event. ..
- ..focus on certain types of information .. more user-centered than general summaries


novel techniques:
• It briefs the user on information of interest using tools related to information extraction, conceptual combination, and text generation.
• It combines information from multiple news articles into a coherent summary using symbolic techniques.
• It augments the resulting summaries using descriptions of entities obtained from on-line sources.

In order to extract information of interest to the user, SUMMONS makes use of components from several MUC systems

The right side of the figure shows how proper nouns and their descriptions are
extracted from past news. An entity extractor identifies proper nouns in the past
newswire archives, along with descriptions. Descriptions are then categorized using
the WordNet hierarchy. Finally, an FD or functional description (Elhadad 1993) for
the description is generated so that it can be reused in fluent ways in the final summary.
FDs mix functional, semantic, syntactic, and lexical information in a recursive
attribute-value format that serves as the basic data structure for all information within
FUF / SURGE.

2. Related Work
3. System Overview


The full content is then passed through a sentence
generator, implemented using the FUF/SURGE language generation system (Elhadad
1993; Robin 1994). FUF is a functional unification formalism that uses a large systemic
grammar of English, called SURGE, to fill in syntactic constraints, build a syntactic
tree, choose closed class words, and eventually linearize the tree as a sentence.

4. Generating the Summary


eight different planning operators,

4.1 Overview of the Summarization Component
two main components:
-content planner: selects information from an underlying knowledge base to include in a text
   produces a conceptual representation of text meaning (e.g., a frame, a logical form, or an internal representation of text) and typically does not include any linguistic information.
-linguistic component: selects words to refer to concepts contained in the selected information and arranges those words, appropriately inflecting them, to form an English sentence.
uses a lexicon and a grammar of English

4.2 Methodology: Collecting and Using a Summary Corpus
used available online corpora... then.. extracted manually, and after careful investigation, several hundred language constructions that we found relevant to the types of summaries we want to produce.
cue phrases collected from the corpus,

4.3 Summary Operators for Content Planning
developed a set of heuristics derived from the corpora that decide what
types of simple sentences constitute a summary, in what order they need to be listed, as
well as the ways in which simple sentences are combined into more complex ones.


4.4 Algorithm
4.4.1 Input.
4.4.2 Preprocessing.
4.4.3 Heuristic Combination.
4.4.4 Discourse Planning.
4.4.5 Ordering of TempIates and Linguistic Generation.

4.5 An Example of System Operation

Article 1:

Article 2:
Article 3:
Article 4:

Figure 7 Template for article one.
Figure 8 Template for article two.
Figure 9 Template for article three.
Figure 10 Template for article four.

templates are generated manually from the input newswire texts.

Figure 11.
The first two sentences are generated from template one. The subsequent sentences
are generated using different operators that are triggered according to changing values
for certain attributes in the three remaining templates.



5. Generating Descriptions
profile manager

5.1 Creation of a Database of Profiles
5.1.1 Extraction of Entity Names from Old Newswire.
5.1.2 Extraction of Descriptions.

5.1.3 Categorization of Descriptions.
WordNet to group extracted descriptions into categories. e.g. "profession, "nationality," "organization." Each of these concepts is triggered by one or more words (which we call trigger terms) in the description.


5.1.4 Organization of Descriptions in a Database of Profiles.

5.2 Generation of Descriptions
  improved summary by merging information extracted from the input
articles with information from the other sources (Radev and McKeown 1997).

5.2.1 Transformation of Descriptions into Functional Descriptions.
5.2.2 Regenerating Descriptions.

6. System Status
6.1 Summary Generation

6.2 The Description Generator
6.3 Portability
6.4 Suggested Evaluation


7. Future Work

8. Conclusion

Our prototype system demonstrates the feasibility of generating briefings of a series of domain-specific news articles on the same event, highlighting changes over time as well as similarities and differences among sources and including some historical information about the participants. The ability to automatically provide summaries of heterogeneous material will critically help in the effective use of the Internet in order to avoid overload with information. We show how planning operators can be used to
synthesize summary content from a set of templates, each representing a single article. These planning operators are empirically based, coming from analysis of existing summaries, and allow for the generation of concise briefings. Our framework allows for experimentation with summaries of different lengths and for the combination of multiple, independent summary operators to produce more complex summaries with added descriptions.

Sunday, April 11, 2010

marcu1997rhetorical The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts

Marcu, D. C. (1998b). The rhetorical parsing, summarization, and generation of natural language texts. PhD thesis, University of Toronto. Adviser-Graeme Hirst.

= = = = = = = = = =

describes the details of a rhetorical parser producing a discourse tree.
rethorical structure theory (RST) tree

marcu1998improving Improving summarization through rhetorical parsing tuning

Marcu, D. (1998a). Improving summarization through rhetorical parsing tuning. In Proceedings of The Sixth Workshop on Very Large Corpora, pages 206-215, Montreal, Canada.


= = = = = = = = = =
[das2007survey]


Deep Natural Language Analysis Methods
.....
Marcu (1998a) describes a unique approach towards summarization that, unlike most other previous work, does not assume that the sentences in a document form a flat sequence. This paper used discourse based heuristics with the traditional features that have been used in the summarization literature. The discourse theory used in this paper is the Rhetorical Structure Theory (RST) that holds between
two non-overlapping pieces of text spans: the nucleus and the satellite. The author mentions that the distinction between nuclei and satellites comes from the empirical observation that the nucleus expresses what is more essential to the writer's purpose than the satellite; and that the nucleus of a rhetorical relation is comprehensible independent of the satellite, but not vice versa.

Marcu (1998b) describes the details of a rhetorical parser producing a discourse tree. Figure 2 shows an
example discourse tree for a text example detailed in the paper. .... dst

ono1994abstract Abstract Generation Based On Rhetorical Structure Extraction

Ono, K., Sumita, K., and Miike, S. (1994). Abstract generation based on rhetorical structure extraction. In Proceedings of Coling '94, pages 344-348, Morristown, NJ, USA.



= = = = = = = = = =
[das2007survey]

Deep Natural Language Analysis Methods
.....

Ono et al. (1994) put forward a computational model of discourse for Japanese expository writings, where they elaborate a practical procedure for extracting the discourse rhetorical structure, a binary tree representing relations between chunks of sentences (rhetorical structure trees are used more intensively in (Marcu, 1998a), as we will see below). This structure was extracted using a series of NLP steps: sentence analysis, rhetorical relation extraction, segmentation, candidate generation and preference judgement. Evaluation was based on the relative importance of rhetorical relations. In the following step, the nodes of the rhetorical structure tree were pruned to reduce the sentence, keeping its important parts. Same was done for paragraphs to nally produce the summary. Evaluation was done with respect to sentence coverage and 30 editorial articles of a Japanese newspaper were used as the dataset. The articles had corresponding sets of key sentences and most important key sentences judged by human subjects. The key sentence coverage was about 51% and the most important key sentence coverage was 74%, indicating encouraging results.
= = = = = = = = = =

barzilay1997using Using lexical chains for text summarization

Barzilay, R. and Elhadad, M. (1997). Using lexical chains for text summarization. In Proceedings ISTS'97.


= = = = = = = = = =
[das2007survey]
Barzilay and Elhadad (1997) describe a work that used considerable amount of linguistic analysis for performing the task of summarization. For a better understanding of their method, we need to define a lexical chain: it is a sequence of related words in a text, spanning short (adjacent words or sentences) or long distances (entire text).

The authors' method progressed with the following steps: segmentation of the text, identi cation of lexical chains, and using strong lexical chains to identify the sentences worthy of extraction. They tried to reach a middle ground between (McKeown and Radev, 1995) and (Luhn, 1958) where the former relied on deep semantic structure of the text, while the latter relied on word statistics of the documents.

The authors describe the notion of cohesion in text as a means of sticking together different parts of the text.

Lexical cohesion is a notable example where semantically related words are used. For example, let us take a look at the following sentence.

                                            John bought a Jag. He loves the car

Here, the word car refers to the word Jag in the previous sentence, and exempli es lexical cohesion. The phenomenon of cohesion occurs not only at the word level, but at word sequences too, resulting in lexical chains, which the authors used as a source representation for summarization. Semantically related words and word sequences were identi ed in the document, and several chains were extracted, that form a representation of the document. To find out lexical chains, the authors used Wordnet (Miller, 1995), applying three generic steps:

1. Selecting a set of candidate words.
2. For each candidate word, finding an appropriate chain relying on a relatedness criterion among members of the chains,
3. If it is found, inserting the word in the chain and updating it accordingly.


The relatedness was measured in terms of Wordnet distance. Simple nouns and noun compounds were used as starting point to nd the set of candidates. In the final steps, strong lexical chains were used to create the summaries. The chains were scored by their length and homogeneity. Then the authors used a few heuristics to select the signi cant sentences.
= = = = = = = = = = 

svore2007enhancing Enhancing single-document summarization by combining RankNet and third-party sources

Svore, K., Vanderwende, L., and Burges, C. (2007). Enhancing single-document summarization by combining RankNet and third-party sources. In Proceedings of the EMNLP-CoNLL, pages 448-457.

Abstract
We present a new approach to automatic summarization based on neural nets, called NetSum. We extract a set of features from each sentence that helps identify its importance in the document. We apply novel features based on news search query logs and Wikipedia entities. Using the RankNet learning algorithm, we train a pair-based sentence ranker to score every sentence in the document and identify the most important sentences. We apply our system to documents gathered from CNN.com, where
each document includes highlights and an article. Our system significantly outperforms the standard baseline in the ROUGE-1 measure on over 70% of our document set.
= = = = = = = = = =
[das2007survey]
In 2001-02, DUC issued a task of creating a 100-word summary of a single news article. However, the best performing systems in the evaluations could not outperform the baseline with statistical signi cance. This extremely strong baseline has been analyzed by Nenkova (2005) and corresponds to the selection of the first n sentences of a newswire article. This surprising result has been attributed to the journalistic convention of putting the most important part of an article in the initial paragraphs. After 2002, the task of single-document summarization for newswire was dropped from DUC.

Svore et al. (2007) propose an algorithm based on neural nets and the use of third party datasets to tackle the problem of extractive summarization, outperforming the baseline with statistical signicance.





The authors used a dataset containing 1365 documents gathered from CNN.com, each consisting of the title, timestamp, three or four human generated story highlights and the article text. They considered the task of creating three machine highlights. The human generated highlights were not verbatim extractions from the article itself. The authors evaluated their system using two metrics: the first one concatenated the three highlights produced by the system, concatenated the three human generated highlights, and compared these two blocks; the second metric considered the ordering and compared the sentences on an individual level.

Svore et al. (2007) trained a model from the labels and the features for each sentence of an article, that could infer the proper ranking of sentences in a test document. The ranking was accomplished using RankNet (Burges et al., 2005), a pair-based neural network algorithm designed to rank a set of inputs that uses the gradient descent method for training. For the training set, they used ROUGE-1 [BD](Lin, 2004)  to score the similarity of a human written highlight and a sentence in the document. These similarity scores were used as soft labels during training, contrasting with other approaches where sentences are "hard-labeled", as selected or not.

Some of the used features based on position or n-grams frequencies have been observed in previous work. However, the novelty of the framework lay in the use of features that derived information from query logs from Microsoft's news search engine7 and Wikipedia8 entries. The authors conjecture that if a document sentence contained keywords used in the news search engine, or entities found in Wikipedia
articles, then there is a greater chance of having that sentence in the highlight. The extracts were evaluated using ROUGE-1 and ROUGE-2, and showed statistically signi cant improvements over the baseline of selecting the rst three sentences in a document.

= = = = = = = = = =

nenkova2005automatic Automatic text summarization of newswire: Lessons learned from the document understanding conference

Nenkova, A. (2005). Automatic text summarization of newswire: Lessons learned from the document understanding conference. In Proceedings of AAAI 2005, Pittsburgh, USA.

Abstract
Since 2001, the Document Understanding Conferences have been the forum for researchers in automatic text summarization to compare methods and results on common test sets. Over the years, several types of summarization tasks have been addressed—single document summarization, multi-document summarization, summarization focused by question, and headline generation. This paper is an overview
of the achieved results in the different types of summarization tasks. We compare both the broader classes of baselines, systems and humans, as well as individual pairs of summarizers (both human and automatic). An analysis of variance model is fitted, with summarizer and input set as independent
variables, and the coverage score as the dependent variable, and simulation-based multiple comparisons were performed. The results document the progress in the field as a whole, rather then focusing on a single system, and thus can serve as a future reference on the work done up to date, as well as a starting point in the formulation of future tasks. Results also indicate that most progress in the field has been achieved in generic multi-document summarization and that the most challenging task is that of producing a focused summary in answer to a question/topic.

= = = = = = = = = =
[das2007survey]
In 2001-02, DUC issued a task of creating a 100-word summary of a single news article. However, the best performing systems in the evaluations could not outperform the baseline with statistical signi cance. This extremely strong baseline has been analyzed by Nenkova (2005) and corresponds to the selection of the first n sentences of a newswire article. This surprising result has been attributed to the journalistic convention of putting the most important part of an article in the initial paragraphs. After 2002, the task of single-document summarization for newswire was dropped from DUC.

Svore et al. (2007) propose an algorithm based on neural nets and the use of third party datasets to tackle the problem of extractive summarization, outperforming the baseline with statistical signi cance.

= = = = = = = = = =

osborne2002using Using maximum entropy for sentence extraction

Osborne, M. (2002). Using maximum entropy for sentence extraction. In Proceedings of the ACL'02 Workshop on Automatic Summarization, pages 1-8, Morristown, NJ, USA.

= = = = = = = = = =
[das2007survey]"Osborne (2002) claims that existing approaches to summarization have always assumed feature independence. The author used log-linear models to obviate this assumption and showed empirically that the system produced better extracts than a naive-Bayes model, with a prior appended to both models. Let c be a label, s the item we are interested in labeling, fi the i-th feature, and i the corresponding feature weight. The conditional log-linear model used by Osborne (2002) can be stated as follows: ... dst"
= = = = = = = = = =

conroy2001tex Text summarization via hidden markov models

Conroy, J. M. and O'leary, D. P. (2001). Text summarization via hidden markov models. In Proceedings of SIGIR '01, pages 406{407, New York, NY, USA.

Ada juga dari penulis ini versi tech report yg lebih panjang, namun bukan hanya HMM: "Text Summarization via Hidden Markov Models and Pivoted QR Matrix Decomposition"

= = = = = = = = = =
[das2007survey]

"In contrast with previous approaches, that were mostly feature-based and nonsequential, Conroy and O'leary (2001) modeled the problem of extracting a sentence from a document using a hidden Markov model (HMM). The basic motivation for using a sequential model is to account for local dependencies between sentences. Only three features were used: position of the sentence in the document (built into
the state structure of the HMM), number of terms in the sentence, and likeliness of the sentence terms given the document terms. .... dst"

lin1999training Training a selection function for extraction

Lin, C.-Y. (1999). Training a selection function for extraction. In Proceedings of CIKM '99, pages 55-62, New York, NY, USA.

ABSTRACT
In this paper we compare performance of several heuristics in generating informative generic/query-oriented extracts for newspaper articles in order to learn how topic prominence affects the performance of each heuristic. We study how different query types can affect the performance of each heuristic and discuss the possibility of using machine learning algorithms to automatically learn good combination functions to combine several heuristics. We also briefly describe the design, implementation, and
performance of a multilingual text summarization system SUMMARIST.

= = = = = = = = = =
[das2007survey]
In later work, Lin (1999) broke away from the assumption that features are independent of each other and tried to model the problem of sentence extraction using decision trees, instead of a naive-Bayes classi er. He examined a lot of features and their effect on sentence extraction.

The data used in this work is a publicly available collection of texts, classi ed into various topics, provided by the TIPSTER-SUMMAC evaluations, targeted towards information retrieval systems. The dataset contains essential text fragments (phrases, clauses, and sentences) which must be included in summaries to answer some TREC topics. These fragments were each evaluated by a human judge. The experiments described in the paper are with the SUMMARIST system developed at the University of Southern California. The system extracted sentences from the documents and those were matched against human extracts, like most early work on extractive summarization.

Some novel features were the query signature (normalized score given to sentences depending on number of query words that they contain), IR signature (the m most salient words in the corpus, similar to the signature words of (Aone et al., 1999)), numerical data (boolean value 1 given to sentences that contained a number in them), proper name (boolean value 1 given to sentences that contained a proper name in them), pronoun or adjective (boolean value 1 given to sentences that contained a pronoun or adjective in them), weekday or month (similar as previous feature) and quotation (similar as previous feature). It is worth noting that some features like the query signature are question-oriented because of the setting of the evaluation, unlike a generalized summarization framework.

The author experimented with various baselines, like using only the positional feature, or using a simple combination of all features by adding their values. When evaluated by matching machine extracted and human extracted sentences, the decision tree classi er was clearly the winner for the whole dataset, but for three topics, a naive combination of features beat it. Lin conjectured that this happened because
some of the features were independent of each other. Feature analysis suggested that the IR signature was a valuable feature, corroborating the early findings of Luhn (1958).

lin1997identifying Identifying topics by position

Lin, C.-Y. and Hovy, E. (1997). Identifying topics by position. In Proceedings of the Fifth conference on Applied natural language processing, pages 283-290, San Francisco, CA, USA.



= = = = = = = = = =
[das2007survey]
Lin and Hovy (1997) studied the importance of a single feature, sentence position. Just weighing a sentence by its position in text, which the authors term as the "position method", arises from the idea that texts generally follow a predictable discourse structure, and that the sentences of greater topic centrality tend to occur in certain specifiable locations (e.g. title, abstracts, etc).

However, since the discourse structure signi cantly varies over domains, the position method cannot be defined as naively as in (Baxendale, 1958).

The paper makes an important contribution by investigating techniques of tailoring the position method towards optimality over a genre and how it can be evaluated for eeffectiveness.

A newswire corpus was used, the collection of Ziff -Davis texts produced from the TIPSTER program; it consists of text about computer and related hardware, accompanied by a set of key topic words and a small abstract of six sentences. For each document in the corpus, the authors measured the yield of each sentence position against the topic keywords. They then ranked the sentence positions by their average yield to produce the Optimal Position Policy (OPP) for topic positions for the genre.

Two kinds of evaluation were performed. Previously unseen text was used for testing whether the same procedure would work in a different domain. The first evaluation showed contours exactly like the training documents. In the second evaluation, word overlap of manual abstracts with the extracted sentences was measured. Windows in abstracts were compared with windows on the selected sentences and corresponding precision and recall values were measured. A high degree of coverage indicated the effectiveness of the position method.

In later work, Lin (1999) broke away from the assumption that features are independent of each other and tried to model the problem of sentence extraction using decision trees, instead of a naive-Bayes classifier.."
= = = = = = = = = =

larsen1999trainable A trainable summarizer with knowledge acquired from robust NLP techniques

Aone, C., Okurowski, M. E., Gorlinsky, J., and Larsen, B. (1999). A trainable summarizer with knowledge acquired from robust nlp techniques. In Mani, I. and Maybury, M. T., editors, Advances in Automatic Text Summarization, pages 71-80. MIT Press.

= = = = = = = = = =
[das2007survey]
Aone et al. (1999) also incorporated a naive-Bayes classi er, but with richer features. They describe a system called DimSum that made use of features like term frequency (tf ) and inverse document frequency (idf) to derive signature words.

Statistically derived two-noun word collocations were used as units for counting, along with single words. A named-entity tagger was used and each entity was considered as a single token. They also employed some shallow discourse analysis like reference to same entities in the text, maintaining cohesion.

Synonyms and morphological variants were also merged while considering lexical terms..

corpora .. newswire .. TREC
= = = = = = = = = =

kupiec1995trainable A trainable document summarizer

Kupiec, J., Pedersen, J., and Chen, F. (1995). A trainable document summarizer. In Proceedings SIGIR '95, pages 68-73, New York, NY, USA.

abstract
-To summarize is to reduce in complexity, and hence in length, while retaining some of the essential qualities of the original.
- This paper focuses on document extracts, a particular kind of computed document summary.
- Document extracts consisting of roughly 20% of the original can be as informative as the full text of a document, which suggests that even shorter extracts may be useful indicative
summaries.
- The trends in our results are in agreement with those of Edmundson who used a subjectively weighted combination of features as opposedto training the feature weights using a corpus.
- We have developed a trainable summarization program

= = = = = = = = = =
[das2007survey]
Kupiec et al. (1995) describe a method derived from Edmundson (1969) that is able to learn from data. The classification function categorizes each sentence as worthy of extraction or not, using a naive-Bayes classifier.  ... dst"
= = = = = = = = = =

edmundson1969new New methods in automatic extracting

Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the ACM, 16(2):264-285.


= = = = = = = = = =

[das2007survey]
Edmundson (1969) describes a system that produces document extracts. His primary contribution was the development of a typical structure for an extractive summarization experiment. At first, the author developed a protocol for creating manual extracts, that was applied in a set of 400 technical documents. The two features of word frequency and positional importance were incorporated from the previous two works. Two other features were used: the presence of cue words (presence of words like significant, or hardly), and the skeleton of the document (whether the sentence is a title or heading). Weights were attached to each of these features manually to score each sentence. During evaluation, it was found that about 44% of the auto-extracts matched the manual extracts.
= = = = = = = = = =

luhn1958automatic The automatic creation of literature abstracts

Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research Development, 2(2):159-165.


= = = = = = = = = =
[das2007survey]
"In his work, Luhn proposed that the frequency of a particular word in an article provides an useful measure of its significance. There are several key ideas put forward in this paper that have assumed importance in later work on summarization. As a first step, words were stemmed to their root forms, and stop words were deleted. Luhn then compiled a list of content words sorted by decreasing frequency, the index providing a signi cance measure of the word.

On a sentence level, a signi cance factor was derived that reflects the number of occurrences of significant words within a sentence, and the linear distance between them due to the intervention of non-signi cant words. All sentences are ranked in order of their significance factor, and the top ranking sentences are finally selected to form the auto-abstract.
= = = = = = = = = =

lin2006information An information-theoretic approach to automatic evaluation of summaries

Lin, C.-Y., Cao, G., Gao, J., and Nie, J.-Y. (2006). An information-theoretic approach to automatic evaluation of summaries. In Proceedings of HLT-NAACL '06, pages 463-470, Morristown, NJ, USA.

performance vs ROUGE
       for single document: the same
       for multi document: better

= = = = = = = = = =

[das2007survey]
"The central idea is to use a divergence measure between a pair of probability distributions, in this case the Jensen-Shannon divergence, where the first distribution is derived from an automatic summary
and the second from a set of reference summaries. This approach has the advantage of suiting both the single-document and the multi-document summarization scenarios. ...... dst"

= = = = = = = = = =

lin2002manual Manual and automatic evaluation of summaries

Lin, C.-Y. and Hovy, E. (2002). Manual and automatic evaluation of summaries. In Proceedings of the ACL-02 Workshop on Automatic Summarization, pages 45-51, Morristown, NJ, USA.

Abstract
In this paper we discuss manual and automatic evaluations of summaries using data from the Document Understanding Conference 2001 (DUC-2001). We first show the instability of the manual evaluation. Specifically, the low inter human agreement indicates that more reference summaries are needed. To
investigate the feasibility of automated summary evaluation based on the recent BLEU method from machine translation, we use accumulative n-gram overlap scores between system and human summaries. The initial results provide encouraging correlations with human judgments, based on the Spearman rank-order correlation coefficient. However, relative ranking of systems needs to take into account the
instability.

= = = = = = = = = =

= = = = = = = = = =

lebanon2007locally The Locally Weighted Bag of Words Framework for Document Representation

Lebanon, G., Mao, Y., and Dillon, J. (2007). The locally weighted bag of words framework for document representation. J. Mach. Learn. Res., 8:2405-2441.

= = = = = = = = = =

[das2007survey]
"Lebanon et al. (2007) suggested representing a document as a simplicial curve (i.e. a curve in the probability simplex), yielding the locally weighted bag-of-words (lowbow) model. According to this representation.... dst"

= = = = = = = = = = 

salton1975vector A vector space model for automatic indexing

Salton, G., Wong, A., and Yang, A. C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18:229-237.

= = = = = = = = = =

[das2007survey]
"In the bag-of-words representation (Salton et al., 1975) each document is represented as a sparse vector in a very large Euclidean space, indexed by words in the vocabulary V . A well-known technique in information retrieval to capture word correlation is latent semantic indexing (LSI), that aims to nd a linear subspace of dimension k jV j where documents may be approximately represented by their projections. ........ dst"

knight2000statistics Statistics-based summarization-step one: Sentence compression

Knight, K. and Marcu, D. (2000). Statistics-based summarization - step one: Sentence compression. In AAAI/IAAI, pages 703-710.

Abstract
When humans produce summaries of documents, they do not simply extract sentences and concatenate them. Rather, they create new sentences that are grammatical, that cohere with one another, and that capture the most salient pieces of information in the original document. Given that large collections of text/abstract pairs are available online, it is now possible to envision algorithms that are trained to mimic this process. In this paper, we focus on sentence compression, a simpler version of this larger challenge. We aim to achieve two goals simultaneously: our compressions should be grammatical, and they should retain the most important pieces of information. These two goals can conflict. We devise both noisy-channel and decision-tree approaches to the problem, and we evaluate results against manual  compressions and a simple baseline.


= = = = = = = = = =

[das2007survey] "Knight and Marcu (2000) introduced a statistical approach to sentence compression. The authors believe that understanding the simpler task of compressing a sentence may be a fruitful rst step to later tackle the problems of single and multi-document summarization"

= = = = = = = = = =

das2007survey A Survey on Automatic Text Summarization

Unpublished paper
A Survey on Automatic Text Summarization
Dipanjan Das and André F.T. Martins
Literature Survey for the Language and Statistics II course at Carnegie Mellon University, 2007



giving special emphasis to empirical methods and extractive techniques. 
Some promising approaches that concentrate on specific details of the summarization problem are also discussed.  Special attention is devoted to automatic evaluation of summarization systems, as future research on summarization is strongly dependent on progress in this area.

1 Introduction

extraction is the procedure of identifying important sections of the text and producing them verbatim;
abstraction aims to produce important material in a new way;
fusion combines extracted parts coherently;
compression aims to throw out unimportant sections of the text (Radev et al., 2002).


Earliest ...scienti c documents ...extracting salient sentences ... using features like 
word and phrase frequency (Luhn, 1958), 
position in the text (Baxendale, 1958) and 
key phrases (Edmundson, 1969).


extractive summarization is mainly concerned .. summary content 
abstractive summarization .. grammatical summary..  advanced language generation techniques.


A crucial issue that will certainly drive future research on summarization is evaluation.

this survey, .. how empirical methods have been used to build summarization systems


2 Single-Document Summarization
describe some eminent extractive techniques.
we look at 
- early work from the 1950s and 60s that kicked o research on summarization. 
- concentrate on approaches involving machine learning techniques published in the 1990s to today. 
- some techniques that use a more complex natural language analysis to tackle the problem.

2.1 Early Work
focused on technical documents.
the most cited (Luhn, 1958)

 "In his work, Luhn proposed that the frequency of a particular word in an article provides an useful measure of its significance. There are several key ideas put forward in this paper that have assumed importance in later work on summarization. As a first step, words were stemmed to their root forms, and stop words were deleted. Luhn then compiled a list of content words sorted by decreasing frequency, the index providing a signi cance measure of the word.

On a sentence level, a signi cance factor was derived that reflects the number of occurrences of significant words within a sentence, and the linear distance between them due to the intervention of non-signi cant words. All sentences are ranked in order of their significance factor, and the top ranking sentences are finally selected to form the auto-abstract.

Edmundson (1969) describes a system that produces document extracts. His primary contribution was the development of a typical structure for an extractive summarization experiment. At first, the author developed a protocol for creating manual extracts, that was applied in a set of 400 technical documents. The two features of word frequency and positional importance were incorporated from the previous two works. Two other features were used: the presence of cue words (presence of words like significant, or hardly), and the skeleton of the document (whether the sentence is a title or heading).

2.2 Machine Learning Methods
initially most systems assumed feature independence and relied on naive-Bayes methods, others have focused on the choice of appropriate features and on learning algorithms that make no independence assumptions. Other signi cant approaches involved hidden Markov models and log-linear models to improve extractive summarization. A very recent paper, in contrast, used neural networks and third party features (like common words in search engine queries) to improve purely extractive single document summarization.

Kupiec et al. (1995) describe a method derived from Edmundson (1969) that is able to learn from data. The classification function categorizes each sentence as worthy of extraction or not, using a naive-Bayes classifier

Aone et al. (1999) also incorporated a naive-Bayes classifier, but with richer features. They describe a system called DimSum that made use of features like term frequency (tf ) and inverse document frequency (idf) to derive signature words.

Statistically derived two-noun word collocations were used as units for counting, along with single words. A named-entity tagger was used and each entity was considered as a single token. They also employed some shallow discourse analysis like reference to same entities in the text, maintaining cohesion.

Synonyms and morphological variants were also merged while considering lexical terms.

Lin and Hovy (1997) studied the importance of a single feature, sentence position. Just weighing a sentence by its position in text, which the authors term as the "position method", arises from the idea that texts generally follow a predictable discourse structure, and that the sentences of greater topic centrality tend to occur in certain specifiable locations (e.g. title, abstracts, etc).

However, since the discourse structure signi cantly varies over domains, the position method cannot be defined as naively as in (Baxendale, 1958).

The paper makes an important contribution by investigating techniques of tailoring the position method towards optimality over a genre and how it can be evaluated for eeffectiveness.

A newswire corpus was used, the collection of Ziff -Davis texts produced from the TIPSTER program; it consists of text about computer and related hardware, accompanied by a set of key topic words and a small abstract of six sentences. For each document in the corpus, the authors measured the yield of each sentence position against the topic keywords. They then ranked the sentence positions by their average yield to produce the Optimal Position Policy (OPP) for topic positions for the genre.


Two kinds of evaluation were performed. Previously unseen text was used for testing whether the same procedure would work in a different domain. The first evaluation showed contours exactly like the training documents. In the second evaluation, word overlap of manual abstracts with the extracted sentences was measured. Windows in abstracts were compared with windows on the selected sentences and corresponding precision and recall values were measured. A high degree of coverage indicated the effectiveness of the position method.


In later work, Lin (1999) broke away from the assumption that features are independent of each other and tried to model the problem of sentence extraction using decision trees, instead of a naive-Bayes classifier.."
Some novel features were the query signature (normalized score given to sentences depending on number of query words that they contain), IR signature 

HMM
"In contrast with previous approaches, that were mostly feature-based and nonsequential, Conroy and O'leary (2001) modeled the problem of extracting a sentence from a document using a hidden Markov model (HMM). The basic motivation for using a sequential model is to account for local dependencies between sentences. Only three features were used: position of the sentence in the document (built into
the state structure of the HMM), number of terms in the sentence, and likeliness of the sentence terms given the document terms.

Log-Linear Models

"Osborne (2002) claims that existing approaches to summarization have always assumed feature independence. The author used log-linear models to obviate this assumption and showed empirically that the system produced better extracts than a naive-Bayes model, with a prior appended to both models.

Neural Networks and Third Party Features
hasil paling bagus, bisa secara signifikan mengalahkan baseline



2.3 Deep Natural Language Analysis Methods

None of these papers solve the problem using machine learning, but rather use a set of heuristics to create document extracts. Most of these techniques try to model the text's discourse structure.



3 Multi-Document Summarization
.. since mid 1990s, .. news articles. .. Google News, NewsBlaster, News In Essence.
..multiple sources of information that overlap and supplement each other, being contradictory at occasions. 

.. key tasks are not only identifying and coping with redundancy across documents, but also recognizing novelty and ensuring that the final summary is both coherent and complete.

.. pioneered .. Columbia University (McKeown and Radev, 1995),  SUMMONS 
Extractive techniques .. use of similarity measures between pairs of sentences.
Approaches vary on how these similarities are used: some 

- identify common themes through clustering and then select one sentence to represent each cluster, 
- generate a composite sentence from each cluster, 
- dynamically by including each candidate passage only if it is considered novel with respect to the previous included passages, via MMR


Some recent work extends multi-document summarization to multilingual environments.


3.1 Abstraction and Information Fusion


3.2 Topic-driven Summarization and MMR
3.3 Graph Spreading Activation
3.4 Centroid-based Summarization

3.5 Multilingual Multi-document Summarization






4 Other Approaches to Summarization
some unconventional approaches that investigate some details that underlie the summarization process, have a role to play in future research on this eld.
 

4.1 Short Summaries (headline)
extractive summarization is not very powerful for very short summaries 
Witbrock and Mittal (1999) statistical models, to model both the order and the likelihood of the appearance of tokens in the target documents. used to co-constrain
(Brown et al., 1993) : translation model between a document and its summary. a mapping between a word in the document and the likelihood of some word appearing in the summary

Evaluation: compared against the actual headlines for a set of input newswire stories. Since
phrasing could not be compared, they compared the generated headlines against the actual headlines, as well as the top ranked summary sentence of the story.



4.2 Sentence Compression
step to later tackle the problems of more complex single and multi-document summarization.
techniques such as: noisy-channel model, decision trees.

4.3 Sequential document representation

document representation, with applications in summarization
bag-of-words representation
                  vector space model salton1975vector
                  simplicial curve lebanon2007locally

5 Evaluation
Evaluating a summary is a difficult task because there does not exist an ideal summary for a given  document or set of documents. 
Agreement between human summarizers is quite low, both for evaluating and generating summaries.
Widespread use of disparate metrics.

5.1 Human and Automatic Evaluation
ISI SEE

5.2 ROUGE
n-gram overlap, isnpired from BLEU machine translation

5.3 Information-theoretic Evaluation of Summaries
performance is better for multi-doc summary


6 Conclusion
still a long trail to walk in this field. 
attention drifted from summarizing scienti c articles to news articles, electronic mail messages, advertisements, and blogs. 
Both abstractive and extractive approaches have been attempted
Abstractive .. requires heavy machinery for language generation,  difficult to extend to broader domains. 
Extraction of sentences ..  satisfactory results in large-scale applications, specially in multi-document summarization. 

This survey emphasizes extractive approaches to summarization using statistical methods. 
A lot of interesting work is being done far from the mainstream research in this field .. relevant to future research, even if they focus only on small details related to a general summarization ..

some recent trends in automatic evaluation of summarization systems




method







Saturday, April 10, 2010

chen2008tsinghua Tsinghua University at the summarization track of TAC 2008

Shouyuan Chen, Yuanming Yu, Chong Long, Feng Jin, Lijing Qin, Minlie Huang, Xiaoyan Zhu, "Tsinghua University at the summarization track of TAC 2008.", Text Analysis Conference(TAC 2008). 2008.11.17-19


Apa yang spesifik dengan metoda tsb unt menangani 'update' summarization (dibandingkan misalnya dengan generic summarization)









abstract

proposed two novel methods,
- based on the information distance theory, and
- based on the sentence centrality .. from the centrality concept in the graph theory.
results .. very competitive to generate extractive summaries.

  Introduction


TAC 2008 update summarization
write a short (~ 100 words) summary of a set of newswire articles

evaluation: readability and content (based on Pyramid Method)

1st system: based on Kolmogorov complexity and information distance theory.
- optimal summary:  a summary with the smallest information distance to all original news articles
- text summarization problem is converted into an optimization problem limited by the summary's information content
- to solve this optimization problem, we proposed an approach to approximate K(.) and D(.,.).

2nd system: centrality concepts within the graph theory
nodes: sentences,  edges: similarities between sentences calculated by an LSI algorithm.

2 The first system: information-distance based update summarization
Kolmogorov complexity and information distence






3 The second system: sentence centrality based update summarization

mani1998machine Machine learning of generic and user-focused summarization

Mani, I. & Bloedorn, E. Machine learning of generic and user-focused summarization PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, 1998, 821-826

Goal:
learn rules that easily edited by human
both generic and user-focused summarization
query in user-focused is a user abstract
does nor require manual tagging on training data
summary:
   generic: author abstract
   user-focused: automatically generated user "abstract". User choose relevant texts, then automatically choose some important words to choose sentences  as queries.


Introduction
Overall Approach
Features: locational, statistical, proper name, synonym
Traning Data: 198 text, 4-10 pages each. Compression rate 5% different rate
Learning Methods: 3 learning methods
 - Standarized Canonical Discriminant Function (SCDF) - SPSS 97. A multiple regression technique
 - C4.5 rules
 - AQ15C

Evaluation metric: accuracy, recall, precision, f-measure

Result
The best generic F: 0.69, user focused F: 0.89

Friday, April 9, 2010

Chowdary2009 USUM: Update Summary Generation System

 C. Ravindranath Chowdary and P. Sreenivasa Kumar, "USUM: Update Summary Generation System", In Advances in Computational Linguistics. Research in Computing Science Vol. 41, 2009, pp. 229-240.

Paper ID (BibTexKey)    : Chowdary2009
Paper category               : (main reference)
Abstract keyphrases       :
query-based, extractive, multi-documen, and updated summarization
scenario where the source documents are not accessible
embed the sentences of the current summary into the new document and then perform query specific
summary generation on that document. 
performance..good ..quality and efficiency.
graph-based page-rank approach 

1.    INTRODUCTION
a.    Paper type (no survey)
b.    Something new/novel (e.g. algorithm, method, technique, approach)? (Y)
Problem: given an extractive summary that is generated for a given query on a set of documents, upon the arrival of a new document, the summary has to be updated without considering the initial
set of documents.  This problem is not addressed in the literature.

c.     Paper readability/understandability? (easy)

d.    Interesting material? (Very)
e.    Detail enough? (Y)
f.    Google Scholar citation count: counting mount, date of publication: 0, 10/4/2010, Feb 2009
g.    Publisher tier: NA
h.    Summarization category: graph-based

2.    CONTENT
a.    Background
Often the information pertaining to a topic is present across several web pages.It will be of great help for the user if a query specific multidocument summary is generated.
In multi-document summary generation, other issues like time, ordering of extracted sentences, scalability etc. will arise.

b.    Description in brief the core content, something new proposed by the authors 

Generating Summary-Embedded Document
We ..  graph based approach .. update summarization.
sentence .. node and edges .. similarity score between them.

Algorithm 1 sketches the details of the embedding of the current summary into the new document


Similarity (calculated using Equation 1)

Update Summary Generation
Summary generation on the embedded document is discussed.
Score of the node is calculated based on the query posed by the user i.e., the node gets score based on its relevance to the query.

Node score calculation is based on the Equation 2.
The sentences in the summary generated using Algorithm 2 are rearranged in the document order.
This summary is complete, coherent and also non-redundant.

Experimental Setup
Initial summary .. using the MEAD (for the first 15 of the 25 documents)16th document will be the new document into which the summary is to be embedded.
The summary is generated for the given query on the embedded document and this generated summary will be embedded into 17th document. The process is repeated till the summary on the last embedded document(25th) is generated.


c.    Machine Learning/Text Mining technique used. NO

d.    Corpus used  DUC 2006

e.    NL knowledge used: NO

f.    Evaluation metric used ROUGE

g.    Performance, and how it compares with others
ROUGE-1 0.38980
ROUGE-2 0.08179
ROUGE-W 0.09429
ROUGE-SU4 0.13757
No comparison with other methods
h.    The strength and weakness of core content (according to authors)
Complete, Coherent, Quality Summary, Efficient

i.    Future works: NA


3 COMMENTS
Bagaimana kalau data baru tidak sama dengan yang lama untuk informasi/hal yang sama?