Tuesday, May 18, 2010
Sunday, April 18, 2010
jing2000cut Cut and Paste Based Text Summarization
Jing, Hongyang and Kathleen McKeown. 2000. Cut and paste based summarization. In Proceedings of the First Conference of the North American Chapter of the Association of Computational Linguistics, pages 178–185, Seattle.
barzilay2001extracting Extracting paraphrases from a parallel corpus
Barzilay, Regina and Kathleen McKeown. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of the ACL/EACL, pages 50–57, Toulouse, France.
Abstract
While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text. Our approach yields phrasal and single word lexical paraphrases as well as syntactic paraphrases.
Abstract
While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text. Our approach yields phrasal and single word lexical paraphrases as well as syntactic paraphrases.
Wednesday, April 14, 2010
barzilay2005sentence Sentence Fusion for Multidocument News Summarization
Regina Barzilay, Kathleen McKeown
"Sentence Fusion for Multidocument News Summarization",
Computational Linguistics, 2005.
A system that can produce informative summaries, highlighting common information found in many online documents, will help Web users to pinpoint information that they need without extensive reading. In this article, we introduce sentence fusion, a novel text-to-text generation technique for synthesizing common information across documents. Sentence fusion involves bottom-up local multisequence alignment to identify phrases conveying similar information and statistical generation to combine common phrases into a sentence. Sentence fusion moves the summarization field from the use of purely extractive methods to the generation of abstracts that contain sentences not found in any of the input documents and can synthesize information across sources.
1. Introduction
Redundancy in large text collections, for natural language systems:
problems : difficulties for end users of search engines and news providers
opportunitie: can be exploited to identify important and accurate information for applications such as summarization and question answering
It would be highly desirable to have a mechanism that could identify common information among multiple related documents and fuse it into a coherent text. This article presents a method for sentence fusion that exploits redundancy to achieve this task in the context of multidocument summarization.
2. Framework for Sentence Fusion: MultiGen
3. Sentence Fusion
4. Sentence Fusion Evaluation
5. Related Work
6. Conclusions and Future Work
"Sentence Fusion for Multidocument News Summarization",
Computational Linguistics, 2005.
A system that can produce informative summaries, highlighting common information found in many online documents, will help Web users to pinpoint information that they need without extensive reading. In this article, we introduce sentence fusion, a novel text-to-text generation technique for synthesizing common information across documents. Sentence fusion involves bottom-up local multisequence alignment to identify phrases conveying similar information and statistical generation to combine common phrases into a sentence. Sentence fusion moves the summarization field from the use of purely extractive methods to the generation of abstracts that contain sentences not found in any of the input documents and can synthesize information across sources.
1. Introduction
Redundancy in large text collections, for natural language systems:
problems : difficulties for end users of search engines and news providers
opportunitie: can be exploited to identify important and accurate information for applications such as summarization and question answering
It would be highly desirable to have a mechanism that could identify common information among multiple related documents and fuse it into a coherent text. This article presents a method for sentence fusion that exploits redundancy to achieve this task in the context of multidocument summarization.
2. Framework for Sentence Fusion: MultiGen
3. Sentence Fusion
4. Sentence Fusion Evaluation
5. Related Work
6. Conclusions and Future Work
Tuesday, April 13, 2010
hahn2000challenges The Challenges of Automatic Summarization
+not added to my bibtex yet
Udo Hahn, Inderjeet Mani: The Challenges of Automatic Summarization. IEEE Computer 33(11): 29-36 (2000)
Udo Hahn, Inderjeet Mani: The Challenges of Automatic Summarization. IEEE Computer 33(11): 29-36 (2000)
sekine2003survey A Survey for Multi-Document Summarization
not yet added in my BibText
A Survey for Multi-Document Summarization
Satoshi Sekine and Chikashi Nobata
The Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Workshop on Text Summarization ; 2003; Edmonton, Canada
Abstract
Automatic Multi-Document summarization is still hard to realize. Under such circumstances, we believe, it is important to observe how humans are doing the same task, and look around for different strategies.
We prepared 100 document sets similar to the ones used in the DUC multi-document summarization task. For each document set, several people prepared the following data and we conducted a survey.
A) Free style summarization
B) Sentence Extraction type summarization
C) Axis (type of main topic)
D) Table style summary
In particular, we will describe the last two in detail, as these could lead to a new direction for multi-summarization research.
1 Introduction
-tantangan: single-doc dan multi-doc summarization performansi tidak jauh dari baseline
-perlu pendekatan baru summarization: mencontoh manusia
-penulis mencoba membuat ringkasan dengan cara: meng highlight frase atau kalimat yang penting, kemudian menghubung-hubungkan sehingga didapat topik2 utama atau yg umum atau berupa list atau tabel. Hasilnya: ringkasan yang bagus
-Meskipun hasil bukan berupa kalimat yang mudah dibaca
-pertanyaan: secara umum berapa jenis "topik utama" yang bis dibuat, dan berapa persen rimgkasn unt tabel.
- tulisan ini ttg manul summary dg 100 document DUC-like
2 Document Sets
3 Task and annotator
4 Free style summarization
5 Sentence Extraction
6 Axis
7 Table
8 Discussion
9 FutureWork
A Survey for Multi-Document Summarization
Satoshi Sekine and Chikashi Nobata
The Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Workshop on Text Summarization ; 2003; Edmonton, Canada
Abstract
Automatic Multi-Document summarization is still hard to realize. Under such circumstances, we believe, it is important to observe how humans are doing the same task, and look around for different strategies.
We prepared 100 document sets similar to the ones used in the DUC multi-document summarization task. For each document set, several people prepared the following data and we conducted a survey.
A) Free style summarization
B) Sentence Extraction type summarization
C) Axis (type of main topic)
D) Table style summary
In particular, we will describe the last two in detail, as these could lead to a new direction for multi-summarization research.
1 Introduction
-tantangan: single-doc dan multi-doc summarization performansi tidak jauh dari baseline
-perlu pendekatan baru summarization: mencontoh manusia
-penulis mencoba membuat ringkasan dengan cara: meng highlight frase atau kalimat yang penting, kemudian menghubung-hubungkan sehingga didapat topik2 utama atau yg umum atau berupa list atau tabel. Hasilnya: ringkasan yang bagus
-Meskipun hasil bukan berupa kalimat yang mudah dibaca
-pertanyaan: secara umum berapa jenis "topik utama" yang bis dibuat, dan berapa persen rimgkasn unt tabel.
- tulisan ini ttg manul summary dg 100 document DUC-like
2 Document Sets
3 Task and annotator
4 Free style summarization
5 Sentence Extraction
6 Axis
7 Table
8 Discussion
9 FutureWork
afantenos2005summarization Summarization from medical documents: a survey
Summary
Objective: The aim of this paper is to survey the recent work in medical documents summarization.
Background: During the last decade, documents summarization got increasing attention by the AI research community. More recently it also attracted the interest of the medical research community as well, due to the enormous growth of information that is available to the physicians and researchers in medicine, through the large and growing number of published journals, conference proceedings, medical sites and portals on the World Wide Web, electronic medical records, etc.
Methodology: This survey gives first a general background on documents summarization, presenting the factors that summarization depends upon, discussing evaluation issues and describing briefly the various types of summarization techniques. It then examines the characteristics of the medical domain through the different types of medical documents. Finally, it presents and discusses the summarization techniques used so far in the medical domain, referring to the corresponding systems and their characteristics.
Discussion and conclusions: The paper discusses thoroughly the promising paths for future research in medical documents summarization. It mainly focuses on the issue of scaling to large collections of documents in various languages and from different media, on personalization issues, on portability to new sub-domains, and on the integration of summarization technology in practical applications.
KEYWORDS
Summarization from medical documents;
Single-document summarization;
Multi-document summarization;
Multi-media summarization;
Extractive summarization;
Abstractive summarization;
Cognitive summarization
1. Introduction
2. Summarization roadmap
3. The medical domain
4. Summarization techniques in the medical domain
5. Promising paths for future research
6. Conclusions
Objective: The aim of this paper is to survey the recent work in medical documents summarization.
Background: During the last decade, documents summarization got increasing attention by the AI research community. More recently it also attracted the interest of the medical research community as well, due to the enormous growth of information that is available to the physicians and researchers in medicine, through the large and growing number of published journals, conference proceedings, medical sites and portals on the World Wide Web, electronic medical records, etc.
Methodology: This survey gives first a general background on documents summarization, presenting the factors that summarization depends upon, discussing evaluation issues and describing briefly the various types of summarization techniques. It then examines the characteristics of the medical domain through the different types of medical documents. Finally, it presents and discusses the summarization techniques used so far in the medical domain, referring to the corresponding systems and their characteristics.
Discussion and conclusions: The paper discusses thoroughly the promising paths for future research in medical documents summarization. It mainly focuses on the issue of scaling to large collections of documents in various languages and from different media, on personalization issues, on portability to new sub-domains, and on the integration of summarization technology in practical applications.
KEYWORDS
Summarization from medical documents;
Single-document summarization;
Multi-document summarization;
Multi-media summarization;
Extractive summarization;
Abstractive summarization;
Cognitive summarization
1. Introduction
2. Summarization roadmap
3. The medical domain
4. Summarization techniques in the medical domain
5. Promising paths for future research
6. Conclusions
Subscribe to:
Posts (Atom)