Monday, April 12, 2010

survey paper review: jones2007automatic & das2007survey

At the beginning of this study I plan to read survey articles in depth and quickly read some important papers referred by the survey. Not many existing survey paper I found. Meanwhile, I get two papers (Jones 2007 shortened as J) and (Das 2007 shortened as D), whereby the latter is an unpublished paper, but pretty good.


In short it can be can said that (J) provides a new understanding of some of the general framework of summarization, while (D) explain in a quite clear and detailed summarization methods.

What's new in these paper? explain it; what are the advantages with these new stuffs?

One characteristic of a good review article is to provide a framework, a systematic category of topic discussed. (A) present three summarization frameworks: for the system itself, for the system evaluation, and for the factors affecting summarization, while (D), although not deliver a new framework but there is an interesting subcategory which related and support with text summarization research (i.e. "other approach to summarization").

With framework, then it will help map out the research topic. Although the 'factors affect summarization' framework in my opinion is too detail (too many parts) such as for example in the "input: other factors"

The paper was easy to follow?
(D) is easier to be understood while the (J) is more difficult because of the more abstract discussion.

Describing the role of machine learning techniques in comprehensive and detail?(J) discuss in brief while (D) in a longer, detailed and clearer.


Comparing (methods, performance) among summarization system?To compare and observe the relationship among methods used, it will be easier if the categorization be made in advance, may be as a tree. Similarly the features used, or in general the text representation. In (J) was covered it but too short and need to be more systematic and clear divisions.

Explaining early works?
This both paper, particularly (J) does not present a clear picture of the early systems. Early systems which influence to subsequent research is important enough to be explained. However, in accordance with its title (J) concentrated on "state of the art" systems in the last decade (~1997-2007). (D) describe clearly and at length about the early system of multi-document summarization, which not a long ago that is in 1995. Multi-document summarization is a text summarization's application and research which is important and dominant at the moment.

Evaluation system
(J) writes with comprehensive framework, while (D) explains two automatic evaluation systems in detail.


Single-doc, multi-doc summarization. What is the challenges for multi-doc summarization?
In (J) there is no clear discussion for the single vs. multi-doc summarization issue. For news corpus, the opening sentences are adequate enough to be a summary, so the automatic summarization is considered less useful. The majority of current research is multi-document summarization. Unfortunately in both papers it is not described what are the challenges for multi-document summarization and its components such as information ordering.


Application / implementation of summarization systems.
Both articles are not provide sufficient explanation about the summary implementation and benefits (and examples in real cases), especially for current conditions. Information about these applications is important to give a comprehensive picture for the readers.

Natural langauge (NL) knowledge used, NL resources used by summarization systems.

Both paper explained about the system that uses NL approach (NL deeper knowledge approach), but not delivered in a clear and structured what kind of NL knowledge and NL resources used and identified to support the performance.

Multi-language summarization.
There are still not many researches on multi-language summarization. In (D) multi-language summarization is briefly discussed, while in (J) does not. For non-English speaking people, such as Indonesian people, the necessity for multi-language summary is higher, because so many news sources in English. An aspect that can be analyzed for English text is how to get a good Indonesian summary from these texts? Summarized it first (in English) then translated, or translated first and then summarized. As well another issue, what is the good form of that translated summary, considering machine translation's results is often not satisfactory.


Explaning factors that influence summarization?


(J) explains clearly, while (D) does not explain.

Criticize current state of the art systems?
Both papers clearly explained several current summarization system. But (D) explain in more detail. It would be better if the weakness of the systems described deeper.

Describe main evaluation programmes (DUC, TAC)?
(J) writes in a clear overall picture of this, while (D) does not. As the event has a deep effect on the development of summarization research, explanation as in (J) is essential to help reader get an understanding of the event.


Describing current and predict future trends?
Both does not present at the special section about research trends and future predictions. However for the current trends have been written spread in the article, they provide explanation and examples. Unfortunately there are no enlightenment about the prediction of future research.


Explaining the main challenges of summarization research? (summarization system, evaluation system)
Explained, with quite detailed and clearer, especially in (J) especially for evaluation system.


Features used 
Both convey features used, but it would be better if there is a discussion to compare the role of features that can be/has been used (related to performance boosting).


Performance comparison
It would be better if there is a table to compare the performance along with related data such as the evaluation metric and the corpus used. 


Presenting examples? 
(D) contains examples with more detail and clearer.


Non extraction summarization
Extraction is the most widely used automatic text summarization, though for humans it is less favored because it becomes more difficult to read, but usually still tolerable. Abstractive summarization is a complex process and so far has not produced satisfaction result. In (J) are only discussed briefly about the non-extraction.


Dividing / breaking down / modularization the summarization system?
If the summarization system can indeed be divided into separate parts (and maybe some independent each other), then it will help to understand and also to design a summarization system. Suppose the division based on the summarization stages in (J) which is "interpretation" from the source text into the source representation, "transformation" from a source representation into summary representation, and "generation" from summary representation into the final summary.

No comments:

Post a Comment