|
Abstract : |
bkbOresearch apple, eom Tradmonally, the document summansatlon task has been tackled rather as a natural language pro-cessing problem, with an. mstanhatecl meaning template being rendered into coherent prose, or as a. passage (~xtractlon problem, where certain.fragments (typ,cally sentences) of the souse doc-ument are deemed to be hlghly representahveof. its content, and thus dehvered as meanmgfid "ap-proxtmahons " of R Balancing the confltctmg re-qmremants of depth and accuracy of a summary, on the one hand, and document and domain m-dependence, on the other, has proven a very hard problem This paper describes a novel approach to content charactensatlon of text documents It ts domain- and genre-independent, by wrtue of not reqmrmg an m-depth analysm of the fifll mean-mg At hhe same trine, it remmns closer to the core meaning by choosing a different granulm'xty of Its representahons (phrasal expresstous rather than sentences or paragraphs), by exploiting a no-tion of dmcourse contlgmty and coherence for the purposes ofumform coverage and context main-tenance, and by utdmmg a strong lmgmstm nohon of sahence, as a more appropriate and representa-bye measure of a document's "aboutness" 1 Capsule overviews The malonty of techmques for "summansatlon", as ap-phed to average-length documents, fall within two broad categories those that rely on template mstantmtlon and those that rely on passage extrachon Work m the former framework traces its roots to some pioneering research by DeJong [7],-and Trot [29], more recently the DARPA-sponsored TIPSTER programme ([2])--and, m parUcular, the message understanchng con-ferencces (MUC e g [6] and [I])--have prowded fertile ground for such work, by placing the emphams of doc-ument analysm to the ldentdlca~on and extracfaon of cer-tain core entttms and facts m a document, which are "packaged " together m a template There are shared mtu-lttons among researchers that generaUon of smooth prose from thts template would ymld a summary of the docu-ment's core content, recent work, most notably by McK-eown and colleagues (cf [21]), focuses on making these mtul~ons more concrete While prowdmg a rich context for research m genera-tlon, this framework requires an analysm front end capa-1Also, |