• Title/Summary/Keyword: ASK1

Search Result 763, Processing Time 0.026 seconds

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

The Study about 「The Discourse on the Constitutional Symptoms and Diseases」 of Sasangin on the 『Dongyi Suse Bowon』 (『동의수세보원(東醫壽世保元)』 태소음양인(太少陰陽人)의 「병증론(病證論)」에 관(關)한 연구(硏究))

  • Lee, Su-kyung;Song, Il-byung
    • Journal of Sasang Constitutional Medicine
    • /
    • v.11 no.2
    • /
    • pp.1-26
    • /
    • 1999
  • This paper was written in order to understand each constitutional symptoms and diseases with two aspects. The first was to trace the courses to accomplish constitutional symptoms and diseases from that of oriented medicine through "Dongyi Bogam" and the original writing such as "Shanghanlun". The second was to analyze the constitutional diseases with Lee Je-ma's own recognition on human being and the society which was based on the "Dongyi Suse Bowon". The original concepts of 'The Interior Disease' and 'The Exterior Disease' were based on the Nature and the Emotion, the Environmental Frames and the Human Affairs, the Ears Eyes Nose Mouth and the Lung Spleen Liver Kidney. The exterior disease were caused by the abilities of ears to listen, eyes to see, nose to smell, and mouth to taste on the environmental frames which were related one's recognition to society. The interior diseases were caused by the abilities of lung to study, spleen to ask, liver to think, kidney to judge on human affairs which were related the relationship between me and others. So the titles of constitutional diseases were named by these views on his first writing of "Dongyi Suse Bowon" in 1894. So the titles of Taeyangin diseases, 'The lumbar Vertebae Disease Induced by Exopathogen' and 'The Small Intestine Disease Induced by Endopathogen' were still remained as the first writing. But the titles of constitutional diseases were rewritten such as present titles in 1900. In order to express pathology and mechanism of constitutional diseases exactly, he rewrote titles which contained the manifestation sites of diseases, and the symptoms of febrile and cold, and the different congenital formations of organs. The exterior diseases and interior diseases had three characteristics. The first was that the exterior disease injured by the nature which had a tendency to progress slowly and the interior disease injured by the emotion which had a tendency to progress rapidly. The second was not that the interior disease and the exterior disease were separated, but that one influenced the other and these were revealed as a disease together when the diseases continued for a long time. The third was that even though the disease caught together it was included the beginning disease. The symptoms in ordinary times was the origin and clue to recognize the constitutional symptoms and diseases. It enabled to establish the constitutional medicine which treated by different ways according to constitution. It had two characteristics which were different from the Traditional Chinese Medicine in appearance of diseases. The first was that the disease was progressed to the next step from the symptoms in ordinary times. The second was that each constitution had different symptoms which were due to symptoms in ordinary times under the same disease, The third was the manifestation of disease were different from symptoms in ordinary times in the same constitution. But the most important thing was that Lee Je-ma recognized these symptoms in ordinary times as four categories and he presented constitutional symptoms and constitutional disease. The four categories were the method to recognize the human being and the diseases for him As the symptoms and diseases of Sasang Constitutional Medicine were compared to Traditional Chinese Medicine, the constitutional diseases of "Dongyi Suse Bowon" could be classified into two groups. The first group was the unique diseases and symptoms, which were not in the Traditional Chinese Medicine, and which were established by the Lee Je-ma. These contained the diseases of taeyangin, the exterior disease of taeumin, the exterior disease of soyangin. The second group used the unique methods to treat disease, which were not in Traditional Chinese Medicine, and which were established by Lee Je-ma. This contained the interior disease of taeumin, the delirium diseases from the MangYin of soyangin, the treatment to help the Yang-Qi ascend and to supplement the ql In the exterior disease of soeumin. Especially, the diseases of taeyangin and taeumin which were caused by the metabolism disorders of Qi-Yack(氣液) were the great achievement to establish constitutional symptoms and diseases. The discourse of taeyangin diseases presented his original thought to recognize the symptoms and diseases through the Shin Gi Hyul Jeong(神氣血精) and the Qi-Yack, the discourse of taeumin diseases presented the disperse of Qi-Yack through the forward and backward of sweat, the discourse of soyangin disease presented the sweat of hand and feet which was manifested that yin-qi of spleen descended to yin qi of kidney, and the bowel movement which was manifested that yang qi of large intestine ascend to head, face and four extremities, the discourse of soeumin disease presented the Jueyin syndrome without the abdominal pain and diarrhea as the exterior disease and made importance to the nervous mind And the classification of exterior diseases and interior diseases were not due to the pharmacology but due to the symptoms and diseases according to the constitution.

  • PDF

A study on the Greeting's Types of Ganchal in Joseon Dynasty (간찰(簡札)의 안부인사(安否人事)에 대한 유형(類型) 연구(硏究))

  • Jeon, Byeong-yong
    • (The)Study of the Eastern Classic
    • /
    • no.57
    • /
    • pp.467-505
    • /
    • 2014
  • I am working on a series of Korean linguistic studies targeting Ganchal(old typed letters in Korea) for many years and this study is for the typology of the [Safety Expression] as the part. For this purpose, [Safety Expression] were divided into a formal types and semantic types, targeting the Chinese Ganchal and Hangul Ganchal of modern Korean Language time(16th century-19th century). Formal types can be divided based on whether Normal position or not, whether Omission or not, whether the Sending letter or not, whether the relationship of the high and the low or not. Normal position form and completion were made the first type which reveal well the typicality of the [Safety Expression]. Original position while [Own Safety] omitted as the second type, while Original position while [Opposite Safety] omitted as the third type, Original position while [Safety Expression] omitted as the fourth type. Inversion type were made as the fifth type which is the most severe solecism in [Safety Expression]. The first type is refers to Original position type that [Opposite Safety] precede the [Own Safety] and the completion type that is full of semantic element. This type can be referred to most typical and normative in that it equipped all components of [Safety Expression]. A second type is that [Safety Expression] is composed of only the [Opposite Safety]. This type is inferior to the first type in terms of set pattern, it is never outdone when it comes to the appearance frequency. Because asking [Opposite Safety] faithfully, omitting [Own Safety] dose not greatly deviate politeness and easy to write Ganchal, it is utilized. The third type is the Original position type showing the configuration of the [Opposite Safety]+Own Safety], but [Opposite Safety] is omitted. The fourth type is a Original position type showing configuration of the [Opposite Safety+Own Safety], but [Safety Expression] is omitted. This type is divided into A ; [Safety Expression] is entirely omitted and B ; such as 'saving trouble', the conventional expression, replace [Safety Expression]. The fifth type is inversion type that shown to structure of the [Own Safety+Opposite Safety], unlike the Original position type. This type is the most severe solecism type and real example is very rare. It is because let leading [Own Safety] and ask later [Opposite Safety] for face save is offend against common decency. In addition, it can be divided into the direct type that [Opposite Safety] and [Own Safety] is directly connected and indirect type that separate into the [story]. The semantic types of [Safety Expression] can be classified based on whether Sending letter or not, fast or slow, whether intimate or not, and isolation or not. For Sending letter, [Safety Expression] consists [Opposite Safety(Climate+Inquiry after health+Mental state)+Own safety(status+Inquiry after health+Mental state)]. At [Opposite safety], [Climate] could be subdivided as [Season] information and [Climate(weather)] information. Also, [Mental state] is divided as receiver's [Family Safety Mental state] and [Individual Safety Mental state]. In [Own Safety], [Status] is divided as receiver's traditional situation; [Recent condition] and receiver's ongoing situation; [Present condition]. [Inquiry after health] is also subdivided as receiver's [Family Safety] and [Individual Safety], [Safety] is as [Family Safety] and [Individual Safety]. Likewise, [Inquiry after health] or [Safety] is usually used as pairs, in dimension of [Family] and [Individual]. This phenomenon seems to have occurred from a big family system, which is defined as taking care of one's parents or grand parents. As for the Written Reply, [Safety Expression] consists [Opposite Safety (Reception+Inquiry after health+Mental state)+Own safety(status+Inquiry after health+Mental state)], and only in [Opposite safety], a difference in semantic structure happens with Sending letter. In [Opposite Safety], [Reception] is divided as [Letter] which is Ganchal that is directly received and [Message], which is news that is received indirectly from people. [Safety] is as [Family Safety] and [Individual Safety], [Mental state] also as [Family Safety Mental state] and [Individual Safety Mental state].