• Title/Summary/Keyword: English words

Search Result 659, Processing Time 0.027 seconds

Restoring Omitted Sentence Constituents in Encyclopedia Documents Using Structural SVM (Structural SVM을 이용한 백과사전 문서 내 생략 문장성분 복원)

  • Hwang, Min-Kook;Kim, Youngtae;Ra, Dongyul;Lim, Soojong;Kim, Hyunki
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.131-150
    • /
    • 2015
  • Omission of noun phrases for obligatory cases is a common phenomenon in sentences of Korean and Japanese, which is not observed in English. When an argument of a predicate can be filled with a noun phrase co-referential with the title, the argument is more easily omitted in Encyclopedia texts. The omitted noun phrase is called a zero anaphor or zero pronoun. Encyclopedias like Wikipedia are major source for information extraction by intelligent application systems such as information retrieval and question answering systems. However, omission of noun phrases makes the quality of information extraction poor. This paper deals with the problem of developing a system that can restore omitted noun phrases in encyclopedia documents. The problem that our system deals with is almost similar to zero anaphora resolution which is one of the important problems in natural language processing. A noun phrase existing in the text that can be used for restoration is called an antecedent. An antecedent must be co-referential with the zero anaphor. While the candidates for the antecedent are only noun phrases in the same text in case of zero anaphora resolution, the title is also a candidate in our problem. In our system, the first stage is in charge of detecting the zero anaphor. In the second stage, antecedent search is carried out by considering the candidates. If antecedent search fails, an attempt made, in the third stage, to use the title as the antecedent. The main characteristic of our system is to make use of a structural SVM for finding the antecedent. The noun phrases in the text that appear before the position of zero anaphor comprise the search space. The main technique used in the methods proposed in previous research works is to perform binary classification for all the noun phrases in the search space. The noun phrase classified to be an antecedent with highest confidence is selected as the antecedent. However, we propose in this paper that antecedent search is viewed as the problem of assigning the antecedent indicator labels to a sequence of noun phrases. In other words, sequence labeling is employed in antecedent search in the text. We are the first to suggest this idea. To perform sequence labeling, we suggest to use a structural SVM which receives a sequence of noun phrases as input and returns the sequence of labels as output. An output label takes one of two values: one indicating that the corresponding noun phrase is the antecedent and the other indicating that it is not. The structural SVM we used is based on the modified Pegasos algorithm which exploits a subgradient descent methodology used for optimization problems. To train and test our system we selected a set of Wikipedia texts and constructed the annotated corpus in which gold-standard answers are provided such as zero anaphors and their possible antecedents. Training examples are prepared using the annotated corpus and used to train the SVMs and test the system. For zero anaphor detection, sentences are parsed by a syntactic analyzer and subject or object cases omitted are identified. Thus performance of our system is dependent on that of the syntactic analyzer, which is a limitation of our system. When an antecedent is not found in the text, our system tries to use the title to restore the zero anaphor. This is based on binary classification using the regular SVM. The experiment showed that our system's performance is F1 = 68.58%. This means that state-of-the-art system can be developed with our technique. It is expected that future work that enables the system to utilize semantic information can lead to a significant performance improvement.

A Study on Effect of B/L's Exemption Clauses Relating to the Governing Law of English Law (영국법의 준거법과 관련한 선하증권 면책약관의 효력에 관한 연구)

  • Han, Nak-Hyun;Jung, Jun-Sik
    • Journal of Korea Port Economic Association
    • /
    • v.22 no.4
    • /
    • pp.1-17
    • /
    • 2006
  • In the Bill of Lading of The Irbenskiy Proliv is not subject to the Hague-Visby Rules in accordance with paragraphs (A) and/or (E) of cl.1 or to the Hague Rules in accordance with paragraphs (B) and/or (D) of cl.1. The Irbenskiy Proliv is very rare case that is effective to exempt the carrier as literal words of Bill of Lading. The action concerns cargoes of perishable goods shipped from Brazil to Japan, under Bills of Lading each of which contained an extensive carrier's exemption clause. A preliminary issue was ordered to be determined on the question whether c1.4 is effective to exempt the ralliers from any potential liability for the claims in this case. The court held that there is no reason to reject c1.4 as part of each of the contracts contained in or evidenced by the bills of lading; and it protects the carrier where damage to the goods shipped results from such causes. It is therefore effective to exempt the carriers from any potential liability for those claims.

  • PDF

Analysis of Research Trends of 'Word of Mouth (WoM)' through Main Path and Word Co-occurrence Network (주경로 분석과 연관어 네트워크 분석을 통한 '구전(WoM)' 관련 연구동향 분석)

  • Shin, Hyunbo;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.179-200
    • /
    • 2019
  • Word-of-mouth (WoM) is defined by consumer activities that share information concerning consumption. WoM activities have long been recognized as important in corporate marketing processes and have received much attention, especially in the marketing field. Recently, according to the development of the Internet, the way in which people exchange information in online news and online communities has been expanded, and WoM is diversified in terms of word of mouth, score, rating, and liking. Social media makes online users easy access to information and online WoM is considered a key source of information. Although various studies on WoM have been preceded by this phenomenon, there is no meta-analysis study that comprehensively analyzes them. This study proposed a method to extract major researches by applying text mining techniques and to grasp the main issues of researches in order to find the trend of WoM research using scholarly big data. To this end, a total of 4389 documents were collected by the keyword 'Word-of-mouth' from 1941 to 2018 in Scopus (www.scopus.com), a citation database, and the data were refined through preprocessing such as English morphological analysis, stopwords removal, and noun extraction. To carry out this study, we adopted main path analysis (MPA) and word co-occurrence network analysis. MPA detects key researches and is used to track the development trajectory of academic field, and presents the research trend from a macro perspective. For this, we constructed a citation network based on the collected data. The node means a document and the link means a citation relation in citation network. We then detected the key-route main path by applying SPC (Search Path Count) weights. As a result, the main path composed of 30 documents extracted from a citation network. The main path was able to confirm the change of the academic area which was developing along with the change of the times reflecting the industrial change such as various industrial groups. The results of MPA revealed that WoM research was distinguished by five periods: (1) establishment of aspects and critical elements of WoM, (2) relationship analysis between WoM variables, (3) beginning of researches of online WoM, (4) relationship analysis between WoM and purchase, and (5) broadening of topics. It was found that changes within the industry was reflected in the results such as online development and social media. Very recent studies showed that the topics and approaches related WoM were being diversified to circumstantial changes. However, the results showed that even though WoM was used in diverse fields, the main stream of the researches of WoM from the start to the end, was related to marketing and figuring out the influential factors that proliferate WoM. By applying word co-occurrence network analysis, the research trend is presented from a microscopic point of view. Word co-occurrence network was constructed to analyze the relationship between keywords and social network analysis (SNA) was utilized. We divided the data into three periods to investigate the periodic changes and trends in discussion of WoM. SNA showed that Period 1 (1941~2008) consisted of clusters regarding relationship, source, and consumers. Period 2 (2009~2013) contained clusters of satisfaction, community, social networks, review, and internet. Clusters of period 3 (2014~2018) involved satisfaction, medium, review, and interview. The periodic changes of clusters showed transition from offline to online WoM. Media of WoM have become an important factor in spreading the words. This study conducted a quantitative meta-analysis based on scholarly big data regarding WoM. The main contribution of this study is that it provides a micro perspective on the research trend of WoM as well as the macro perspective. The limitation of this study is that the citation network constructed in this study is a network based on the direct citation relation of the collected documents for MPA.

A Study on Views of Vital Capital in Film (영화 <기생충>에 나타난 생명자본의 관점에 관한 연구)

  • Kang, Byoung-Ho
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.3
    • /
    • pp.75-88
    • /
    • 2021
  • The film won the Golden Palm Award at the Cannes Film Festival, and received the Academy Award for a non-English-speaking film in February 2020, respectively. It has received a monumental evaluation in the world film history. Overall, this film is about class conflict, and critics evaluate the theme of the film as "badly twisted class gap" and "anger from class." The film expresses an intrinsic conflict embodied in culture as a "tragedy in which no bad person appears," rather than the dichotomous composition of the classical class struggle from Marxism. In other words, this can be seen as expressing the substrated class relationship of the modern society that Pierre Bourdieu had argued. This film has been focused as a controversial target under Korea society with excess of ideology. Politics used to adopt the keyword, 'parasite', for political disputes not only in culture contents world. Paradoxically socialism China did not allow to release film 'Parasite.' On the other hand, Lee O-Yong argues that the movie "Parasite" does not look at social phenomena through a dichotomous perspective, but is viewed through a "double perspective" and evaluates that it does not lose eyes looking at humans through tension. This view is based upon 'Vital Capitalism'. Lee. O-Yong looks at the movie "Parasite" from the perspective of "Vital Capitalism". The theory of Vital Capitalism does not seek to find the root of historical development in class struggle conflicts, but rather figuring out history and society pays attention onto the intrinsic characteristics of life, Topophilia, Neophilia, and Biophilia. Lee Eo-ryeong argues that the development of civilization theory evolved from the stage of Hobbes' Darwinism or predatism to the stage of host vs. parasite of Michel Serres, and onto the stage of Margulis's 'Win-Win (inter-dependence)'. In this paper, after overview of vital capital concept and preceeding research, re-interpretations were tried onto scenes based upon fields from habitus, culture capital. This exploration looks for a alternative for excess of ideology in Korea society.

Research Trends in The Journal of Daesoon Academy of Sciences : 『The Journal of Daesoon』 Vol.1-Vol.25 (1996~2015) (『대순사상논총』의 연구 동향에 관한 연구- 『대순사상논총』 1집-25집(1996~2015) -)

  • Chang, In-ho
    • Journal of the Daesoon Academy of Sciences
    • /
    • v.27
    • /
    • pp.201-243
    • /
    • 2016
  • This paper analyzes the research trends from 358 scholarly articles published in the Journal of Daesoon Academy of Sciences from the first published journal in 1996 to the most recent journal published on the 25th of 2015 and proposes ideas for improvement. First of all, "The Journal of Daesoon Academy of Sciences" does not meet the standards required by the National Research Foundation, falling short of the most important conditions for the registration such as the periodicity and punctuality expected from academic journals. Furthermore, in terms of the Bibliometrical analysis, the number of articles published by the journal is decreasing and the consistency, with regards to rules and principles regulating publication details and bibliography formats, is nonexistent. Although various authors seemed to be meeting these criteria on the surface, the ratio of co-authored articles is too small. Securing researchers specializing in Daesoon Thought for expanding the size of the journal is important, but it is also important to diversify the research topics through exchanging ideas among researchers from various organizations. Here are some ideas for the improvement of the Journal of Daesoon Academy of Sciences: First, in order to meet the standards for punctuality and periodicity, it would be best to publish the journal twice a year with 12 to 15 articles. Second, the journal must become searchable through the creation of a database. Third, the key words and abstracts of articles must be written in Korean and English to facilitate the sharing of articles among researchers. Fourth, the journal must have a diverse and outstanding editorial board which takes into account the geographical situations of its board members. Fifth, the Journal must include articles on relevant topics that reflect the core topics of the Daesoon Thought and other studies. Sixth, articles must have a front page that contains bibliographical items to convey information to the reader. Seventh, it is essential that the journal have a clear publication date detailing the year, month, and day as well as a standard numbering scheme (i.e, Vol. and no).

A Study on the Keyboard of Jawi Script (Arabic-Malay Script) (아랍식-말레이문자(Jawi Script) 키보드(Keyboard)에 관한 연구)

  • KANG, Kyoung Seok
    • SUVANNABHUMI
    • /
    • v.3 no.1
    • /
    • pp.47-66
    • /
    • 2011
  • Malay society is rooted on the Islamic concept. That Islam influenced every corner of that Malay society which had ever been an edge of the civilizations of the Indus and Ganges. Once the letters of that Hindu religion namely Sanscrit was adopted to this Malay society for the purpose of getting the Malay language, that is, Bahasa Melayu down to the practical literation but in vain. The Sanscrit was too complicated for Malay society to imitate and put it into practice in everyday life because it was totally different type of letters which has many of the similar allographs for a sound. In the end Malay society gave it up and just used the Malay language without using any letters for herself. After a few centuries Islam entered this Malay society with taking Arabic letters. It was not merely influencing Malay cultures, but to the religious life according to wide spread of that Islam. Finally Arabic letters was to the very means that Malay language was written by. It means that Arabic letters had been used for Arabic language in former times, but it became a similar form of letters for a new language which was named as Malay language. This Arabic letters for Arabic language has no problems whereas Arabic letters for Malay language has some of it. Naturally speaking, arabic letters was not designed for any other language but just for Arabic language itself. On account of this, there occurred a few problems in writing Malay consonants, just like p, ng, g, c, ny and v. These 6 letters could never be written down in Arabic letters. Those 6 ones were never known before in trying to pronounce by Arab people. Therefore, Malay society had only to modify a few new forms of letters for these 6 letters which had frequently been found in their own Malay sounds. As a result, pa was derived from fa, nga was derived from ain, ga was derived from kaf, ca was derived from jim, nya was derived from tha or ba, and va was derived from wau itself. Where must these 6 newly modified letters be put on this Arabic keyboard? This is the very core of this working paper. As a matter of course, these 6 letters were put on the place where 6 Arabic signs which were scarecely written in Malay language. Those 6 are found when they are used only in the 'shift-key-using-letters.' These newly designed 6 letters were put instead of the original places of fatha, kasra, damma, sukun, tanween and so on. The main differences between the 2 set of 6 letters are this: 6 in Arabic orginal keyboard are only signs for Arabic letters, on the other hand 6 Malay's are real letters. In others words, 6 newly modified Malay letters were substituted for unused 6 Arabic signs in Malay keyboard. This type of newly designed Malay Jawi Script keyboard is still used in Malaysia, Brunei and some other Malay countries. But this sort of keyboard also needs to go forward to find out another way of keyboard system which is in accordance with the alphabetically ordered keyboard system. It means that alif is going to be typed for A key, and zai shall be typed when Z key is pressed. This keyboard system is called 'Malay Jawi-English Rumi matching keyboard system', even though this system should probably be inconvenient for Malay Jawi experts who are good at Arabic 'alif-ba-ta'order.

  • PDF

Burqanism from the Origin of the Pastoral Nomadic Koryo Region and the Vision of Korean Livestock Farming (고려의 원시영역 유목초지, 그 부르칸(불함)이즘과 한국축산의 비전)

  • Chu Chae Hyok
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.25 no.1
    • /
    • pp.71-82
    • /
    • 2005
  • Khori(高麗) refers to the Chaabog(reindeer) that live on lichens(蘚) on Mt. Soyon(鮮) in which pastures are the cold and dry plateau of North Eurasia. Thus, the origin region of the Khori or Koguryo that are the ancestors of the reindeer-herding pastoral nomads(馴鹿 遊牧民) can be said to be the Steppe-Taiga-Tundra pastoral areas of North Eurasia and North America. When the pastoral nomads moved on to the great mountain(大山) zone of the Jangbaek(長白) to the Baekdu(白頭) Mountains, they could have been in contact with pastoral farmers or agricultural farmers living there and they became the farmers remaining on agricultural farms. They were the Koryo people, the ancestors of Korea. Staying in one place, they gradually forgot the origin of their reindeer-herding pastoral nomadic history in the Northwest area of Mt. Soyon, the small mountain(小山) zone of the Steppe-Taiga-Tundra pastoral areas. In other words, they lost their identity as reindeer-herding pastoral nomads when they entered the agricultural area after leaving the pastoral area. However, since their basic genes had already formed when they lived on the cold and dry plateau of North Eurasia, it is possible to study their pastoral nomadic history focusing on 'the minority living in the broad area(廣域少數)', by utilizing highly advanced biotechnological science and focusing on genes and information technology innovation, and removing various past hindrances in research. Therefore, it is not so difficult to restore the reindeerherding pastoral nomadic history of the Koguryo(高句麗) people and secure their pastoral nomadic identity, of which the first steps have already been taken into their historical stages. The Eurasian continent and the Korean peninsula, especially the cold and dry plateau of North Eurasia and the Korean peninsula have been closely related to each other ecologically and historically. They can never be a separate space at all. The Eurasian continent lies horizontally east to west and thus, the continent forms an isothermal zone. Also, since the time of producing their own foods, it was relatively easy for people with their technology to move to other places owing to the pastoral nomadic characteristic of mobility. Unlike the Chungyen(中原) region, western Asia and the regions covering the Siberia-Manchu-Korean peninsula where food production revolution was first made were connected to the Mongolian lichens route(蘚苔之路: Ni, ukinii jam) and steppe roads. Although the ecological conditions of nature have changed a bit throughout a long history, it was natural for the many tribes in North Asia living on the largest Steppe-Taiga-Tundra area in the world to have believed 'the legends related to animals in relation to their founders and ancestors(獸祖傳說)'. Assuming that Siberian tigers and the tigers living on Mt. Baekdu were connected ecologically and genetically because of the ecological characteristics of the animals, and their migration from plateau to plateau, we would suspect that the Chosun(朝鮮) tribe living on Mt. Baekdu were ethnically and culturally more closely connected to the farther removed Ural-Altai tribes that lived on the cold and dry plateau region than to the Han(i14;) tribe who lived in Chungyen(中原) that was close to Mt. Baekdu. More evidence is the structure of the Korean language which has the form of 'Subject + Object + Verb', which is assumed to have originated from the speedy lifestyle of the reindeer-herding pastoral nomads. The structure is quite different from that of the Han(漢) language, which is based on agricultural life. Also, it is natural for reindeer riding reindeerherding pastoral nomads or horse-riding sheep-herding pastoral nomads(騎馬, 羊遊牧民) to have held military and political power over the region and eventually to have established an ancient pastoral nomadic empire in the process of their conquest of agricultural regions. The stages for founding global empires in the history of mankind maybe largely divided into two, in terms of ecological conditions and occupations. They are the steppes and the oceans. Of course, the steppe-based empires were established based on the skills to deal with horses and the ability to shoot arrows while riding horses, along with the use of iron ware in the 8th century BC. The steppe-based empires became the foundation for an oceanic empire, which could have been established by the use of warships and warship guns since the 15th Century. Based on those facts, we know that Chosun, Puyo(夫餘), and Koguryo are the products of a developmental process of pastoral nomadic empires on the steppes. Maybe we can easily find the pastoral nomadic identity of the Koguryo more than we expected when we trace the origins and history of the Korean tribe living in the pastures located in the northwest area of Mt. Jangbaek by focusing on pastoral nomadic mobility and organization just as we have investigated the historic origins of Anglo-Saxons in America by focusing on the times before the 15th Century. In the process, we should keep in mind that English culture originated from the Industrial Revolution and was directly delivered to the American continent, although America was far from England and was not an intermediate point on long sojourns either. Further, American culture came back to England in a more advanced form later. The most important thing currently to be resolved is to cause Koreans to look back on their own history in a freer way of thinking and with diverse, profound, and sharp insight, taking away the old and existing conventional recognition that is entangled with complicated interests with Korean people and other countries. The meanings of Chosun, Khori, and Solongos have been interpreted arbitrarily without any historic evidence by the scholars who followed conventional tradition of fixed-minded aristocrats in an agricultural society. If the Siberian cultural properties of the stone age, the earthenware age, the bronze age, and the iron age are analyzed in such a way, archaeological discovery will never be able to contribute to the restoration of the Koguryo's pastoral nomadic identity. One should transcend the errors that tend to interpret the cultural properties discovered in the pastoral nomadic regions as not being differentiated from those of agricultural regions and just interpret them altogether from the agricultural point of view. A more careful intention is required in the interpretation of cultural properties of ancient Korean empires that seem to have been formed due to mutual interactions of pastoral nomadic and agricultural cultures. Also, it is required that the conventional recognition chain of 'reverse-genes' be severed, which has placed more weight on agricultural properties than pastoral nomadic ones, since their settlement on agricultural farms was made after the establishment of their ancient pastoral nomadic empires. There is no reason at all to place priority on stoneware, earthenware, bronze ware, and iron ware than on wooden ware(木器) and other ware which were made of animal skins(皮器), bones and horns(骨角器), in analyzing the history in the regions of reindeer or sheep pastures. Reading ancient Korean history from the perspective of pastoral nomadic history, one feels strongly the instinctive emotions to return to the natural 'mother place'. The reindeer-herding pastoral nomadic identity of the Koguryo people that has been accumulated in volumes in their genes and hidden deep inside and have interacted organically could be reborn with Burqanism(Burqan refers to 不咸 in Chinese), which was their religion by birth and symbolized as the red willow(紅柳=不咸). The mother place of the Koguryo's people is the endless vast green pastures of North Eurasia and North America, where we anticipated the development of Korean livestock farming following the inherent properties in the genes of the reindeer-herding pastoral nomads with Korean ancestors. We anticipate that the place would be the core resource that could contribute to the development of life of living creatures following the inherent properties of their genes and biotechnological factors. In other words, biotechnology used for a search for clues on the well-being of humans could be the fruit brought by Burqanism of the Koguryo people and the fruit of the globalization of Korean livestock farming. It is the Chosun farmer in China come from the vast nomadic reindeer pastures of North Eurasia that resolved the food problem of a billion Chinese people with lowland paddy rice seeds (水稻) by transforming Heilongjiang Province(黑龍江省) into an oceanic lowland paddy rice field(水田). Even Mao Tse-tung(毛擇東) could not resolve the food problem by his revolution campaigns for tens of years. Today is the very time that requires the development of special livestock farming following the inherent properties of the ancient Korean reindeer-herding pastoral nomads that respected the dignity of life on the cold and dry plateau of North Eurasia and the America continent. I suggest that research should be started from the pastures of the Dariganga Steppe in East Mongolia that was the homeland of Hanwoo(韓牛) and the central horse-herding steppe place(牧馬場) of Chingis Khan's Mongolia. The Dariganga Steppe is awash with an affluent natural environment for pastoral nomadic living however, the quality of life of the pastoral nomads there is still low. I suggest we Koreans, the descendents of the Koguryo, should take our first steps for our livestock farming business project and develop the Northern nomadic pastures, here at the pastures of the Dariganga Steppe, which is the Mongolian core place of state-of-the-art technology for military weapons.

Color-related Query Processing for Intelligent E-Commerce Search (지능형 검색엔진을 위한 색상 질의 처리 방안)

  • Hong, Jung A;Koo, Kyo Jung;Cha, Ji Won;Seo, Ah Jeong;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.109-125
    • /
    • 2019
  • As interest on intelligent search engines increases, various studies have been conducted to extract and utilize the features related to products intelligencely. In particular, when users search for goods in e-commerce search engines, the 'color' of a product is an important feature that describes the product. Therefore, it is necessary to deal with the synonyms of color terms in order to produce accurate results to user's color-related queries. Previous studies have suggested dictionary-based approach to process synonyms for color features. However, the dictionary-based approach has a limitation that it cannot handle unregistered color-related terms in user queries. In order to overcome the limitation of the conventional methods, this research proposes a model which extracts RGB values from an internet search engine in real time, and outputs similar color names based on designated color information. At first, a color term dictionary was constructed which includes color names and R, G, B values of each color from Korean color standard digital palette program and the Wikipedia color list for the basic color search. The dictionary has been made more robust by adding 138 color names converted from English color names to foreign words in Korean, and with corresponding RGB values. Therefore, the fininal color dictionary includes a total of 671 color names and corresponding RGB values. The method proposed in this research starts by searching for a specific color which a user searched for. Then, the presence of the searched color in the built-in color dictionary is checked. If there exists the color in the dictionary, the RGB values of the color in the dictioanry are used as reference values of the retrieved color. If the searched color does not exist in the dictionary, the top-5 Google image search results of the searched color are crawled and average RGB values are extracted in certain middle area of each image. To extract the RGB values in images, a variety of different ways was attempted since there are limits to simply obtain the average of the RGB values of the center area of images. As a result, clustering RGB values in image's certain area and making average value of the cluster with the highest density as the reference values showed the best performance. Based on the reference RGB values of the searched color, the RGB values of all the colors in the color dictionary constructed aforetime are compared. Then a color list is created with colors within the range of ${\pm}50$ for each R value, G value, and B value. Finally, using the Euclidean distance between the above results and the reference RGB values of the searched color, the color with the highest similarity from up to five colors becomes the final outcome. In order to evaluate the usefulness of the proposed method, we performed an experiment. In the experiment, 300 color names and corresponding color RGB values by the questionnaires were obtained. They are used to compare the RGB values obtained from four different methods including the proposed method. The average euclidean distance of CIE-Lab using our method was about 13.85, which showed a relatively low distance compared to 3088 for the case using synonym dictionary only and 30.38 for the case using the dictionary with Korean synonym website WordNet. The case which didn't use clustering method of the proposed method showed 13.88 of average euclidean distance, which implies the DBSCAN clustering of the proposed method can reduce the Euclidean distance. This research suggests a new color synonym processing method based on RGB values that combines the dictionary method with the real time synonym processing method for new color names. This method enables to get rid of the limit of the dictionary-based approach which is a conventional synonym processing method. This research can contribute to improve the intelligence of e-commerce search systems especially on the color searching feature.

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

  • Park, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.1-25
    • /
    • 2020
  • Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.