• Title/Summary/Keyword: research vocabulary

Search Result 265, Processing Time 0.021 seconds

Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being (주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부정성 분석 방법)

  • Choi, Sukjae;Song, Yeongeun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.83-105
    • /
    • 2016
  • Measuring an individual's subjective wellbeing in an accurate, unobtrusive, and cost-effective manner is a core success factor of the wellbeing support system, which is a type of medical IT service. However, measurements with a self-report questionnaire and wearable sensors are cost-intensive and obtrusive when the wellbeing support system should be running in real-time, despite being very accurate. Recently, reasoning the state of subjective wellbeing with conventional sentiment analysis and unstructured data has been proposed as an alternative to resolve the drawbacks of the self-report questionnaire and wearable sensors. However, this approach does not consider contextual polarity, which results in lower measurement accuracy. Moreover, there is no sentimental word net or ontology for the subjective wellbeing area. Hence, this paper proposes a method to extract keywords and their contextual polarity representing the subjective wellbeing state from the unstructured text in online websites in order to improve the reasoning accuracy of the sentiment analysis. The proposed method is as follows. First, a set of general sentimental words is proposed. SentiWordNet was adopted; this is the most widely used dictionary and contains about 100,000 words such as nouns, verbs, adjectives, and adverbs with polarities from -1.0 (extremely negative) to 1.0 (extremely positive). Second, corpora on subjective wellbeing (SWB corpora) were obtained by crawling online text. A survey was conducted to prepare a learning dataset that includes an individual's opinion and the level of self-report wellness, such as stress and depression. The participants were asked to respond with their feelings about online news on two topics. Next, three data sources were extracted from the SWB corpora: demographic information, psychographic information, and the structural characteristics of the text (e.g., the number of words used in the text, simple statistics on the special characters used). These were considered to adjust the level of a specific SWB. Finally, a set of reasoning rules was generated for each wellbeing factor to estimate the SWB of an individual based on the text written by the individual. The experimental results suggested that using contextual polarity for each SWB factor (e.g., stress, depression) significantly improved the estimation accuracy compared to conventional sentiment analysis methods incorporating SentiWordNet. Even though literature is available on Korean sentiment analysis, such studies only used only a limited set of sentimental words. Due to the small number of words, many sentences are overlooked and ignored when estimating the level of sentiment. However, the proposed method can identify multiple sentiment-neutral words as sentiment words in the context of a specific SWB factor. The results also suggest that a specific type of senti-word dictionary containing contextual polarity needs to be constructed along with a dictionary based on common sense such as SenticNet. These efforts will enrich and enlarge the application area of sentic computing. The study is helpful to practitioners and managers of wellness services in that a couple of characteristics of unstructured text have been identified for improving SWB. Consistent with the literature, the results showed that the gender and age affect the SWB state when the individual is exposed to an identical queue from the online text. In addition, the length of the textual response and usage pattern of special characters were found to indicate the individual's SWB. These imply that better SWB measurement should involve collecting the textual structure and the individual's demographic conditions. In the future, the proposed method should be improved by automated identification of the contextual polarity in order to enlarge the vocabulary in a cost-effective manner.

A Research in Applying Big Data and Artificial Intelligence on Defense Metadata using Multi Repository Meta-Data Management (MRMM) (국방 빅데이터/인공지능 활성화를 위한 다중메타데이터 저장소 관리시스템(MRMM) 기술 연구)

  • Shin, Philip Wootaek;Lee, Jinhee;Kim, Jeongwoo;Shin, Dongsun;Lee, Youngsang;Hwang, Seung Ho
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.169-178
    • /
    • 2020
  • The reductions of troops/human resources, and improvement in combat power have made Korean Department of Defense actively adapt 4th Industrial Revolution technology (Artificial Intelligence, Big Data). The defense information system has been developed in various ways according to the task and the uniqueness of each military. In order to take full advantage of the 4th Industrial Revolution technology, it is necessary to improve the closed defense datamanagement system.However, the establishment and usage of data standards in all information systems for the utilization of defense big data and artificial intelligence has limitations due to security issues, business characteristics of each military, anddifficulty in standardizing large-scale systems. Based on the interworking requirements of each system, data sharing is limited through direct linkage through interoperability agreement between systems. In order to implement smart defense using the 4th Industrial Revolution technology, it is urgent to prepare a system that can share defense data and make good use of it. To technically support the defense, it is critical to develop Multi Repository Meta-Data Management (MRMM) that supports systematic standard management of defense data that manages enterprise standard and standard mapping for each system and promotes data interoperability through linkage between standards which obeys the Defense Interoperability Management Development Guidelines. We introduced MRMM, and implemented by using vocabulary similarity using machine learning and statistical approach. Based on MRMM, We expect to simplify the standardization integration of all military databases using artificial intelligence and bigdata. This will lead to huge reduction of defense budget while increasing combat power for implementing smart defense.

A Landscape of Joseon Dynasty in Late 19th Century through Experience Record of Modern Westerners - Focused on Landscape Vocabulary and Content Analysis - (근대기 서양인들의 조선견문기를 통해 본 19세기 말 조선의 경관 - 경관 관련 어휘와 내용 분석을 중심으로 -)

  • Kim, Dong-Hyun;Shin, Hyun-Sil
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.36 no.1
    • /
    • pp.20-33
    • /
    • 2018
  • This study aims to illuminated landscape of Joseon Dynasty in the end of 19th century when Joseon dynasty began to modernize through the perspective of Westerners. Historical meaning to Western people's landscape records has been preceded. And landscape typology and their perception were analyzed. The results were as follows. First, the Westerners who visited Joseon dynasty at that time were involved in the historical and political situation of the Joseon Dynasty or understood their culture through traveling for so long. And record of Westerners is a significant data to analyze scenery at that time because common contents appear in various books. Second, the landscape of Joseon dynasty that appears in Western records was mainly recorded in small towns and villages, natural environments, scenic sites, historic sites, modern facilities, and cultivated areas. Small towns and villages are mainly mentioned with shabby alleys and dense houses. And natural landscape were identified to mountain landscapes and diverse geomorphological landscape that surrounding vegetation along the coast and rivers. The palaces, fortress and temples were recorded as main objects of scenic sites and historic site. And western-style buildings such as foreign legations and settlements, churches and schools were mentioned in the modernized facilities. A cultivated land was confirmed to be underdeveloped and neglected, but as range of view became wider, it was seen to a peaceful and prosperous rural landscape. Third, Westerners' landscape perception of Joseon dynasty at that time can be deduced from positive or negative perceptions. The residential environment was perceived as negative because it was unsanitary and backward. On the contrary, outstanding natural landscapes, scenic sites and historic sites, and upper class gardens were perceived as positive. For modernized landscapes, positive and negative perceptions were similarly mentioned. Positive perceptions were formed in improvement of civilized landscape, and appeared negative perception because damaged traditional landscapes and heterogeneity.

Degree of Self-Understanding Through "Self-Guided Interpretation" in Yeoncheon, Hantan River UNESCO Geopark: Focusing on Readability and Curriculum Relevance (한탄강 세계지질공원 연천 지역의 자기-안내식 해설 매체를 통한 스스로 이해 가능 정도: 이독성과 교육과정 관련성을 중심으로)

  • Min Ji Kim;Chan-Jong Kim;Eun-Jeong Yu
    • Journal of the Korean earth science society
    • /
    • v.44 no.6
    • /
    • pp.655-674
    • /
    • 2023
  • This study examined whether the "self-guided interpretation" media in the Yeoncheon area of the Hantangang River UNESCO Geopark are intelligible for visitors. Accordingly, two on-site investigations were conducted in the Hantangang River Global Geopark in September and November 2022. The Yeoncheon area, known for its diverse geological features and the era of geological attraction formation, was selected for analysis. We analyzed the readability levels, graphic characteristics, and alignment with science curriculum of the interpretive media specific to geological sites among a total of 36 self-guided interpretive media in the Yeoncheon area. Results indicated that information boards, primarily offering guidance on geological attractions, were the most prevalent type of interpretive media in the Yeoncheon area. The quantity of text in explanatory media surpassed that of a 12th-grade science textbook. The average vocabulary grade was similar to that of 11th- and 12th-grade science textbooks, with somewhat reduced readability due to a high occurrence of complex sentences. Predominant graphic types included illustrative photographs, aiding comprehension of the geological formation process through multi-structure graphics. Regarding scientific terms used in the interpretive media, 86.3% of the terms were within the "Solid Earth" section of the 2015 revised curriculum, with the majority being at the 4th-grade level. The 11th-grade optional curriculum terms comprised the second largest portion, and 13.7% of all science terms were from outside the curriculum. Notably, variations in the scientific terminology's complexity was based on geological attractions. Specifically, the terminology level on the homepage tended to be generally higher than that on information boards. Through these findings, specific factors impeding visitor comprehension of geological attractions in the Yeoncheon area, based on the interpretation medium, were identified. We suggest further research to effect improvements in self-guided interpretation media, fostering geological resource education for general visitors and anticipating advancements in geology education.

Intelligent VOC Analyzing System Using Opinion Mining (오피니언 마이닝을 이용한 지능형 VOC 분석시스템)

  • Kim, Yoosin;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.113-125
    • /
    • 2013
  • Every company wants to know customer's requirement and makes an effort to meet them. Cause that, communication between customer and company became core competition of business and that important is increasing continuously. There are several strategies to find customer's needs, but VOC (Voice of customer) is one of most powerful communication tools and VOC gathering by several channels as telephone, post, e-mail, website and so on is so meaningful. So, almost company is gathering VOC and operating VOC system. VOC is important not only to business organization but also public organization such as government, education institute, and medical center that should drive up public service quality and customer satisfaction. Accordingly, they make a VOC gathering and analyzing System and then use for making a new product and service, and upgrade. In recent years, innovations in internet and ICT have made diverse channels such as SNS, mobile, website and call-center to collect VOC data. Although a lot of VOC data is collected through diverse channel, the proper utilization is still difficult. It is because the VOC data is made of very emotional contents by voice or text of informal style and the volume of the VOC data are so big. These unstructured big data make a difficult to store and analyze for use by human. So that, the organization need to automatic collecting, storing, classifying and analyzing system for unstructured big VOC data. This study propose an intelligent VOC analyzing system based on opinion mining to classify the unstructured VOC data automatically and determine the polarity as well as the type of VOC. And then, the basis of the VOC opinion analyzing system, called domain-oriented sentiment dictionary is created and corresponding stages are presented in detail. The experiment is conducted with 4,300 VOC data collected from a medical website to measure the effectiveness of the proposed system and utilized them to develop the sensitive data dictionary by determining the special sentiment vocabulary and their polarity value in a medical domain. Through the experiment, it comes out that positive terms such as "칭찬, 친절함, 감사, 무사히, 잘해, 감동, 미소" have high positive opinion value, and negative terms such as "퉁명, 뭡니까, 말하더군요, 무시하는" have strong negative opinion. These terms are in general use and the experiment result seems to be a high probability of opinion polarity. Furthermore, the accuracy of proposed VOC classification model has been compared and the highest classification accuracy of 77.8% is conformed at threshold with -0.50 of opinion classification of VOC. Through the proposed intelligent VOC analyzing system, the real time opinion classification and response priority of VOC can be predicted. Ultimately the positive effectiveness is expected to catch the customer complains at early stage and deal with it quickly with the lower number of staff to operate the VOC system. It can be made available human resource and time of customer service part. Above all, this study is new try to automatic analyzing the unstructured VOC data using opinion mining, and shows that the system could be used as variable to classify the positive or negative polarity of VOC opinion. It is expected to suggest practical framework of the VOC analysis to diverse use and the model can be used as real VOC analyzing system if it is implemented as system. Despite experiment results and expectation, this study has several limits. First of all, the sample data is only collected from a hospital web-site. It means that the sentimental dictionary made by sample data can be lean too much towards on that hospital and web-site. Therefore, next research has to take several channels such as call-center and SNS, and other domain like government, financial company, and education institute.