• Title/Summary/Keyword: Data dictionary

Search Result 346, Processing Time 0.028 seconds

The Study of Developing Korean SentiWordNet for Big Data Analytics : Focusing on Anger Emotion (빅데이터 분석을 위한 한국어 SentiWordNet 개발 방안 연구 : 분노 감정을 중심으로)

  • Choi, Sukjae;Kwon, Ohbyung
    • The Journal of Society for e-Business Studies
    • /
    • v.19 no.4
    • /
    • pp.1-19
    • /
    • 2014
  • Efforts to identify user's recognition which exists in the big data are being conducted actively. They try to measure scores of people's view about products, movies and social issues by analyzing statements raised on Internet bulletin boards or SNS. So this study deals with the problem of determining how to find the emotional vocabulary and the degree of these values. The survey methods are using the results of previous studies for the basic emotional vocabulary and degree, and inferring from the dictionary's glosses for the extended emotional vocabulary. The results were found to have the 4 emotional words lists (vocabularies) as basic emotional list, extended 1 stratum 1 level list from basic vocabulary's glosses, extended 2 stratum 1 level list from glosses of non-emotional words, and extended 2 stratum 2 level list from glosses' glosses. And we obtained the emotional degrees by applying the weight of the sentences and the emphasis multiplier values on the basis of basic emotional list. Experimental results have been identified as AND and OR sentence having a weight of average degree of included words. And MULTIPLY sentence having 1.2 to 1.5 weight depending on the type of adverb. It is also assumed that NOT sentence having a certain degree by reducing and reversing the original word's emotional degree. It is also considered that emphasis multiplier values have 2 for 1 stratum and 3 for 2 stratum.

Development of Beauty Experience Pattern Map Based on Consumer Emotions: Focusing on Cosmetics (소비자 감성 기반 뷰티 경험 패턴 맵 개발: 화장품을 중심으로)

  • Seo, Bong-Goon;Kim, Keon-Woo;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.179-196
    • /
    • 2019
  • Recently, the "Smart Consumer" has been emerging. He or she is increasingly inclined to search for and purchase products by taking into account personal judgment or expert reviews rather than by relying on information delivered through manufacturers' advertising. This is especially true when purchasing cosmetics. Because cosmetics act directly on the skin, consumers respond seriously to dangerous chemical elements they contain or to skin problems they may cause. Above all, cosmetics should fit well with the purchaser's skin type. In addition, changes in global cosmetics consumer trends make it necessary to study this field. The desire to find one's own individualized cosmetics is being revealed to consumers around the world and is known as "Finding the Holy Grail." Many consumers show a deep interest in customized cosmetics with the cultural boom known as "K-Beauty" (an aspect of "Han-Ryu"), the growth of personal grooming, and the emergence of "self-culture" that includes "self-beauty" and "self-interior." These trends have led to the explosive popularity of cosmetics made in Korea in the Chinese and Southeast Asian markets. In order to meet the customized cosmetics needs of consumers, cosmetics manufacturers and related companies are responding by concentrating on delivering premium services through the convergence of ICT(Information, Communication and Technology). Despite the evolution of companies' responses regarding market trends toward customized cosmetics, there is no "Intelligent Data Platform" that deals holistically with consumers' skin condition experience and thus attaches emotions to products and services. To find the Holy Grail of customized cosmetics, it is important to acquire and analyze consumer data on what they want in order to address their experiences and emotions. The emotions consumers are addressing when purchasing cosmetics varies by their age, sex, skin type, and specific skin issues and influences what price is considered reasonable. Therefore, it is necessary to classify emotions regarding cosmetics by individual consumer. Because of its importance, consumer emotion analysis has been used for both services and products. Given the trends identified above, we judge that consumer emotion analysis can be used in our study. Therefore, we collected and indexed data on consumers' emotions regarding their cosmetics experiences focusing on consumers' language. We crawled the cosmetics emotion data from SNS (blog and Twitter) according to sales ranking ($1^{st}$ to $99^{th}$), focusing on the ample/serum category. A total of 357 emotional adjectives were collected, and we combined and abstracted similar or duplicate emotional adjectives. We conducted a "Consumer Sentiment Journey" workshop to build a "Consumer Sentiment Dictionary," and this resulted in a total of 76 emotional adjectives regarding cosmetics consumer experience. Using these 76 emotional adjectives, we performed clustering with the Self-Organizing Map (SOM) method. As a result of the analysis, we derived eight final clusters of cosmetics consumer sentiments. Using the vector values of each node for each cluster, the characteristics of each cluster were derived based on the top ten most frequently appearing consumer sentiments. Different characteristics were found in consumer sentiments in each cluster. We also developed a cosmetics experience pattern map. The study results confirmed that recommendation and classification systems that consider consumer emotions and sentiments are needed because each consumer differs in what he or she pursues and prefers. Furthermore, this study reaffirms that the application of emotion and sentiment analysis can be extended to various fields other than cosmetics, and it implies that consumer insights can be derived using these methods. They can be used not only to build a specialized sentiment dictionary using scientific processes and "Design Thinking Methodology," but we also expect that these methods can help us to understand consumers' psychological reactions and cognitive behaviors. If this study is further developed, we believe that it will be able to provide solutions based on consumer experience, and therefore that it can be developed as an aspect of marketing intelligence.

A Research in Applying Big Data and Artificial Intelligence on Defense Metadata using Multi Repository Meta-Data Management (MRMM) (국방 빅데이터/인공지능 활성화를 위한 다중메타데이터 저장소 관리시스템(MRMM) 기술 연구)

  • Shin, Philip Wootaek;Lee, Jinhee;Kim, Jeongwoo;Shin, Dongsun;Lee, Youngsang;Hwang, Seung Ho
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.169-178
    • /
    • 2020
  • The reductions of troops/human resources, and improvement in combat power have made Korean Department of Defense actively adapt 4th Industrial Revolution technology (Artificial Intelligence, Big Data). The defense information system has been developed in various ways according to the task and the uniqueness of each military. In order to take full advantage of the 4th Industrial Revolution technology, it is necessary to improve the closed defense datamanagement system.However, the establishment and usage of data standards in all information systems for the utilization of defense big data and artificial intelligence has limitations due to security issues, business characteristics of each military, anddifficulty in standardizing large-scale systems. Based on the interworking requirements of each system, data sharing is limited through direct linkage through interoperability agreement between systems. In order to implement smart defense using the 4th Industrial Revolution technology, it is urgent to prepare a system that can share defense data and make good use of it. To technically support the defense, it is critical to develop Multi Repository Meta-Data Management (MRMM) that supports systematic standard management of defense data that manages enterprise standard and standard mapping for each system and promotes data interoperability through linkage between standards which obeys the Defense Interoperability Management Development Guidelines. We introduced MRMM, and implemented by using vocabulary similarity using machine learning and statistical approach. Based on MRMM, We expect to simplify the standardization integration of all military databases using artificial intelligence and bigdata. This will lead to huge reduction of defense budget while increasing combat power for implementing smart defense.

The Evaluation of Youth Overeducation and its Impact on the Wage System in Korea (청년층 학력과잉이 임금에 미치는 영향에 대한 분석 - 경제위기 전·후를 중심으로 -)

  • Park, Sung-Joon;Hwang, Sang-In
    • Journal of Labour Economics
    • /
    • v.28 no.3
    • /
    • pp.141-166
    • /
    • 2005
  • The purpose of this study is to evaluate the status of youth overeducation and to analyze the impact on the wage system, before and after the financial crisis. In this study, we adapt the following method; first, we investigate the year 1996 (before financial crisis) and year 2000 (after financial crisis) data from "the Survey Report on the Wage structure", based on the data from "the Occupational Dictionary" by occupation group. So we could evaluate the difference between the youth over-educational status, before and after financial crisis. Second, we analyze the reason why the difference occurs, with financial crisis dummy variable and other variables such as sex, occupation, industry. Third, we try to find the difference between the impact of the overeducation on the wage rate, before and after financial crisis. The main findings are as follows; first, the degree of overeducation in year 2000 is more than in year 1996. So the financial crisis plays the important role in deepening the degree of overeducation. Second, the wage rate of the overeducated worker is higher than that of the required-educated worker. Also, the both wage rates are increased after financial crisis. However, the difference of both wage rates' has declined over the financial crisis. Such a finding means that even though the both wage rates of the overeducated and the required-educated worker are increased, the wage rate of the required-educated worker has increased much more than that of overeducated worker, after the financial crisis.

  • PDF

Trend Analysis of North Korean Forest Science Research (1962-2016) by Data Mining (데이터 마이닝을 활용한 북한 산림과학 연구 동향 분석(1962~2016))

  • Lim, Joongbin;Kim, Kyoung-Min;Kim, Myung-Kil;Yi, Jong Min;Park, Jin Woo
    • Journal of Korean Society of Forest Science
    • /
    • v.109 no.1
    • /
    • pp.81-98
    • /
    • 2020
  • In this study, forest-related research papers published in North Korean journals were analyzed to understand the research trends in North Korean forest science. The Korea Science and Technology Information Institute (KISTI) North Korea Science and Technology Network (NKtech) is constructing a database related to science and technology in North Korea. From this, a total of 1,389 articles published from 1962 to 2016 were collected with forest science key words based on the South Korean National Science and Technology Standard Classification System. The topics were divided into four categories: afforestation, forest protection, forest use, and forest management. In the field of afforestation, research activities on nursery and agroforestry were active, and the survival rate was emphasized. In the forest protection field, there was a significant research effort into forest pests, and efforts were being made to reduce soil erosion through agroforestry. In the field of forest use, research activities on pulp/paper and mushrooms were active. In the forest management field, activities related to "ecological information" were conspicuous, and efforts were being made to reduce carbon. These results suggest that the perspective of North Korean forest research has changed from nature reorganization to nature protection. Thus, a comparative study on forest science and technology in each sub-sector of the forest research field, along with analysis of the relationship between policy direction and research direction of North Korea over time, would be worthwhile future investigations. To overcome the problem of technical terminology, a compilation/dictionary of inter-Korean forestry terminology would be useful for effective communication between the two Koreas.

Automatic Recommendation of Nearby Tourist Attractions related to Events (이벤트와 관련된 주변 관광지 자동 추천 알고리즘 개발)

  • Ahn, Jinhyun;Im, Dong-Hyuk
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.3
    • /
    • pp.407-413
    • /
    • 2020
  • Participating in exhibitions is one of the major activities for tourists. When selecting their next travel destination after participating in an event, they use map services and social network services, such as blogs, to obtain information about tourist attractions. The map services are location-based recommendations, because they can easily retrieve information regarding nearby places. Blogs contain informative content about tourist attractions, thereby providing content-based recommendations. However, few services consider both location and content. In location-based recommendations, tourist attractions that are not related to the content of the event attended might be recommended. Content-based recommendation has a disadvantage in that events located at a distance might get recommended. We propose an algorithm that considers both location and content, based on information from the Korea Tourism Organization's Linked Open Data (LOD), Wikipedia, and a Korean dictionary. By extracting nouns from the description of a tourist attraction and then comparing them with nouns about other attractions, a content-based relationship is determined. The distance to the event is calculated based on the latitude and longitude of each tourist attraction. A weight selected by the user is used for linear combination with the content-based relationship to determine the preference order of the recommendations.

An Analysis Study on the Contents of Occupation in Technology & Home Economics Textbooks for Middle School : focusing on preparation for Low Birthrate & Aging Society (저출산·고령사회 대비 관점에서 중학교 기술·가정 교과서에 제시된 직업 내용 분석)

  • Lee, Soo-jeong
    • Journal of vocational education research
    • /
    • v.37 no.1
    • /
    • pp.139-156
    • /
    • 2018
  • This study analyzed the aspect of occupational contents shown in total 24 types of Technology & Home Economics (1). (2) textbooks for middle school in accordance with the 2009 revised curriculum. Analyzing the type of occupation shown in textbooks based on the Korean Standard Classification of Occupations(hierarchical classification), the frequency(ratio), and the aspect of occupational contents in each unit of textbooks and each data type, this study provided basic data to be able to understand diverse aspects of occupational contents. In the results of study, in case of Technology & Home Economics textbooks for middle school, the large area of 'home life' presented occupational contents in the relatively high ratio than the 'world of technology' while the frequency(ratio) of occupational contents was very much different in each publisher, large area, and unit. The occupation name presented in textbooks provided very limited information in the level of 5.27% of occupations presented in the Korea Dictionary of Occupations. Especially, providing occupational information concentrated in professionals & relevant practiceans in the type of the Korean Standard Classification of Occupations(hierarchical classification), it was limited to provide opportunities to learn the diversity related to occupation. Based on such results of study, on top of introducing diverse occupational contents to make students cherish all the occupations, it would be also necessary to seek for institutional measures related to textbook development/career education, so that they could explore career by considering their aptitude and interest.

A Method for Prediction of Quality Defects in Manufacturing Using Natural Language Processing and Machine Learning (자연어 처리 및 기계학습을 활용한 제조업 현장의 품질 불량 예측 방법론)

  • Roh, Jeong-Min;Kim, Yongsung
    • Journal of Platform Technology
    • /
    • v.9 no.3
    • /
    • pp.52-62
    • /
    • 2021
  • Quality control is critical at manufacturing sites and is key to predicting the risk of quality defect before manufacturing. However, the reliability of manual quality control methods is affected by human and physical limitations because manufacturing processes vary across industries. These limitations become particularly obvious in domain areas with numerous manufacturing processes, such as the manufacture of major nuclear equipment. This study proposed a novel method for predicting the risk of quality defects by using natural language processing and machine learning. In this study, production data collected over 6 years at a factory that manufactures main equipment that is installed in nuclear power plants were used. In the preprocessing stage of text data, a mapping method was applied to the word dictionary so that domain knowledge could be appropriately reflected, and a hybrid algorithm, which combined n-gram, Term Frequency-Inverse Document Frequency, and Singular Value Decomposition, was constructed for sentence vectorization. Next, in the experiment to classify the risky processes resulting in poor quality, k-fold cross-validation was applied to categorize cases from Unigram to cumulative Trigram. Furthermore, for achieving objective experimental results, Naive Bayes and Support Vector Machine were used as classification algorithms and the maximum accuracy and F1-score of 0.7685 and 0.8641, respectively, were achieved. Thus, the proposed method is effective. The performance of the proposed method were compared and with votes of field engineers, and the results revealed that the proposed method outperformed field engineers. Thus, the method can be implemented for quality control at manufacturing sites.

Relationship Analysis between the Box Office Performance and Sentimental Words in Movie Review (영화의 흥행 성과와 리뷰 감정어휘와의 관계 분석)

  • Mun, Seong Min;Ha, Hyo Ji;Lee, Kyung Won
    • Design Convergence Study
    • /
    • v.14 no.4
    • /
    • pp.1-16
    • /
    • 2015
  • This study aims to understand distribution of the sentimental words on each genre and find relationship between box office performance and sentimental words in movie review using 673 movies that have more than 1,000 reviews. For the analysis, crawling movie reviews and made data was composed movie genre, movie name, sales, attendance, screen, normal attendance, 7 sentimental words. For analysis results, we used correlation analysis and Parallel coordinates. As a results, First, the highest box office value of the genre is comedy and the lowest box office value of the genre is horror through analyze box office on each genre. Secondly, Movie genre of fantasy feel a lot of boring emotion and Movie genre of SF feel a lot of anger emotion even if 'Happy' and 'Surprise' have highest sentiment value on every genre. Third, We found 'Anger' increase sentimental value when 'Disgust' increase sentimental value and 'Surprise' decrease sentimental value when 'Happy' increase sentimental value through analyze correlation relationship between sentimental words using total data. Fourth, We found 'Happy' have linear relationship between box office and 'Fear' have non-linear relationship between box office through analyze sentimental words according to box office performance.

A Study on the Sensibility Analysis of School Life and the Will to Farming of Students at Korea National College of Agricultural and Fisheries (한국농수산대학 재학생의 학교생활 감성 분석 및 영농의지에 관한 연구)

  • Joo, J.S.;Lee, S.Y.;Kim, J.S.;Shin, Y.K.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.21 no.2
    • /
    • pp.103-114
    • /
    • 2019
  • In this study we examined the preferences of college life factors for students at Korea National College of Agriculture and Fisheries(KNCAF). Analytical techniques of unstructured data used opinion mining and text mining techniques, and the results of text mining were visualized as word cloud. And those results were used for statistical analysis of the students' willingness to farm after graduation. The items of the favorable survey consisted of 10 items in 5 areas including university image, self-capacity, dormitory, education system, and future vision. After classifying the emotions of positive and negative in the collected questionnaire, a dictionary of positive and negative was created to evaluate the preference. The items of 'college image' at the time of university support, 'self after 10 years' after graduation, 'self-capacity' and 'present KNCAF' showed high positive emotion. On the other hand, positive emotion was low in the items of 'college dormitory', 'educational course', 'long-term field practice' and 'future of Korean agriculture'. In the cross-analysis of the difference in the will to farming according to gender, farming base, and entrance motivation, the will to farm according to gender and entrance motivation showed statistically significant results, but it was not significant in farming base. Also in binary logistic regression analysis on the will to farming, the statistically significant variable was found to be 'motivation for admission'