• Title/Summary/Keyword: 텍스트 처리

Search Result 1,355, Processing Time 0.03 seconds

Analysis of Social Trends for Electric Scooters Using Dynamic Topic Modeling and Sentiment Analysis (동적 토픽 모델링과 감성 분석을 활용한 전동킥보드에 대한 사회적 동향 분석)

  • Kyoungok, Kim;Yerang, Shin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.1
    • /
    • pp.19-30
    • /
    • 2023
  • An electric scooter(e-scooter), one popularized micro-mobility vehicle has shown rapidly increasing use in many cities. In South Korea, the use of e-scooters has greatly increased, as some companies have launched e-scooter sharing services in a few large cities, starting with Seoul in 2018. However, the use of e-scooters is still controversial because of issues such as parking and safety. Since the perception toward the means of transportation affects the mode choice, it is necessary to track the trends for electric scooters to make the use of e-scooters more active. Hence, this study aimed to analyze the trends related to e-scooters. For this purpose, we analyzed news articles related to e-scooters published from 2014 to 2020 using dynamic topic modeling to extract issues and sentiment analysis to investigate how the degree of positive and negative opinions in news articles had changed. As a result of topic modeling, it was possible to extract three different topics related to micro-mobility technologies, shared e-scooter services, and regulations for micro-mobility, and the proportion of the topic for regulations for micro-mobility increased as shared e-scooter services increased in recent years. In addition, the top positive words included quick, enjoyable, and easy, whereas the top negative words included threat, complaint, and ilegal, which implies that people satisfied with the convenience of e-scooter or e-scooter sharing services, but safety and parking issues should be addressed for micro-mobility services to become more active. In conclusion, this study was able to understand how issues and social trends related to e-scooters have changed, and to determine the issues that need to be addressed. Moreover, it is expected that the research framework using dynamic topic modeling and sentiment analysis will be helpful in determining social trends on various areas.

Export Control System based on Case Based Reasoning: Design and Evaluation (사례 기반 지능형 수출통제 시스템 : 설계와 평가)

  • Hong, Woneui;Kim, Uihyun;Cho, Sinhee;Kim, Sansung;Yi, Mun Yong;Shin, Donghoon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.109-131
    • /
    • 2014
  • As the demand of nuclear power plant equipment is continuously growing worldwide, the importance of handling nuclear strategic materials is also increasing. While the number of cases submitted for the exports of nuclear-power commodity and technology is dramatically increasing, preadjudication (or prescreening to be simple) of strategic materials has been done so far by experts of a long-time experience and extensive field knowledge. However, there is severe shortage of experts in this domain, not to mention that it takes a long time to develop an expert. Because human experts must manually evaluate all the documents submitted for export permission, the current practice of nuclear material export is neither time-efficient nor cost-effective. Toward alleviating the problem of relying on costly human experts only, our research proposes a new system designed to help field experts make their decisions more effectively and efficiently. The proposed system is built upon case-based reasoning, which in essence extracts key features from the existing cases, compares the features with the features of a new case, and derives a solution for the new case by referencing similar cases and their solutions. Our research proposes a framework of case-based reasoning system, designs a case-based reasoning system for the control of nuclear material exports, and evaluates the performance of alternative keyword extraction methods (full automatic, full manual, and semi-automatic). A keyword extraction method is an essential component of the case-based reasoning system as it is used to extract key features of the cases. The full automatic method was conducted using TF-IDF, which is a widely used de facto standard method for representative keyword extraction in text mining. TF (Term Frequency) is based on the frequency count of the term within a document, showing how important the term is within a document while IDF (Inverted Document Frequency) is based on the infrequency of the term within a document set, showing how uniquely the term represents the document. The results show that the semi-automatic approach, which is based on the collaboration of machine and human, is the most effective solution regardless of whether the human is a field expert or a student who majors in nuclear engineering. Moreover, we propose a new approach of computing nuclear document similarity along with a new framework of document analysis. The proposed algorithm of nuclear document similarity considers both document-to-document similarity (${\alpha}$) and document-to-nuclear system similarity (${\beta}$), in order to derive the final score (${\gamma}$) for the decision of whether the presented case is of strategic material or not. The final score (${\gamma}$) represents a document similarity between the past cases and the new case. The score is induced by not only exploiting conventional TF-IDF, but utilizing a nuclear system similarity score, which takes the context of nuclear system domain into account. Finally, the system retrieves top-3 documents stored in the case base that are considered as the most similar cases with regard to the new case, and provides them with the degree of credibility. With this final score and the credibility score, it becomes easier for a user to see which documents in the case base are more worthy of looking up so that the user can make a proper decision with relatively lower cost. The evaluation of the system has been conducted by developing a prototype and testing with field data. The system workflows and outcomes have been verified by the field experts. This research is expected to contribute the growth of knowledge service industry by proposing a new system that can effectively reduce the burden of relying on costly human experts for the export control of nuclear materials and that can be considered as a meaningful example of knowledge service application.

A Comparative Study on the Acceptability and the Consumption Attitude for Soy Foods between Korean and Canadian University Students (한국과 캐나다 대학생들의 콩가공식품에 대한 수응도 및 소비실태 비교 연구)

  • Ahn Tae-Hyun;Paliyath Gopinadhan
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.51 no.5
    • /
    • pp.466-476
    • /
    • 2006
  • The objective of this study was to compare and analyze the acceptability and consumption attitude for soy foods between Korean and Canadian university students as young consumers. This survey was carried out by questionnaire and the subjects were n=516 in Korea and n=502 in Canada. Opinions for soy foods in terms of general knowledge were that soy foods are healthy (86.5% in Korean and 53.4% in Canadian) or neutral (11.6% in Korean and 42.8% in Canadian), dairy foods can be substituted by soy foods (51.9% in Korean and 41.8% in Canadian), and soy foods are not only for vegetarians and milk allergy Patients but also for ordinary People (94.2% in Korean and 87.6% in Canadian). In main sources of information about soy foods, the rate by commercials on TV, radio or magazine was the highest (58.0%) for Korean students and the rate by family or friend was the highest(35.7%) for Canadian students. In consumption attitude, all of Korean students have purchased soy foods but only 55.4% of Canadian students have purchased soy foods, and soymilk was remarkably recognized and consumed then soy beverage and margarine in order. 76.4% of Korean students and 65.1% of Canadian students think soy foods are general and popular and can purchase easily, otherwise, in terms of price, soy foods were expensively recognized as 'more expensive than dairy foods' was 59.1% (Korean) and 54.7% (Canadian), and 'similar to dairy foods' was 36.8% (Korean) and 39.9% (Canadian). Major reasons for the rare consumption were 'I am not interested in soy foods' in Korean students (27.3%) and 'I prefer dairy foods to soy foods' in Canadian students (51.7%). However, consumption of soy foods in both countries are very positive and it will be increased.

A Study on Automatic Classification Model of Documents Based on Korean Standard Industrial Classification (한국표준산업분류를 기준으로 한 문서의 자동 분류 모델에 관한 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.221-241
    • /
    • 2018
  • As we enter the knowledge society, the importance of information as a new form of capital is being emphasized. The importance of information classification is also increasing for efficient management of digital information produced exponentially. In this study, we tried to automatically classify and provide tailored information that can help companies decide to make technology commercialization. Therefore, we propose a method to classify information based on Korea Standard Industry Classification (KSIC), which indicates the business characteristics of enterprises. The classification of information or documents has been largely based on machine learning, but there is not enough training data categorized on the basis of KSIC. Therefore, this study applied the method of calculating similarity between documents. Specifically, a method and a model for presenting the most appropriate KSIC code are proposed by collecting explanatory texts of each code of KSIC and calculating the similarity with the classification object document using the vector space model. The IPC data were collected and classified by KSIC. And then verified the methodology by comparing it with the KSIC-IPC concordance table provided by the Korean Intellectual Property Office. As a result of the verification, the highest agreement was obtained when the LT method, which is a kind of TF-IDF calculation formula, was applied. At this time, the degree of match of the first rank matching KSIC was 53% and the cumulative match of the fifth ranking was 76%. Through this, it can be confirmed that KSIC classification of technology, industry, and market information that SMEs need more quantitatively and objectively is possible. In addition, it is considered that the methods and results provided in this study can be used as a basic data to help the qualitative judgment of experts in creating a linkage table between heterogeneous classification systems.

Analysis of Household Textbooks for MiddleㆍHigh School in Colonial Age (식민지 시대 '가사교과서'에 관한 연구: 1930년대를 중심으로)

  • Jun Mi-Kyung
    • Journal of Korean Home Economics Education Association
    • /
    • v.16 no.3
    • /
    • pp.1-25
    • /
    • 2004
  • This study analyzes the external forms of the household textbooks and also the contents of them used at girls' middleㆍhigh schools during the period of Japanese ruling over Korea. To this end, 8 household textbooks published from 1928 to 1937 were analyzed. The results of the study are summarized as follows. 1. The household subject had become the one of the most important subjects to girl students as the practical uses were emphasized in educational area during the period. As a result. the classes of the household were the second in hours, following the class of Japanese (the national language) to girl students. 2. The contents of the household textbooks were intended to contain 'the modern' and 'the newest'. The students were also suggested to apply the contents of the textbooks to real home life. Many pictures, photos and illustrations were included in household textbooks to help students to understand the contents of the subject. 3. The purposes of the household class were the reformation of the living conditions and home economics. 4. The external characteristics of the household textbooks during the period were as follows. - Written in Japanese vertically and the size of the textbook was A5 (150/210) with pulp paper of good quality - The type style of the body of the textbooks was Ming-style type- The sequent order of the textbooks was the outer cover, the title page, pictorial, introduction, table of contents, the body, appendix and the back cover. 5. The household textbooks consisted of the first volume and the second volume. The first volume contained clothing and textiles, food and nutrition and housing. Taking care of the aged. nursing. child care, household economy and home management were included in the second volume. 6. The household textbooks were designed to make women the housewives.

  • PDF

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.1
    • /
    • pp.47-60
    • /
    • 2010
  • Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.

  • PDF

Dam-Heon Hong Dae Yong's : A part of the BukHak School's Understanding on The Great Learning (담헌 홍대용의 <대학문의(大學問疑)> : 북학파의 『대학』 이해의 일단(一端))

  • Ahn, Woe Soon
    • (The)Study of the Eastern Classic
    • /
    • no.33
    • /
    • pp.385-411
    • /
    • 2008
  • This thesis aims at examining his understandings about the The Great Learning centering on the of the founder of the YiYongHuSaeng School (利用厚生學派: A school who pursued prosperous economy and welfare of people) orthe BukHak School(北學派: A positive school that pursued mercantilism) from the Joseon Dynasty, Dam-Heon Hong Dae Yong (1731-1783). 1) From what is indicated in the , his studies mainly focused on the annotations from DaeHakJangGuDaeJeonJipJu ("大學章句大全集註": A book that edited different phrases from the Great Learning into chapters and paragraphs), Questioning of the Great Learning", and "JuJaUhRyu (朱子語類: A book of Confucian literature written by Yeo Jung Deok" of Zhu Xi who was a representative scholar of the Neo-Confucianism in the Song Dynasty. 2) Acknowledging entirely the arguing points of Zhu Xi, he is taking a critical position in a way that partial doubts are divided into seven chapters and questioned. 3) For the main characteristic and direction of the questioning, he is estimating that Zhu Xi only stressed the 'means' and 'interior' out of the world of 'means and ends' and 'interior and exterior' in Zhu Xi's recognizing and handling cases; instead, he emphasized putting equivalent value on the 'ends' and 'exterior' as well. 4) In fact, such questions raised partially were misconceived since they were not carried out through profound understanding nor systematic logic expansion of what Zhu Xi insisted. 5) Despite this, at the point where Neo-Confucian thoughts were fixed and weakened only with its form left in the late Joseon Dynasty, his perspective on the study of Confucian classics that the 'ends' and 'exterior' should be as equally valued as the 'means' and 'interior' by examining through the core text of the Neo-Confucianism, The Great Learning has its significance in his YiYongHuSaeng dogma that says politicians, by all means, should provide the ruled with economic convenience and welfare and this is their very right virtue.

A Generalized Adaptive Deep Latent Factor Recommendation Model (일반화 적응 심층 잠재요인 추천모형)

  • Kim, Jeongha;Lee, Jipyeong;Jang, Seonghyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.249-263
    • /
    • 2023
  • Collaborative Filtering, a representative recommendation system methodology, consists of two approaches: neighbor methods and latent factor models. Among these, the latent factor model using matrix factorization decomposes the user-item interaction matrix into two lower-dimensional rectangular matrices, predicting the item's rating through the product of these matrices. Due to the factor vectors inferred from rating patterns capturing user and item characteristics, this method is superior in scalability, accuracy, and flexibility compared to neighbor-based methods. However, it has a fundamental drawback: the need to reflect the diversity of preferences of different individuals for items with no ratings. This limitation leads to repetitive and inaccurate recommendations. The Adaptive Deep Latent Factor Model (ADLFM) was developed to address this issue. This model adaptively learns the preferences for each item by using the item description, which provides a detailed summary and explanation of the item. ADLFM takes in item description as input, calculates latent vectors of the user and item, and presents a method that can reflect personal diversity using an attention score. However, due to the requirement of a dataset that includes item descriptions, the domain that can apply ADLFM is limited, resulting in generalization limitations. This study proposes a Generalized Adaptive Deep Latent Factor Recommendation Model, G-ADLFRM, to improve the limitations of ADLFM. Firstly, we use item ID, commonly used in recommendation systems, as input instead of the item description. Additionally, we apply improved deep learning model structures such as Self-Attention, Multi-head Attention, and Multi-Conv1D. We conducted experiments on various datasets with input and model structure changes. The results showed that when only the input was changed, MAE increased slightly compared to ADLFM due to accompanying information loss, resulting in decreased recommendation performance. However, the average learning speed per epoch significantly improved as the amount of information to be processed decreased. When both the input and the model structure were changed, the best-performing Multi-Conv1d structure showed similar performance to ADLFM, sufficiently counteracting the information loss caused by the input change. We conclude that G-ADLFRM is a new, lightweight, and generalizable model that maintains the performance of the existing ADLFM while enabling fast learning and inference.

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

The Usefulness of Product Display of Online Store by the Product Type of Usage Situation - Focusing on the moderate effect of the product portability - (사용상황별 제품유형에 따른 온라인 점포 제품디스플레이의 유용성 - 제품 휴대성의 조절효과를 중심으로 -)

  • Lee, Dong-Il;Choi, Seung-Hoon
    • Journal of Distribution Research
    • /
    • v.16 no.2
    • /
    • pp.1-24
    • /
    • 2011
  • 1. Introduction: Contrast to the offline purchasing environment, online store cannot offer the sense of touch or direct visual information of its product to the consumers. So the builder of the online shopping mall should provide more concrete and detailed product information(Kim 2008), and Alba (1997) also predicted that the quality of the offered information is determined by the post-purchase consumer satisfaction. In practice, many fashion and apparel online shopping malls offer the picture information with the product on the real person model to enhance the usefulness of product information. On the other virtual product experience has been suggested to the ways of overcoming the online consumers' limited perceptual capability (Jiang & Benbasat 2005). However, the adoption and the facilitation of the virtual reality tools requires high investment and technical specialty compared to the text/picture product information offerings (Shaffer 2006). This could make the entry barrier to the online shopping to the small retailers and sometimes it could be demanding high level of consumers' perceptual efforts. So the expensive technological solution could affects negatively to the consumer decision making processes. Nevertheless, most of the previous research on the online product information provision suggests the VR be the more effective tools. 2. Research Model and Hypothesis: Presented in

    , research model suggests VR effect could be moderated by the product types by the usage situations. Product types could be defined as the portable product and installed product, and the information offering type as still picture of the product, picture of the product with the real-person model and VR. 3. Methods and Results: 3.1. Experimental design and measured variables We designed the 2(product types) X 3(product information types) experimental setting and measured dependent variables such as information usefulness, attitude toward the shopping mall, overall product quality, purchase intention and the revisiting intention. In the case of information usefulness and attitude toward the shopping mall were measured by multi-item scale. As a result of reliability test, Cronbach's Alpha value of each variable shows more than 0.6. Thus, we ensured that the internal consistency of items. 3.2. Manipulation check The main concern of this study is to verify the moderate effect by the product type of usage situation. indicates that our experimental manipulation of the moderate effect of the product type was successful. 3.3. Results As
    indicates, there was a significant main effect on the only one dependent variable(attitude toward the shopping mall) by the information types. As predicted, VR has highest mean value compared to other information types. Thus, H1 was partially supported. However, main effect by the product types was not found. To evaluate H2 and H3, a two-way ANOVA was conducted. As
    indicates, there exist the interaction effects on the three dependent variables(information usefulness, overall product quality and purchase intention) by the information types and the product types. As predicted, picture of the product with the real-person model has highest mean among the information types in the case of portable product. On the other hand, VR has highest mean among the information types in the case of installed product. Thus, H2 and H3 was supported. 4. Implications: The present study found the moderate effect by the product type of usage situation. Based on the findings the following managerial implications are asserted. First, it was found that information types are affect only the attitude toward the shopping mall. The meaning of this finding is that VR effects are not enough to understand the product itself. Therefore, we must consider when and how to use this VR tools. Second, it was found that there exist the interaction effects on the information usefulness, overall product quality and purchase intention. This finding suggests that consideration of usage situation helps consumer's understanding of product and promotes their purchase intention. In conclusion, not only product attributes but also product usage situations must be fully considered by the online retailers when they want to meet the needs of consumers.

  • PDF

  • (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.