• Title/Summary/Keyword: Text Index

Search Result 270, Processing Time 0.026 seconds

Increasing Accuracy of Classifying Useful Reviews by Removing Neutral Terms (중립도 기반 선택적 단어 제거를 통한 유용 리뷰 분류 정확도 향상 방안)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.129-142
    • /
    • 2016
  • Customer product reviews have become one of the important factors for purchase decision makings. Customers believe that reviews written by others who have already had an experience with the product offer more reliable information than that provided by sellers. However, there are too many products and reviews, the advantage of e-commerce can be overwhelmed by increasing search costs. Reading all of the reviews to find out the pros and cons of a certain product can be exhausting. To help users find the most useful information about products without much difficulty, e-commerce companies try to provide various ways for customers to write and rate product reviews. To assist potential customers, online stores have devised various ways to provide useful customer reviews. Different methods have been developed to classify and recommend useful reviews to customers, primarily using feedback provided by customers about the helpfulness of reviews. Most shopping websites provide customer reviews and offer the following information: the average preference of a product, the number of customers who have participated in preference voting, and preference distribution. Most information on the helpfulness of product reviews is collected through a voting system. Amazon.com asks customers whether a review on a certain product is helpful, and it places the most helpful favorable and the most helpful critical review at the top of the list of product reviews. Some companies also predict the usefulness of a review based on certain attributes including length, author(s), and the words used, publishing only reviews that are likely to be useful. Text mining approaches have been used for classifying useful reviews in advance. To apply a text mining approach based on all reviews for a product, we need to build a term-document matrix. We have to extract all words from reviews and build a matrix with the number of occurrences of a term in a review. Since there are many reviews, the size of term-document matrix is so large. It caused difficulties to apply text mining algorithms with the large term-document matrix. Thus, researchers need to delete some terms in terms of sparsity since sparse words have little effects on classifications or predictions. The purpose of this study is to suggest a better way of building term-document matrix by deleting useless terms for review classification. In this study, we propose neutrality index to select words to be deleted. Many words still appear in both classifications - useful and not useful - and these words have little or negative effects on classification performances. Thus, we defined these words as neutral terms and deleted neutral terms which are appeared in both classifications similarly. After deleting sparse words, we selected words to be deleted in terms of neutrality. We tested our approach with Amazon.com's review data from five different product categories: Cellphones & Accessories, Movies & TV program, Automotive, CDs & Vinyl, Clothing, Shoes & Jewelry. We used reviews which got greater than four votes by users and 60% of the ratio of useful votes among total votes is the threshold to classify useful and not-useful reviews. We randomly selected 1,500 useful reviews and 1,500 not-useful reviews for each product category. And then we applied Information Gain and Support Vector Machine algorithms to classify the reviews and compared the classification performances in terms of precision, recall, and F-measure. Though the performances vary according to product categories and data sets, deleting terms with sparsity and neutrality showed the best performances in terms of F-measure for the two classification algorithms. However, deleting terms with sparsity only showed the best performances in terms of Recall for Information Gain and using all terms showed the best performances in terms of precision for SVM. Thus, it needs to be careful for selecting term deleting methods and classification algorithms based on data sets.

A Study for the Necessity of Terminology Standardization in Chuna Medicine (추나의학 용어 표준화 필요성 연구)

  • Kweon, Jeong-Ju;Kim, Min-Woo;Park, Kyung-Moo;Jang, Gun;Cho, Hyun-Cheul;Nam, Hang-Eoo;Shin, Byung-Cheul;Lim, Hyung-Ho;Song, Yun-Kyung
    • The Journal of Churna Manual Medicine for Spine and Nerves
    • /
    • v.7 no.1
    • /
    • pp.1-13
    • /
    • 2012
  • Objectives: Although chuna medicine has progressed distinguishingly, yet chuna medical terminology hasn't been standardized. So there are a lot of difficulties in translating chuna related book and their meaning cannot be conveyed properly. For this reason, we could say standardization of chuna medical terminology is very essential. Purpose of our study was to develope a standard database of concept terms for chuna medicine, in addition, we considered establishing fundamental principles of chuna medical terminology. Methods: To select standard chuna medical terms, we sorted important chuna medical index words. Then we sorted those words into a group that has same meanings and united to one single term. In the meantime, we extracted index words from 26 domestic and foreign manual technique related books and sorted them out and based on these word, we translated chuna medical terms to Korean terms. In the case of chuna technique terms, we searched chuna text books for term those were wrongly used, and corrected them by suggesting fundamental principles of terminology. Results: 664 chuna words were selected as standard chuna terms and have been translated to English terms. In the process, adscititious words such as anatomical terms and title of books were exempted and selected only important words that could be used as index of chuna terms. In deciding essential elements of chuna technique terms, patient position, contact point, segmental contact point, malposition, procedure method were selected. Conclusions: Correcting chuna medical terms in a sort period could cause confusion, but in long term perspective, in the aspect of conveying the meaning clearly and education purpose, standardizing of chuna medical terminology must be done. From this study, standardization of chuna medical terms were chosen in large category, but further studies must be followed in order to standardize terms of subdivisional categories.

  • PDF

An Index Structure for Substructure Searching In Chemical Databases (화학 데이타베이스에서 부분구조 검색을 위한 인덱스 구조)

  • Lee Hwangu;Cha Jaehyuk
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.641-649
    • /
    • 2004
  • The relationship between chemical structures and biological activities is researched briskly in the area of 'Medicinal Chemistry' At the base of these structure-based drug design tries, medicinal chemists search the existing drugs of similar chemical structure to target drug for the development of a new drug. Therefore, it is such necessary that an automatic system selects drug files that have a set of chemical moieties matching a user-defined query moiety. Substructure searching is the process of identifying a set of chemical moieties that match a specific query moiety. Testing for substructure searching was developed in the late 1950s. In graph theoretical terms, this problem corresponds to determining which graphs in a set are subgraph isomorphic to a specified query moiety. Testing for subgraph isomorphism has been proved, in the general case, to be an NP- complete problem. For the purpose of overcoming this difficulty, there were computational approaches. On the 1990s, a US patent has been granted on an atom-centered indexing scheme, used by the RS3 system; this has the virtue that the indexes generated can be searched by direct text comparison. This system is commercially used(http://www.acelrys.com/rs3). We define the RS3 system's drawback and present a new indexing scheme. The RS3 system treats substructure searching with substring matching by means of expressing chemical structure aspredefined strings. However, it has insufficient 'rerall' and 'precision‘ because it is impossible to index structures uniquely for same atom and same bond. To resolve this problem, we make the minimum-cost- spanning tree for one centered atom and describe a structure with paths per levels. Expressing 2D chemical structure into 1D a string has limit. Therefore, we break 2D chemical structure into 1D structure fragments. We present in this paper a new index technique to improve recall and precision surprisingly.

A Method for Evaluating News Value based on Supply and Demand of Information Using Text Analysis (텍스트 분석을 활용한 정보의 수요 공급 기반 뉴스 가치 평가 방안)

  • Lee, Donghoon;Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.45-67
    • /
    • 2016
  • Given the recent development of smart devices, users are producing, sharing, and acquiring a variety of information via the Internet and social network services (SNSs). Because users tend to use multiple media simultaneously according to their goals and preferences, domestic SNS users use around 2.09 media concurrently on average. Since the information provided by such media is usually textually represented, recent studies have been actively conducting textual analysis in order to understand users more deeply. Earlier studies using textual analysis focused on analyzing a document's contents without substantive consideration of the diverse characteristics of the source medium. However, current studies argue that analytical and interpretive approaches should be applied differently according to the characteristics of a document's source. Documents can be classified into the following types: informative documents for delivering information, expressive documents for expressing emotions and aesthetics, operational documents for inducing the recipient's behavior, and audiovisual media documents for supplementing the above three functions through images and music. Further, documents can be classified according to their contents, which comprise facts, concepts, procedures, principles, rules, stories, opinions, and descriptions. Documents have unique characteristics according to the source media by which they are distributed. In terms of newspapers, only highly trained people tend to write articles for public dissemination. In contrast, with SNSs, various types of users can freely write any message and such messages are distributed in an unpredictable way. Again, in the case of newspapers, each article exists independently and does not tend to have any relation to other articles. However, messages (original tweets) on Twitter, for example, are highly organized and regularly duplicated and repeated through replies and retweets. There have been many studies focusing on the different characteristics between newspapers and SNSs. However, it is difficult to find a study that focuses on the difference between the two media from the perspective of supply and demand. We can regard the articles of newspapers as a kind of information supply, whereas messages on various SNSs represent a demand for information. By investigating traditional newspapers and SNSs from the perspective of supply and demand of information, we can explore and explain the information dilemma more clearly. For example, there may be superfluous issues that are heavily reported in newspaper articles despite the fact that users seldom have much interest in these issues. Such overproduced information is not only a waste of media resources but also makes it difficult to find valuable, in-demand information. Further, some issues that are covered by only a few newspapers may be of high interest to SNS users. To alleviate the deleterious effects of information asymmetries, it is necessary to analyze the supply and demand of each information source and, accordingly, provide information flexibly. Such an approach would allow the value of information to be explored and approximated on the basis of the supply-demand balance. Conceptually, this is very similar to the price of goods or services being determined by the supply-demand relationship. Adopting this concept, media companies could focus on the production of highly in-demand issues that are in short supply. In this study, we selected Internet news sites and Twitter as representative media for investigating information supply and demand, respectively. We present the notion of News Value Index (NVI), which evaluates the value of news information in terms of the magnitude of Twitter messages associated with it. In addition, we visualize the change of information value over time using the NVI. We conducted an analysis using 387,014 news articles and 31,674,795 Twitter messages. The analysis results revealed interesting patterns: most issues show lower NVI than average of the whole issue, whereas a few issues show steadily higher NVI than the average.

Exploring Domestic ESG Research Trends: Focusing on Domestic Research on ESG from 2012 to 2021 (국내 ESG 연구동향 탐색: 2012~2021년 진행된 국내 학술연구 중심으로)

  • Park, Jae Hyun;Han, Hyang Won;Kim, Na Ra
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.17 no.1
    • /
    • pp.191-211
    • /
    • 2022
  • As the value of highly sustainable companies increases, ESG(Environmental, Social, and Governance) has emerged as the biggest topic of discussion for companies around the world. In addition, as domestically, more research is being done on ESG in line with global trends, it is necessary to examine ESG research trends. Accordingly, ESG academic papers that have been published for the past 10 years were collected for each year, and frequency analysis was conducted using text mining techniques regarding key themes and thesis titles. This paper analyzed the number of selected publications by year and the cumulated number of studies through bibliometric analysis. The findings suggested that the number of ESG papers is increasing each year and that academic interest in ESG-related issues continues to abound. Next, according to the results of frequency analysis of the keywords and titles of the research papers, the words- "ESG", "company", "society", "responsibility", "management", "investment", and "sustainability"- were extracted. This analysis identified the research fields and keywords that have been relevant to ESG in the past 10 years. As a result of comparing the major ESG issues presented in recent overseas studies and the common factors of the ESG key keywords presented in this study, it was confirmed that the environment is the focus of recent studies compared to previous studies. Third, it was found that the data used by domestic ESG studies mainly include the KEJI index, the KRX index, and the KCGS ESG evaluation index. After identifying the main research subjects of ESG papers, research found that 8 out of 152 domestic ESG studies were focused on SMEs. Through this study, it was possible to confirm the ESG research trend and increase in research, and future researchers divided the research topics and research keywords and presented basic data for selecting more diverse research topics. Based on both, the arguments of previous ESG studies conducted on SMEs and the results of this study, there is a lack of studies on guidelines for ESG practice and their application to SMEs, and more ESG research regarding SMEs will need to be conducted in the future.

A Study on Kiosk Satisfaction Level Improvement: Focusing on Kano, Timko, and PCSI Methodology (키오스크 소비자의 만족수준 연구: Kano, Timko, PCSI 방법론을 중심으로)

  • Choi, Jaehoon;Kim, Pansoo
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.17 no.4
    • /
    • pp.193-204
    • /
    • 2022
  • This study analyzed the degree of influence of measurement and improvement of customer satisfaction level targeting kiosk users. In modern times, due to the development of technology and the improvement of the online environment, the probability that simple labor tasks will disappear after 10 years is close to 90%. Even in domestic research, it is predicted that 'simple labor jobs' will disappear due to the influence of advanced technology with a probability of about 36%. there is. In particular, as the demand for non-face-to-face services increases due to the Corona 19 virus, which is recently spreading globally, the trend of introducing kiosks has accelerated, and the global market will grow to 83.5 billion won in 2021, showing an average annual growth rate of 8.9%. there is. However, due to the unmanned nature of these kiosks, some consumers still have difficulties in using them, and consumers who are not familiar with the use of these technologies have a negative attitude towards service co-producers due to rejection of non-face-to-face services and anxiety about service errors. Lack of understanding leads to role conflicts between sales clerks and consumers, or inequality is being created in terms of service provision and generations accustomed to using technology. In addition, since kiosk is a representative technology-based self-service industry, if the user feels uncomfortable or requires additional labor, the overall service value decreases and the growth of the kiosk industry itself can be suppressed. It is important. Therefore, interviews were conducted on the main points of direct use with actual users centered on display color scheme, text size, device design, device size, internal UI (interface), amount of information, recognition sensor (barcode, NFC, etc.), Display brightness, self-event, and reaction speed items were extracted. Afterwards, using the questionnaire, the Kano model quality attribute classification of each expected evaluation item was carried out, and Timko's customer satisfaction coefficient, which can be calculated with accurate numerical values The PCSI Index analysis was additionally performed to determine the improvement priorities by finally classifying the improvement impact of the kiosk expected evaluation items through research. As a result, the impact of improvement appears in the order of internal UI (interface), text size, recognition sensor (barcode, NFC, etc.), reaction speed, self-event, display brightness, amount of information, device size, device design, and display color scheme. Through this, we intend to contribute to a comprehensive comparison of kiosk-based research in each field and to set the direction for improvement in the venture industry.

Citizen Sentiment Analysis of the Social Disaster by Using Opinion Mining (오피니언 마이닝 기법을 이용한 사회적 재난의 시민 감성도 분석)

  • Seo, Min Song;Yoo, Hwan Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.1
    • /
    • pp.37-46
    • /
    • 2017
  • Recently, disaster caused by social factors is frequently occurring in Korea. Prediction about what crisis could happen is difficult, raising the citizen's concern. In this study, we developed a program to acquire tweet data by applying Python language based Tweepy plug-in, regarding social disasters such as 'Nonspecific motive crimes' and 'Oxy' products. These data were used to evaluate psychological trauma and anxiety of citizens through the text clustering analysis and the opinion mining analysis of the R Studio program after natural language processing. In the analysis of the 'Oxy' case, the accident of Sewol ferry, the continual sale of Oxy products of the Oxy had the highest similarity and 'Nonspecific motive crimes', the coping measures of the government against unexpected incidents such as the 'incident' of the screen door, the accident of Sewol ferry and 'Nonspecific motive crime' due to misogyny in Busan, had the highest similarity. In addition, the average index of the Citizens sentiment score in Nonspecific motive crimes was more negative than that in the Oxy case by 11.61%p. Therefore, it is expected that the findings will be utilized to predict the mental health of citizens to prevent future accidents.

A Study of Inquiry Tendency of Earth Science Contents presented in North Korean Textbooks (북한 교과서 중 지구과학 내용의 탐구 경향성 분석)

  • Park, KiRak;Park, Hyun Ju
    • Journal of the Korean earth science society
    • /
    • v.40 no.2
    • /
    • pp.188-199
    • /
    • 2019
  • The purpose of this study was to investigate the tendency of inquiry of earth science content presented in North Korean textbooks of the 2013 National curriculum using Romey's method, and to help use as basic data for better understanding earth science education in North Korea. The content of earth science in the text, figure, question, and activity index of textbooks of Natural Science 1 and 2, Chosun Geography 2 of elementary junior high school, and of Geography 1 of advanced junior high school were all analyzed using Romey's method. The results of this study were as follows: First, the atmospheric science question and the astronomy text showed the tendency of inquiry type. Second, the proportion of oceanography was relatively small. Third, there were many non-inquiry questions or excessive inquiry questions, and both types of questions needed to be balanced. Fourth, there were a tendency that did not emphasize inquiry learning. Finally, the quantitative and qualitative level of inquiry tendency should be improved. In this paper, we propose to use a qualitative method when analyzing earth science content in North Korean textbooks, and suggested that we should further study the comparative analysis of inquiry tendency of earth science content using South and North Korean textbooks.

Topic Modeling of Profit Adjustment Research Trend in Korean Accounting (텍스트 마이닝을 이용한 이익조정 연구동향 토픽모델링)

  • Kim, JiYeon;Na, HongSeok;Park, Kyung Hwan
    • Journal of Digital Convergence
    • /
    • v.19 no.1
    • /
    • pp.125-139
    • /
    • 2021
  • This study identifies the trend of Korean accounting researches on profit adjustment. We analyzed the abstract of accounting research articles published in Korean Citation Index (KCI) by using text mining technique. Among papers whose themes were profit adjustment, topics were divided into 4 parts: (i) Auditing and audit reports, (ii) corporate taxes and debt ratios, (iii) general management strategy of companies, and (iv) financial statements and accounting principles. Unlike the prediction that financial statements and accounting principles would be the main topic, auditing was analyzed as the most studied area. We analyzed topic trends based on the number of papers by topic, and could figure out the impact of K-IFRS introduction on profit adjustment research. By using Big Data method, this study enabled the division of research themes that have not been available in the past studies. This study enables the policy makers and business managers to learn about additional considerations in addition to accounting principles related to profit adjustment.

An Analysis of the Research Trends for Urban Study using Topic Modeling (토픽모델링을 이용한 도시 분야 연구동향 분석)

  • Jang, Sun-Young;Jung, Seunghyun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.3
    • /
    • pp.661-670
    • /
    • 2021
  • Research trends can be usefully used to determine the importance of research topics by period, identify insufficient research fields, and discover new fields. In this study, research trends of urban spaces, where various problems are occurring due to population concentration and urbanization, were analyzed by topic modeling. The analysis target was the abstracts of papers listed in the Korea Citation Index (KCI) published between 2002 and 2019. Topic modeling is an algorithm-based text mining technique that can discover a certain pattern in the entire content, and it is easy to cluster. In this study, the frequency of keywords, trends by year, topic derivation, cluster by topic, and trend by topic type were analyzed. Research in urban regeneration is increasing continuously, and it was analyzed as a field where detailed topics could be expanded in the future. Furthermore, urban regeneration is now becoming a regular research field. On the other hand, topics related to development/growth and energy/environment have entered a stagnation period. This study is meaningful because the correlation and trends between keywords were analyzed using topic modeling targeting all domestic urban studies.