• Title/Summary/Keyword: Recommendation Systems

Search Result 837, Processing Time 0.027 seconds

A Study on Calibration Procedures for Ir-192 High Dose Rate Brachytherapy Sources (고선량률(HDR) 근접치료의 동위원소 Ir-192에 대한 측정방법에 관한 고찰)

  • Baek, Tae-Seong;Lee, Seung-Wook;Na, Soo-Kyong
    • The Journal of Korean Society for Radiation Therapy
    • /
    • v.19 no.1
    • /
    • pp.19-26
    • /
    • 2007
  • Purpose: To compare of the accuracy among various measurement procedures of HDR Brachytherapy, and to evaluate the clinical suitability and usefulness of alternative PMMA (polymethylmethacrylateplastics: $C_5H_8O_2$) plate phantom without any additional cost due to the purchase of measuring apparatus. Materials and Methods: We made a comparative study on three types of measuring systems: well type chamber, source calibration jig, and PMMA plate phantom. Farmer type chamber was used for source calibration jig method and PMMA plate phantom method. Measurement was done 5 times each in comparison with the measurement values from manufacturer. Measurement results from experiment were compared with that from the manufacturer which is offered with the source whenever a source is substituted by a new one and evaluate the accuracy of source activity. Results: As a consequence of Ir-192 source measurement using well type chamber, source calibration jig and PMMA plate phantom, RMS (Root Mean Square) values for the relative error are 0.6%, 1.57%, 2.1%, respectively, compared with the data from manufacturer. And the mean errors with standard deviation are given $-0.2{\pm}0.5%$, $0.97{\pm}1.23%$, $-0.89{\pm}1.87%$ respectively. Conclusion: From the results shown by the three types of measurement system (well type chamber, source calibration jig, and PMMA plate phantom), the measurement with well type chamber produced the best accuracy. It turns out that we can also use the alternative system of PMMA plate phantom clinically without purchasing any additional particular apparatus since the system does not exceed the recommendation of AAPM (American Association of Physicists in Medicine), which requires the error range of within ${\pm}5%$.

  • PDF

A Study on the Design of Case-based Reasoning Office Knowledge Recommender System for Office Professionals (사례기반추론을 이용한 사무지식 추천시스템)

  • Kim, Myong-Ok;Na, Jung-Ah
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.131-146
    • /
    • 2011
  • It is becoming more essential than ever for office professionals to become competent in information collection/gathering and problem solving in today's global business society. In particular, office professionals do not only assist simple chores but are also forced to make decisions as quickly and efficiently as possible in problematic situations that can end in either profit or loss to their company. Since office professionals rely heavily on their tacit knowledge to solve problems that arise in everyday business situations, it is truly helpful and efficient to refer to similar business cases from the past and share or reuse such previous business knowledge for better performance results. Case-based reasoning(CBR) is a problem-solving method which utilizes previous similar cases to solve problems. Through CBR, the closest case to the current business situation can be searched and retrieved from the case or knowledge base and can be referred to for a new solution. This reduces the time and resources needed and increase success probability. The main purpose of this study is to design a system called COKRS(Case-based reasoning Office Knowledge Recommender System) and develop a prototype for it. COKRS manages cases and their meta data, accepts key words from the user and searches the casebase for the most similar past case to the input keyword, and communicates with users to collect information about the quality of the case provided and continuously apply the information to update values on the similarity table. Core concepts like system architecture, definition of a case, meta database, similarity table have been introduced, and also an algorithm to retrieve all similar cases from past work history has also been proposed. In this research, a case is best defined as a work experience in office administration. However, defining a case in office administration was not an easy task in reality. We surveyed 10 office professionals in order to get an idea of how to define a case in office administration and found out that in most cases any type of office work is to be recorded digitally and/or non-digitally. Therefore, we have defined a record or document case as for COKRS. Similarity table was composed of items of the result of job analysis for office professionals conducted in a previous research. Values between items of the similarity table were initially set to those from researchers' experiences and literature review. The results of this study could also be utilized in other areas of business for knowledge sharing wherever it is necessary and beneficial to share and learn from past experiences. We expect this research to be a reference for researchers and developers who are in this area or interested in office knowledge recommendation system based on CBR. Focus group interview(FGI) was conducted with ten administrative assistants carefully selected from various areas of business. They were given a chance to try out COKRS in an actual work setting and make some suggestions for future improvement. FGI has identified the user-interface for saving and searching cases for keywords as the most positive aspect of COKRS, and has identified the most urgently needed improvement as transforming tacit knowledge and knowhow into recorded documents more efficiently. Also, the focus group has mentioned that it is essential to secure enough support, encouragement, and reward from the company and promote positive attitude and atmosphere for knowledge sharing for everybody's benefit in the company.

The Adaptive Personalization Method According to Users Purchasing Index : Application to Beverage Purchasing Predictions (고객별 구매빈도에 동적으로 적응하는 개인화 시스템 : 음료수 구매 예측에의 적용)

  • Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.95-108
    • /
    • 2011
  • TThis is a study of the personalization method that intelligently adapts the level of clustering considering purchasing index of a customer. In the e-biz era, many companies gather customers' demographic and transactional information such as age, gender, purchasing date and product category. They use this information to predict customer's preferences or purchasing patterns so that they can provide more customized services to their customers. The previous Customer-Segmentation method provides customized services for each customer group. This method clusters a whole customer set into different groups based on their similarity and builds predictive models for the resulting groups. Thus, it can manage the number of predictive models and also provide more data for the customers who do not have enough data to build a good predictive model by using the data of other similar customers. However, this method often fails to provide highly personalized services to each customer, which is especially important to VIP customers. Furthermore, it clusters the customers who already have a considerable amount of data as well as the customers who only have small amount of data, which causes to increase computational cost unnecessarily without significant performance improvement. The other conventional method called 1-to-1 method provides more customized services than the Customer-Segmentation method for each individual customer since the predictive model are built using only the data for the individual customer. This method not only provides highly personalized services but also builds a relatively simple and less costly model that satisfies with each customer. However, the 1-to-1 method has a limitation that it does not produce a good predictive model when a customer has only a few numbers of data. In other words, if a customer has insufficient number of transactional data then the performance rate of this method deteriorate. In order to overcome the limitations of these two conventional methods, we suggested the new method called Intelligent Customer Segmentation method that provides adaptive personalized services according to the customer's purchasing index. The suggested method clusters customers according to their purchasing index, so that the prediction for the less purchasing customers are based on the data in more intensively clustered groups, and for the VIP customers, who already have a considerable amount of data, clustered to a much lesser extent or not clustered at all. The main idea of this method is that applying clustering technique when the number of transactional data of the target customer is less than the predefined criterion data size. In order to find this criterion number, we suggest the algorithm called sliding window correlation analysis in this study. The algorithm purposes to find the transactional data size that the performance of the 1-to-1 method is radically decreased due to the data sparity. After finding this criterion data size, we apply the conventional 1-to-1 method for the customers who have more data than the criterion and apply clustering technique who have less than this amount until they can use at least the predefined criterion amount of data for model building processes. We apply the two conventional methods and the newly suggested method to Neilsen's beverage purchasing data to predict the purchasing amounts of the customers and the purchasing categories. We use two data mining techniques (Support Vector Machine and Linear Regression) and two types of performance measures (MAE and RMSE) in order to predict two dependent variables as aforementioned. The results show that the suggested Intelligent Customer Segmentation method can outperform the conventional 1-to-1 method in many cases and produces the same level of performances compare with the Customer-Segmentation method spending much less computational cost.

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

  • Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.23-45
    • /
    • 2020
  • Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.

A Comparative Study on Communication of Agricultural Innovation (농업 기술 전파 커뮤니케이션에 관한 비교 연구)

  • Kim, Sung-Soo
    • Journal of Agricultural Extension & Community Development
    • /
    • v.7 no.1
    • /
    • pp.121-136
    • /
    • 2000
  • This study reports on a comparison between the Korean diffusion of agricultural innovation or extension service and the cooperative extension service in the United States of America. It focuses on relevant differences between the two systems and provides recommendation for improvement of the Korean system to insure success in important areas related to the diffusion of agricultural innovations. After a comparative study on diffusion of innovations it is clear that: in order to have a productive agriculture that makes effective and efficient use of natural resources and helps achieve sustainability goals, a mechanism that delivers knowledge to agricultural communities must be established and maintained. This mechanism is clearly an agricultural extension service that is cooperatively funded by federal, state and local governments and that insures participation of constituents in the process of establishing priorities and evaluating achievements. The success of US agriculture, the most productive in the world, is to a large degree to the Cooperative Extension Service. Based on the results of this study and the differences of the United States and Korea, the following recommendations should be emphasized for more effective communication for agricultural innovation and rural development in Korea: 1) In order to insure that extension educators are high caliber professional individuals, it is important to establish a system that nationally recognizes these individuals as such, and that provides a professional development path. 2) The results of the decision of transfer of extension educators to local governments has not yielded positive outcomes, especially in terms of professional status. It is clearly demonstrable that valuable professionals are leaving the service, that local governments do not have the will and resources to implement a successful extension program. 3) Because of the critical importance of diffusing innovations to agricultural producers in order to insure and quality and steady food supply, it is of critical importance that these issues be addressed before the extension service is further deteriorated. Given the cement situation, it is clear that the extension service should become nationally supported again in cooperation with local and state governments and that extension professionals be given appropriate rank at the national level, commesurate with their peers in research and teaching. 4) The common current committee practice of lengthy reporting and short discussion needs to be changed to one that results in char, brief and substantive action oriented goals. Joint participation by researchers, extension educators and farmers should be encouraged in planning, implementation and evaluation of communication for agricultural innovations. Roles and functions of committees for institutional cooperation, and or agricultural extension committees should be enlarged. 5) Extension educators should be encouraged to adopt new communication technologies to improve their diffusion of innovations methods. Agricultural institutions and organizations should be encouraged to adopt farmer-first and or client-oriented approach in agricultural extension and diffusion of agricultural technologies. The number, complexity and rapid change of information in agricultural extension require the development of a computer based information and report system to support agricultural extension. 6) To facilitate and expand the further development of communication for agricultural innovation and rural development, agricultural communication programs in universities especially in colleges of agriculture and life sciences. 7) To strengthening the sense of national and social responsibility communication for agricultural innovation and rural development among students in agricultural colleges and universities through participation in learning activities by proactive recruitment. 8) To establish and reinforce a policy that insures participation in communication for agricultural innovation and regal development activities. 9) To improve further development of communication for agricultural innovation and rural development in Korea, more research activities should be encouraged.

  • PDF

Participant Characteristic and Educational Effects for Cyber Agricultural Technology Training Courses (사이버농업기술교육 참가자의 특성과 교육효과)

  • Kang, Dae-Koo
    • Journal of Agricultural Extension & Community Development
    • /
    • v.21 no.1
    • /
    • pp.35-82
    • /
    • 2014
  • It was main objectives to find the learners characteristics and educational effects of cyber agricultural technology courses in RDA. For the research, it was followed by literature reviews and internet based survey methods. In internet based survey, two staged stratified sampling method was adopted from cyber training members database in RDA along with some key word as open course or certificate course, and enrollment years. Instrument was composed through literature reviews about cyber education effects and educational effect factors. And learner characteristics items were added in survey documents. It was sent to sampled persons by e-mail and 316 data was returned via google survey systems. Through the data cleaning, 303 data were analysed by chi-square, t-test and F-test. It's significance level was .05. The results of the research were as followed; First, the respondent was composed of mainly man(77.9%), and monthly income group was mainly 2,000,000 or 3,000,000 won(24%), bachelor degree(48%), fifty or forty age group was shared to 75%, and their job was changed after learning(12.2%). So major respondents' job was not changed. Their major was not mainly agriculture. Learners' learning style were composed of two or more types as concrete-sequential, mixing, abstract-random, so e-learning course should be developed for the students' type. Second, it was attended at 3.2 days a week, 53.53 minutes a class, totally 172.63 minutes a week. They were very eager or generally eager to study, and attended two or more subjects. The cyber education motives was for farming knowledge, personal competency development, job performance enlarging. They selected subjects along with their interest. A subject person couldn't choose more subjects for little time, others, non interesting subject, but more subject persons were for job performance benefits and previous subjects effectiveness. Most learner was finished their subject, but a fourth was not finished for busy (26.7%). And their entrying behavior was not enough to learn e-course and computer or internet using ability was middle level as software using. And they thought RDA cyber course was comfort in non time or space limit, knowledge acquisition, and personal competency development. Cyber learning group was composed of open course only (12.5%), certificate only(25.7%), both(36.3%). Third, satisfaction and academic achievement of e-learning learners were good, and educational service offering for doing job in learning application category was good, but effect of cyber education was not good, especially, agricultural income increasing was not good because major learner group was not farmer, so they couldn't apply their knowledge to farming. And content structure and design, content comprehension, content amount were good. The more learning subject group responded to good in effects, and both open course and certificate course group satisfied more than open course only group. Based on the results, recommendation was offered as cyber course specialization before main course in RDA training system, support staff and faculty enlargement, building blended learning system with local RDA office, introducing cyber tutor system.

The Need for Paradigm Shift in Semantic Similarity and Semantic Relatedness : From Cognitive Semantics Perspective (의미간의 유사도 연구의 패러다임 변화의 필요성-인지 의미론적 관점에서의 고찰)

  • Choi, Youngseok;Park, Jinsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.111-123
    • /
    • 2013
  • Semantic similarity/relatedness measure between two concepts plays an important role in research on system integration and database integration. Moreover, current research on keyword recommendation or tag clustering strongly depends on this kind of semantic measure. For this reason, many researchers in various fields including computer science and computational linguistics have tried to improve methods to calculating semantic similarity/relatedness measure. This study of similarity between concepts is meant to discover how a computational process can model the action of a human to determine the relationship between two concepts. Most research on calculating semantic similarity usually uses ready-made reference knowledge such as semantic network and dictionary to measure concept similarity. The topological method is used to calculated relatedness or similarity between concepts based on various forms of a semantic network including a hierarchical taxonomy. This approach assumes that the semantic network reflects the human knowledge well. The nodes in a network represent concepts, and way to measure the conceptual similarity between two nodes are also regarded as ways to determine the conceptual similarity of two words(i.e,. two nodes in a network). Topological method can be categorized as node-based or edge-based, which are also called the information content approach and the conceptual distance approach, respectively. The node-based approach is used to calculate similarity between concepts based on how much information the two concepts share in terms of a semantic network or taxonomy while edge-based approach estimates the distance between the nodes that correspond to the concepts being compared. Both of two approaches have assumed that the semantic network is static. That means topological approach has not considered the change of semantic relation between concepts in semantic network. However, as information communication technologies make advantage in sharing knowledge among people, semantic relation between concepts in semantic network may change. To explain the change in semantic relation, we adopt the cognitive semantics. The basic assumption of cognitive semantics is that humans judge the semantic relation based on their cognition and understanding of concepts. This cognition and understanding is called 'World Knowledge.' World knowledge can be categorized as personal knowledge and cultural knowledge. Personal knowledge means the knowledge from personal experience. Everyone can have different Personal Knowledge of same concept. Cultural Knowledge is the knowledge shared by people who are living in the same culture or using the same language. People in the same culture have common understanding of specific concepts. Cultural knowledge can be the starting point of discussion about the change of semantic relation. If the culture shared by people changes for some reasons, the human's cultural knowledge may also change. Today's society and culture are changing at a past face, and the change of cultural knowledge is not negligible issues in the research on semantic relationship between concepts. In this paper, we propose the future directions of research on semantic similarity. In other words, we discuss that how the research on semantic similarity can reflect the change of semantic relation caused by the change of cultural knowledge. We suggest three direction of future research on semantic similarity. First, the research should include the versioning and update methodology for semantic network. Second, semantic network which is dynamically generated can be used for the calculation of semantic similarity between concepts. If the researcher can develop the methodology to extract the semantic network from given knowledge base in real time, this approach can solve many problems related to the change of semantic relation. Third, the statistical approach based on corpus analysis can be an alternative for the method using semantic network. We believe that these proposed research direction can be the milestone of the research on semantic relation.

A Checklist to Improve the Fairness in AI Financial Service: Focused on the AI-based Credit Scoring Service (인공지능 기반 금융서비스의 공정성 확보를 위한 체크리스트 제안: 인공지능 기반 개인신용평가를 중심으로)

  • Kim, HaYeong;Heo, JeongYun;Kwon, Hochang
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.259-278
    • /
    • 2022
  • With the spread of Artificial Intelligence (AI), various AI-based services are expanding in the financial sector such as service recommendation, automated customer response, fraud detection system(FDS), credit scoring services, etc. At the same time, problems related to reliability and unexpected social controversy are also occurring due to the nature of data-based machine learning. The need Based on this background, this study aimed to contribute to improving trust in AI-based financial services by proposing a checklist to secure fairness in AI-based credit scoring services which directly affects consumers' financial life. Among the key elements of trustworthy AI like transparency, safety, accountability, and fairness, fairness was selected as the subject of the study so that everyone could enjoy the benefits of automated algorithms from the perspective of inclusive finance without social discrimination. We divided the entire fairness related operation process into three areas like data, algorithms, and user areas through literature research. For each area, we constructed four detailed considerations for evaluation resulting in 12 checklists. The relative importance and priority of the categories were evaluated through the analytic hierarchy process (AHP). We use three different groups: financial field workers, artificial intelligence field workers, and general users which represent entire financial stakeholders. According to the importance of each stakeholder, three groups were classified and analyzed, and from a practical perspective, specific checks such as feasibility verification for using learning data and non-financial information and monitoring new inflow data were identified. Moreover, financial consumers in general were found to be highly considerate of the accuracy of result analysis and bias checks. We expect this result could contribute to the design and operation of fair AI-based financial services.

An Intelligent Decision Support System for Selecting Promising Technologies for R&D based on Time-series Patent Analysis (R&D 기술 선정을 위한 시계열 특허 분석 기반 지능형 의사결정지원시스템)

  • Lee, Choongseok;Lee, Suk Joo;Choi, Byounggu
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.79-96
    • /
    • 2012
  • As the pace of competition dramatically accelerates and the complexity of change grows, a variety of research have been conducted to improve firms' short-term performance and to enhance firms' long-term survival. In particular, researchers and practitioners have paid their attention to identify promising technologies that lead competitive advantage to a firm. Discovery of promising technology depends on how a firm evaluates the value of technologies, thus many evaluating methods have been proposed. Experts' opinion based approaches have been widely accepted to predict the value of technologies. Whereas this approach provides in-depth analysis and ensures validity of analysis results, it is usually cost-and time-ineffective and is limited to qualitative evaluation. Considerable studies attempt to forecast the value of technology by using patent information to overcome the limitation of experts' opinion based approach. Patent based technology evaluation has served as a valuable assessment approach of the technological forecasting because it contains a full and practical description of technology with uniform structure. Furthermore, it provides information that is not divulged in any other sources. Although patent information based approach has contributed to our understanding of prediction of promising technologies, it has some limitations because prediction has been made based on the past patent information, and the interpretations of patent analyses are not consistent. In order to fill this gap, this study proposes a technology forecasting methodology by integrating patent information approach and artificial intelligence method. The methodology consists of three modules : evaluation of technologies promising, implementation of technologies value prediction model, and recommendation of promising technologies. In the first module, technologies promising is evaluated from three different and complementary dimensions; impact, fusion, and diffusion perspectives. The impact of technologies refers to their influence on future technologies development and improvement, and is also clearly associated with their monetary value. The fusion of technologies denotes the extent to which a technology fuses different technologies, and represents the breadth of search underlying the technology. The fusion of technologies can be calculated based on technology or patent, thus this study measures two types of fusion index; fusion index per technology and fusion index per patent. Finally, the diffusion of technologies denotes their degree of applicability across scientific and technological fields. In the same vein, diffusion index per technology and diffusion index per patent are considered respectively. In the second module, technologies value prediction model is implemented using artificial intelligence method. This studies use the values of five indexes (i.e., impact index, fusion index per technology, fusion index per patent, diffusion index per technology and diffusion index per patent) at different time (e.g., t-n, t-n-1, t-n-2, ${\cdots}$) as input variables. The out variables are values of five indexes at time t, which is used for learning. The learning method adopted in this study is backpropagation algorithm. In the third module, this study recommends final promising technologies based on analytic hierarchy process. AHP provides relative importance of each index, leading to final promising index for technology. Applicability of the proposed methodology is tested by using U.S. patents in international patent class G06F (i.e., electronic digital data processing) from 2000 to 2008. The results show that mean absolute error value for prediction produced by the proposed methodology is lower than the value produced by multiple regression analysis in cases of fusion indexes. However, mean absolute error value of the proposed methodology is slightly higher than the value of multiple regression analysis. These unexpected results may be explained, in part, by small number of patents. Since this study only uses patent data in class G06F, number of sample patent data is relatively small, leading to incomplete learning to satisfy complex artificial intelligence structure. In addition, fusion index per technology and impact index are found to be important criteria to predict promising technology. This study attempts to extend the existing knowledge by proposing a new methodology for prediction technology value by integrating patent information analysis and artificial intelligence network. It helps managers who want to technology develop planning and policy maker who want to implement technology policy by providing quantitative prediction methodology. In addition, this study could help other researchers by proving a deeper understanding of the complex technological forecasting field.

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.