• Title/Summary/Keyword: Recommendation system

Search Result 1,704, Processing Time 0.032 seconds

An Investigation on Expanding Co-occurrence Criteria in Association Rule Mining (연관규칙 마이닝에서의 동시성 기준 확장에 대한 연구)

  • Kim, Mi-Sung;Kim, Nam-Gyu;Ahn, Jae-Hyeon
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.23-38
    • /
    • 2012
  • There is a large difference between purchasing patterns in an online shopping mall and in an offline market. This difference may be caused mainly by the difference in accessibility of online and offline markets. It means that an interval between the initial purchasing decision and its realization appears to be relatively short in an online shopping mall, because a customer can make an order immediately. Because of the short interval between a purchasing decision and its realization, an online shopping mall transaction usually contains fewer items than that of an offline market. In an offline market, customers usually keep some items in mind and buy them all at once a few days after deciding to buy them, instead of buying each item individually and immediately. On the contrary, more than 70% of online shopping mall transactions contain only one item. This statistic implies that traditional data mining techniques cannot be directly applied to online market analysis, because hardly any association rules can survive with an acceptable level of Support because of too many Null Transactions. Most market basket analyses on online shopping mall transactions, therefore, have been performed by expanding the co-occurrence criteria of traditional association rule mining. While the traditional co-occurrence criteria defines items purchased in one transaction as concurrently purchased items, the expanded co-occurrence criteria regards items purchased by a customer during some predefined period (e.g., a day) as concurrently purchased items. In studies using expanded co-occurrence criteria, however, the criteria has been defined arbitrarily by researchers without any theoretical grounds or agreement. The lack of clear grounds of adopting a certain co-occurrence criteria degrades the reliability of the analytical results. Moreover, it is hard to derive new meaningful findings by combining the outcomes of previous individual studies. In this paper, we attempt to compare expanded co-occurrence criteria and propose a guideline for selecting an appropriate one. First of all, we compare the accuracy of association rules discovered according to various co-occurrence criteria. By doing this experiment we expect that we can provide a guideline for selecting appropriate co-occurrence criteria that corresponds to the purpose of the analysis. Additionally, we will perform similar experiments with several groups of customers that are segmented by each customer's average duration between orders. By this experiment, we attempt to discover the relationship between the optimal co-occurrence criteria and the customer's average duration between orders. Finally, by a series of experiments, we expect that we can provide basic guidelines for developing customized recommendation systems. Our experiments use a real dataset acquired from one of the largest internet shopping malls in Korea. We use 66,278 transactions of 3,847 customers conducted during the last two years. Overall results show that the accuracy of association rules of frequent shoppers (whose average duration between orders is relatively short) is higher than that of causal shoppers. In addition we discover that with frequent shoppers, the accuracy of association rules appears very high when the co-occurrence criteria of the training set corresponds to the validation set (i.e., target set). It implies that the co-occurrence criteria of frequent shoppers should be set according to the application purpose period. For example, an analyzer should use a day as a co-occurrence criterion if he/she wants to offer a coupon valid only for a day to potential customers who will use the coupon. On the contrary, an analyzer should use a month as a co-occurrence criterion if he/she wants to publish a coupon book that can be used for a month. In the case of causal shoppers, the accuracy of association rules appears to not be affected by the period of the application purposes. The accuracy of the causal shoppers' association rules becomes higher when the longer co-occurrence criterion has been adopted. It implies that an analyzer has to set the co-occurrence criterion for as long as possible, regardless of the application purpose period.

The Jurisdictional Precedent Analysis of Medical Dispute in Dental Field (치과임상영역에서 발생된 의료분쟁의 판례분석)

  • Kwon, Byung-Ki;Ahn, Hyoung-Joon;Kang, Jin-Kyu;Kim, Chong-Youl;Choi, Jong-Hoon
    • Journal of Oral Medicine and Pain
    • /
    • v.31 no.4
    • /
    • pp.283-296
    • /
    • 2006
  • Along with the development of scientific technologies, health care has been growing remarkably, and as the social life quality improves with increasing interest in health, the demand for medical service is rapidly increasing. However, medical accident and medical dispute also are rapidly increasing due to various factors such as, increasing sense of people's right, lack of understanding in the nature of medical practice, over expectation on medical technique, commercialize medical supply system, moral degeneracy and unawareness of medical jurisprudence by doctors, widespread trend of mutual distrust, and lack of systematized device for solution of medical dispute. This study analysed 30 cases of civil suit in the year between 1994 to 2004, which were selected among the medical dispute cases in dental field with the judgement collected from organizations related to dentistry and department of oral medicine, Yonsei university dental hospital. The following results were drawn from the analyses: 1. The distribution of year showed rapid increase of medical dispute after the year 2000. 2. In the types of medical dispute, suit associated with tooth extraction took 36.7% of all. 3. As for the cause of medical dispute, uncomfortable feeling and dissatisfaction with the treatment showed 36.7%, death and permanent damage showed 16.7% each. 4. Winning the suit, compulsory mediation and recommendation for settlement took 60.0% of judgement result for the plaintiff. 5. For the type of medical organization in relation to medical dispute, 60.0% was found to be the private dental clinics, and 30.0% was university dental hospitals. 6. For the level of trial, dispute that progressed above 2 or 3 trials was of 30.0%. 7. For the amount of claim for damage, the claim amounting between 50 million to 100 million won was of 36.7%, and that of more than 100 million won was 13.3%, and in case of the judgement amount, the amount ranging from 10 million to 30 million won was of 40.0%, and that of more than 100 million won was of 6.7%. 8. For the number of dentist involved in the suit, 26.7% was of 2 or more dentists. 9. For the amount of time spent until the judgement, 46.7% took 11 to 20 months, and 36.7% took 21 to 30 months. 10. For medical malpractice, 46.7% was judged to be guilty, and 70% of the cases had undergone medical judgement or verification of the case by specialists during the process of the suit. 11. In the lost cases of doctors(18 cases), 72.2% was due to violence of carefulness in practice and 16.7% was due to missing of explanation to patient. Medical disputes occurring in the field of dentistry are usually of relatively less risky cases. Hence, the importance of explanation to patient is emphasized, and since the levels of patient satisfaction are subjective, improvement of the relationship between the patient and the dentist and recovery of autonomy within the group dentist are essential in addition to the reduction of technical malpractice. Moreover, management measure against the medical dispute should be set up through complement of the current doctors and hospitals medical malpractice insurance which is being conducted irrationally, and establishment of system in which education as well as consultation for medical disputes lead by the group of dental clinicians and academic scholars are accessible.

An Intelligent Decision Support System for Selecting Promising Technologies for R&D based on Time-series Patent Analysis (R&D 기술 선정을 위한 시계열 특허 분석 기반 지능형 의사결정지원시스템)

  • Lee, Choongseok;Lee, Suk Joo;Choi, Byounggu
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.79-96
    • /
    • 2012
  • As the pace of competition dramatically accelerates and the complexity of change grows, a variety of research have been conducted to improve firms' short-term performance and to enhance firms' long-term survival. In particular, researchers and practitioners have paid their attention to identify promising technologies that lead competitive advantage to a firm. Discovery of promising technology depends on how a firm evaluates the value of technologies, thus many evaluating methods have been proposed. Experts' opinion based approaches have been widely accepted to predict the value of technologies. Whereas this approach provides in-depth analysis and ensures validity of analysis results, it is usually cost-and time-ineffective and is limited to qualitative evaluation. Considerable studies attempt to forecast the value of technology by using patent information to overcome the limitation of experts' opinion based approach. Patent based technology evaluation has served as a valuable assessment approach of the technological forecasting because it contains a full and practical description of technology with uniform structure. Furthermore, it provides information that is not divulged in any other sources. Although patent information based approach has contributed to our understanding of prediction of promising technologies, it has some limitations because prediction has been made based on the past patent information, and the interpretations of patent analyses are not consistent. In order to fill this gap, this study proposes a technology forecasting methodology by integrating patent information approach and artificial intelligence method. The methodology consists of three modules : evaluation of technologies promising, implementation of technologies value prediction model, and recommendation of promising technologies. In the first module, technologies promising is evaluated from three different and complementary dimensions; impact, fusion, and diffusion perspectives. The impact of technologies refers to their influence on future technologies development and improvement, and is also clearly associated with their monetary value. The fusion of technologies denotes the extent to which a technology fuses different technologies, and represents the breadth of search underlying the technology. The fusion of technologies can be calculated based on technology or patent, thus this study measures two types of fusion index; fusion index per technology and fusion index per patent. Finally, the diffusion of technologies denotes their degree of applicability across scientific and technological fields. In the same vein, diffusion index per technology and diffusion index per patent are considered respectively. In the second module, technologies value prediction model is implemented using artificial intelligence method. This studies use the values of five indexes (i.e., impact index, fusion index per technology, fusion index per patent, diffusion index per technology and diffusion index per patent) at different time (e.g., t-n, t-n-1, t-n-2, ${\cdots}$) as input variables. The out variables are values of five indexes at time t, which is used for learning. The learning method adopted in this study is backpropagation algorithm. In the third module, this study recommends final promising technologies based on analytic hierarchy process. AHP provides relative importance of each index, leading to final promising index for technology. Applicability of the proposed methodology is tested by using U.S. patents in international patent class G06F (i.e., electronic digital data processing) from 2000 to 2008. The results show that mean absolute error value for prediction produced by the proposed methodology is lower than the value produced by multiple regression analysis in cases of fusion indexes. However, mean absolute error value of the proposed methodology is slightly higher than the value of multiple regression analysis. These unexpected results may be explained, in part, by small number of patents. Since this study only uses patent data in class G06F, number of sample patent data is relatively small, leading to incomplete learning to satisfy complex artificial intelligence structure. In addition, fusion index per technology and impact index are found to be important criteria to predict promising technology. This study attempts to extend the existing knowledge by proposing a new methodology for prediction technology value by integrating patent information analysis and artificial intelligence network. It helps managers who want to technology develop planning and policy maker who want to implement technology policy by providing quantitative prediction methodology. In addition, this study could help other researchers by proving a deeper understanding of the complex technological forecasting field.

A Study on the Expressed Desire at Discharge of Patients to Use Home Nursing and Affecting Factors of the Desire (퇴원환자의 가정간호 이용의사와 관련 요인)

  • Lee, Ji-Hyun;Lee, Young-Eun;Lee, Myung-Hwa;Sohn, Sue-Kyung
    • The Korean Journal of Rehabilitation Nursing
    • /
    • v.2 no.2
    • /
    • pp.257-270
    • /
    • 1999
  • The purpose of this study is to investigate factors related to the intent of using home nursing of chronic disease patients who got out of a university hospital. For the purpose, the study selected 153 patients who were hospitalized and left K university hospital with diagnoses of cancer, hypertension, diabetes and cerebral vascular accident and ordered to be discharged and performed interviews with them and surveys on their medical records to obtain the following results. For this study a direct-interview survey and medical record review was conducted from June 28 to Aug. 30, 1998. The frequency and mean values were computed to find the characteristics of the study subjects, and $X^2$-test, t-test, factor analysis and multiple logistic regession analysis were applied for the analysis of the data. The following results were obtained. 1) When characteristics of the subjects were examined, men and women occupied for 58.8% and 41.2%, respectively. The subjects were 41.3 years old in aver age and had the monthly aver age earning of 0.99 million won or below, which was the most out of the total subjects at 34.6%. Among the total, 87.6% resided in cities and 12.4 in counties. The most left the hospital with diagnosis of cancer at 51.6%, followed by hyper tension at 24.2%, diabetes at 13.7% and cerebral vascular accident at 7.2%. 2) 93.5% of the selected patients had the intent of using home nursing and 6.5%, didn't. Among those patients having the intent, 85.6% had the intent of paying for home nursing and 14.4%, didn't. The subjects expected that the nursing would be paid 9,143 won in aver age and 47.7% of them preferred national authorities as the main servers. 86.3% of the subjects thought that home nursing business had the main advantage of making it possible to learn nursing methods at home and thereby contributing to improving the ability of patients and their facilities to solve health problems. 3) Relations between the intent of use and characteristics of the subjects such as demography-related social, home environment, disease and physical function characteristics did not show statistically significant differences among one another. Compared to those who had no intent of using home nursing, the group having the intent had more cases of male patients, the age of 39 or below, residence in cities, 5 family member s or more, no existence of home nursing servers, leaving the hospital from a non-hospitalized building, disease development for five months or below, hospitalization for ten days or more, non-hospitalization with in the recent one month, two times or over of hospitalization, leaving the hospital with no demand of special treatment, operation underwent, poor results of treatment, leaving the hospital with demand of rehabilitation services, physical disablement and high evaluation point of daily life. 4) Among those patients having the intent of using home nursing, 47.6% demanded technical nursing and 55.9%, supportive nursing. As technical nursing,' inject into a blood vessel ' and 'treat pustule and teach basic prevention methods occupied for 57.4%, respectively, topping the list. Among demands of supportive nursing, 'observe patients 'status and refer them to hospitals or community resources as available, if necessary' was the most with percent age point of 59.5. Regarding the intent of paying for home nursing, 39.2% of those patients wishing to use the nursing responded paying for technical services and 20.2, supportive services. In detail, 70.0% wanted to pay for a service stated as 'inject into a blood vessel', highest among the former services and 30.7%, a service referred to as 'teaching exercises needed to make the body of patients move', highest among the latter. When this was analyzed in terms of a relation between the need(the need for home nursing) and the demand(the intent of paying for home nursing), The rate of the need to the demand was found two or three times higher in technical nursing(0.82) than in supportive nursing(0.35). In aspects of tech ical nursing, muscle injection(1.26, the 1st rank) was highest in the rate while among aspects of supportive nursing, a service referred to as 'teach exercises needed for making patients move their bodies normally'(0.58, the 1st rank). 5) factors I(satisfaction with hospital services), II(recognition of disease state), III(economy) and IV(period of disease) occupied for 34.4, 13.8, 11.9 and 9.2 percents, respectively among factors related to the intent by the subjects of using home nursing, totaled 59.3%. In conclusion, most of chronic disease patients have the intent of using hospital-based home nursing and satisfaction with hospital services is a factor affecting the intent most. Thus a post-management system is needed to continue providing health management to those patients after they leave the hospital. Further, supportive services should be provided in order that those who are satisfied with hospital services return to their community and live their in dependent lives. Based on these results, the researcher would make the following recommendation. 1) Because home nursing becomes more and more needed due to a sharp increase in chronic disease patients and elderly people, related rules and regulations should be made and implemented. 2) Hospital nurses specializing in home nursing should be cultivated.

  • PDF

Dressing Effect of Phosphorus Fetilizer on the Growth of Soil Improving Species (비료목생장(肥料木生長)에 미치는 인산비료(燐酸肥料)의 시비효과(施肥效果))

  • Ma, Sang Kyu
    • Journal of Korean Society of Forest Science
    • /
    • v.45 no.1
    • /
    • pp.26-36
    • /
    • 1979
  • Through several trials that has done for making the fertilizing-counter plan on the soil improving species, some results have been got as follows; 1. In the non-phosphorus dressing plots soil improving species have very poor survial ratio and show us to be died step by step. It may be resons that root can not make the nodule in case of non-phosphorus dressing and so tree could not absorb the nitrogen nutrient fixed by the nodule. And root competition with the weedy speces for utilizing the nutrient and oxygen in the soil could be reasons when planting in the heavy weedy rooting site. 2. Triple super phosphate, Fused Mg Phosphate and Fused super phosphate have showed the remarkable effects on the growth of soil improving species within 3rd year after planting. But Koreaan tablet fertilizer(9-12-4) for forest purpose have reacted considerably lower effect in comparision with the above powder and grain type phosphorous fertilizer. 3. In case of tablet type fertilizer tree root will have very little phosphorus absorbing surface because phosphorus can be utilized only from the tablet surface and root can not penetrate into the tablet. This my be reson to show the poor dressing reaction of tablet fertilizer but tablet fertilizer has a possibility to be utilized during long years as a sympton in photo 6. So tablet fertilizer can have a recommendation to dress much fertilizer at p]anting year and then tree root can get much more chance for absorbing the phosphorus that could keep the high survival and for utilizing it during many years without after dressing. 4. The granurar and powder type phosphate can develop the dense root mat like photo 8 because of giving the large surface for absorbing the phosphorus and weedy root can approch to the nodule for taking the nitrogen element. So this type seems to present better effect than tablet type to control the soil movement, stem weight as 200g per meter(l meter long${\times}$0.1m width). When added the lime any effect could not be found and rather give the negative effect. So Lespedeza seed sowing and phosphorus dressing system seems us to be very reasonable method for covering the raw material of basket making, fodder and fuel wood supply. 7. Fused Mg phosphate and Fused super phosphate are good fertilizer to the soil improving species and dressing more than 30g per seedling can be recommendable amount. 5. In the unproductive and dry soil with phosphorus fertilizer Robinia pseudoacacia and Alnus firnui can grow more than 2.3m in height at 3rd year and Alnus inokumae have the rapid height growth that is more than 1.8m at 2nd year. Depending on the growth situation like the above example minirotated management has possibilities and rapid covering of erosed land can be done with the soil improving species and phosphorus fertilizer. 6. In the Lespedeza sowing plot with 40g Fused Mg phosphate dressing per meter in the eroded and unproductive forest soil Lespedeza have completely covered this poor land and produced the green.

  • PDF

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

Business Application of Convolutional Neural Networks for Apparel Classification Using Runway Image (합성곱 신경망의 비지니스 응용: 런웨이 이미지를 사용한 의류 분류를 중심으로)

  • Seo, Yian;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.1-19
    • /
    • 2018
  • Large amount of data is now available for research and business sectors to extract knowledge from it. This data can be in the form of unstructured data such as audio, text, and image data and can be analyzed by deep learning methodology. Deep learning is now widely used for various estimation, classification, and prediction problems. Especially, fashion business adopts deep learning techniques for apparel recognition, apparel search and retrieval engine, and automatic product recommendation. The core model of these applications is the image classification using Convolutional Neural Networks (CNN). CNN is made up of neurons which learn parameters such as weights while inputs come through and reach outputs. CNN has layer structure which is best suited for image classification as it is comprised of convolutional layer for generating feature maps, pooling layer for reducing the dimensionality of feature maps, and fully-connected layer for classifying the extracted features. However, most of the classification models have been trained using online product image, which is taken under controlled situation such as apparel image itself or professional model wearing apparel. This image may not be an effective way to train the classification model considering the situation when one might want to classify street fashion image or walking image, which is taken in uncontrolled situation and involves people's movement and unexpected pose. Therefore, we propose to train the model with runway apparel image dataset which captures mobility. This will allow the classification model to be trained with far more variable data and enhance the adaptation with diverse query image. To achieve both convergence and generalization of the model, we apply Transfer Learning on our training network. As Transfer Learning in CNN is composed of pre-training and fine-tuning stages, we divide the training step into two. First, we pre-train our architecture with large-scale dataset, ImageNet dataset, which consists of 1.2 million images with 1000 categories including animals, plants, activities, materials, instrumentations, scenes, and foods. We use GoogLeNet for our main architecture as it has achieved great accuracy with efficiency in ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Second, we fine-tune the network with our own runway image dataset. For the runway image dataset, we could not find any previously and publicly made dataset, so we collect the dataset from Google Image Search attaining 2426 images of 32 major fashion brands including Anna Molinari, Balenciaga, Balmain, Brioni, Burberry, Celine, Chanel, Chloe, Christian Dior, Cividini, Dolce and Gabbana, Emilio Pucci, Ermenegildo, Fendi, Giuliana Teso, Gucci, Issey Miyake, Kenzo, Leonard, Louis Vuitton, Marc Jacobs, Marni, Max Mara, Missoni, Moschino, Ralph Lauren, Roberto Cavalli, Sonia Rykiel, Stella McCartney, Valentino, Versace, and Yve Saint Laurent. We perform 10-folded experiments to consider the random generation of training data, and our proposed model has achieved accuracy of 67.2% on final test. Our research suggests several advantages over previous related studies as to our best knowledge, there haven't been any previous studies which trained the network for apparel image classification based on runway image dataset. We suggest the idea of training model with image capturing all the possible postures, which is denoted as mobility, by using our own runway apparel image dataset. Moreover, by applying Transfer Learning and using checkpoint and parameters provided by Tensorflow Slim, we could save time spent on training the classification model as taking 6 minutes per experiment to train the classifier. This model can be used in many business applications where the query image can be runway image, product image, or street fashion image. To be specific, runway query image can be used for mobile application service during fashion week to facilitate brand search, street style query image can be classified during fashion editorial task to classify and label the brand or style, and website query image can be processed by e-commerce multi-complex service providing item information or recommending similar item.

Context Sharing Framework Based on Time Dependent Metadata for Social News Service (소셜 뉴스를 위한 시간 종속적인 메타데이터 기반의 컨텍스트 공유 프레임워크)

  • Ga, Myung-Hyun;Oh, Kyeong-Jin;Hong, Myung-Duk;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.39-53
    • /
    • 2013
  • The emergence of the internet technology and SNS has increased the information flow and has changed the way people to communicate from one-way to two-way communication. Users not only consume and share the information, they also can create and share it among their friends across the social network service. It also changes the Social Media behavior to become one of the most important communication tools which also includes Social TV. Social TV is a form which people can watch a TV program and at the same share any information or its content with friends through Social media. Social News is getting popular and also known as a Participatory Social Media. It creates influences on user interest through Internet to represent society issues and creates news credibility based on user's reputation. However, the conventional platforms in news services only focus on the news recommendation domain. Recent development in SNS has changed this landscape to allow user to share and disseminate the news. Conventional platform does not provide any special way for news to be share. Currently, Social News Service only allows user to access the entire news. Nonetheless, they cannot access partial of the contents which related to users interest. For example user only have interested to a partial of the news and share the content, it is still hard for them to do so. In worst cases users might understand the news in different context. To solve this, Social News Service must provide a method to provide additional information. For example, Yovisto known as an academic video searching service provided time dependent metadata from the video. User can search and watch partial of video content according to time dependent metadata. They also can share content with a friend in social media. Yovisto applies a method to divide or synchronize a video based whenever the slides presentation is changed to another page. However, we are not able to employs this method on news video since the news video is not incorporating with any power point slides presentation. Segmentation method is required to separate the news video and to creating time dependent metadata. In this work, In this paper, a time dependent metadata-based framework is proposed to segment news contents and to provide time dependent metadata so that user can use context information to communicate with their friends. The transcript of the news is divided by using the proposed story segmentation method. We provide a tag to represent the entire content of the news. And provide the sub tag to indicate the segmented news which includes the starting time of the news. The time dependent metadata helps user to track the news information. It also allows them to leave a comment on each segment of the news. User also may share the news based on time metadata as segmented news or as a whole. Therefore, it helps the user to understand the shared news. To demonstrate the performance, we evaluate the story segmentation accuracy and also the tag generation. For this purpose, we measured accuracy of the story segmentation through semantic similarity and compared to the benchmark algorithm. Experimental results show that the proposed method outperforms benchmark algorithms in terms of the accuracy of story segmentation. It is important to note that sub tag accuracy is the most important as a part of the proposed framework to share the specific news context with others. To extract a more accurate sub tags, we have created stop word list that is not related to the content of the news such as name of the anchor or reporter. And we applied to framework. We have analyzed the accuracy of tags and sub tags which represent the context of news. From the analysis, it seems that proposed framework is helpful to users for sharing their opinions with context information in Social media and Social news.

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

Analysis of shopping website visit types and shopping pattern (쇼핑 웹사이트 탐색 유형과 방문 패턴 분석)

  • Choi, Kyungbin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.85-107
    • /
    • 2019
  • Online consumers browse products belonging to a particular product line or brand for purchase, or simply leave a wide range of navigation without making purchase. The research on the behavior and purchase of online consumers has been steadily progressed, and related services and applications based on behavior data of consumers have been developed in practice. In recent years, customization strategies and recommendation systems of consumers have been utilized due to the development of big data technology, and attempts are being made to optimize users' shopping experience. However, even in such an attempt, it is very unlikely that online consumers will actually be able to visit the website and switch to the purchase stage. This is because online consumers do not just visit the website to purchase products but use and browse the websites differently according to their shopping motives and purposes. Therefore, it is important to analyze various types of visits as well as visits to purchase, which is important for understanding the behaviors of online consumers. In this study, we explored the clustering analysis of session based on click stream data of e-commerce company in order to explain diversity and complexity of search behavior of online consumers and typified search behavior. For the analysis, we converted data points of more than 8 million pages units into visit units' sessions, resulting in a total of over 500,000 website visit sessions. For each visit session, 12 characteristics such as page view, duration, search diversity, and page type concentration were extracted for clustering analysis. Considering the size of the data set, we performed the analysis using the Mini-Batch K-means algorithm, which has advantages in terms of learning speed and efficiency while maintaining the clustering performance similar to that of the clustering algorithm K-means. The most optimized number of clusters was derived from four, and the differences in session unit characteristics and purchasing rates were identified for each cluster. The online consumer visits the website several times and learns about the product and decides the purchase. In order to analyze the purchasing process over several visits of the online consumer, we constructed the visiting sequence data of the consumer based on the navigation patterns in the web site derived clustering analysis. The visit sequence data includes a series of visiting sequences until one purchase is made, and the items constituting one sequence become cluster labels derived from the foregoing. We have separately established a sequence data for consumers who have made purchases and data on visits for consumers who have only explored products without making purchases during the same period of time. And then sequential pattern mining was applied to extract frequent patterns from each sequence data. The minimum support is set to 10%, and frequent patterns consist of a sequence of cluster labels. While there are common derived patterns in both sequence data, there are also frequent patterns derived only from one side of sequence data. We found that the consumers who made purchases through the comparative analysis of the extracted frequent patterns showed the visiting pattern to decide to purchase the product repeatedly while searching for the specific product. The implication of this study is that we analyze the search type of online consumers by using large - scale click stream data and analyze the patterns of them to explain the behavior of purchasing process with data-driven point. Most studies that typology of online consumers have focused on the characteristics of the type and what factors are key in distinguishing that type. In this study, we carried out an analysis to type the behavior of online consumers, and further analyzed what order the types could be organized into one another and become a series of search patterns. In addition, online retailers will be able to try to improve their purchasing conversion through marketing strategies and recommendations for various types of visit and will be able to evaluate the effect of the strategy through changes in consumers' visit patterns.