• Title/Summary/Keyword: Analyzing Performance of Data

Search Result 1,392, Processing Time 0.033 seconds

Study on Importance-Performance Analysis Regarding Selection Attributes of Rice-Convenience Foods (쌀을 이용한 편의식품의 선택속성에 관한 중요도-수행도 분석(IPA))

  • Park, Hyojin;Oh, Narae;Jang, Jin-A;Yoon, Hei Ryeo;Cho, Mi Sook
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.45 no.4
    • /
    • pp.593-601
    • /
    • 2016
  • This study was carried out to establish an effective marketing strategy based on Importance-Performance Analysis (IPA) of rice-convenience foods. IPA is one of the most efficient and simple methods to evaluate product quality. Data were collected from 652 people (320 males and 332 females) and analyzed by SPSS 19.0. Subjects consumed rice-convenience foods as a snack substitute (19.3%), breakfast (20.7%), lunch (37.4%), dinner (15.2%), and late-night meal (7.4%). The purpose for consumption of rice-convenience foods were as follows: light meal (34.8%), lack of time to prepare meal (42.2%), favorite restaurant is not nearby (2.3%), save money (3.4%), and outdoor activities (9.7%). All attributes about rice-convenience foods were categorized into intrinsic property and extrinsic property. As a result of factor analysis, health, sensibility, and diversity factors were extracted from intrinsic property. In addition, dependence and appearance factors were drawn from extrinsic property. In analyzing the differences between importance and performance, there were significant differences; 16 items in the intrinsic property (P<0.01), and 10 items in the extrinsic property (P<0.001). The IPA matrix is composed of four quadrants, and each represents different strategies; the first, 'keep up the good work', the second, 'possible overkill', the third, 'low priority for management', and the fourth, 'concentrate management'. As a result, factors of rice-convenience foods positioned in the fourth quadrant were 'safety (from food additives, etc.)' and 'price' in the intrinsic property and 'nutrition label' and 'safety of packaging material' in the extrinsic property. They need to be improved immediately. In this study, rice-convenience food factors for continuous maintenance and concentrative improvement were compared by IPA. Based upon the results of this study, it is necessary to develop methods to make efficient use of limited resources and practical marketing strategies.

Quality Evaluation of Take-out Services at Restaurants in Chungbuk Province (충청북도지역 외식업체의 테이크아웃서비스 품질특성 분석)

  • Lee, Young-Eun
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.37 no.7
    • /
    • pp.942-952
    • /
    • 2008
  • The purpose of this research was to evaluate the quality of take-out services at restaurants in Chungbuk Province. A questionnaire survey by 450 customers who had experience in take-out service at the restaurants was conducted and 378 completed questionnaires were available for statistical evaluation. Statistical analyses were made of raw data by SAS V8.2. The scale for analyzing the importance and performance of the service quality was composed of 5-point Likert scales. The main results of this study are as follow: The quality attributes of take-out service were rearranged into four factors in terms of food, sanitation, access and service. The importance score was higher than performance score. IPA showed that 'freshness of food material', 'cleanliness and hygiene in food', 'sanitation of facilities', 'neatness of employees' and 'price in food' was included in 'focus here' area. There was significantly positive correlation between factors such as food, sanitation, access, service and overall customer satisfaction (p<.001); between factors and repurchasing intentions (p<.001); and between customer satisfaction and repurchasing intentions (p<.001). According to multiple regression analysis, 26.27% of the variance in respondents' overall satisfaction score and 9.21% of the variance in respondents' repurchasing intention score could be explained by factors such as food, sanitation, access and service.

Analysis of Business Performance in Dental Hygiene Process (ADPIE) in Dental Clinic (치과의료기관의 치위생과정(ADPIE) 경영성과 분석)

  • Oh, Jin-Young;Han, Gyeong-Soon
    • Journal of dental hygiene science
    • /
    • v.15 no.5
    • /
    • pp.585-593
    • /
    • 2015
  • This study, the value of dental hygiene process and business performance among the dental clinics located in Gyeonggi province by comparing and analyzing the financial and non-financial results specifically in the department that provides and did not provide dental hygiene process (ADPIE). The collected data treated with percentage and t-test in utilization of IBM SPSS Statistics ver. 20.0. In terms of the medical cost per patient, the Department A (DA) that applied the dental hygiene process were 216,664 Korean Won (KRW) in 2013 and 324,810 KRW in 2014 whereas Department B (DB) which did not apply the dental hygiene process resulted in 184,655 KRW in 2013 and 225,698 KRW in 2014 (p<0.01). Regarding the number of daily patients, the DA showed increase of 8.08 (p=0.01) while DB showed increase of 2.42 patients (p>0.05). The medical consent rate was 89.17% in DA and 60.09% in DB in 2013 while showing 89.68% and 66.98% respectively in 2014 (p<0.001). The patients' revisit rate was 87.48% in DA and 44.92% in DB in 2013 and that of the DA and DB was 85.89% and 45.55% respectively in 2014 (p<0.001). The rate of regular check-up was 16.01% in DA and 2.53% in DB in 2013 and the same rate in 2014 showed 19.03% and 6.84% respectively in 2014 (p <0.001). The rate of referred patients was 38.46% and 29.98% respectively in DA and DB in 2013 whereas DA showed 47.59% and DB showed 30.77% in 2014 (p<0.05). According to the results, the medical system with dental hygiene process is verified to be a premium medical program that can improve satisfaction as well as management effectiveness in dental service.

Numerical Modelling for the Dilation Flow of Gas in a Bentonite Buffer Material: DECOVALEX-2019 Task A (벤토나이트 완충재에서의 기체 팽창 흐름 수치 모델링: DECOVALEX-2019 Task A)

  • Lee, Jaewon;Lee, Changsoo;Kim, Geon Young
    • Tunnel and Underground Space
    • /
    • v.30 no.4
    • /
    • pp.382-393
    • /
    • 2020
  • The engineered barrier system of high-level radioactive waste disposal must maintain its performance in the long term, because it must play a role in slowing the rate of leakage to the surrounding rock mass even if a radionuclide leak occurs from the canister. In particular, it is very important to clarify gas dilation flow phenomenon clearly, that occurs only in a medium containing a large amount of clay material such as a bentonite buffer, which can affect the long-term performance of the bentonite buffer. Accordingly, DECOVALEX-2019 Task A was conducted to identify the hydraulic-mechanical mechanism for the dilation flow, and to develop and verify a new numerical analysis technique for quantitative evaluation of gas migration phenomena. In this study, based on the conventional two-phase flow and mechanical behavior with effective stresses in the porous medium, the hydraulic-mechanical model was developed considering the concept of damage to simulate the formation of micro-cracks and expansion of the medium and the corresponding change in the hydraulic properties. Model verification and validation were conducted through comparison with the results of 1D and 3D gas injection tests. As a result of the numerical analysis, it was possible to model the sudden increase in pore water pressure, stress, gas inflow and outflow rate due to the dilation flow induced by gas pressure, however, the influence of the hydraulic-mechanical interaction was underestimated. Nevertheless, this study can provide a preliminary model for the dilation flow and a basis for developing an advanced model. It is believed that it can be used not only for analyzing data from laboratory and field tests, but also for long-term performance evaluation of the high-level radioactive waste disposal system.

Research about feature selection that use heuristic function (휴리스틱 함수를 이용한 feature selection에 관한 연구)

  • Hong, Seok-Mi;Jung, Kyung-Sook;Chung, Tae-Choong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.281-286
    • /
    • 2003
  • A large number of features are collected for problem solving in real life, but to utilize ail the features collected would be difficult. It is not so easy to collect of correct data about all features. In case it takes advantage of all collected data to learn, complicated learning model is created and good performance result can't get. Also exist interrelationships or hierarchical relations among the features. We can reduce feature's number analyzing relation among the features using heuristic knowledge or statistical method. Heuristic technique refers to learning through repetitive trial and errors and experience. Experts can approach to relevant problem domain through opinion collection process by experience. These properties can be utilized to reduce the number of feature used in learning. Experts generate a new feature (highly abstract) using raw data. This paper describes machine learning model that reduce the number of features used in learning using heuristic function and use abstracted feature by neural network's input value. We have applied this model to the win/lose prediction in pro-baseball games. The result shows the model mixing two techniques not only reduces the complexity of the neural network model but also significantly improves the classification accuracy than when neural network and heuristic model are used separately.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • Development of a complex failure prediction system using Hierarchical Attention Network (Hierarchical Attention Network를 이용한 복합 장애 발생 예측 시스템 개발)

    • Park, Youngchan;An, Sangjun;Kim, Mintae;Kim, Wooju
      • Journal of Intelligence and Information Systems
      • /
      • v.26 no.4
      • /
      • pp.127-148
      • /
      • 2020
    • The data center is a physical environment facility for accommodating computer systems and related components, and is an essential foundation technology for next-generation core industries such as big data, smart factories, wearables, and smart homes. In particular, with the growth of cloud computing, the proportional expansion of the data center infrastructure is inevitable. Monitoring the health of these data center facilities is a way to maintain and manage the system and prevent failure. If a failure occurs in some elements of the facility, it may affect not only the relevant equipment but also other connected equipment, and may cause enormous damage. In particular, IT facilities are irregular due to interdependence and it is difficult to know the cause. In the previous study predicting failure in data center, failure was predicted by looking at a single server as a single state without assuming that the devices were mixed. Therefore, in this study, data center failures were classified into failures occurring inside the server (Outage A) and failures occurring outside the server (Outage B), and focused on analyzing complex failures occurring within the server. Server external failures include power, cooling, user errors, etc. Since such failures can be prevented in the early stages of data center facility construction, various solutions are being developed. On the other hand, the cause of the failure occurring in the server is difficult to determine, and adequate prevention has not yet been achieved. In particular, this is the reason why server failures do not occur singularly, cause other server failures, or receive something that causes failures from other servers. In other words, while the existing studies assumed that it was a single server that did not affect the servers and analyzed the failure, in this study, the failure occurred on the assumption that it had an effect between servers. In order to define the complex failure situation in the data center, failure history data for each equipment existing in the data center was used. There are four major failures considered in this study: Network Node Down, Server Down, Windows Activation Services Down, and Database Management System Service Down. The failures that occur for each device are sorted in chronological order, and when a failure occurs in a specific equipment, if a failure occurs in a specific equipment within 5 minutes from the time of occurrence, it is defined that the failure occurs simultaneously. After configuring the sequence for the devices that have failed at the same time, 5 devices that frequently occur simultaneously within the configured sequence were selected, and the case where the selected devices failed at the same time was confirmed through visualization. Since the server resource information collected for failure analysis is in units of time series and has flow, we used Long Short-term Memory (LSTM), a deep learning algorithm that can predict the next state through the previous state. In addition, unlike a single server, the Hierarchical Attention Network deep learning model structure was used in consideration of the fact that the level of multiple failures for each server is different. This algorithm is a method of increasing the prediction accuracy by giving weight to the server as the impact on the failure increases. The study began with defining the type of failure and selecting the analysis target. In the first experiment, the same collected data was assumed as a single server state and a multiple server state, and compared and analyzed. The second experiment improved the prediction accuracy in the case of a complex server by optimizing each server threshold. In the first experiment, which assumed each of a single server and multiple servers, in the case of a single server, it was predicted that three of the five servers did not have a failure even though the actual failure occurred. However, assuming multiple servers, all five servers were predicted to have failed. As a result of the experiment, the hypothesis that there is an effect between servers is proven. As a result of this study, it was confirmed that the prediction performance was superior when the multiple servers were assumed than when the single server was assumed. In particular, applying the Hierarchical Attention Network algorithm, assuming that the effects of each server will be different, played a role in improving the analysis effect. In addition, by applying a different threshold for each server, the prediction accuracy could be improved. This study showed that failures that are difficult to determine the cause can be predicted through historical data, and a model that can predict failures occurring in servers in data centers is presented. It is expected that the occurrence of disability can be prevented in advance using the results of this study.

    Information Privacy Concern in Context-Aware Personalized Services: Results of a Delphi Study

    • Lee, Yon-Nim;Kwon, Oh-Byung
      • Asia pacific journal of information systems
      • /
      • v.20 no.2
      • /
      • pp.63-86
      • /
      • 2010
    • Personalized services directly and indirectly acquire personal data, in part, to provide customers with higher-value services that are specifically context-relevant (such as place and time). Information technologies continue to mature and develop, providing greatly improved performance. Sensory networks and intelligent software can now obtain context data, and that is the cornerstone for providing personalized, context-specific services. Yet, the danger of overflowing personal information is increasing because the data retrieved by the sensors usually contains privacy information. Various technical characteristics of context-aware applications have more troubling implications for information privacy. In parallel with increasing use of context for service personalization, information privacy concerns have also increased such as an unrestricted availability of context information. Those privacy concerns are consistently regarded as a critical issue facing context-aware personalized service success. The entire field of information privacy is growing as an important area of research, with many new definitions and terminologies, because of a need for a better understanding of information privacy concepts. Especially, it requires that the factors of information privacy should be revised according to the characteristics of new technologies. However, previous information privacy factors of context-aware applications have at least two shortcomings. First, there has been little overview of the technology characteristics of context-aware computing. Existing studies have only focused on a small subset of the technical characteristics of context-aware computing. Therefore, there has not been a mutually exclusive set of factors that uniquely and completely describe information privacy on context-aware applications. Second, user survey has been widely used to identify factors of information privacy in most studies despite the limitation of users' knowledge and experiences about context-aware computing technology. To date, since context-aware services have not been widely deployed on a commercial scale yet, only very few people have prior experiences with context-aware personalized services. It is difficult to build users' knowledge about context-aware technology even by increasing their understanding in various ways: scenarios, pictures, flash animation, etc. Nevertheless, conducting a survey, assuming that the participants have sufficient experience or understanding about the technologies shown in the survey, may not be absolutely valid. Moreover, some surveys are based solely on simplifying and hence unrealistic assumptions (e.g., they only consider location information as a context data). A better understanding of information privacy concern in context-aware personalized services is highly needed. Hence, the purpose of this paper is to identify a generic set of factors for elemental information privacy concern in context-aware personalized services and to develop a rank-order list of information privacy concern factors. We consider overall technology characteristics to establish a mutually exclusive set of factors. A Delphi survey, a rigorous data collection method, was deployed to obtain a reliable opinion from the experts and to produce a rank-order list. It, therefore, lends itself well to obtaining a set of universal factors of information privacy concern and its priority. An international panel of researchers and practitioners who have the expertise in privacy and context-aware system fields were involved in our research. Delphi rounds formatting will faithfully follow the procedure for the Delphi study proposed by Okoli and Pawlowski. This will involve three general rounds: (1) brainstorming for important factors; (2) narrowing down the original list to the most important ones; and (3) ranking the list of important factors. For this round only, experts were treated as individuals, not panels. Adapted from Okoli and Pawlowski, we outlined the process of administrating the study. We performed three rounds. In the first and second rounds of the Delphi questionnaire, we gathered a set of exclusive factors for information privacy concern in context-aware personalized services. The respondents were asked to provide at least five main factors for the most appropriate understanding of the information privacy concern in the first round. To do so, some of the main factors found in the literature were presented to the participants. The second round of the questionnaire discussed the main factor provided in the first round, fleshed out with relevant sub-factors. Respondents were then requested to evaluate each sub factor's suitability against the corresponding main factors to determine the final sub-factors from the candidate factors. The sub-factors were found from the literature survey. Final factors selected by over 50% of experts. In the third round, a list of factors with corresponding questions was provided, and the respondents were requested to assess the importance of each main factor and its corresponding sub factors. Finally, we calculated the mean rank of each item to make a final result. While analyzing the data, we focused on group consensus rather than individual insistence. To do so, a concordance analysis, which measures the consistency of the experts' responses over successive rounds of the Delphi, was adopted during the survey process. As a result, experts reported that context data collection and high identifiable level of identical data are the most important factor in the main factors and sub factors, respectively. Additional important sub-factors included diverse types of context data collected, tracking and recording functionalities, and embedded and disappeared sensor devices. The average score of each factor is very useful for future context-aware personalized service development in the view of the information privacy. The final factors have the following differences comparing to those proposed in other studies. First, the concern factors differ from existing studies, which are based on privacy issues that may occur during the lifecycle of acquired user information. However, our study helped to clarify these sometimes vague issues by determining which privacy concern issues are viable based on specific technical characteristics in context-aware personalized services. Since a context-aware service differs in its technical characteristics compared to other services, we selected specific characteristics that had a higher potential to increase user's privacy concerns. Secondly, this study considered privacy issues in terms of service delivery and display that were almost overlooked in existing studies by introducing IPOS as the factor division. Lastly, in each factor, it correlated the level of importance with professionals' opinions as to what extent users have privacy concerns. The reason that it did not select the traditional method questionnaire at that time is that context-aware personalized service considered the absolute lack in understanding and experience of users with new technology. For understanding users' privacy concerns, professionals in the Delphi questionnaire process selected context data collection, tracking and recording, and sensory network as the most important factors among technological characteristics of context-aware personalized services. In the creation of a context-aware personalized services, this study demonstrates the importance and relevance of determining an optimal methodology, and which technologies and in what sequence are needed, to acquire what types of users' context information. Most studies focus on which services and systems should be provided and developed by utilizing context information on the supposition, along with the development of context-aware technology. However, the results in this study show that, in terms of users' privacy, it is necessary to pay greater attention to the activities that acquire context information. To inspect the results in the evaluation of sub factor, additional studies would be necessary for approaches on reducing users' privacy concerns toward technological characteristics such as highly identifiable level of identical data, diverse types of context data collected, tracking and recording functionality, embedded and disappearing sensor devices. The factor ranked the next highest level of importance after input is a context-aware service delivery that is related to output. The results show that delivery and display showing services to users in a context-aware personalized services toward the anywhere-anytime-any device concept have been regarded as even more important than in previous computing environment. Considering the concern factors to develop context aware personalized services will help to increase service success rate and hopefully user acceptance for those services. Our future work will be to adopt these factors for qualifying context aware service development projects such as u-city development projects in terms of service quality and hence user acceptance.

    A Folksonomy Ranking Framework: A Semantic Graph-based Approach (폭소노미 사이트를 위한 랭킹 프레임워크 설계: 시맨틱 그래프기반 접근)

    • Park, Hyun-Jung;Rho, Sang-Kyu
      • Asia pacific journal of information systems
      • /
      • v.21 no.2
      • /
      • pp.89-116
      • /
      • 2011
    • In collaborative tagging systems such as Delicious.com and Flickr.com, users assign keywords or tags to their uploaded resources, such as bookmarks and pictures, for their future use or sharing purposes. The collection of resources and tags generated by a user is called a personomy, and the collection of all personomies constitutes the folksonomy. The most significant need of the folksonomy users Is to efficiently find useful resources or experts on specific topics. An excellent ranking algorithm would assign higher ranking to more useful resources or experts. What resources are considered useful In a folksonomic system? Does a standard superior to frequency or freshness exist? The resource recommended by more users with mere expertise should be worthy of attention. This ranking paradigm can be implemented through a graph-based ranking algorithm. Two well-known representatives of such a paradigm are Page Rank by Google and HITS(Hypertext Induced Topic Selection) by Kleinberg. Both Page Rank and HITS assign a higher evaluation score to pages linked to more higher-scored pages. HITS differs from PageRank in that it utilizes two kinds of scores: authority and hub scores. The ranking objects of these pages are limited to Web pages, whereas the ranking objects of a folksonomic system are somewhat heterogeneous(i.e., users, resources, and tags). Therefore, uniform application of the voting notion of PageRank and HITS based on the links to a folksonomy would be unreasonable, In a folksonomic system, each link corresponding to a property can have an opposite direction, depending on whether the property is an active or a passive voice. The current research stems from the Idea that a graph-based ranking algorithm could be applied to the folksonomic system using the concept of mutual Interactions between entitles, rather than the voting notion of PageRank or HITS. The concept of mutual interactions, proposed for ranking the Semantic Web resources, enables the calculation of importance scores of various resources unaffected by link directions. The weights of a property representing the mutual interaction between classes are assigned depending on the relative significance of the property to the resource importance of each class. This class-oriented approach is based on the fact that, in the Semantic Web, there are many heterogeneous classes; thus, applying a different appraisal standard for each class is more reasonable. This is similar to the evaluation method of humans, where different items are assigned specific weights, which are then summed up to determine the weighted average. We can check for missing properties more easily with this approach than with other predicate-oriented approaches. A user of a tagging system usually assigns more than one tags to the same resource, and there can be more than one tags with the same subjectivity and objectivity. In the case that many users assign similar tags to the same resource, grading the users differently depending on the assignment order becomes necessary. This idea comes from the studies in psychology wherein expertise involves the ability to select the most relevant information for achieving a goal. An expert should be someone who not only has a large collection of documents annotated with a particular tag, but also tends to add documents of high quality to his/her collections. Such documents are identified by the number, as well as the expertise, of users who have the same documents in their collections. In other words, there is a relationship of mutual reinforcement between the expertise of a user and the quality of a document. In addition, there is a need to rank entities related more closely to a certain entity. Considering the property of social media that ensures the popularity of a topic is temporary, recent data should have more weight than old data. We propose a comprehensive folksonomy ranking framework in which all these considerations are dealt with and that can be easily customized to each folksonomy site for ranking purposes. To examine the validity of our ranking algorithm and show the mechanism of adjusting property, time, and expertise weights, we first use a dataset designed for analyzing the effect of each ranking factor independently. We then show the ranking results of a real folksonomy site, with the ranking factors combined. Because the ground truth of a given dataset is not known when it comes to ranking, we inject simulated data whose ranking results can be predicted into the real dataset and compare the ranking results of our algorithm with that of a previous HITS-based algorithm. Our semantic ranking algorithm based on the concept of mutual interaction seems to be preferable to the HITS-based algorithm as a flexible folksonomy ranking framework. Some concrete points of difference are as follows. First, with the time concept applied to the property weights, our algorithm shows superior performance in lowering the scores of older data and raising the scores of newer data. Second, applying the time concept to the expertise weights, as well as to the property weights, our algorithm controls the conflicting influence of expertise weights and enhances overall consistency of time-valued ranking. The expertise weights of the previous study can act as an obstacle to the time-valued ranking because the number of followers increases as time goes on. Third, many new properties and classes can be included in our framework. The previous HITS-based algorithm, based on the voting notion, loses ground in the situation where the domain consists of more than two classes, or where other important properties, such as "sent through twitter" or "registered as a friend," are added to the domain. Forth, there is a big difference in the calculation time and memory use between the two kinds of algorithms. While the matrix multiplication of two matrices, has to be executed twice for the previous HITS-based algorithm, this is unnecessary with our algorithm. In our ranking framework, various folksonomy ranking policies can be expressed with the ranking factors combined and our approach can work, even if the folksonomy site is not implemented with Semantic Web languages. Above all, the time weight proposed in this paper will be applicable to various domains, including social media, where time value is considered important.

    A New Item Recommendation Procedure Using Preference Boundary

    • Kim, Hyea-Kyeong;Jang, Moon-Kyoung;Kim, Jae-Kyeong;Cho, Yoon-Ho
      • Asia pacific journal of information systems
      • /
      • v.20 no.1
      • /
      • pp.81-99
      • /
      • 2010
    • Lately, in consumers' markets the number of new items is rapidly increasing at an overwhelming rate while consumers have limited access to information about those new products in making a sensible, well-informed purchase. Therefore, item providers and customers need a system which recommends right items to right customers. Also, whenever new items are released, for instance, the recommender system specializing in new items can help item providers locate and identify potential customers. Currently, new items are being added to an existing system without being specially noted to consumers, making it difficult for consumers to identify and evaluate new products introduced in the markets. Most of previous approaches for recommender systems have to rely on the usage history of customers. For new items, this content-based (CB) approach is simply not available for the system to recommend those new items to potential consumers. Although collaborative filtering (CF) approach is not directly applicable to solve the new item problem, it would be a good idea to use the basic principle of CF which identifies similar customers, i,e. neighbors, and recommend items to those customers who have liked the similar items in the past. This research aims to suggest a hybrid recommendation procedure based on the preference boundary of target customer. We suggest the hybrid recommendation procedure using the preference boundary in the feature space for recommending new items only. The basic principle is that if a new item belongs within the preference boundary of a target customer, then it is evaluated to be preferred by the customer. Customers' preferences and characteristics of items including new items are represented in a feature space, and the scope or boundary of the target customer's preference is extended to those of neighbors'. The new item recommendation procedure consists of three steps. The first step is analyzing the profile of items, which are represented as k-dimensional feature values. The second step is to determine the representative point of the target customer's preference boundary, the centroid, based on a personal information set. To determine the centroid of preference boundary of a target customer, three algorithms are developed in this research: one is using the centroid of a target customer only (TC), the other is using centroid of a (dummy) big target customer that is composed of a target customer and his/her neighbors (BC), and another is using centroids of a target customer and his/her neighbors (NC). The third step is to determine the range of the preference boundary, the radius. The suggested algorithm Is using the average distance (AD) between the centroid and all purchased items. We test whether the CF-based approach to determine the centroid of the preference boundary improves the recommendation quality or not. For this purpose, we develop two hybrid algorithms, BC and NC, which use neighbors when deciding centroid of the preference boundary. To test the validity of hybrid algorithms, BC and NC, we developed CB-algorithm, TC, which uses target customers only. We measured effectiveness scores of suggested algorithms and compared them through a series of experiments with a set of real mobile image transaction data. We spilt the period between 1st June 2004 and 31st July and the period between 1st August and 31st August 2004 as a training set and a test set, respectively. The training set Is used to make the preference boundary, and the test set is used to evaluate the performance of the suggested hybrid recommendation procedure. The main aim of this research Is to compare the hybrid recommendation algorithm with the CB algorithm. To evaluate the performance of each algorithm, we compare the purchased new item list in test period with the recommended item list which is recommended by suggested algorithms. So we employ the evaluation metric to hit the ratio for evaluating our algorithms. The hit ratio is defined as the ratio of the hit set size to the recommended set size. The hit set size means the number of success of recommendations in our experiment, and the test set size means the number of purchased items during the test period. Experimental test result shows the hit ratio of BC and NC is bigger than that of TC. This means using neighbors Is more effective to recommend new items. That is hybrid algorithm using CF is more effective when recommending to consumers new items than the algorithm using only CB. The reason of the smaller hit ratio of BC than that of NC is that BC is defined as a dummy or virtual customer who purchased all items of target customers' and neighbors'. That is centroid of BC often shifts from that of TC, so it tends to reflect skewed characters of target customer. So the recommendation algorithm using NC shows the best hit ratio, because NC has sufficient information about target customers and their neighbors without damaging the information about the target customers.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.