• Title/Summary/Keyword: 3 step search algorithm

Search Result 78, Processing Time 0.022 seconds

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)

  • Kim, Minsung;Im, Il
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.137-148
    • /
    • 2014
  • Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

    . Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.

  • A Study for the Minimum Weight Design of a Coastal Fishing Boat (소형 연안 어선의 최소 중량 설계에 관한 연구)

    • Song, Ha-Cheol;Kim, Yong-Sub;Shim, Chun-Sik
      • Journal of Navigation and Port Research
      • /
      • v.32 no.3
      • /
      • pp.223-228
      • /
      • 2008
    • As most of small fishing boats made of FRP have been constructed by experience in Korea, some structural safety problems have been occurred occasionally. To improve the structural strength and reduce the costs for construction and operation, optimum design for small fishing boat was carried out in this study. The weight of fishing boat and the main dimensions of structural members are chosen as objective function and design variables, respectively. By the combination of global and local search methods, a hybrid optimization algorithm was developed to escape the local minima and reduce CPU time in analysis procedure, and finite element analysis was performed to determine the constraint parameters at each iteration step in optimization loop. Optimization results were compared with the real existing fishing boat, and the effects of optimum design were examined from points of view; structural strength, material cost, etc.

    Development of Yóukè Mining System with Yóukè's Travel Demand and Insight Based on Web Search Traffic Information (웹검색 트래픽 정보를 활용한 유커 인바운드 여행 수요 예측 모형 및 유커마이닝 시스템 개발)

    • Choi, Youji;Park, Do-Hyung
      • Journal of Intelligence and Information Systems
      • /
      • v.23 no.3
      • /
      • pp.155-175
      • /
      • 2017
    • As social data become into the spotlight, mainstream web search engines provide data indicate how many people searched specific keyword: Web Search Traffic data. Web search traffic information is collection of each crowd that search for specific keyword. In a various area, web search traffic can be used as one of useful variables that represent the attention of common users on specific interests. A lot of studies uses web search traffic data to nowcast or forecast social phenomenon such as epidemic prediction, consumer pattern analysis, product life cycle, financial invest modeling and so on. Also web search traffic data have begun to be applied to predict tourist inbound. Proper demand prediction is needed because tourism is high value-added industry as increasing employment and foreign exchange. Among those tourists, especially Chinese tourists: Youke is continuously growing nowadays, Youke has been largest tourist inbound of Korea tourism for many years and tourism profits per one Youke as well. It is important that research into proper demand prediction approaches of Youke in both public and private sector. Accurate tourism demands prediction is important to efficient decision making in a limited resource. This study suggests improved model that reflects latest issue of society by presented the attention from group of individual. Trip abroad is generally high-involvement activity so that potential tourists likely deep into searching for information about their own trip. Web search traffic data presents tourists' attention in the process of preparation their journey instantaneous and dynamic way. So that this study attempted select key words that potential Chinese tourists likely searched out internet. Baidu-Chinese biggest web search engine that share over 80%- provides users with accessing to web search traffic data. Qualitative interview with potential tourists helps us to understand the information search behavior before a trip and identify the keywords for this study. Selected key words of web search traffic are categorized by how much directly related to "Korean Tourism" in a three levels. Classifying categories helps to find out which keyword can explain Youke inbound demands from close one to far one as distance of category. Web search traffic data of each key words gathered by web crawler developed to crawling web search data onto Baidu Index. Using automatically gathered variable data, linear model is designed by multiple regression analysis for suitable for operational application of decision and policy making because of easiness to explanation about variables' effective relationship. After regression linear models have composed, comparing with model composed traditional variables and model additional input web search traffic data variables to traditional model has conducted by significance and R squared. after comparing performance of models, final model is composed. Final regression model has improved explanation and advantage of real-time immediacy and convenience than traditional model. Furthermore, this study demonstrates system intuitively visualized to general use -Youke Mining solution has several functions of tourist decision making including embed final regression model. Youke Mining solution has algorithm based on data science and well-designed simple interface. In the end this research suggests three significant meanings on theoretical, practical and political aspects. Theoretically, Youke Mining system and the model in this research are the first step on the Youke inbound prediction using interactive and instant variable: web search traffic information represents tourists' attention while prepare their trip. Baidu web search traffic data has more than 80% of web search engine market. Practically, Baidu data could represent attention of the potential tourists who prepare their own tour as real-time. Finally, in political way, designed Chinese tourist demands prediction model based on web search traffic can be used to tourism decision making for efficient managing of resource and optimizing opportunity for successful policy.

    Automatic Selection of Similar Sentences for Teaching Writing in Elementary School (초등 글쓰기 교육을 위한 유사 문장 자동 선별)

    • Park, Youngki
      • Journal of The Korean Association of Information Education
      • /
      • v.20 no.4
      • /
      • pp.333-340
      • /
      • 2016
    • When elementary students write their own sentences, it is often educationally beneficial to compare them with other people's similar sentences. However, it is impractical for use in most classrooms, because it is burdensome for teachers to look up all of the sentences written by students. To cope with this problem, we propose a novel approach for automatic selection of similar sentences based on a three-step process: 1) extracting the subword units from the word-level sentences, 2) training the model with the encoder-decoder architecture, and 3) using the approximate k-nearest neighbor search algorithm to find the similar sentences. Experimental results show that the proposed approach achieves the accuracy of 75% for our test data.

    Calibration of Car-Following Models Using a Dual Genetic Algorithm with Central Composite Design (중심합성계획법 기반 이중유전자알고리즘을 활용한 차량추종모형 정산방법론 개발)

    • Bae, Bumjoon;Lim, Hyeonsup;So, Jaehyun (Jason)
      • The Journal of The Korea Institute of Intelligent Transport Systems
      • /
      • v.18 no.2
      • /
      • pp.29-43
      • /
      • 2019
    • The calibration of microscopic traffic simulation models has received much attention in the simulation field. Although no standard has been established for it, a genetic algorithm (GA) has been widely employed in recent literature because of its high efficiency to find solutions in such optimization problems. However, the performance still falls short in simulation analyses to support fast decision making. This paper proposes a new calibration procedure using a dual GA and central composite design (CCD) in order to improve the efficiency. The calibration exercise goes through three major sequential steps: (1) experimental design using CCD for a quadratic response surface model (RSM) estimation, (2) 1st GA procedure using the RSM with CCD to find a near-optimal initial population for a next step, and (3) 2nd GA procedure to find a final solution. The proposed method was applied in calibrating the Gipps car-following model with respect to maximizing the likelihood of a spacing distribution between a lead and following vehicle. In order to evaluate the performance of the proposed method, a conventional calibration approach using a single GA was compared under both simulated and real vehicle trajectory data. It was found that the proposed approach enhances the optimization speed by starting to search from an initial population that is closer to the optimum than that of the other approach. This result implies the proposed approach has benefits for a large-scale traffic network simulation analysis. This method can be extended to other optimization tasks using GA in transportation studies.

    A CF-based Health Functional Recommender System using Extended User Similarity Measure (확장된 사용자 유사도를 이용한 CF-기반 건강기능식품 추천 시스템)

    • Sein Hong;Euiju Jeong;Jaekyeong Kim
      • Journal of Intelligence and Information Systems
      • /
      • v.29 no.3
      • /
      • pp.1-17
      • /
      • 2023
    • With the recent rapid development of ICT(Information and Communication Technology) and the popularization of digital devices, the size of the online market continues to grow. As a result, we live in a flood of information. Thus, customers are facing information overload problems that require a lot of time and money to select products. Therefore, a personalized recommender system has become an essential methodology to address such issues. Collaborative Filtering(CF) is the most widely used recommender system. Traditional recommender systems mainly utilize quantitative data such as rating values, resulting in poor recommendation accuracy. Quantitative data cannot fully reflect the user's preference. To solve such a problem, studies that reflect qualitative data, such as review contents, are being actively conducted these days. To quantify user review contents, text mining was used in this study. The general CF consists of the following three steps: user-item matrix generation, Top-N neighborhood group search, and Top-K recommendation list generation. In this study, we propose a recommendation algorithm that applies an extended similarity measure, which utilize quantified review contents in addition to user rating values. After calculating review similarity by applying TF-IDF, Word2Vec, and Doc2Vec techniques to review content, extended similarity is created by combining user rating similarity and quantified review contents. To verify this, we used user ratings and review data from the e-commerce site Amazon's "Health and Personal Care". The proposed recommendation model using extended similarity measure showed superior performance to the traditional recommendation model using only user rating value-based similarity measure. In addition, among the various text mining techniques, the similarity obtained using the TF-IDF technique showed the best performance when used in the neighbor group search and recommendation list generation step.

    Recognition and Request for Medical Direction by 119 Emergency Medical Technicians (119 구급대원들이 지각하는 의료지도의 필요성 인식과 요구도)

    • Park, Joo-Ho
      • The Korean Journal of Emergency Medical Services
      • /
      • v.15 no.3
      • /
      • pp.31-44
      • /
      • 2011
    • Purpose : The purpose of emergency medical services(EMS) is to save human lives and assure the completeness of the body in emergency situations. Those who have been qualified on medical practice to perform such treatment as there is the risk of human life and possibility of major physical and mental injuries that could result from the urgency of time and invasiveness inflicted upon the body. In the emergency medical activities, 119 emergency medical technicians mainly perform the task but they are not able to perform such task independently and they are mandatory to receive medical direction. The purpose of this study is to examine the recognition and request for medical direction by 119 emergency medical technicians in order to provide basic information on the development of medical direction program suitable to the characteristics of EMS as well as for the studies on EMS for the sake of efficient operation of pre-hospital EMS. Method : Questionnaire via e-mail was conducted during July 1-31, 2010 for 675 participants who are emergency medical technicians, nurses and other emergency crews in Gyeongbuk. The effective 171 responses were used for the final analysis. In regards to the emergency medical technicians' scope of responsibilities defined in Attached Form 14, Enforcement regulations of EMS, t-test analysis was conducted by using the means and standard deviation of the level of request for medical direction on the scope of responsibilities of Level 1 & Level 2 emergency medical technicians as the scale of medical direction request. The general characteristics, experience result, the reason for necessity, emergency medical technicians & medical director request level, medical direction method, the place of work of the medical director, feedback content and improvement plan request level were analyzed through frequency and percentage. The level of experience in medical direction and necessity were analyzed through ${\chi}^2$ test. Results : In regards to the medical direction experience per qualification, the experience was the highest with 53.3% for Level 1 emergency medical technicians and 80.3% responded that experience was helpful. As for the recognition on the necessity of medical direction, 71.3% responded as "necessary" and it turned out to be the highest of 76.9% in nurses. As for the reason for responding "necessary", the reason for reducing the risk and side-effects from EMS for patients was the largest(75.4%), and the reason of EMS delay due to the request of medical direction was the highest(71.4%) for the reason for responding "not necessary". In regards to the request level of the task scope of emergency medical technicians, injection of certain amount of solution during a state of shock was the highest($3.10{\pm}.96$) for Level 1 emergency rescuers, and the endotracheal intubation was the highest($3.12{\pm}1.03$) for nurses, and the sublingual administration of nitroglycerine(NTG) during chest pain was the highest($2.62{\pm}1.02$) for Level 2 emergency medical technicians, and regulation of heartbeat using AED was the highest($2.76{\pm}.99$) for other emergency crews. For the revitalization of medical direction, the improvement in the capability of EMS(78.9%) was requested from emergency crew, and the ability to evaluate the medical state of patient was the highest(80.1%) in the level of request for medical director. The prehospital and direct medical direction was the highest(60.8%) for medical direction method, and the emergency medical facility was the highest(52.0%) for the placement of medical director, and the evaluation of appropriateness of EMS was the highest(66.1%) for the feedback content, and the reinforcement of emergency crew(emergency medical technicians) personnel was the highest(69.0%) for the improvement plan. Conclusion : The medical direction is an important policy in the prehospital EMS activity because 119 emergency medical technicians agreed the necessity of medical direction and over 80% of those who experienced medical direction said it was helpful. In addition, the simulation training program using algorithm and case study through feedback are necessary in order to enhance the technical capability of ambulance teams on the item of professional EMS with high level of request in the task scope of emergency medical technicians, and recognition of medical direction is the essence of the EMS field. In regards to revitalizing medical direction, the improvement of the task performance capability of 119 emergency medical technicians and medical directors, reinforcement of emergency medical activity personnel, assurance of trust between emergency medical technicians and the emergency physician, and search for professional operation plan of medical direction center are needed to expand the direct medical direction method for possible treatment beforehand through the participation by medical director even at the step in which emergency situation report is received.

    The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

    • Chun, Se-Hak
      • Journal of Intelligence and Information Systems
      • /
      • v.25 no.3
      • /
      • pp.239-251
      • /
      • 2019
    • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.