• Title/Summary/Keyword: 알고리즘 향상

Search Result 6,850, Processing Time 0.039 seconds

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)

  • Kim, Minsung;Im, Il
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.137-148
    • /
    • 2014
  • Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

    . Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.

  • How to improve the accuracy of recommendation systems: Combining ratings and review texts sentiment scores (평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구)

    • Hyun, Jiyeon;Ryu, Sangyi;Lee, Sang-Yong Tom
      • Journal of Intelligence and Information Systems
      • /
      • v.25 no.1
      • /
      • pp.219-239
      • /
      • 2019
    • As the importance of providing customized services to individuals becomes important, researches on personalized recommendation systems are constantly being carried out. Collaborative filtering is one of the most popular systems in academia and industry. However, there exists limitation in a sense that recommendations were mostly based on quantitative information such as users' ratings, which made the accuracy be lowered. To solve these problems, many studies have been actively attempted to improve the performance of the recommendation system by using other information besides the quantitative information. Good examples are the usages of the sentiment analysis on customer review text data. Nevertheless, the existing research has not directly combined the results of the sentiment analysis and quantitative rating scores in the recommendation system. Therefore, this study aims to reflect the sentiments shown in the reviews into the rating scores. In other words, we propose a new algorithm that can directly convert the user 's own review into the empirically quantitative information and reflect it directly to the recommendation system. To do this, we needed to quantify users' reviews, which were originally qualitative information. In this study, sentiment score was calculated through sentiment analysis technique of text mining. The data was targeted for movie review. Based on the data, a domain specific sentiment dictionary is constructed for the movie reviews. Regression analysis was used as a method to construct sentiment dictionary. Each positive / negative dictionary was constructed using Lasso regression, Ridge regression, and ElasticNet methods. Based on this constructed sentiment dictionary, the accuracy was verified through confusion matrix. The accuracy of the Lasso based dictionary was 70%, the accuracy of the Ridge based dictionary was 79%, and that of the ElasticNet (${\alpha}=0.3$) was 83%. Therefore, in this study, the sentiment score of the review is calculated based on the dictionary of the ElasticNet method. It was combined with a rating to create a new rating. In this paper, we show that the collaborative filtering that reflects sentiment scores of user review is superior to the traditional method that only considers the existing rating. In order to show that the proposed algorithm is based on memory-based user collaboration filtering, item-based collaborative filtering and model based matrix factorization SVD, and SVD ++. Based on the above algorithm, the mean absolute error (MAE) and the root mean square error (RMSE) are calculated to evaluate the recommendation system with a score that combines sentiment scores with a system that only considers scores. When the evaluation index was MAE, it was improved by 0.059 for UBCF, 0.0862 for IBCF, 0.1012 for SVD and 0.188 for SVD ++. When the evaluation index is RMSE, UBCF is 0.0431, IBCF is 0.0882, SVD is 0.1103, and SVD ++ is 0.1756. As a result, it can be seen that the prediction performance of the evaluation point reflecting the sentiment score proposed in this paper is superior to that of the conventional evaluation method. In other words, in this paper, it is confirmed that the collaborative filtering that reflects the sentiment score of the user review shows superior accuracy as compared with the conventional type of collaborative filtering that only considers the quantitative score. We then attempted paired t-test validation to ensure that the proposed model was a better approach and concluded that the proposed model is better. In this study, to overcome limitations of previous researches that judge user's sentiment only by quantitative rating score, the review was numerically calculated and a user's opinion was more refined and considered into the recommendation system to improve the accuracy. The findings of this study have managerial implications to recommendation system developers who need to consider both quantitative information and qualitative information it is expect. The way of constructing the combined system in this paper might be directly used by the developers.

    A Study on Usefulness of Clinical Application of Metal Artifact Reduction Algorithm in Radiotherapy (방사선치료 시 Metal artifact reduction Algorithm의 임상적용 유용성평가)

    • Park, Ja Ram;Kim, Min Su;Kim, Jeong Mi;Chung, Hyeon Suk;Lee, Chung Hwan;Back, Geum Mun
      • The Journal of Korean Society for Radiation Therapy
      • /
      • v.29 no.2
      • /
      • pp.9-17
      • /
      • 2017
    • Purpose: The tissue description and electron density indicated by the Computed Tomography(CT) number (also known as Hounsfield Unit) in radiotherapy are important in ensuring the accuracy of CT-based computerized radiotherapy planning. The internal metal implants, however, not only reduce the accuracy of CT number but also introduce uncertainty into tissue description, leading to development of many clinical algorithms for reducing metal artifacts. The purpose of this study was, therefore, to investigate the accuracy and the clinical applicability by analyzing date from SMART MAR (GE) used in our institution. Methode: and material: For assessment of images, the original images were obtained after forming ROIs with identical volumes by using CIRS ED phantom and inserting rods of six tissues and then non-SMART MAR and SMART MAR images were obtained and compared in terms of CT number and SD value. For determination of the difference in dose by the changes in CT number due to metal artifacts, the original images were obtained by forming PTV at two sites of CIRS ED phantom CT images with Computerized Treatment Planning (CTP system), the identical treatment plans were established for non-SMART MAR and SMART MAR images by obtaining unilateral and bilateral titanium insertion images, and mean doses, Homogeneity Index(HI), and Conformity Index(CI) for both PTVs were compared. The absorbed doses at both sites were measured by calculating the dose conversion constant (cCy/nC) from ylinder acrylic phantom, 0.125cc ionchamber, and electrometer and obtaining non-SMART MAR and SMART MAR images from images resulting from insertions of unilateral and bilateral titanium rods, and compared with point doses from CTP. Result: The results of image assessment showed that the CT number of SMART MAR images compared to those of non-SMART MAR images were more close to those of original images, and the SD decreased more in SMART compared to non-SMART ones. The results of dose determinations showed that the mean doses, HI and CI of non-SMART MAR images compared to those of SMART MAR images were more close to those of original images, however the differences did not reach statistical significance. The results of absorbed dose measurement showed that the difference between actual absorbed dose and point dose on CTP in absorbed dose were 2.69 and 3.63 % in non-SMRT MAR images, however decreased to 0.56 and 0.68 %, respectively in SMART MAR images. Conclusion: The application of SMART MAR in CT images from patients with metal implants improved quality of images, being demonstrated by improvement in accuracy of CT number and decrease in SD, therefore it is considered that this method is useful in dose calculation and forming contour between tumor and normal tissues.

    • PDF

    Rear Vehicle Detection Method in Harsh Environment Using Improved Image Information (개선된 영상 정보를 이용한 가혹한 환경에서의 후방 차량 감지 방법)

    • Jeong, Jin-Seong;Kim, Hyun-Tae;Jang, Young-Min;Cho, Sang-Bok
      • Journal of the Institute of Electronics and Information Engineers
      • /
      • v.54 no.1
      • /
      • pp.96-110
      • /
      • 2017
    • Most of vehicle detection studies using the existing general lens or wide-angle lens have a blind spot in the rear detection situation, the image is vulnerable to noise and a variety of external environments. In this paper, we propose a method that is detection in harsh external environment with noise, blind spots, etc. First, using a fish-eye lens will help minimize blind spots compared to the wide-angle lens. When angle of the lens is growing because nonlinear radial distortion also increase, calibration was used after initializing and optimizing the distortion constant in order to ensure accuracy. In addition, the original image was analyzed along with calibration to remove fog and calibrate brightness and thereby enable detection even when visibility is obstructed due to light and dark adaptations from foggy situations or sudden changes in illumination. Fog removal generally takes a considerably significant amount of time to calculate. Thus in order to reduce the calculation time, remove the fog used the major fog removal algorithm Dark Channel Prior. While Gamma Correction was used to calibrate brightness, a brightness and contrast evaluation was conducted on the image in order to determine the Gamma Value needed for correction. The evaluation used only a part instead of the entirety of the image in order to reduce the time allotted to calculation. When the brightness and contrast values were calculated, those values were used to decided Gamma value and to correct the entire image. The brightness correction and fog removal were processed in parallel, and the images were registered as a single image to minimize the calculation time needed for all the processes. Then the feature extraction method HOG was used to detect the vehicle in the corrected image. As a result, it took 0.064 seconds per frame to detect the vehicle using image correction as proposed herein, which showed a 7.5% improvement in detection rate compared to the existing vehicle detection method.

    Visualization and Localization of Fusion Image Using VRML for Three-dimensional Modeling of Epileptic Seizure Focus (VRML을 이용한 융합 영상에서 간질환자 발작 진원지의 3차원적 가시화와 위치 측정 구현)

    • 이상호;김동현;유선국;정해조;윤미진;손혜경;강원석;이종두;김희중
      • Progress in Medical Physics
      • /
      • v.14 no.1
      • /
      • pp.34-42
      • /
      • 2003
    • In medical imaging, three-dimensional (3D) display using Virtual Reality Modeling Language (VRML) as a portable file format can give intuitive information more efficiently on the World Wide Web (WWW). The web-based 3D visualization of functional images combined with anatomical images has not studied much in systematic ways. The goal of this study was to achieve a simultaneous observation of 3D anatomic and functional models with planar images on the WWW, providing their locational information in 3D space with a measuring implement using VRML. MRI and ictal-interictal SPECT images were obtained from one epileptic patient. Subtraction ictal SPECT co-registered to MRI (SISCOM) was performed to improve identification of a seizure focus. SISCOM image volumes were held by thresholds above one standard deviation (1-SD) and two standard deviations (2-SD). SISCOM foci and boundaries of gray matter, white matter, and cerebrospinal fluid (CSF) in the MRI volume were segmented and rendered to VRML polygonal surfaces by marching cube algorithm. Line profiles of x and y-axis that represent real lengths on an image were acquired and their maximum lengths were the same as 211.67 mm. The real size vs. the rendered VRML surface size was approximately the ratio of 1 to 605.9. A VRML measuring tool was made and merged with previous VRML surfaces. User interface tools were embedded with Java Script routines to display MRI planar images as cross sections of 3D surface models and to set transparencies of 3D surface models. When transparencies of 3D surface models were properly controlled, a fused display of the brain geometry with 3D distributions of focal activated regions provided intuitively spatial correlations among three 3D surface models. The epileptic seizure focus was in the right temporal lobe of the brain. The real position of the seizure focus could be verified by the VRML measuring tool and the anatomy corresponding to the seizure focus could be confirmed by MRI planar images crossing 3D surface models. The VRML application developed in this study may have several advantages. Firstly, 3D fused display and control of anatomic and functional image were achieved on the m. Secondly, the vector analysis of a 3D surface model was defined by the VRML measuring tool based on the real size. Finally, the anatomy corresponding to the seizure focus was intuitively detected by correlations with MRI images. Our web based visualization of 3-D fusion image and its localization will be a help to online research and education in diagnostic radiology, therapeutic radiology, and surgery applications.

    • PDF

    Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation (영한 기계 번역에서 미가공 텍스트 데이터를 이용한 대역어 선택 중의성 해소)

    • Kim Yu-Seop;Chang Jeong-Ho
      • The KIPS Transactions:PartB
      • /
      • v.11B no.6
      • /
      • pp.749-758
      • /
      • 2004
    • In this paper, we propose a new method utilizing only raw corpus without additional human effort for disambiguation of target word selection in English-Korean machine translation. We use two data-driven techniques; one is the Latent Semantic Analysis(LSA) and the other the Probabilistic Latent Semantic Analysis(PLSA). These two techniques can represent complex semantic structures in given contexts like text passages. We construct linguistic semantic knowledge by using the two techniques and use the knowledge for target word selection in English-Korean machine translation. For target word selection, we utilize a grammatical relationship stored in a dictionary. We use k- nearest neighbor learning algorithm for the resolution of data sparseness Problem in target word selection and estimate the distance between instances based on these models. In experiments, we use TREC data of AP news for construction of latent semantic space and Wail Street Journal corpus for evaluation of target word selection. Through the Latent Semantic Analysis methods, the accuracy of target word selection has improved over 10% and PLSA has showed better accuracy than LSA method. finally we have showed the relatedness between the accuracy and two important factors ; one is dimensionality of latent space and k value of k-NT learning by using correlation calculation.

    Automatic Interpretation of Epileptogenic Zones in F-18-FDG Brain PET using Artificial Neural Network (인공신경회로망을 이용한 F-18-FDG 뇌 PET의 간질원인병소 자동해석)

    • 이재성;김석기;이명철;박광석;이동수
      • Journal of Biomedical Engineering Research
      • /
      • v.19 no.5
      • /
      • pp.455-468
      • /
      • 1998
    • For the objective interpretation of cerebral metabolic patterns in epilepsy patients, we developed computer-aided classifier using artificial neural network. We studied interictal brain FDG PET scans of 257 epilepsy patients who were diagnosed as normal(n=64), L TLE (n=112), or R TLE (n=81) by visual interpretation. Automatically segmented volume of interest (VOI) was used to reliably extract the features representing patterns of cerebral metabolism. All images were spatially normalized to MNI standard PET template and smoothed with 16mm FWHM Gaussian kernel using SPM96. Mean count in cerebral region was normalized. The VOls for 34 cerebral regions were previously defined on the standard template and 17 different counts of mirrored regions to hemispheric midline were extracted from spatially normalized images. A three-layer feed-forward error back-propagation neural network classifier with 7 input nodes and 3 output nodes was used. The network was trained to interpret metabolic patterns and produce identical diagnoses with those of expert viewers. The performance of the neural network was optimized by testing with 5~40 nodes in hidden layer. Randomly selected 40 images from each group were used to train the network and the remainders were used to test the learned network. The optimized neural network gave a maximum agreement rate of 80.3% with expert viewers. It used 20 hidden nodes and was trained for 1508 epochs. Also, neural network gave agreement rates of 75~80% with 10 or 30 nodes in hidden layer. We conclude that artificial neural network performed as well as human experts and could be potentially useful as clinical decision support tool for the localization of epileptogenic zones.

    • PDF

    T-Cache: a Fast Cache Manager for Pipeline Time-Series Data (T-Cache: 시계열 배관 데이타를 위한 고성능 캐시 관리자)

    • Shin, Je-Yong;Lee, Jin-Soo;Kim, Won-Sik;Kim, Seon-Hyo;Yoon, Min-A;Han, Wook-Shin;Jung, Soon-Ki;Park, Se-Young
      • Journal of KIISE:Computing Practices and Letters
      • /
      • v.13 no.5
      • /
      • pp.293-299
      • /
      • 2007
    • Intelligent pipeline inspection gauges (PIGs) are inspection vehicles that move along within a (gas or oil) pipeline and acquire signals (also called sensor data) from their surrounding rings of sensors. By analyzing the signals captured in intelligent PIGs, we can detect pipeline defects, such as holes and curvatures and other potential causes of gas explosions. There are two major data access patterns apparent when an analyzer accesses the pipeline signal data. The first is a sequential pattern where an analyst reads the sensor data one time only in a sequential fashion. The second is the repetitive pattern where an analyzer repeatedly reads the signal data within a fixed range; this is the dominant pattern in analyzing the signal data. The existing PIG software reads signal data directly from the server at every user#s request, requiring network transfer and disk access cost. It works well only for the sequential pattern, but not for the more dominant repetitive pattern. This problem becomes very serious in a client/server environment where several analysts analyze the signal data concurrently. To tackle this problem, we devise a fast in-memory cache manager, called T-Cache, by considering pipeline sensor data as multiple time-series data and by efficiently caching the time-series data at T-Cache. To the best of the authors# knowledge, this is the first research on caching pipeline signals on the client-side. We propose a new concept of the signal cache line as a caching unit, which is a set of time-series signal data for a fixed distance. We also provide the various data structures including smart cursors and algorithms used in T-Cache. Experimental results show that T-Cache performs much better for the repetitive pattern in terms of disk I/Os and the elapsed time. Even with the sequential pattern, T-Cache shows almost the same performance as a system that does not use any caching, indicating the caching overhead in T-Cache is negligible.

    The Estimation Model of an Origin-Destination Matrix from Traffic Counts Using a Conjugate Gradient Method (Conjugate Gradient 기법을 이용한 관측교통량 기반 기종점 OD행렬 추정 모형 개발)

    • Lee, Heon-Ju;Lee, Seung-Jae
      • Journal of Korean Society of Transportation
      • /
      • v.22 no.1 s.72
      • /
      • pp.43-62
      • /
      • 2004
    • Conventionally the estimation method of the origin-destination Matrix has been developed by implementing the expansion of sampled data obtained from roadside interview and household travel survey. In the survey process, the bigger the sample size is, the higher the level of limitation, due to taking time for an error test for a cost and a time. Estimating the O-D matrix from observed traffic count data has been applied as methods of over-coming this limitation, and a gradient model is known as one of the most popular techniques. However, in case of the gradient model, although it may be capable of minimizing the error between the observed and estimated traffic volumes, a prior O-D matrix structure cannot maintained exactly. That is to say, unwanted changes may be occurred. For this reason, this study adopts a conjugate gradient algorithm to take into account two factors: estimation of the O-D matrix from the conjugate gradient algorithm while reflecting the prior O-D matrix structure maintained. This development of the O-D matrix estimation model is to minimize the error between observed and estimated traffic volumes. This study validates the model using the simple network, and then applies it to a large scale network. There are several findings through the tests. First, as the consequence of consistency, it is apparent that the upper level of this model plays a key role by the internal relationship with lower level. Secondly, as the respect of estimation precision, the estimation error is lied within the tolerance interval. Furthermore, the structure of the estimated O-D matrix has not changed too much, and even still has conserved some attributes.

    Noise-robust electrocardiogram R-peak detection with adaptive filter and variable threshold (적응형 필터와 가변 임계값을 적용하여 잡음에 강인한 심전도 R-피크 검출)

    • Rahman, MD Saifur;Choi, Chul-Hyung;Kim, Si-Kyung;Park, In-Deok;Kim, Young-Pil
      • Journal of the Korea Academia-Industrial cooperation Society
      • /
      • v.18 no.12
      • /
      • pp.126-134
      • /
      • 2017
    • There have been numerous studies on extracting the R-peak from electrocardiogram (ECG) signals. However, most of the detection methods are complicated to implement in a real-time portable electrocardiograph device and have the disadvantage of requiring a large amount of calculations. R-peak detection requires pre-processing and post-processing related to baseline drift and the removal of noise from the commercial power supply for ECG data. An adaptive filter technique is widely used for R-peak detection, but the R-peak value cannot be detected when the input is lower than a threshold value. Moreover, there is a problem in detecting the P-peak and T-peak values due to the derivation of an erroneous threshold value as a result of noise. We propose a robust R-peak detection algorithm with low complexity and simple computation to solve these problems. The proposed scheme removes the baseline drift in ECG signals using an adaptive filter to solve the problems involved in threshold extraction. We also propose a technique to extract the appropriate threshold value automatically using the minimum and maximum values of the filtered ECG signal. To detect the R-peak from the ECG signal, we propose a threshold neighborhood search technique. Through experiments, we confirmed the improvement of the R-peak detection accuracy of the proposed method and achieved a detection speed that is suitable for a mobile system by reducing the amount of calculation. The experimental results show that the heart rate detection accuracy and sensitivity were very high (about 100%).


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.