• Title/Summary/Keyword: Sparse data

Search Result 415, Processing Time 0.023 seconds

Collaborative Filtering using Co-Occurrence and Similarity information (상품 동시 발생 정보와 유사도 정보를 이용한 협업적 필터링)

  • Na, Kwang Tek;Lee, Ju Hong
    • Journal of Internet Computing and Services
    • /
    • v.18 no.3
    • /
    • pp.19-28
    • /
    • 2017
  • Collaborative filtering (CF) is a system that interprets the relationship between a user and a product and recommends the product to a specific user. The CF model is advantageous in that it can recommend products to users with only rating data without any additional information such as contents. However, there are many cases where a user does not give a rating even after consuming the product as well as consuming only a small portion of the total product. This means that the number of ratings observed is very small and the user rating matrix is very sparse. The sparsity of this rating data poses a problem in raising CF performance. In this paper, we concentrate on raising the performance of latent factor model (especially SVD). We propose a new model that includes product similarity information and co occurrence information in SVD. The similarity and concurrence information obtained from the rating data increased the expressiveness of the latent space in terms of latent factors. Thus, Recall increased by 16% and Precision and NDCG increased by 8% and 7%, respectively. The proposed method of the paper will show better performance than the existing method when combined with other recommender systems in the future.

Analysis of Manganese Nodule Abundance in KODOS Area (KODOS 지역의 망간단괴 부존률 분포해석)

  • Jung, Moon Young;Kim, In Kee;Sung, Won Mo;Kang, Jung Keuk
    • Economic and Environmental Geology
    • /
    • v.28 no.3
    • /
    • pp.199-211
    • /
    • 1995
  • The deep sea camera system could render it possible to obtain the detailed information of the nodule distribution, but difficult to estimate nodule abundance quantitatively. In order to estimate nodule abundance quantitatively from deep seabed photographs, the nodule abundance equation was derived from the box core data obtained in KODOS area(long.: $154^{\circ}{\sim}151^{\circ}W$, lat.: $9^{\circ}{\sim}12^{\circ}N$) during two survey cruises carried out in 1989 and 1990. The regression equation derived by considering extent of burial of nodule to Handa's equation compensates for the abundance error attributable to partial burial of some nodules by sediments. An average long axis and average extent of burial of nodules in photographed area are determined according to the surface textures of nodules, and nodule coverage is calculated by the image analysis method. Average nodule abundance estimated from seabed photographs by using the equation is approximately 92% of the actual average abundance in KODOS area. The measured sampling points by box core or free fall grab are in general very sparse and hence nodule abundance distribution should be interpolated and extrapolated from measured data to uncharacterized areas. The another goal of this study is to depict continuous distribution of nodule abundance in KODOS area by using PC-version of geostatistical model in which several stages are systematically proceeded. Geostatistics was used to analyse spatial structure and distribution of regionalized variable(nodule abundance) within sets of real data. In order to investigate the spatial structure of nodule abundance in KODOS area, experimental variograms were calculated and fitted to a spherical models in isotropy and anisotropy, respectively. The spherical structure models were used to map out distribution of the nodule abundance for isotropic and anisotropic models by using the kriging method. The result from anisotropic model is much more reliable than one of isotropic model. Distribution map of nodule abundance produced by PC-version of geostatistical model indicates that approximately 40% of KODOS area is considered to be promising area(nodule abundance > $5kg/m^2$) for mining in case of anisotropy.

  • PDF

High Resolution Time Resolved Contrast Enhanced MR Angiography Using k-t FOCUSS (k-t FOCUSS 알고리듬을 이용한 고분해능 4-D MR 혈관 조영 영상 기법)

  • Jung, Hong;Kim, Eung-Yeop;Ye, Jong-Chul
    • Investigative Magnetic Resonance Imaging
    • /
    • v.14 no.1
    • /
    • pp.10-20
    • /
    • 2010
  • Purpose : Recently, the Recon Challenge at the 2009 ISMRM workshop on Data Sampling and Image Reconstruction at Sedona, Arizona was held to evaluate feasibility of highly accelerated acquisition of time resolved contrast enhanced MR angiography. This paper provides the step-by-step description of the winning results of k-t FOCUSS in this competition. Materials and Methods : In previous works, we proved that k-t FOCUSS algorithm successfully solves the compressed sensing problem even for less sparse cardiac cine applications. Therefore, using k-t FOCUSS, very accurate time resolved contrast enhanced MR angiography can be reconstructed. Accelerated radial trajectory data were synthetized from X-ray cerebral angiography images and provided by the organizing committee, and radiologists double blindly evaluated each reconstruction result with respect to the ground-truth data. Results : The reconstructed results at various acceleration factors demonstrate that each components of compressed sensing, such as sparsifying transform and incoherent sampling patterns, etc can have profound effects on the final reconstruction results. Conclusion : From reconstructed results, we see that the compressed sensing dynamic MR imaging algorithm, k-t FOCUSS enables high resolution time resolved contrast enhanced MR angiography.

A Study on VaR Stability for Operational Risk Management (운영리스크 VaR 추정값의 안정성검증 방법 연구)

  • Kim, Hyun-Joong;Kim, Woo-Hwan;Lee, Sang-Cheol;Im, Jong-Ho;Cho, Sang-Hee;Kim, Ah-Hyoun
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.5
    • /
    • pp.697-708
    • /
    • 2008
  • Operational risk is defined as the risk of loss resulting from inadequate or failed internal processes, people and systems, or external events. The advanced measurement approach proposed by Basel committee uses loss distribution approach(LDA) which quantifies operational loss based on bank's own historical data and measurement system. LDA involves two distribution fittings(frequency and severity) and then generates aggregate loss distribution by employing mathematical convolution. An objective validation for the operational risk measurement is essential because the operational risk measurement allows flexibility and subjective judgement to calculate regulatory capital. However, the methodology to verify the soundness of the operational risk measurement was not fully developed because the internal operational loss data had been extremely sparse and the modeling of extreme tail was very difficult. In this paper, we propose a methodology for the validation of operational risk measurement based on bootstrap confidence intervals of operational VaR(value at risk). We derived two methods to generate confidence intervals of operational VaR.

Lifetime Prevalence and Comorbidity in Obsessive-Compulsive Disorder and Subclinical Obsessive-Compulsive Disorder in Korea (강박장애 및 아임상형 강박장애의 평생 유병률과 병발성)

  • Hong, Jin-Pyo;Lee, Dong-Eun;Hahm, Bong-Jin;Lee, Jun-Young;Suh, Tong-Woo;Cho, Seong-Jin;Park, Jong-Ik;Lee, Dong-Woo;Bae, Jae-Nam;Park, Su-Bin;Cho, Maeng-Je
    • Anxiety and mood
    • /
    • v.5 no.1
    • /
    • pp.29-35
    • /
    • 2009
  • Background : In spite of the worldwide relevance of obsessive-compulsive disorder Ed-highlight : Unclear. Perhaps consider changing word choice. (OCD), there are considerable differences in prevalence, sex ratio, comorbidity patterns, and sociodemographic correlates. Data on subclinical OCD have been sparse to date. Methods : Data stemmed from the Korea Epidemiologic Catchment Area (KECA) study which had been carried out from April to December 2001. Korean versions of DSM-IV adapted Composite International Diagnostic Interview were administered to a representative sample of 6275 persons aged 18-64 living in the community. DSM-IV based criteria for subclinical OCD were applied. Results : The lifetime prevalence rates for OCD and subclinical OCD were 0.8% and 6.6%, respectively. In both OCD and subclinical OCD, the rates for males and females were not statistically different. OCD was demonstrated to be associated with depressive disorder, bipolar disorder, social phobia, generalized anxiety disorder, and alcohol and nicotine dependence. Additionally, subclinical OCD was associated with posttraumatic stress and somatoform disorders. Comorbidity rates in subclinical OCD were lower than those in OCD. Conclusions : The lifetime prevalence rate for OCD was less than 1% in the Korean general population. Age distribution and comorbidity patterns suggest that subclinical OCD represents a broad and heterogeneous syndrome and not simply a milder form of OCD.

  • PDF

Estimation of Monthly Precipitation in North Korea Using PRISM and Digital Elevation Model (PRISM과 상세 지형정보에 근거한 북한지역 강수량 분포 추정)

  • Kim, Dae-Jun;Yun, Jin-I.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.13 no.1
    • /
    • pp.35-40
    • /
    • 2011
  • While high-definition precipitation maps with a 270 m spatial resolution are available for South Korea, there is little information on geospatial availability of precipitation water for the famine - plagued North Korea. The restricted data access and sparse observations prohibit application of the widely used PRISM (Parameter-elevation Regressions on Independent Slopes Model) to North Korea for fine-resolution mapping of precipitation. A hybrid method which complements the PRISM grid with a sub-grid scale elevation function is suggested to estimate precipitation for remote areas with little data such as North Korea. The fine scale elevation - precipitation regressions for four sloping aspects were derived from 546 observation points in South Korea. A 'virtual' elevation surface at a 270 m grid spacing was generated by inverse distance weighed averaging of the station elevations of 78 KMA (Korea Meteorological Administration) synoptic stations. A 'real' elevation surface made up from both 78 synoptic and 468 automated weather stations (AWS) was also generated and subtracted from the virtual surface to get elevation difference at each point. The same procedure was done for monthly precipitation to get the precipitation difference at each point. A regression analysis was applied to derive the aspect - specific coefficient of precipitation change with a unit increase in elevation. The elevation difference between 'virtual' and 'real' surface was calculated for each 270m grid points across North Korea and the regression coefficients were applied to obtain the precipitation corrections for the PRISM grid. The correction terms are now added to the PRISM generated low resolution (~2.4 km) precipitation map to produce the 270 m high resolution map compatible with those available for South Korea. According to the final product, the spatial average precipitation for entire territory of North Korea is 1,196 mm for a climatological normal year (1971-2000) with standard deviation of 298 mm.

3-stage Portfolio Selection Ensemble Learning based on Evolutionary Algorithm for Sparse Enhanced Index Tracking (부분복제 지수 상향 추종을 위한 진화 알고리즘 기반 3단계 포트폴리오 선택 앙상블 학습)

  • Yoon, Dong Jin;Lee, Ju Hong;Choi, Bum Ghi;Song, Jae Won
    • Smart Media Journal
    • /
    • v.10 no.3
    • /
    • pp.39-47
    • /
    • 2021
  • Enhanced index tracking is a problem of optimizing the objective function to generate returns above the index based on the index tracking that follows the market return. In order to avoid problems such as large transaction costs and illiquidity, we used a method of constructing a portfolio by selecting only some of the stocks included in the index. Commonly used enhanced index tracking methods tried to find the optimal portfolio with only one objective function in all tested periods, but it is almost impossible to find the ultimate strategy that always works well in the volatile financial market. In addition, it is important to improve generalization performance beyond optimizing the objective function for training data due to the nature of the financial market, where statistical characteristics change significantly over time, but existing methods have a limitation in that there is no direct discussion for this. In order to solve these problems, this paper proposes ensemble learning that composes a portfolio by combining several objective functions and a 3-stage portfolio selection algorithm that can select a portfolio by applying criteria other than the objective function to the training data. The proposed method in an experiment using the S&P500 index shows Sharpe ratio that is 27% higher than the index and the existing methods, showing that the 3-stage portfolio selection algorithm and ensemble learning are effective in selecting an enhanced index portfolio.

Epidemiological investigation of porcine pseudorabies virus and its coinfection rate in Shandong Province in China from 2015 to 2018

  • Ma, Zicheng;Han, Zifeng;Liu, Zhaohu;Meng, Fanliang;Wang, Hongyu;Cao, Longlong;Li, Yan;Jiao, Qiulin;Liu, Sidang;Liu, Mengda
    • Journal of Veterinary Science
    • /
    • v.21 no.3
    • /
    • pp.36.1-36.9
    • /
    • 2020
  • Background: Pseudorabies, also known as Aujeszky's disease, is caused by the pseudorabies virus (PRV) and has been recognized as a critical disease affecting the pig industry and a wide range of animals around the world, resulting in great economic losses each year. Shandong province, one of the most vital food animal-breeding regions in China, has a very dense pig population, within which pseudorabies infections were detected in recent years. The data, however, on PRV epidemiology and coinfection rates of PRV with other major swine diseases is sparse. Objectives: This study aimed to investigate the PRV epidemiology in Shandong and analyze the current control measures. Methods: In this study, a total number of 16,457 serum samples and 1,638 tissue samples, which were collected from 362 intensive pig farms (≥ 300 sows/farm) covered all cities in Shandong, were tested by performing enzyme-linked immunosorbent assay (ELISA) and polymerase chain reaction (PCR). Results: Overall, 52.7% and 91.5% of the serum samples were positive for PRV-gE and -gB, respectively, based on ELISA results. In addition, 15.7% of the tissue samples were PCR positive for PRV. The coinfection rates of PRV with porcine circovirus type 2 (PCV2), porcine reproductive and respiratory syndrome virus, and classical swine fever virus were measured; coinfection with PCV2 was 35.0%, higher than those of the other two viruses. Macroscopic and microscopic lesions were observed in various tissues during histopathological examination. Conclusions: The results demonstrate the PRV prevalence and its coinfection rates in Shandong province and indicate that pseudorabies is endemic in pig farms in this region. This study provides epidemiological data that can be useful in the prevention and control of pseudorabies in Shandong, China.

Attention based Feature-Fusion Network for 3D Object Detection (3차원 객체 탐지를 위한 어텐션 기반 특징 융합 네트워크)

  • Sang-Hyun Ryoo;Dae-Yeol Kang;Seung-Jun Hwang;Sung-Jun Park;Joong-Hwan Baek
    • Journal of Advanced Navigation Technology
    • /
    • v.27 no.2
    • /
    • pp.190-196
    • /
    • 2023
  • Recently, following the development of LIDAR technology which can detect distance from the object, the interest for LIDAR based 3D object detection network is getting higher. Previous networks generate inaccurate localization results due to spatial information loss during voxelization and downsampling. In this study, we propose an attention-based convergence method and a camera-LIDAR convergence system to acquire high-level features and high positional accuracy. First, by introducing the attention method into the Voxel-RCNN structure, which is a grid-based 3D object detection network, the multi-scale sparse 3D convolution feature is effectively fused to improve the performance of 3D object detection. Additionally, we propose the late-fusion mechanism for fusing outcomes in 3D object detection network and 2D object detection network to delete false positive. Comparative experiments with existing algorithms are performed using the KITTI data set, which is widely used in the field of autonomous driving. The proposed method showed performance improvement in both 2D object detection on BEV and 3D object detection. In particular, the precision was improved by about 0.54% for the car moderate class compared to Voxel-RCNN.

Recommender Systems using Structural Hole and Collaborative Filtering (구조적 공백과 협업필터링을 이용한 추천시스템)

  • Kim, Mingun;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.107-120
    • /
    • 2014
  • This study proposes a novel recommender system using the structural hole analysis to reflect qualitative and emotional information in recommendation process. Although collaborative filtering (CF) is known as the most popular recommendation algorithm, it has some limitations including scalability and sparsity problems. The scalability problem arises when the volume of users and items become quite large. It means that CF cannot scale up due to large computation time for finding neighbors from the user-item matrix as the number of users and items increases in real-world e-commerce sites. Sparsity is a common problem of most recommender systems due to the fact that users generally evaluate only a small portion of the whole items. In addition, the cold-start problem is the special case of the sparsity problem when users or items newly added to the system with no ratings at all. When the user's preference evaluation data is sparse, two users or items are unlikely to have common ratings, and finally, CF will predict ratings using a very limited number of similar users. Moreover, it may produces biased recommendations because similarity weights may be estimated using only a small portion of rating data. In this study, we suggest a novel limitation of the conventional CF. The limitation is that CF does not consider qualitative and emotional information about users in the recommendation process because it only utilizes user's preference scores of the user-item matrix. To address this novel limitation, this study proposes cluster-indexing CF model with the structural hole analysis for recommendations. In general, the structural hole means a location which connects two separate actors without any redundant connections in the network. The actor who occupies the structural hole can easily access to non-redundant, various and fresh information. Therefore, the actor who occupies the structural hole may be a important person in the focal network and he or she may be the representative person in the focal subgroup in the network. Thus, his or her characteristics may represent the general characteristics of the users in the focal subgroup. In this sense, we can distinguish friends and strangers of the focal user utilizing the structural hole analysis. This study uses the structural hole analysis to select structural holes in subgroups as an initial seeds for a cluster analysis. First, we gather data about users' preference ratings for items and their social network information. For gathering research data, we develop a data collection system. Then, we perform structural hole analysis and find structural holes of social network. Next, we use these structural holes as cluster centroids for the clustering algorithm. Finally, this study makes recommendations using CF within user's cluster, and compare the recommendation performances of comparative models. For implementing experiments of the proposed model, we composite the experimental results from two experiments. The first experiment is the structural hole analysis. For the first one, this study employs a software package for the analysis of social network data - UCINET version 6. The second one is for performing modified clustering, and CF using the result of the cluster analysis. We develop an experimental system using VBA (Visual Basic for Application) of Microsoft Excel 2007 for the second one. This study designs to analyzing clustering based on a novel similarity measure - Pearson correlation between user preference rating vectors for the modified clustering experiment. In addition, this study uses 'all-but-one' approach for the CF experiment. In order to validate the effectiveness of our proposed model, we apply three comparative types of CF models to the same dataset. The experimental results show that the proposed model outperforms the other comparative models. In especial, the proposed model significantly performs better than two comparative modes with the cluster analysis from the statistical significance test. However, the difference between the proposed model and the naive model does not have statistical significance.