• Title/Summary/Keyword: similarity metric

Search Result 112, Processing Time 0.041 seconds

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)

  • Kim, Minsung;Im, Il
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.137-148
    • /
    • 2014
  • Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

    . Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.

  • Seasonal Variations in the Species Composition of Fisheries Resources Caught by Trammel Net in the Uljin Marine Ranching Area, East Sea (울진바다목장에서 자망으로 어획된 수산자원의 종조성과 계절변동)

    • Yoon, Byoung Sun;Park, Jeong-Ho;Yoon, Sang Chul;Yang, Jae Hyeong;Lee, Sung-Il;Kim, Jong-Bin;Choi, Young-Min;Sohn, Myoung Ho
      • Korean Journal of Fisheries and Aquatic Sciences
      • /
      • v.48 no.6
      • /
      • pp.947-959
      • /
      • 2015
    • Variations in the species composition, biomass and size distribution of fisheries resources in the Uljin marine ranching area were investigated using trammel nets at two stations (artificial reef and natural rocky area) from 2009 to 2010. During the survey, a total of 74 species were sampled with a mean density of 132 ind./net and mean biomass of 21.56 kg/net. In the natural rocky area, a total of 45 species were sampled at a mean density of 202 ind./net and mean biomass of 28.81 kg/net, while in the artificial reef area, samples included a total of 56 species, with means of 62 ind./net and 14.30 kg/net. The dominant species, comprising over 3% of the total number of individuals, were Suberites ficus (30.8%), Ovalipes punctatus (19.2%), Paralichthys olivaceus (11.7%), Pleuronectes herzensteini (4.7%), Kareius bicoloratus (3.5%), Pseudopleuronectes yokohamae (3.5%) and Eopsetta grigorjewi (3.0%). The dominant species, in terms of biomass, comprising over 5% of the total biomass, were P. olivaceus (22.1%), S. ficus (18.7%), O. punctatus (7.2%), Hexagrammos otakii (6.6%), P. yokohamae (5.7%), K. bicoloratus and P. herzensteini (5.3%). A cluster analysis and non-metric multidimensional scaling (nMDS) analysis based on the Bray-Curtis similarity of fourth root transformed data for number of species and individuals, was divided into two groups: the artificial reef area (group A) and the natural rocky area (group B).

    Characteristics of distribution and community structure of marcrobenthic Invertebrates caught in the coastal waters of middle East Sea, Korea (동해 중부해역 저서무척추동물의 분포특성 및 군집구조)

    • YOON, Byoung-Sun;CHOI, Young-Min;SOHN, Myong-Ho;KIM, Jong-Bin;YANG, Jae-Hyeong;PARK, Jeong-Ho
      • Journal of the Korean Society of Fisheries and Ocean Technology
      • /
      • v.52 no.4
      • /
      • pp.372-385
      • /
      • 2016
    • This present study investigated characteristics of distribution and community structure of macrobenthic invertebrates through the survey of commercial Danish seine fisheries from 2011 to 2013. In this study, a total of 28 species were sampled with a mean density of $32,568ind./km^2$ and mean biomass of $1,649.5kg/km^2$. The dominant species, comprising over 1.0% of the total number of individuals, were Chionoecetes opilio ($11,203ind./km^2$, 34.4%), Pandalus eous ($9,247ind./km^2$, 28.4%), Ophiuridae spp. ($5,750ind./km^2$, 17.7%), Argis lar ($2,631ind./km^2$, 8.1%), Neocrangon communis ($994ind./km^2$, 3.1%), Berryteuthis magister ($612ind./km^2$, 1.9%), Sepiola birostrata ($499ind./km^2$, 1.5%) and Strongylocentrotidae sp. ($424ind./km^2$, 1.3%). The dominant species, in terms of biomass, comprising over 1.0% of the total biomass, were C. opilio ($1,167.2kg/km^2$, 70.8%), B. magister ($130.3kg/km^2$, 7.9%), P. eous ($102.4kg/km^2$, 6.2%), Ophiuridae spp. ($84.6kg/km^2$, 5.1%), Enteroctopus dofleini ($45.5kg/km^2$, 2.8%), A. lar ($35.7kg/km^2$, 2.2%), Strongylocentrotidae sp. ($25.0kg/km^2$, 1.5%) and S. birostrata ($22.1kg/km^2$, 1.3%). Among them, S. birostrata, E. dofleini, Strongylocentrotidae sp. and Ophiuridae spp. were higher abundance and biomass in the shallow water (<200 meters in depth), whereas C. opilio, P. eous, A. lar, N. communis and B. magister were higher in the deep water (301 ~ 500 meters in depth). As the results of cluster analysis and non-metric multidimensional scaling (nMDS) analysis based on the Bray-Curtis similarity of fourth root transformed data for number of species and individuals, the macrobenthic invertebrates community by Danish seine survey was divided into two groups of station in the shallow water (<200 meters in depth, Group A) and the deep water (201 ~ 500 meters in depth, Group B). The major individual-dominant species was S. birostrata, Ophiuridae spp. and immature C. opilio in group A. But Group B was P. eous, A. lar, B. magister and mature C. opilio.

    Spatio-temporal Variation of Fish Communities in Open Estuary, Seomjin River Estuary and Gwangyang Bay Coast (열린 하구인 섬진강 하구 및 광양만 연안 어류 군집의 시공간적 변화)

    • Sun Ho Lee;Won-Seok Kim;Jae-Won Park;Hyunbin Jo;Wan-Ok Lee;Tae Sik Yu;Hyo Gyeom Kim;Chang Woo Ji;Ihn-Sil Kwak
      • Korean Journal of Ecology and Environment
      • /
      • v.55 no.2
      • /
      • pp.132-144
      • /
      • 2022
    • The fish community in the Seomjin River-Seomjin River Estuary-Gwangyang Bay coast continuum was investigated three times from March 2019 to October 2019. The collected species at the eight sites during the survey period were 49 species belonging to 31 families, including two endangered species. According to Bray-Curtis similarities, observations were divided into four groups based on the fish community composition; two groups (group 1, 2) and two uncategorized groups (group 3, 4). ANOSIM based on spatial and temporal groupings indicated that the spatial differences in fish communities (R=0.398, P=0.001) were relatively more important than the temporal differences (analysis of similarities, R=0.273, P=0.002). In particular, there were significant differences between groups 1 and 2 (analysis of similarities, R=0.556, P=0.001), and similarity percentage analysis revealed that Argyrosomus argentatus (9.4%), Favonigobius gymnauchen (6.9%) and Konosirus punctatus (5.9%) contributed to these differences of fish assemblages for each group. The fish fauna distributed in the Seomjin River-Gwangyang Bay ecosystem were spatially divided and the number of species and number of individuals showed seasonal differences. This study could be a basis for understanding changes in the fish community and implementing conservation and management strategies on major species within a continuous environment of the river-estuary-ocean continuum.

    A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

    • Lee, Dongwon
      • Journal of Intelligence and Information Systems
      • /
      • v.27 no.1
      • /
      • pp.23-46
      • /
      • 2021
    • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.

    Community Structure of Macrobenthic Assemblages around the Wolseong Nuclear Power Plant, East Sea of Korea (월성 원자력발전소 주변해역에 서식하는 대형저서동물의 군집구조)

    • Seo, In-Soo;Moon, Hyung-Tae;Choi, Byoung-Mi;Kim, Mi-Hyang;Kim, Dae-Ik;Yun, Jae-Seong;Byun, Ju-Young;Choi, Hue-Chang;Son, Min-Ho
      • Korean Journal of Environmental Biology
      • /
      • v.27 no.4
      • /
      • pp.341-352
      • /
      • 2009
    • This study was carried out to investigated community structure of macrobenthic assemblages around the Wolseong Nuclear Power Plant, East Sea of Korea and seasonal sampling was performed from October 2007 to July 2008. A total of 163 macrobenthic fauna were collected. The overall average macrobenthos density and biomass were 1,005 individuals $m^{-2}$ and $21.81\;gWWt\;m^{-2}$, respectively. Based on the LeBris (1988) index, there were 10 dominant species accounting for approximately 69.00% of total individuals. The major dominant species were the polychaetes Spiophanes bombyx (349 inds. $m^{-2}$), Mediomastus californiensis (82 inds. $m^{-2}$), Sigambra tentaculata (55 inds. $m^{-2}$), Magelona japonica (50 inds. $m^{-2}$), Scoletoma longifolia (33 inds. $m^{-2}$) and the Unidentified amphipod (Amphipoda spp., 72 inds. $m^{-2}$). The conventional multi-variate statistics (cluster analysis and non-metric multi-dimensional scaling) applied to assess spatial variation in macrobenthic assemblages. Cluster analysis and nMDS ordination analysis based on the Bray-Curtis similarity identified 2 major station groups. The major group 1 was associated with sand dominated stations and was characterized by high abundance of the bivalves Mactra chinensis, Siliqua pulchella and the polychaete Protodorvillea egena. On the other hand, major group 2 was connected with mud dominated stations and was numerically dominated by the polychaetes M. californiensis, M. japonica, Sternaspis scutata, S. longifolia and the bivalves Thyasira tokunagai and Theora fragilis. However, macrobenthic community structure were no significant differences between the environmental variables (sediment type and depth) and heated discharge.

    Community Structure of Macrobenthic Assemblages around Gijang Province, East Sea of Korea (동해 기장군 주변해역에 서식하는 대형저서동물의 군집구조)

    • Kim, Dae-Ik;Seo, In-Soo;Moon, Chang-Ho;Choi, Byoung-Mi;Jung, Rae-Hong;Son, Min-Ho
      • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
      • /
      • v.16 no.2
      • /
      • pp.97-105
      • /
      • 2011
    • This study investigated the community structure and spatio-temporal variation of macrobenthic assemblages around Gijang Province, East Sea of Korea. Macrobenthos collected seasonally using a modified van Veen grab sampler from March to November 2006. A total of 157 macrobenthic fauna were collected. The overall average macrobenthos density were $552 \;ind/m^2$. The species number of macrobenthos was in the range from 62 in winter and spring to 122 in autumn. On the other hand, abundance fluctuated between 6,540 (in spring) and 17,920 (in autumn) inds./$18m^2$. Cluster analysis and non-metric multi-dimensional scaling (nMDS) were applied to assess the spatio-temporal fluctuation in the macrobenthic assemblages. Cluster analysis and nMDS ordination analysis based on the Bray-Curtis similarity identified 3 station groups. The group 1 (station 8~10, 12, 13, 17 and 18) was characterized by high abundance of the polychaete Lumbrineris longifolia, the bivalve Ennucula tenuis and the Amphipoda spp., with mean phi range from $6.2{\Phi}$ to $7.1{\Phi}$ (above 50m water depth). The group 2 (station 5~7, 11, 14~16) was numerically dominated by the po1ychaete Ampharete arctica and the bivalve Theorafragilis (mean phi: $6.0{\sim}7.0{\Phi}$; within 40 m water depth). Finally group 3 (station 1~4) was characterized by high density of the polychaetes Magelona japonica and Sternaspis scutata, with mean phi range from $3.5{\Phi}$ to $6.9{\Phi}$ (below 30 m water depth). In conclusion, the Macrobenthic community structure showed a distinct spatial and temporal trend, which seemed to be related to the water depth and sediment composition.

    A Comparative Study on a Macrobenthic Community Structure from the Theory of Island Biogeography (도서생물지리설의 관점에서 대형무척추동물 군집 비교)

    • Seo, In-Soo;Choi, Byoung-Mi;Kim, Mi-Hyang;Yun, Jae-Seong;Park, Jae-Yeong;Lee, Sang-Yeop
      • Korean Journal of Environmental Biology
      • /
      • v.28 no.4
      • /
      • pp.179-187
      • /
      • 2010
    • The Theory of Island Biogeography describes that the number of species on an island affected by island area and distance from the mainland. This study was performed to compare and analyze the community structure of the macro-invertebrates in three isolated islands, around Korean waters in terms of the Theory of Island Biogeography. Macrobenthic animals were collected using a modified underwater quadrat in August 2009. A total of 104 macrobenthic species were sampled with a mean density of 399 individuals $m^{-2}$ and biomass of 1,506.70 g $m^{-2}$. Based on the abundance and biomass data, there were 10 dominant species accounting for approximately 67.17% of total individuals. The highest densities were found in the amphipoda Amphipoda spp., the bivalvia Modiolus agripetus and Mytilus coruscus, the Sipunculida Phascolosoma scolops and the polychaeta Syllidae unid.. On the contrary, the top ten species made up 95.66% of the total biomass while the three most abundant, the bivalves M. coruscus, Streostria circumpicta and M. agripetus. The conventional multi-variate statistics (cluster analysis and non-metric multi-dimensional scaling) applied to assess spatial variation in macrobenthic assemblages. Cluster analysis and nMDS ordination analysis based on the Bray-Curtis similarity identified 2 station groups. The group 1 was consisted with Gageodo (except for lower station at Transect 2) and Dokdo station and was numerically dominated by the polychaetes Eunice antennata and Syllidae unid., the cirripedia Megabalanus rosa and the bivalvia M. coruscus. However, group 2 was associated with Sohwado station and was characterized by high abundance of the anomura Petrolisthes japonicus, the gastropoda Lirularia pygmaea and the brachiopoda Coptothyris grayi. In conclusion, these results suggested that the species diversity and community structure of macrobenthos in three isolated island seemed slightly related to island area and distance from the mainland.

    Environmental characteristics on habitats of Viola diamantiaca Nakai and its RAPD analysis (금강제비꽃(Viola diamantiaca Nakai) 자생지의 환경특성과 RAPD 분석)

    • Seo, Won-Bok;Yoo, Ki-Oug
      • Korean Journal of Plant Taxonomy
      • /
      • v.41 no.1
      • /
      • pp.66-80
      • /
      • 2011
    • This study investigated the environmental factors and conducted a RAPD analysis for a better understanding of the environmental characteristics and regional genetic variation in samples from 18 different areas of Viola diamantiaca. The habitats are mostly located on the slopes of mountains facing north at an altitude ranging from 614 m to 1,462 m above sea level with angles of inclination ranging from 3 degrees to 30 degrees. A total of 268 vascular plant taxa are identified in 35 quadrates of 18 habitats. The importance value of V. diamantiaca is 11.58%, and four highly ranked species, Sasa borealis (5.61%), Meehania urticifolia (5.21%), Ainsliaea acerifolia (3.62%), Pseudostellaria palibiniana (3.60%) are considered to have an affinity with V. diamantiaca in their habitats. The degree of their average species diversity is 1.36, while this metric for their evenness and dominance are 0.89 and 0.07, respectively. The average field capacity of the soil is 25.99%, with organic matter at 17.47%, and the pH is 5.19. The soil texture was confirmed as sandy loam of eleven and loam of seven. The result of the RAPD analysis, among 78 bands amplified with a primer, 64 (84.6%) showed polymorphism. Eighteen populations could be classified into five groups with similarity coefficient values ranging from 0.53 to 0.86. The Mt. Jiri population, which is geographically segregated, shows basal branching within the 18 populations. Five populations, including two in the southern district in Gangwon-do and three in Chungcheongbuk-do, form a distinct clade. Four populations in the central district of Gangwon-do and Mt. Bohyeon in the Gyeongsangbuk-do clade form a sister to the clade containing two populations in Gyeonggi-do and five populations of the northern district in Gangwon-do. The Mt. Gariwang population is placed between the southern district and the central district in the Gangwon-do clades.

    Analysis of Trophic Structure and Energy Flows in the Uljin Marine Ranching Area, Korean East Sea (울진 바다목장 생태계의 영양구조와 에너지 흐름)

    • Kim, Hyung Chul;Lee, Jae Kyung;Kim, Mi Hyang;Choi, Byoung-Mi;Seo, In-Soo;Na, Jong Hun
      • Journal of the Korean Society of Marine Environment & Safety
      • /
      • v.24 no.6
      • /
      • pp.750-763
      • /
      • 2018
    • This study conducted 10 sampling sites survey 4 times to determine the trophic structure and energy flow of marine ecosystems for Uljin marine ranching area, Korean East Sea from March to October 2013. Based on the ecological characteristics of biological species, one used the non-Metric Multidimensional Scaling method based on the similarity of species. A total of 19 classified species groups formed categories including, top predators, seabirds, large pelagic fishes, small pelagic fishes, rockfishes, pleuronectiformes, benthic fishes, semi-benthic fishes, cephalopods, benthic feeders, epifauna, bivalves, abalone, Cnidaria, zooplankton, benthic algae, microalgae, phytoplankton and detritus. The biomass, production/biomass, consumption/biomass, diet composition data of each species groups to input data used in Ecopath mode estimated the trophic structure and energy flow of marine ecosystems in the Uljin marine ranching area. One estimated each species groups on the trophic level from 1 to 5.687. The sum of all consumption was estimated at $229.7t/km^2/yr$ and the sum of all exports was as estimated $3,432.4t/km^2/yr$. Total system throughput was at $6,796.2t/km^2/yr$, and the sum of all production was estimated at $3,613.1t/km^2/yr$. Net system production according to these results was estimated at $3,490.3t/km^2/yr$ and total biomass (excluding detritus) was estimated at $167.3t/km^2/yr$ in the Uljin marine ranching area.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.