• Title/Summary/Keyword: Distributed Clustering

Search Result 222, Processing Time 0.027 seconds

Big Data Analysis of News on Purchasing Second-hand Clothing and Second-hand Luxury Goods: Identification of Social Perception and Current Situation Using Text Mining (중고의류와 중고명품 구매 관련 언론 보도 빅데이터 분석: 텍스트마이닝을 활용한 사회적 인식과 현황 파악)

  • Hwa-Sook Yoo
    • Human Ecology Research
    • /
    • v.61 no.4
    • /
    • pp.687-707
    • /
    • 2023
  • This study was conducted to obtain useful information on the development of the future second-hand fashion market by obtaining information on the current situation through unstructured text data distributed as news articles related to 'purchase of second-hand clothing' and 'purchase of second-hand luxury goods'. Text-based unstructured data was collected on a daily basis from Naver news from January 1st to December 31st, 2022, using 'purchase of second-hand clothing' and 'purchase of second-hand luxury goods' as collection keywords. This was analyzed using text mining, and the results are as follows. First, looking at the frequency, the collection data related to the purchase of second-hand luxury goods almost quadrupled compared to the data related to the purchase of second-hand clothing, indicating that the purchase of second-hand luxury goods is receiving more social attention. Second, there were common words between the data obtained by the two collection keywords, but they had different words. Regarding second-hand clothing, words related to donations, sharing, and compensation sales were mainly mentioned, indicating that the purchase of second-hand clothing tends to be recognized as an eco-friendly transaction. In second-hand luxury goods, resale and genuine controversy related to the transaction of second-hand luxury goods, second-hand trading platforms, and luxury brands were frequently mentioned. Third, as a result of clustering, data related to the purchase of second-hand clothing were divided into five groups, and data related to the purchase of second-hand luxury goods were divided into six groups.

Water resources monitoring technique using multi-source satellite image data fusion (다종 위성영상 자료 융합 기반 수자원 모니터링 기술 개발)

  • Lee, Seulchan;Kim, Wanyub;Cho, Seongkeun;Jeon, Hyunho;Choi, Minhae
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.8
    • /
    • pp.497-508
    • /
    • 2023
  • Agricultural reservoirs are crucial structures for water resources monitoring especially in Korea where the resources are seasonally unevenly distributed. Optical and Synthetic Aperture Radar (SAR) satellites, being utilized as tools for monitoring the reservoirs, have unique limitations in that optical sensors are sensitive to weather conditions and SAR sensors are sensitive to noises and multiple scattering over dense vegetations. In this study, we tried to improve water body detection accuracy through optical-SAR data fusion, and quantitatively analyze the complementary effects. We first detected water bodies at Edong, Cheontae reservoir using the Compact Advanced Satellite 500(CAS500), Kompsat-3/3A, and Sentinel-2 derived Normalized Difference Water Index (NDWI), and SAR backscattering coefficient from Sentinel-1 by K-means clustering technique. After that, the improvements in accuracies were analyzed by applying K-means clustering to the 2-D grid space consists of NDWI and SAR. Kompsat-3/3A was found to have the best accuracy (0.98 at both reservoirs), followed by Sentinel-2(0.83 at Edong, 0.97 at Cheontae), Sentinel-1(both 0.93), and CAS500(0.69, 0.78). By applying K-means clustering to the 2-D space at Cheontae reservoir, accuracy of CAS500 was improved around 22%(resulting accuracy: 0.95) with improve in precision (85%) and degradation in recall (14%). Precision of Kompsat-3A (Sentinel-2) was improved 3%(5%), and recall was degraded 4%(7%). More precise water resources monitoring is expected to be possible with developments of high-resolution SAR satellites including CAS500-5, developments of image fusion and water body detection techniques.

A Two-Stage Learning Method of CNN and K-means RGB Cluster for Sentiment Classification of Images (이미지 감성분류를 위한 CNN과 K-means RGB Cluster 이-단계 학습 방안)

  • Kim, Jeongtae;Park, Eunbi;Han, Kiwoong;Lee, Junghyun;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.139-156
    • /
    • 2021
  • The biggest reason for using a deep learning model in image classification is that it is possible to consider the relationship between each region by extracting each region's features from the overall information of the image. However, the CNN model may not be suitable for emotional image data without the image's regional features. To solve the difficulty of classifying emotion images, many researchers each year propose a CNN-based architecture suitable for emotion images. Studies on the relationship between color and human emotion were also conducted, and results were derived that different emotions are induced according to color. In studies using deep learning, there have been studies that apply color information to image subtraction classification. The case where the image's color information is additionally used than the case where the classification model is trained with only the image improves the accuracy of classifying image emotions. This study proposes two ways to increase the accuracy by incorporating the result value after the model classifies an image's emotion. Both methods improve accuracy by modifying the result value based on statistics using the color of the picture. When performing the test by finding the two-color combinations most distributed for all training data, the two-color combinations most distributed for each test data image were found. The result values were corrected according to the color combination distribution. This method weights the result value obtained after the model classifies an image's emotion by creating an expression based on the log function and the exponential function. Emotion6, classified into six emotions, and Artphoto classified into eight categories were used for the image data. Densenet169, Mnasnet, Resnet101, Resnet152, and Vgg19 architectures were used for the CNN model, and the performance evaluation was compared before and after applying the two-stage learning to the CNN model. Inspired by color psychology, which deals with the relationship between colors and emotions, when creating a model that classifies an image's sentiment, we studied how to improve accuracy by modifying the result values based on color. Sixteen colors were used: red, orange, yellow, green, blue, indigo, purple, turquoise, pink, magenta, brown, gray, silver, gold, white, and black. It has meaning. Using Scikit-learn's Clustering, the seven colors that are primarily distributed in the image are checked. Then, the RGB coordinate values of the colors from the image are compared with the RGB coordinate values of the 16 colors presented in the above data. That is, it was converted to the closest color. Suppose three or more color combinations are selected. In that case, too many color combinations occur, resulting in a problem in which the distribution is scattered, so a situation fewer influences the result value. Therefore, to solve this problem, two-color combinations were found and weighted to the model. Before training, the most distributed color combinations were found for all training data images. The distribution of color combinations for each class was stored in a Python dictionary format to be used during testing. During the test, the two-color combinations that are most distributed for each test data image are found. After that, we checked how the color combinations were distributed in the training data and corrected the result. We devised several equations to weight the result value from the model based on the extracted color as described above. The data set was randomly divided by 80:20, and the model was verified using 20% of the data as a test set. After splitting the remaining 80% of the data into five divisions to perform 5-fold cross-validation, the model was trained five times using different verification datasets. Finally, the performance was checked using the test dataset that was previously separated. Adam was used as the activation function, and the learning rate was set to 0.01. The training was performed as much as 20 epochs, and if the validation loss value did not decrease during five epochs of learning, the experiment was stopped. Early tapping was set to load the model with the best validation loss value. The classification accuracy was better when the extracted information using color properties was used together than the case using only the CNN architecture.

A Study on Influence of Location Factors of Food Service Business Start-up Real Estate Store on Business Performance: Mediated Effect of Start-up Business Satisfaction (외식창업부동산점포의 입지요인이 경영성과에 미치는 영향: 창업만족도의 매개효과)

  • Lee, Mu-Seon
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.12 no.2
    • /
    • pp.77-86
    • /
    • 2017
  • Selection of location in food service start-up business is sure to be a shortcut to achievement of business performance, and in this context, it's no exaggeration to say that food service industry is an real estate industry. This study looked into what influence of the location factor in food service start-up business had on sales performance, and intended to verify whether the location factors ultimately influenced business performance consequent on the influence of location factors on start-up business satisfaction. To this end, this study set food service owner-operators as its research subject, and conducted a survey of the operators (of restaurants) located in Anyang-si from December 1, 2016 until January 30, 2017. This study distributed a total of 300 copies of questionnaires, and collected 245 copes, among which this study used 198 copies for empirical study excluding the copies whose reply was unfaithful. This study did empirical analysis of 198 copies using SPSS 22.0 Statistical Package Program, together with the application of frequency analysis, factor analysis and regression analysis. The major results of this study are as follows: First, this study divided the location factors in food service start-up business stores into the four, i.e. accessibility, clustering property, placeness and visibility, etc. Second, the study results showed that accessibility, clustering property, placeness and visibility had significant influence as one in the influence of locational factors on sales performance. Third, this study could understand that start-up business satisfaction had a partial mediated effect in the influence of location factors on sales performance. Resultantly, this study confirmed food service start-up business's own selection of location, and wished to find major factors and a differentiated point in time of selection of location of stores in other fields. Such a result gives an implication that it's necessary to concentrate all efforts to increase sales performance of food service start-up business from the location selection phase, and to make efforts to increase start-up business satisfaction.

  • PDF

Predicting the Performance of Recommender Systems through Social Network Analysis and Artificial Neural Network (사회연결망분석과 인공신경망을 이용한 추천시스템 성능 예측)

  • Cho, Yoon-Ho;Kim, In-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.159-172
    • /
    • 2010
  • The recommender system is one of the possible solutions to assist customers in finding the items they would like to purchase. To date, a variety of recommendation techniques have been developed. One of the most successful recommendation techniques is Collaborative Filtering (CF) that has been used in a number of different applications such as recommending Web pages, movies, music, articles and products. CF identifies customers whose tastes are similar to those of a given customer, and recommends items those customers have liked in the past. Numerous CF algorithms have been developed to increase the performance of recommender systems. Broadly, there are memory-based CF algorithms, model-based CF algorithms, and hybrid CF algorithms which combine CF with content-based techniques or other recommender systems. While many researchers have focused their efforts in improving CF performance, the theoretical justification of CF algorithms is lacking. That is, we do not know many things about how CF is done. Furthermore, the relative performances of CF algorithms are known to be domain and data dependent. It is very time-consuming and expensive to implement and launce a CF recommender system, and also the system unsuited for the given domain provides customers with poor quality recommendations that make them easily annoyed. Therefore, predicting the performances of CF algorithms in advance is practically important and needed. In this study, we propose an efficient approach to predict the performance of CF. Social Network Analysis (SNA) and Artificial Neural Network (ANN) are applied to develop our prediction model. CF can be modeled as a social network in which customers are nodes and purchase relationships between customers are links. SNA facilitates an exploration of the topological properties of the network structure that are implicit in data for CF recommendations. An ANN model is developed through an analysis of network topology, such as network density, inclusiveness, clustering coefficient, network centralization, and Krackhardt's efficiency. While network density, expressed as a proportion of the maximum possible number of links, captures the density of the whole network, the clustering coefficient captures the degree to which the overall network contains localized pockets of dense connectivity. Inclusiveness refers to the number of nodes which are included within the various connected parts of the social network. Centralization reflects the extent to which connections are concentrated in a small number of nodes rather than distributed equally among all nodes. Krackhardt's efficiency characterizes how dense the social network is beyond that barely needed to keep the social group even indirectly connected to one another. We use these social network measures as input variables of the ANN model. As an output variable, we use the recommendation accuracy measured by F1-measure. In order to evaluate the effectiveness of the ANN model, sales transaction data from H department store, one of the well-known department stores in Korea, was used. Total 396 experimental samples were gathered, and we used 40%, 40%, and 20% of them, for training, test, and validation, respectively. The 5-fold cross validation was also conducted to enhance the reliability of our experiments. The input variable measuring process consists of following three steps; analysis of customer similarities, construction of a social network, and analysis of social network patterns. We used Net Miner 3 and UCINET 6.0 for SNA, and Clementine 11.1 for ANN modeling. The experiments reported that the ANN model has 92.61% estimated accuracy and 0.0049 RMSE. Thus, we can know that our prediction model helps decide whether CF is useful for a given application with certain data characteristics.

The Risk Assessment of the Fire Occurrence According to Urban Facilities in Jinju-si (진주시 도시시설물별 화재발생 위험도 평가)

  • Bae, Gyu Han;Won, Tae Hong;Yoo, Hwan Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.24 no.1
    • /
    • pp.43-50
    • /
    • 2016
  • Urbanization in Korea has increased significantly and subsequently, various facilities have been concentrated in urban areas at high speed in accordance with a growing urban population. Accordingly, damages have occurred due to a variety of disasters. In particular, fire damage among the social disasters caused the most severe damage in urban areas along with traffic accidents. 44,432 cases of fire occurred in 2015 in Korea. Due to these accidents, 253 were killed and property damage of 4,50 billion won was generated. However, despite the efforts to reduce a variety of damage, fire danger still remains high. In this regard, this study collected fire data, generated from 2007 to 2014 through the Jinju Fire Department and the National Fire Data System(NFDS) and calculated fire risk by analyzing the clustering of fire cases and facilities in Jinju-si based on the current DB of facilities, offered by the Ministry of Government Administration and Home Affairs. As a result, the risk ratings of fire occurrence were classified as four stages under the standards of the US Society of Fire Protection Engineers(SEPE). Business facilities, entertainment facilities, and automobile facilities were classified as the highest A grade, detached houses, Apartment houses, education facilities, sales facilities, accommodation, set of facilities, medical facilities, industrial facilities, and life service facilities were classified as U grade, and other facilities were classified as EU grade. Finally, hazardous production facilities were classified as BEU grade, the lowest grade. In addition, in the case of setting the standard with loss of life, the highest risk facility was the hazardous production facilities, while in the case of setting the standard with property damage, a set of facilities and industrial facilities showed the highest risk. In this regard, this study is expected to be effectively utilized to establish the fire reduction measures against facilities, distributed in urban space by calculating risk grades regarding the generation frequency, casualties, and property damage, through the classification of fire, occurred in the city, according to the facilities.

A Statistical Analysis of Phenotypic Diversity Based on Genetic Traits in Barley Germplasms (특성평가 정보를 활용한 보리 유전자원 형태적 형질 다양성의 통계적 분석)

  • Yu, Dong Su;Shin, Myoung-Jae;Park, Jin-Cheon;Kang, Manjung
    • Korean Journal of Plant Resources
    • /
    • v.35 no.5
    • /
    • pp.641-651
    • /
    • 2022
  • The biodiversity research of barley, a functional food, is proceeding to conserve germplasms and develop new cultivar of barley to improve its functional effects. In this study, with 25,104 barley germplasms in the National Agrobiodiversity Center, South Korea, the biodiversity index of species was much lower (1.17) than the origins (24.73) because of the presence of a biased species, Hordeum vulgare subsp. vulgare, but the species and origin of germplasms were significantly different with regard to genetic traits. In the clustering analysis based on genetic traits, we found that 97% barley germplasms could mostly be distributed between 1~7 clusters out of a total of 15 clusters; 'normal and uzu type', 'lodging', and 'loose smut' were commonly represented in the 1~7 clusters and some clusters showed specific differences in five genetic traits including 'growth habit'. In correlation of each genetic trait, the infection of 'barley yellow mosaic virus' was highly correlated to 'number of grains per spike'. '1000 grain weight' was weakly correlated with seven genetic traits including 'number of grains per spike'. Our analysis for barley's biodiversity can provide a useful guide to the species' phenotypes that need to be collected to conserve biodiversity and to breed new barley varieties.

Population Genetic Variation of Ulmus davidiana var. japonica in South Korea Based on ISSR Markers (ISSR 표지자를 이용한 느릅나무 자연집단의 유전변이 분석)

  • Ahn, Ji Young;Hong, Kyung Nak;Lee, Jei Wan;Yang, Byung Hoon
    • Journal of Korean Society of Forest Science
    • /
    • v.102 no.4
    • /
    • pp.560-565
    • /
    • 2013
  • Population genetic structure and diversity of Ulmus davidiana var. japonica in South Korea were studied using ISSR markers. A total of 45 polymorphic ISSR amplicons were cropped from 7 ISSR primers and 171 individuals of 7 populations. The average of effective alleles and the proportion of polymorphic loci were 1.5 and 89% respectively. The Shannon's diversity index (I) was 0.435 and the expected heterozygosity from the frequentist's method ($H_e$) and the Bayesian inference (hs) were 0.289 and 0.323 respectively. From AMOVA, 4.2% of total genetic variation in the elm populations was explained with the difference among populations (${\Phi}_{ST}=0.042$) and the other 95.8% was distributed within populations. The ${\theta}^{II}$ value by Bayesian method which was comparable to the FST was 0.043. So the level of genetic diversity in the elm populations was similar to that in Genus Ulmus and the level of genetic differentiation was lower than that of others. No population showed a significant difference in the population-specific fixation indices (average of $PS-F_{IS}=0.822$) or the population-specific genetic differentiations (average of $PS-F_{ST}=0.101$). Seven populations were allocated into 3 groups in the UPGMA and the PCA, but the grouping patterns were different. Also, we could not confirm any geographic trend from Bayesian clustering.

Distribution of lasmodiophora brassicae Causing clubroot Disease of Chinese Cabbage in Soil (배추무사마귀병균의 토양내 분포)

  • 김충회;조원대;김홍모
    • Research in Plant Disease
    • /
    • v.6 no.1
    • /
    • pp.27-33
    • /
    • 2000
  • Population density of Plasmodiophora brassicae in soil of severely infested fields of Chinese cabbage decreased as soil depth increases. More than 97% of total population was found in surface soil (0-5cm depth), and a few resting spores of the pathogen were also detected in 40 cm-deep soil. the clubroot pathogen was evenly distributed over the surface soil without clustering around a Chinese cabbage plant. Density of P. brassicae in soil at 23 Chinese cabbage fields in Pyongchang, Kangwon province ranged widely from less than 10$^4$resting spores/g soil to above 10$\^$6/ resting spores/g soil. Few or none of P. brassicae was found in virgin soil without any cropping history, intermediate with 0.36-2.75$\times$10$^4$resting spores/g soil in fields of other crops but more than 10 times higher population was found in severely infected Chinese cabbage fields. Density of P. brassicae was highest in the fields of monocropping of crucifers with some exceptions, but was low in rotated fields with corn, rye, medicinal crops or other non-host vegetables. Pathoen density in soil was decreased rapidly when rye or medicinal crops were cultivated after Chinese cabbage, suggesting that survival of clubroot pathogen appears to be influenced greatly by cropping system. The improved method for detecting resting spores of P. brassicae in soil used in this study seemed to be adequate for estimating population density of P. brassicae in soil in aspects of clearer dyeing, increased detecting sensitivity, and simplicity in preparation.

  • PDF

Distribution of DArT Markers in a Genetic Linkage Map of Tomato (토마토 유전자연관지도 상의 DarT 마커 분포)

  • Truong, Hai Thi Hong;Graham, Elaine;Esch, Elisabeth;Wang, Jaw-Fen;Hanson, Peter
    • Horticultural Science & Technology
    • /
    • v.28 no.4
    • /
    • pp.664-671
    • /
    • 2010
  • A genetic linkage map was constructed using 188 $F_9$ RILs derived from a cross between $Solanum$ $lycopersicum$ H7996 (resistant to bacterial wilt) and $S.$ $pimpinellifolium$ WVa700 (highly susceptible to bacterial wilt). The map consisted of 361 markers including 260 DArTs, 74 AFLPs, 4 RFLPs, 1 SNP, and 22 SSRs. The resulting linkage map was comprised of 13 linkage groups covering 2042.7 cM. The genetic linkage map had an average map distance between markers of 5.7 cM, with an average DArT marker density of 1/7.9 cM. Based on the distribution of anchor SSR markers, 11 linkage groups were assigned to 10 chromosomes of tomato except chromosomes 5 and 12. The DArT markers were distributed across the genome in a similar way as other markers and showed the highest frequency of clustering (38.8%) at ${\leq}$ 0.5 cM intervals between adjacent markers, which is 3 times higher than AFLPs (13.5%). The present study is the first utilization of DArT markers in tomato linkage map construction.