• Title/Summary/Keyword: Clustering Strategy

Search Result 195, Processing Time 0.034 seconds

Water Pollution Source Tracing Using FDC and Correlation Analysis in Geumho River Basin (FDC 및 상관관계 분석을 이용한 금호강 유역에서의 오염원추적)

  • Park, Kyung Ok;Lee, Chang Hee;Cha, Il Geun
    • Journal of Wetlands Research
    • /
    • v.18 no.3
    • /
    • pp.232-243
    • /
    • 2016
  • In order to establish the watershed water quality management strategy of Total Maximum Daily Load(TMDL), it is necessary to understand the relationship between water quality component impacts, and to identify the impacts on downstream target point of watershed water quality management of waste treatment plant(WTP) discharge and upstream/tributary loads. In this study, we determined the impacts between the water quality contaminants, and traced water pollution sources using monitoring data of ministry of environment in tributaries and main stream and WTP monitoring data. Test area is set to Geumho river basin which has characteristics of urban and rural area and composes of GeumhoA, GeumhoB, GeumhoC watershed units in TMDL. The clustering with five grades of discharge data and the correlation analysis were performed through the FDC(Flow duration curve) analysis, which more clearly identified the points and water contaminants deteriorating target water quality of downstream point. This can be used as a tool for tracing pollutants with FDC analysis, and will help us establish the watershed water quality management strategy for TMDL target point in watershed more effectively.

Strategy for Store Management Using SOM Based on RFM (RFM 기반 SOM을 이용한 매장관리 전략 도출)

  • Jeong, Yoon Jeong;Choi, Il Young;Kim, Jae Kyeong;Choi, Ju Choel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.93-112
    • /
    • 2015
  • Depending on the change in consumer's consumption pattern, existing retail shop has evolved in hypermarket or convenience store offering grocery and daily products mostly. Therefore, it is important to maintain the inventory levels and proper product configuration for effectively utilize the limited space in the retail store and increasing sales. Accordingly, this study proposed proper product configuration and inventory level strategy based on RFM(Recency, Frequency, Monetary) model and SOM(self-organizing map) for manage the retail shop effectively. RFM model is analytic model to analyze customer behaviors based on the past customer's buying activities. And it can differentiates important customers from large data by three variables. R represents recency, which refers to the last purchase of commodities. The latest consuming customer has bigger R. F represents frequency, which refers to the number of transactions in a particular period and M represents monetary, which refers to consumption money amount in a particular period. Thus, RFM method has been known to be a very effective model for customer segmentation. In this study, using a normalized value of the RFM variables, SOM cluster analysis was performed. SOM is regarded as one of the most distinguished artificial neural network models in the unsupervised learning tool space. It is a popular tool for clustering and visualization of high dimensional data in such a way that similar items are grouped spatially close to one another. In particular, it has been successfully applied in various technical fields for finding patterns. In our research, the procedure tries to find sales patterns by analyzing product sales records with Recency, Frequency and Monetary values. And to suggest a business strategy, we conduct the decision tree based on SOM results. To validate the proposed procedure in this study, we adopted the M-mart data collected between 2014.01.01~2014.12.31. Each product get the value of R, F, M, and they are clustered by 9 using SOM. And we also performed three tests using the weekday data, weekend data, whole data in order to analyze the sales pattern change. In order to propose the strategy of each cluster, we examine the criteria of product clustering. The clusters through the SOM can be explained by the characteristics of these clusters of decision trees. As a result, we can suggest the inventory management strategy of each 9 clusters through the suggested procedures of the study. The highest of all three value(R, F, M) cluster's products need to have high level of the inventory as well as to be disposed in a place where it can be increasing customer's path. In contrast, the lowest of all three value(R, F, M) cluster's products need to have low level of inventory as well as to be disposed in a place where visibility is low. The highest R value cluster's products is usually new releases products, and need to be placed on the front of the store. And, manager should decrease inventory levels gradually in the highest F value cluster's products purchased in the past. Because, we assume that cluster has lower R value and the M value than the average value of good. And it can be deduced that product are sold poorly in recent days and total sales also will be lower than the frequency. The procedure presented in this study is expected to contribute to raising the profitability of the retail store. The paper is organized as follows. The second chapter briefly reviews the literature related to this study. The third chapter suggests procedures for research proposals, and the fourth chapter applied suggested procedure using the actual product sales data. Finally, the fifth chapter described the conclusion of the study and further research.

Hierarchical Clustering Approach of Multisensor Data Fusion: Application of SAR and SPOT-7 Data on Korean Peninsula

  • Lee, Sang-Hoon;Hong, Hyun-Gi
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.65-65
    • /
    • 2002
  • In remote sensing, images are acquired over the same area by sensors of different spectral ranges (from the visible to the microwave) and/or with different number, position, and width of spectral bands. These images are generally partially redundant, as they represent the same scene, and partially complementary. For many applications of image classification, the information provided by a single sensor is often incomplete or imprecise resulting in misclassification. Fusion with redundant data can draw more consistent inferences for the interpretation of the scene, and can then improve classification accuracy. The common approach to the classification of multisensor data as a data fusion scheme at pixel level is to concatenate the data into one vector as if they were measurements from a single sensor. The multiband data acquired by a single multispectral sensor or by two or more different sensors are not completely independent, and a certain degree of informative overlap may exist between the observation spaces of the different bands. This dependence may make the data less informative and should be properly modeled in the analysis so that its effect can be eliminated. For modeling and eliminating the effect of such dependence, this study employs a strategy using self and conditional information variation measures. The self information variation reflects the self certainty of the individual bands, while the conditional information variation reflects the degree of dependence of the different bands. One data set might be very less reliable than others in the analysis and even exacerbate the classification results. The unreliable data set should be excluded in the analysis. To account for this, the self information variation is utilized to measure the degrees of reliability. The team of positively dependent bands can gather more information jointly than the team of independent ones. But, when bands are negatively dependent, the combined analysis of these bands may give worse information. Using the conditional information variation measure, the multiband data are split into two or more subsets according the dependence between the bands. Each subsets are classified separately, and a data fusion scheme at decision level is applied to integrate the individual classification results. In this study. a two-level algorithm using hierarchical clustering procedure is used for unsupervised image classification. Hierarchical clustering algorithm is based on similarity measures between all pairs of candidates being considered for merging. In the first level, the image is partitioned as any number of regions which are sets of spatially contiguous pixels so that no union of adjacent regions is statistically uniform. The regions resulted from the low level are clustered into a parsimonious number of groups according to their statistical characteristics. The algorithm has been applied to satellite multispectral data and airbone SAR data.

  • PDF

Cluster-based Delay-adaptive Sensor Scheduling for Energy-saving in Wireless Sensor Networks (센서네트워크에서 클러스터기반의 에너지 효율형 센서 스케쥴링 연구)

  • Choi, Wook;Lee, Yong;Chung, Yoo-Jin
    • Journal of the Korea Society for Simulation
    • /
    • v.18 no.3
    • /
    • pp.47-59
    • /
    • 2009
  • Due to the application-specific nature of wireless sensor networks, the sensitivity to such a requirement as data reporting latency may vary depending on the type of applications, thus requiring application-specific algorithm and protocol design paradigms which help us to maximize energy conservation and thus the network lifetime. In this paper, we propose a novel delay-adaptive sensor scheduling scheme for energy-saving data gathering which is based on a two phase clustering (TPC). The ultimate goal is to extend the network lifetime by providing sensors with high adaptability to the application-dependent and time-varying delay requirements. The TPC requests sensors to construct two types of links: direct and relay links. The direct links are used for control and forwarding time critical sensed data. On the other hand, the relay links are used only for data forwarding based on the user delay constraints, thus allowing the sensors to opportunistically use the most energy-saving links and forming a multi-hop path. Simulation results demonstrate that cluster-based delay-adaptive data gathering strategy (CD-DGS) saves a significant amount of energy for dense sensor networks by adapting to the user delay constraints.

Developing the Strategies of Redesigning the Role of Retail Stores Using Cluster Analysis: The Case of Mongolian Retail Company (클러스터링을 통한 유통매장의 역할 재설계 전략 수립: 몽골유통사를 대상으로)

  • Tsatsral Telmentugs;KwangSup Shin
    • The Journal of Bigdata
    • /
    • v.8 no.1
    • /
    • pp.131-156
    • /
    • 2023
  • The traditional retail industry significantly changed over the past decade due to the mobile and online technologies. This change has been accompanied by a shift in consumer behavior regarding purchasing patterns. Despite the rise of online shopping, there are still specific categories of products, such as "Processed food" in Mongolia, for which traditional shopping remains the preferred purchase method. To prepare for the inevitable future of retail businesses, firms need to closely analyze the performance of their offline stores to plan their further actions in a new multi-channel environment. Retailers must integrate diverse channels into their operations to stay relevant and adjust to the shifting market. In this research, we have analyzed the performance data such as sales, profit, and amount of sales of offline stores by using clustering approach. From the clustering, we have found the several distinct insights by comparing the circumstances and performance of retail stores. For the certain retail stores, we have proposed three different strategies: a fulfillment hub store between online and offline channels, an experience store to elongate customers' time on the premises, and a merge between two non-related channels that could complement each other to increase traffic based on the store characteristics. With the proposed strategies, it may enhance the user experience and profit at the same time.

Interest-based Customer Segmentation Methodology Using Topic Modeling (토픽 분석을 활용한 관심 기반 고객 세분화 방법론)

  • Hyun, Yoonjin;Kim, Namgyu;Cho, Yoonho
    • Journal of Information Technology Applications and Management
    • /
    • v.22 no.1
    • /
    • pp.77-93
    • /
    • 2015
  • As the range of the customer choice becomes more diverse, the average life span of companies' products and services is becoming shorter. Most companies are striving to maximize the revenue by understanding the customer's needs and providing customized products and services. However, companies had to bear a significant burden, in terms of the time and cost involved in the process of determining each individual customer's needs. Therefore, an alternative method is employed that involves grouping the customers into different categories based on certain criteria and establishing a marketing strategy tailored for each group. In this way, customer segmentation and customer clustering are performed using demographic information and behavioral information. Demographic information included sex, age, income level, and etc., while behavioral information was usually identified indirectly through customers' purchase history and search history. However, there is a limitation regarding companies' customer behavioral information, because the information is usually obtained through the limited data provided by a customer on a company's website. This is because the pattern indicated when a customer accesses a particular site might not be representative of the general tendency of that customer. Therefore, in this study, rather than the pattern indicated through a particular site, a customer's interest is identified using that customer's access record pertaining to external news. Hence, by utilizing this method, we proposed a methodology to perform customer segmentation. In addition, by extracting the main issues through a topic analysis covering approximately 3,000 Internet news articles, the actual experiment applying customer segmentation is performed and the applicability of the proposed methodology is analyzed.

Patterning Waterbird Assemblages on Rice Fields Using Self-Organizing Map and Random Forest (자기조직화지도(Self-organizing map)와 랜덤 포레스트 분석(Random forest)을 이용한 논습지에 도래하는 수조류 군집 특성 파악)

  • Nam, Hyung-Kyu;Choi, Seung-Hye;Yoo, Jeong-Chil
    • Korean Journal of Environmental Agriculture
    • /
    • v.34 no.3
    • /
    • pp.168-177
    • /
    • 2015
  • BACKGROUND: In recent year, there has been great concern regarding agricultural land uses and their importance for the conservation of biodiversity. Rice fields are managed unique wetland for wildlife, especially waterbirds. A comprehensive monitoring of the waterbird assemblage to understand patterning changes was attempted for rice ecosystem in South Korea. This rice ecosystem has been recognized as one of the most important for waterbirds conservation. METHODS AND RESULTS: Biweekly monitoring was implemented for the 4 years from April 2009 to March 2010, from April 2011 to March 2014. 32 species of waterbirds were observed. Self-organizing map (SOM) and random forest were applied to the waterbirds dataset to identify the characteristics in waterbirds distribution. SOM and random forest analysis clearly classified into four clusters and extract ecological information from waterbird dataset. Waterbird assemblages represented strong seasonality and habitat use according to waterbird group such as shorebirds, herons and waterfowl. CONCLUSION: Our results showed that the combination of SOM and random forest analysis could be useful for ecosystem assessment and management. Furthermore, we strongly suggested that a strict management strategy for the rice fields to conserve the waterbirds. The strategy could be seasonally and species specific.

Design of Distributed Hadoop Full Stack Platform for Big Data Collection and Processing (빅데이터 수집 처리를 위한 분산 하둡 풀스택 플랫폼의 설계)

  • Lee, Myeong-Ho
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.7
    • /
    • pp.45-51
    • /
    • 2021
  • In accordance with the rapid non-face-to-face environment and mobile first strategy, the explosive increase and creation of many structured/unstructured data every year demands new decision making and services using big data in all fields. However, there have been few reference cases of using the Hadoop Ecosystem, which uses the rapidly increasing big data every year to collect and load big data into a standard platform that can be applied in a practical environment, and then store and process well-established big data in a relational database. Therefore, in this study, after collecting unstructured data searched by keywords from social network services based on Hadoop 2.0 through three virtual machine servers in the Spring Framework environment, the collected unstructured data is loaded into Hadoop Distributed File System and HBase based on the loaded unstructured data, it was designed and implemented to store standardized big data in a relational database using a morpheme analyzer. In the future, research on clustering and classification and analysis using machine learning using Hive or Mahout for deep data analysis should be continued.

Technology Development Strategy of Piggyback Transportation System Using Topic Modeling Based on LDA Algorithm

  • Jun, Sung-Chan;Han, Seong-Ho;Kim, Sang-Baek
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.261-270
    • /
    • 2020
  • In this study, we identify promising technologies for Piggyback transportation system by analyzing the relevant patent information. In order for this, we first develop the patent database by extracting relevant technology keywords from the pioneering research papers for the Piggyback flactcar system. We then employed textmining to identify the frequently referred words from the patent database, and using these words, we applied the LDA (Latent Dirichlet Allocation) algorithm in order to identify "topics" that are corresponding to "key" technologies for the Piggyback system. Finally, we employ the ARIMA model to forecast the trends of these "key" technologies for technology forecasting, and identify the promising technologies for the Piggyback system. with keyword search method the patent analysis. The results show that data-driven integrated management system, operation planning system and special cargo (especially fluid and gas) handling/storage technologies are identified to be the "key" promising technolgies for the future of the Piggyback system, and data reception/analysis techniques must be developed in order to improve the system performance. The proposed procedure and analysis method provides useful insights to develop the R&D strategy and the technology roadmap for the Piggyback system.

Associations Between Conventional Healthy Behaviors and Social Distancing During the COVID-19 Pandemic: Evidence From the 2020 Community Health Survey in Korea

  • Rang Hee, Kwon;Minsoo, Jung
    • Journal of Preventive Medicine and Public Health
    • /
    • v.55 no.6
    • /
    • pp.568-577
    • /
    • 2022
  • Objectives: Many studies have shown that social distancing, as a non-pharmaceutical intervention (NPI) that is one of the various measures against coronavirus disease 2019 (COVID-19), is an effective preventive measure to suppress the spread of infectious diseases. This study explored the relationships between traditional health-related behaviors in Korea and social distancing practices during the COVID-19 pandemic. Methods: Data were obtained from the 2020 Community Health Survey conducted by the Korea Disease Control and Prevention Agency (n=98 149). The dependent variable was the degree of social distancing practice to cope with the COVID-19 epidemic. Independent variables included health-risk behaviors and health-promoting behaviors. The moderators were vaccination and unmet medical needs. Predictors affecting the practice of social distancing were identified through hierarchical multiple logistic regression analysis. Results: Smokers (adjusted odds ratio [aOR], 0.924) and frequent drinkers (aOR, 0.933) were more likely not to practice social distancing. A greater degree of physical activity was associated with a higher likelihood of practicing social distancing (aOR, 1.029). People who were vaccinated against influenza were more likely to practice social distancing than those who were not (aOR, 1.150). However, people with unmet medical needs were less likely to practice social distancing than those who did not experience unmet medical needs (aOR, 0.757). Conclusions: Social distancing practices were related to traditional health behaviors such as smoking, drinking, and physical activity. Their patterns showed a clustering effect of health inequality. Therefore, when establishing a strategy to strengthen social distancing, a strategy to protect the vulnerable should be considered concomitantly.