• Title/Summary/Keyword: Root clustering

Search Result 37, Processing Time 0.024 seconds

A Method for Determining the Peak Level of Risk in Root Industry Work Environment using Machine Learning (기계학습을 이용한 뿌리산업 작업 환경 위험도 피크레벨 결정방법)

  • Sang-Min Lee;Jun-Yeong Kim;Suk-Chan Kang;Kyung-Jun Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.127-136
    • /
    • 2024
  • Because the hazardous working environments and high labor intensity of the root industry can potentially impact the health of workers, current regulations have focused on measuring and controlling environmental factors, on a semi-annual basis. However, there is a lack of quantitative criteria addressing workers' health conditions other than the physical work environment. This gap makes it challenging to prevent occupational diseases resulting from continuous exposure to harmful substances below regulatory thresholds. Therefore, this paper proposes a machine learning-based method for determining the peak level of risk in root industry work environments and enables real-time safety assessment in workplaces utilizing this approach.

Spatial Region Estimation for Autonomous CoT Clustering Using Hidden Markov Model

  • Jung, Joon-young;Min, Okgee
    • ETRI Journal
    • /
    • v.40 no.1
    • /
    • pp.122-132
    • /
    • 2018
  • This paper proposes a hierarchical dual filtering (HDF) algorithm to estimate the spatial region between a Cloud of Things (CoT) gateway and an Internet of Things (IoT) device. The accuracy of the spatial region estimation is important for autonomous CoT clustering. We conduct spatial region estimation using a hidden Markov model (HMM) with a raw Bluetooth received signal strength indicator (RSSI). However, the accuracy of the region estimation using the validation data is only 53.8%. To increase the accuracy of the spatial region estimation, the HDF algorithm removes the high-frequency signals hierarchically, and alters the parameters according to whether the IoT device moves. The accuracy of spatial region estimation using a raw RSSI, Kalman filter, and HDF are compared to evaluate the effectiveness of the HDF algorithm. The success rate and root mean square error (RMSE) of all regions are 0.538, 0.622, and 0.75, and 0.997, 0.812, and 0.5 when raw RSSI, a Kalman filter, and HDF are used, respectively. The HDF algorithm attains the best results in terms of the success rate and RMSE of spatial region estimation using HMM.

The Effect of Input Variables Clustering on the Characteristics of Ensemble Machine Learning Model for Water Quality Prediction (입력자료 군집화에 따른 앙상블 머신러닝 모형의 수질예측 특성 연구)

  • Park, Jungsu
    • Journal of Korean Society on Water Environment
    • /
    • v.37 no.5
    • /
    • pp.335-343
    • /
    • 2021
  • Water quality prediction is essential for the proper management of water supply systems. Increased suspended sediment concentration (SSC) has various effects on water supply systems such as increased treatment cost and consequently, there have been various efforts to develop a model for predicting SSC. However, SSC is affected by both the natural and anthropogenic environment, making it challenging to predict SSC. Recently, advanced machine learning models have increasingly been used for water quality prediction. This study developed an ensemble machine learning model to predict SSC using the XGBoost (XGB) algorithm. The observed discharge (Q) and SSC in two fields monitoring stations were used to develop the model. The input variables were clustered in two groups with low and high ranges of Q using the k-means clustering algorithm. Then each group of data was separately used to optimize XGB (Model 1). The model performance was compared with that of the XGB model using the entire data (Model 2). The models were evaluated by mean squared error-ob servation standard deviation ratio (RSR) and root mean squared error. The RSR were 0.51 and 0.57 in the two monitoring stations for Model 2, respectively, while the model performance improved to RSR 0.46 and 0.55, respectively, for Model 1.

Detection and Control of Variation Source for a Production Unit

  • Xu, Jichao;Akpolat, Hasan
    • International Journal of Quality Innovation
    • /
    • v.4 no.1
    • /
    • pp.148-159
    • /
    • 2003
  • Variation is the archenemy of quality. To reduce or control the variation in a complex production unit, firstly we need to identify the location of the root cause of the variation. This paper discusses the detection of variability and the techniques used for reduction of variation for a production unit consisting of many processes. In the first part of this paper, the background of variability detection in production systems is introduced which is then followed by a weighted network corresponding to correlation matrix of all processes. Based on the network and clustering criterion of maximum spanning tree, a classification of all processes is derived. Furthermore, the variation of each process in a class is determined by residual analysis. In the last part, the use of methods of robust design for the processes with a larger variability is discussed.

Tree-Based Clustering Protocol for Energy Efficient Wireless Sensor Networks (에너지 효율적 무선 센서 네트워크를 위한 트리 기반 클러스터링 프로토콜)

  • Kim, Kyung-Tae;Youn, Hee-Yong
    • The KIPS Transactions:PartC
    • /
    • v.17C no.1
    • /
    • pp.69-80
    • /
    • 2010
  • Wireless sensor networks (WSN) consisting of a large number of sensors aim to gather data in a variety of environments and are being used and applied to many different fields. The sensor nodes composing a sensor network operate on battery of limited power and as a result, high energy efficiency and long network lifetime are major goals of research in the WSN. In this paper we propose a novel tree-based clustering approach for energy efficient wireless sensor networks. The proposed scheme forms the cluster and the nodes in a cluster construct a tree with the root of the cluster-head., The height of the tree is the distance of the member nodes to the cluster-head. Computer simulation shows that the proposed scheme enhances energy efficiency and balances the energy consumption among the nodes, and thus significantly extends the network lifetime compared to the existing schemes such as LEACH, PEGASIS, and TREEPSI.

Performance Improvement of Word Clustering Using Ontology (온톨로지를 이용한 단어 군집화 성능 개선)

  • Park Eun-Jin;Kim Jae-Hoon;Ock Cheol-Young
    • The KIPS Transactions:PartB
    • /
    • v.13B no.3 s.106
    • /
    • pp.337-344
    • /
    • 2006
  • In this paper, we describe the design and the implementation of word clustering system using a definition of an entry word in the dictionary, called a dictionary definition. Generally word clustering needs various features like words and the performance of a system for the word clustering depends on using some kinds of features. Dictionary definition describes the meaning of an entry in detail, but words in the dictionary definition are implicative or abstractive, and then its length is not long. The word clustering using only features extracted from the dictionary definition results in a lots of small-size clusters. In order to make large-size clusters and improve the performance, we need to transform the features into more general words with keeping the original meaning of the dictionary definition as intact as possible. In this paper, we propose two methods for extending the dictionary definition using ontology. One is to extend the dictionary definition to parent words on the ontology and the other is to extend the dictionary definition to some words in fixed depth from the root of the ontology. Through our experiments, we have observed that the proposed systems outperform that without extending features, and the latter's extending method overtakes the former's extending method in performance. We have also observed that verbs are very useful in extending features in the case of word clustering.

Hierarchical Overlapping Clustering to Detect Complex Concepts (중복을 허용한 계층적 클러스터링에 의한 복합 개념 탐지 방법)

  • Hong, Su-Jeong;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.111-125
    • /
    • 2011
  • Clustering is a process of grouping similar or relevant documents into a cluster and assigning a meaningful concept to the cluster. By this process, clustering facilitates fast and correct search for the relevant documents by narrowing down the range of searching only to the collection of documents belonging to related clusters. For effective clustering, techniques are required for identifying similar documents and grouping them into a cluster, and discovering a concept that is most relevant to the cluster. One of the problems often appearing in this context is the detection of a complex concept that overlaps with several simple concepts at the same hierarchical level. Previous clustering methods were unable to identify and represent a complex concept that belongs to several different clusters at the same level in the concept hierarchy, and also could not validate the semantic hierarchical relationship between a complex concept and each of simple concepts. In order to solve these problems, this paper proposes a new clustering method that identifies and represents complex concepts efficiently. We developed the Hierarchical Overlapping Clustering (HOC) algorithm that modified the traditional Agglomerative Hierarchical Clustering algorithm to allow overlapped clusters at the same level in the concept hierarchy. The HOC algorithm represents the clustering result not by a tree but by a lattice to detect complex concepts. We developed a system that employs the HOC algorithm to carry out the goal of complex concept detection. This system operates in three phases; 1) the preprocessing of documents, 2) the clustering using the HOC algorithm, and 3) the validation of semantic hierarchical relationships among the concepts in the lattice obtained as a result of clustering. The preprocessing phase represents the documents as x-y coordinate values in a 2-dimensional space by considering the weights of terms appearing in the documents. First, it goes through some refinement process by applying stopwords removal and stemming to extract index terms. Then, each index term is assigned a TF-IDF weight value and the x-y coordinate value for each document is determined by combining the TF-IDF values of the terms in it. The clustering phase uses the HOC algorithm in which the similarity between the documents is calculated by applying the Euclidean distance method. Initially, a cluster is generated for each document by grouping those documents that are closest to it. Then, the distance between any two clusters is measured, grouping the closest clusters as a new cluster. This process is repeated until the root cluster is generated. In the validation phase, the feature selection method is applied to validate the appropriateness of the cluster concepts built by the HOC algorithm to see if they have meaningful hierarchical relationships. Feature selection is a method of extracting key features from a document by identifying and assigning weight values to important and representative terms in the document. In order to correctly select key features, a method is needed to determine how each term contributes to the class of the document. Among several methods achieving this goal, this paper adopted the $x^2$�� statistics, which measures the dependency degree of a term t to a class c, and represents the relationship between t and c by a numerical value. To demonstrate the effectiveness of the HOC algorithm, a series of performance evaluation is carried out by using a well-known Reuter-21578 news collection. The result of performance evaluation showed that the HOC algorithm greatly contributes to detecting and producing complex concepts by generating the concept hierarchy in a lattice structure.

Study on the grading standard of Panax notoginseng seedlings

  • Chen, Lijuan;Yang, Ye;Ge, Jin;Cui, Xiuming;Xiong, Yin
    • Journal of Ginseng Research
    • /
    • v.42 no.2
    • /
    • pp.208-217
    • /
    • 2018
  • Background: The quality differences in seedlings of medicinal herbs often affect the quality of medicinal parts. The establishment of the grading standard of Panax notoginseng seedlings is significant for the stable quality of medicinal parts of P. notoginseng. Methods: To establish the grading standard of P. notoginseng seedlings, a total of 36,000 P. notoginseng seedlings were collected from 30 producing areas, of which the fresh weight, root length, root diameter, bud length, bud diameter, and rootlet number were measured. The K-means clustering method was applied to grade seedlings and establish the grading standard. Results: The fresh weight and rootlet number of P. notoginseng seedlings were determined as the final indices of grading. P. notoginseng seedlings from different regions of Yunnan could be preliminarily classified into four grades: the special grade, the premium grade, the standard grade, and culled seedlings. Conclusion: The grading standard was proven to be reasonable according to the agronomic characters, emergence rate, and photosynthetic efficiency of seedlings after transplantation, and the yields and contents of active constituents of the medicinal parts from different grades of seedlings.

Performance Analysis of ILEACH and LEACH Protocols for Wireless Sensor Networks

  • Miah, Md. Sipon;Koo, Insoo
    • Journal of information and communication convergence engineering
    • /
    • v.10 no.4
    • /
    • pp.384-389
    • /
    • 2012
  • In this paper, we examine the problems of the low energy adaptive clustering hierarchy (LEACH) protocol and present ideas for improvement by selecting the cluster head node. The main problem with LEACH lies in the random selection of cluster heads. There exists a probability that the formed cluster heads are unbalanced and may remain in one part of the network, which makes some part of the network unreachable. In this paper, we present a new version of the LEACH protocol called the improved LEACH (ILEACH) protocol, which a cluster head is selected based on its ratio between the current energy level and an initial energy level, and multiplies by the root square of its number of neighbor nodes. The simulation results show that the proposed ILEACH increases the energy efficiency and network lifetime.

An Analysis of Stock Return Behavior using Financial Big Data (금융 빅 데이터를 이용한 주식수익률 행태 분석)

  • Jung, Heon-Yong;Kim, Sang-Sik
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.708-710
    • /
    • 2014
  • 최근 금융 분야에서는 빅 데이터를 이용하여 주가예측 모형을 만들어내고 있으며, 특히 금융 시계열 자료의 변동성 집중 현상을 금융 빅 데이터를 이용하여 분석함으로써 세계 주식시장의 동조화 현상을 분석하고 있다. 본 논문에서는 한국과 중국의 일별 주가지수수익률과 일중 주가지수수익률을 이용하여 이들 2개 국가의 대표적인 주가지수 시계열 데이터에 변동성 집중 현상이 존재하는지를 보다 세밀하게 추적하여 양국 주식시장의 동조화 현상을 분석한다. 분석 결과, 한국의 KOSPI와 중국의 Shanghai 종합주가지수의 지수수익률 시계열 자료는 단위근이 존재하지 않으며, 변동성 집중 현상을 보이는 것으로 나타났다. 또한 한국보다는 중국 주식시장의 변동성 집중현상이 보다 강하게 나타나며, 이러한 현상은 일중 주가지수수익률 시계열 자료에서 보다 두드러지게 나타났다.

  • PDF