• Title/Summary/Keyword: Group Model Clustering

Search Result 98, Processing Time 0.022 seconds

Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model

  • Kadowaki, Natsuki;Kishida, Kazuaki
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.2
    • /
    • pp.6-17
    • /
    • 2020
  • Word similarity is often measured to enhance system performance in the information retrieval field and other related areas. This paper reports on an experimental comparison of values for word similarity measures that were computed based on 50 intentionally selected words from a Reuters corpus. There were three targets, including (1) co-occurrence-based similarity measures (for which a co-occurrence frequency is counted as the number of documents or sentences), (2) context-based distributional similarity measures obtained from a latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), and Word2Vec algorithm, and (3) similarity measures computed from the tf-idf weights of each word according to a vector space model (VSM). Here, a Pearson correlation coefficient for a pair of VSM-based similarity measures and co-occurrence-based similarity measures according to the number of documents was highest. Group-average agglomerative hierarchical clustering was also applied to similarity matrices computed by individual measures. An evaluation of the cluster sets according to an answer set revealed that VSM- and LDA-based similarity measures performed best.

An Empirical Comparison and Verification Study on the Seaport Clustering Measurement Using Meta-Frontier DEA and Integer Programming Models (메타프론티어 DEA모형과 정수계획모형을 이용한 항만클러스터링 측정에 대한 실증적 비교 및 검증연구)

  • Park, Ro-Kyung
    • Journal of Korea Port Economic Association
    • /
    • v.33 no.2
    • /
    • pp.53-82
    • /
    • 2017
  • The purpose of this study is to show the clustering trend and compare empirical results, as well as to choose the clustering ports for 3 Korean ports (Busan, Incheon, and Gwangyang) by using meta-frontier DEA (Data Envelopment Analysis) and integer models on 38 Asian container ports over the period 2005-2014. The models consider 4 input variables (birth length, depth, total area, and number of cranes) and 1 output variable (container TEU). The main empirical results of the study are as follows. First, the meta-frontier DEA for Chinese seaports identifies as most efficient ports (in decreasing order) Shanghai, Hongkong, Ningbo, Qingdao, and Guangzhou, while efficient Korean seaports are Busan, Incheon, and Gwangyang. Second, the clustering results of the integer model show that the Busan port should cluster with Dubai, Hongkong, Shanghai, Guangzhou, Ningbo, Qingdao, Singapore, and Kaosiung, while Incheon and Gwangyang should cluster with Shahid Rajaee, Haifa, Khor Fakkan, Tanjung Perak, Osaka, Keelong, and Bangkok ports. Third, clustering through the integer model sharply increases the group efficiency of Incheon (401.84%) and Gwangyang (354.25%), but not that of the Busan port. Fourth, the efficiency ranking comparison between the two models before and after the clustering using the Wilcoxon signed-rank test is matched with the average level of group efficiency (57.88 %) and the technology gap ratio (80.93%). The policy implication of this study is that Korean port policy planners should employ meta-frontier DEA, as well as integer models when clustering is needed among Asian container ports for enhancing the efficiency. In addition Korean seaport managers and port authorities should introduce port development and management plans accounting for the reference and clustered seaports after careful analysis.

Adaptive Clustering Algorithm for Recycling Cell Formation: An Application of Fuzzy ART Neural Networks

  • Seo, Kwang-Kyu;Park, Ji-Hyung
    • Journal of Mechanical Science and Technology
    • /
    • v.18 no.12
    • /
    • pp.2137-2147
    • /
    • 2004
  • The recycling cell formation problem means that disposal products are classified into recycling part families using group technology in their end-of-life phase. Disposal products have the uncertainties of product status by usage influences during product use phase, and recycling cells are formed design, process and usage attributes. In order to deal with the uncertainties, fuzzy set theory and fuzzy logic-based neural network model are applied to recycling cell formation problem for disposal products. Fuzzy C-mean algorithm and a heuristic approach based on fuzzy ART neural network is suggested. Especially, the modified Fuzzy ART neural network is shown that it has a good clustering results and gives an extension for systematically generating alternative solutions in the recycling cell formation problem. Disposal refrigerators are shown as examples.

Biomedical Ontologies and Text Mining for Biomedicine and Healthcare: A Survey

  • Yoo, Ill-Hoi;Song, Min
    • Journal of Computing Science and Engineering
    • /
    • v.2 no.2
    • /
    • pp.109-136
    • /
    • 2008
  • In this survey paper, we discuss biomedical ontologies and major text mining techniques applied to biomedicine and healthcare. Biomedical ontologies such as UMLS are currently being adopted in text mining approaches because they provide domain knowledge for text mining approaches. In addition, biomedical ontologies enable us to resolve many linguistic problems when text mining approaches handle biomedical literature. As the first example of text mining, document clustering is surveyed. Because a document set is normally multiple topic, text mining approaches use document clustering as a preprocessing step to group similar documents. Additionally, document clustering is able to inform the biomedical literature searches required for the practice of evidence-based medicine. We introduce Swanson's UnDiscovered Public Knowledge (UDPK) model to generate biomedical hypotheses from biomedical literature such as MEDLINE by discovering novel connections among logically-related biomedical concepts. Another important area of text mining is document classification. Document classification is a valuable tool for biomedical tasks that involve large amounts of text. We survey well-known classification techniques in biomedicine. As the last example of text mining in biomedicine and healthcare, we survey information extraction. Information extraction is the process of scanning text for information relevant to some interest, including extracting entities, relations, and events. We also address techniques and issues of evaluating text mining applications in biomedicine and healthcare.

A sequential outlier detecting method using a clustering algorithm (군집 알고리즘을 이용한 순차적 이상치 탐지법)

  • Seo, Han Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.699-706
    • /
    • 2016
  • Outlier detection methods without performing a test often do not succeed in detecting multiple outliers because they are structurally vulnerable to a masking effect or a swamping effect. This paper considers testing procedures supplemented to a clustering-based method of identifying the group with a minority of the observations as outliers. One of general steps is performing a variety of t-test on individual outlier-candidates. This paper proposes a sequential procedure for searching for outliers by changing cutoff values on a cluster tree and performing a test on a set of outlier-candidates. The proposed method is illustrated and compared to existing methods by an example and Monte Carlo studies.

Improving Web Service Recommendation using Clustering with K-NN and SVD Algorithms

  • Weerasinghe, Amith M.;Rupasingha, Rupasingha A.H.M.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1708-1727
    • /
    • 2021
  • In the advent of the twenty-first century, human beings began to closely interact with technology. Today, technology is developing, and as a result, the world wide web (www) has a very important place on the Internet and the significant task is fulfilled by Web services. A lot of Web services are available on the Internet and, therefore, it is difficult to find matching Web services among the available Web services. The recommendation systems can help in fixing this problem. In this paper, our observation was based on the recommended method such as the collaborative filtering (CF) technique which faces some failure from the data sparsity and the cold-start problems. To overcome these problems, we first applied an ontology-based clustering and then the k-nearest neighbor (KNN) algorithm for each separate cluster group that effectively increased the data density using the past user interests. Then, user ratings were predicted based on the model-based approach, such as singular value decomposition (SVD) and the predictions used for the recommendation. The evaluation results showed that our proposed approach has a less prediction error rate with high accuracy after analyzing the existing recommendation methods.

Extraction of Classes and Hierarchy from Procedural Software (절차지향 소프트웨어로부터 클래스와 상속성 추출)

  • Choi, Jeong-Ran;Park, Sung-Og;Lee, Moon-Kun
    • Journal of KIISE:Software and Applications
    • /
    • v.28 no.9
    • /
    • pp.612-628
    • /
    • 2001
  • This paper presents a methodology to extract classes and inheritance relations from procedural software. The methodology is based on the idea of generating all groups of class candidates, based on the combinatorial groups of object candidates, and their inheritance with all possible combinations and selecting a group of object candidates, and their inheritance with all possible combinations and selecting a group with the best or optimal combination of candidates with respect to the degree of relativity and similarity between class candidates in the group and classes in a domain model. The methodology has innovative features in class candidates in the group and classes in a domain model. The methodology has innovative features in class and inheritance extraction: a clustering method based on both static (attribute) and dynamic (method) clustering, the combinatorial cases of grouping class candidate cases based on abstraction, a signature similarity measurement for inheritance relations among n class candidates or m classes, two-dimensional similarity measurement for inheritance relations among n class candidates or m classes, two-dimensional similarity measurement, that is, the horizontal measurement for overall group similarity between n class candidates and m classes, and the vertical measurement for specific similarity between a set of classes in a group of class candidates and a set of classes with the same class hierarchy in a domain model, etc. This methodology provides reengineering experts with a comprehensive and integrated environment to select the best or optimal group of class candidates.

  • PDF

Clustering of parental and peer variables associated with adolescent risk behaviors and their characteristics -Using Mixture Model- (청소년의 위험행동에 영향을 주는 부모변인과 또래변인을 중심으로 한 집단 구분 및 그 특성 - Mixture Model을 이용하여 -)

  • Lee, Ji-Min;Kwak, Young-Sik
    • Korean Journal of Human Ecology
    • /
    • v.16 no.5
    • /
    • pp.899-908
    • /
    • 2007
  • Clusters of parental and peer variables associated with adolescent risk behaviors are explored using the mixture model. Questionnaires were completed by 917 high school freshmen in the Daegu Kyungpook area and included measures of risk behaviors, parental attachment, autonomy, parental monitoring, and peers' risk behaviors and desirable behaviors. As a result of the mixture model, five clusters were produced. Two of the subgroups were consistent with the literature of showing linear relationships among adolescent risk behaviors and above variables; a group of higher parental attachment and autonomy as well as parental monitoring, lower friends' risk behaviors, and lower adolescent risk behaviors, and a group of lower parental attachment and autonomy as well as parental monitoring, higher friends' risk behaviors, and higher adolescent risk behaviors. Two other subgroups were similar in parental attachment and autonomy, but differed in parental monitoring, friends' risk behaviors, and adolescent risk behaviors. The last subgroup was characterized by scoring the lowest parental attachment and autonomy, parental monitoring, friends' risk behaviors, and lower adolescent risk behaviors compared to other subgroups. The utility of the mixture model in research on adolescent risk behaviors is discussed in the conclusion.

A Method of Detecting the Aggressive Driving of Elderly Driver (노인 운전자의 공격적인 운전 상태 검출 기법)

  • Koh, Dong-Woo;Kang, Hang-Bong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.537-542
    • /
    • 2017
  • Aggressive driving is a major cause of car accidents. Previous studies have mainly analyzed young driver's aggressive driving tendency, yet they were only done through pure clustering or classification technique of machine learning. However, since elderly people have different driving habits due to their fragile physical conditions, it is necessary to develop a new method such as enhancing the characteristics of driving data to properly analyze aggressive driving of elderly drivers. In this study, acceleration data collected from a smartphone of a driving vehicle is analyzed by a newly proposed ECA(Enhanced Clustering method for Acceleration data) technique, coupled with a conventional clustering technique (K-means Clustering, Expectation-maximization algorithm). ECA selects high-intensity data among the data of the cluster group detected through K-means and EM in all of the subjects' data and models the characteristic data through the scaled value. Using this method, the aggressive driving data of all youth and elderly experiment participants were collected, unlike the pure clustering method. We further found that the K-means clustering has higher detection efficiency than EM method. Also, the results of K-means clustering demonstrate that a young driver has a driving strength 1.29 times higher than that of an elderly driver. In conclusion, the proposed method of our research is able to detect aggressive driving maneuvers from data of the elderly having low operating intensity. The proposed method is able to construct a customized safe driving system for the elderly driver. In the future, it will be possible to detect abnormal driving conditions and to use the collected data for early warning to drivers.

Development of newly recruited privates on-the-job Training Achievements Group Classification Model (신병 주특기교육 성취집단 예측모형 개발)

  • Kwak, Ki-Hyo;Suh, Yong-Moo
    • Journal of the military operations research society of Korea
    • /
    • v.33 no.2
    • /
    • pp.101-113
    • /
    • 2007
  • The period of military personnel service will be phased down by 2014 according to 'The law of National Defense Reformation' issued by the Ministry of National Defense. For this reason, the ROK army provides discrimination education to 'newly recruited privates' for more effective individual performance in the on-the-job training. For the training to be more effective, it would be essential to predict the degree of achievements by new privates in the training. Thus, we used data mining techniques to develop a classification model which classifies the new privates into one of two achievements groups, so that different skills of education are applied to each group. The target variable for this model is a binary variable, whose value can be either 'a group of general control' or 'a group of special control'. We developed four pure classification models using Neural Network, Decision Tree, Support Vector Machine and Naive Bayesian. We also built four hybrid models, each of which combines k-means clustering algorithm with one of these four mining technique. Experimental results demonstrated that the highest performance model was the hybrid model of k-means and Neural Network. We expect that various military education programs could be supported by these classification models for better educational performance.