• Title/Summary/Keyword: Statistics Classification

Search Result 867, Processing Time 0.022 seconds

Comparison of the performance of classification algorithms using cytotoxicity data (세포독성 자료를 이용한 분류 알고리즘 성능 비교)

  • Yoon, Yeochang;Jeung, Eui Bae;Jo, Na Rae;Ju, Su In;Lee, Sung Duck
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.3
    • /
    • pp.417-426
    • /
    • 2018
  • An alternative developmental toxicity test using mouse embryonic stem cell derived embryoid bodies has been developed. This alternative method is not to administer chemicals to animals, but to treat chemicals with cells. This study suggests the use of Discriminant Analysis, Support Vector Machine, Artificial Neural Network and k-Nearest Neighbor. Algorithm performance was compared with accuracy and a weighted Cohen's kappa coefficient. In application, various classification techniques were applied to cytotoxicity data to classify drug toxicity and compare the results.

P2P Traffic Classification using Advanced Heuristic Rules and Analysis of Decision Tree Algorithms (개선된 휴리스틱 규칙 및 의사 결정 트리 분석을 이용한 P2P 트래픽 분류 기법)

  • Ye, Wujian;Cho, Kyungsan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.3
    • /
    • pp.45-54
    • /
    • 2014
  • In this paper, an improved two-step P2P traffic classification scheme is proposed to overcome the limitations of the existing methods. The first step is a signature-based classifier at the packet-level. The second step consists of pattern heuristic rules and a statistics-based classifier at the flow-level. With pattern heuristic rules, the accuracy can be improved and the amount of traffic to be classified by statistics-based classifier can be reduced. Based on the analysis of different decision tree algorithms, the statistics-based classifier is implemented with REPTree. In addition, the ensemble algorithm is used to improve the performance of statistics-based classifier Through the verification with the real datasets, it is shown that our hybrid scheme provides higher accuracy and lower overhead compared to other existing schemes.

A Study on the Method of Security Industrial Classification through the Review of Industrial Special Classification (국내산업 특수분류방법을 고려한 보안산업 분류방향 연구)

  • Shin, Eunhee;Chang, Hangbae
    • The Journal of Society for e-Business Studies
    • /
    • v.22 no.4
    • /
    • pp.175-191
    • /
    • 2017
  • The basis of economic statistics for evaluating the security industry's growth and inter-industry impacts is to create a standardized industry classification along with the scope of the security industry. The industrial classification should be written in such a way that it complies with and complies with the standards of the international and domestic standardized standard industrial classifications. Representative classifications of information security, physical security, and convergence security as well as classification of products and services related to security at present are not in line with the criteria of industrial classification based on the characteristics of production activities for products. The results of the convergence security industrial classification study are also consumer-oriented classification, which differs from the supplier-centric classification officially used in statistics, law, and policy enforcement in the present country. In this study, we first summarized the criteria of Korean and international industrial classification, and then examined whether the current classification of security meets these criteria. Next, to examine the classification directions of newly formed industries such as security industry, we reviewed some cases of domestic industrial special classification and types, and proposed the industrial classification criteria and direction of the security industry on the basis of them.

On a Balanced Classification Rule

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.24 no.2
    • /
    • pp.453-470
    • /
    • 1995
  • We describe a constrained optimal classification rule for the case when the prior probability of an observation belonging to one of the two populations is unknown. This is done by suggesting a balanced design for the classification experiment and constructing the optimal rule under the balanced design condition. The rule si characterized by a constrained minimization of total risk of misclassification; the constraint of the rule is constructed by the process of equation between Kullback-Leibler's directed divergence measures obtained from the two population conditional densities. The efficacy of the suggested rule is examined through two-group normal classification. This indicates that, in case little is known about the relative population sizes, dramatic gains in accuracy of classification result can be achieved.

  • PDF

Classification of Traffic Flows into QoS Classes by Unsupervised Learning and KNN Clustering

  • Zeng, Yi;Chen, Thomas M.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.3 no.2
    • /
    • pp.134-146
    • /
    • 2009
  • Traffic classification seeks to assign packet flows to an appropriate quality of service(QoS) class based on flow statistics without the need to examine packet payloads. Classification proceeds in two steps. Classification rules are first built by analyzing traffic traces, and then the classification rules are evaluated using test data. In this paper, we use self-organizing map and K-means clustering as unsupervised machine learning methods to identify the inherent classes in traffic traces. Three clusters were discovered, corresponding to transactional, bulk data transfer, and interactive applications. The K-nearest neighbor classifier was found to be highly accurate for the traffic data and significantly better compared to a minimum mean distance classifier.

Learning and Classification in the Extensional Object Model (확장개체모델에서의 학습과 계층파악)

  • Kim, Yong-Jae;An, Joon-M.;Lee, Seok-Jun
    • Asia pacific journal of information systems
    • /
    • v.17 no.1
    • /
    • pp.33-58
    • /
    • 2007
  • Quiet often, an organization tries to grapple with inconsistent and partial information to generate relevant information to support decision making and action. As such, an organization scans the environment interprets scanned data, executes actions, and learns from feedback of actions, which boils down to computational interpretations and learning in terms of machine learning, statistics, and database. The ExOM proposed in this paper is geared to facilitate such knowledge discovery found in large databases in a most flexible manner. It supports a broad range of learning and classification styles and integrates them with traditional database functions. The learning and classification components of the ExOM are tightly integrated so that learning and classification of objects is less burdensome to ordinary users. A brief sketch of a strategy as to the expressiveness of terminological language is followed by a description of prototype implementation of the learning and classification components of the ExOM.

Effect of Experimental Layout on Model Selection under Variance Components Models: A Simulation Study (분산성분모형에서 요인의 배치구조가 모형선택법에 미치는 영향에 대한 실험연구)

  • Lee, Yonghee
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.5
    • /
    • pp.1035-1046
    • /
    • 2015
  • Variance components models incorporate various random factors in the form of linear models. There are two experimental Layouts for the classification of factors under variance components models: nested classification and crossed classification. We consider two-way variance components models and investigate the effect of experimental Layout on the performance of model selection criteria AIC and BIC. The effect of experimental Layout is studied through a simulation study with various combinations of parameters in a systematic fashion. The simulation study shows differences in performance of model selection methods between the two classification. There is a particular tendency to prefer the smaller model than the true model when the variance component of a nested factor becomes relatively larger than a nesting factor that is persistent even when the sample size is not small.

Document classification using a deep neural network in text mining (텍스트 마이닝에서 심층 신경망을 이용한 문서 분류)

  • Lee, Bo-Hui;Lee, Su-Jin;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.615-625
    • /
    • 2020
  • The document-term frequency matrix is a term extracted from documents in which the group information exists in text mining. In this study, we generated the document-term frequency matrix for document classification according to research field. We applied the traditional term weighting function term frequency-inverse document frequency (TF-IDF) to the generated document-term frequency matrix. In addition, we applied term frequency-inverse gravity moment (TF-IGM). We also generated a document-keyword weighted matrix by extracting keywords to improve the document classification accuracy. Based on the keywords matrix extracted, we classify documents using a deep neural network. In order to find the optimal model in the deep neural network, the accuracy of document classification was verified by changing the number of hidden layers and hidden nodes. Consequently, the model with eight hidden layers showed the highest accuracy and all TF-IGM document classification accuracy (according to parameter changes) were higher than TF-IDF. In addition, the deep neural network was confirmed to have better accuracy than the support vector machine. Therefore, we propose a method to apply TF-IGM and a deep neural network in the document classification.

Classification and discrimination of excel radial charts using the statistical shape analysis (통계적 형상분석을 이용한 엑셀 방사형 차트의 분류와 판별)

  • Seungeon Lee;Jun Hong Kim;Yeonseok Choi;Yong-Seok Choi
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.1
    • /
    • pp.73-86
    • /
    • 2024
  • A radial chart of Excel is very useful graphical method in delivering information for numerical data. However, it is not easy to discriminate or classify many individuals. In this case, after shaping each individual of a radial chart, we need to apply shape analysis. For a radial chart, since landmarks for shaping are formed as many as the number of variables representing the characteristics of the object, we consider a shape that connects them to a line. If the shape becomes complicated due to the large number of variables, it is difficult to easily grasp even if visualized using a radial chart. Principal component analysis (PCA) is performed on variables to create a visually effective shape. The classification table and classification rate are checked by applying the techniques of traditional discriminant analysis, support vector machine (SVM), and artificial neural network (ANN), before and after principal component analysis. In addition, the difference in discrimination between the two coordinates of generalized procrustes analysis (GPA) coordinates and Bookstein coordinates is compared. Bookstein coordinates are obtained by converting the position, rotation, and scale of the shape around the base landmarks, and show higher rate than GPA coordinates for the classification rate.

System Analysis of Disease Classification of Oriental Medicine Diagnosis and Study for Improvement Method (한방진단명의 질병분류체계 분석과 개선방안 연구)

  • Lee, Hyun Ju;Park, Su Bock;Kim, Su Jin;Ko, Seung Yeon
    • Quality Improvement in Health Care
    • /
    • v.12 no.2
    • /
    • pp.84-92
    • /
    • 2006
  • Background : To examine the difference between ICD-10 and The Korean standard classification of disease(oriental medicine), and to aim at improve the practical use as statistical data. It is one of the reason of disease classification. On that account we convert the many to many correspondence presenting classification of oriental medicine into many to one correspondence. Method : The study tracked out 155 patients discharged from the university hospital which is located in Gyeonggi Province and managing hospital and oriental medicine hospital from July to October this year. The period of this study was from August 1 to November 18. We compared correspondence between the two services' diagnosis(hospital services and oriental medicine hospital services) at the same time and attempted many to one correspondence classification. That is for production of statistical data. Result : We investigated the group which have had medical treatment experience of two kinds of services at the same time. The result of this investigation was that the same oriental medicine diagnosis used differently in western medicine diagnosis. 44.5% was accorded with western medicine diagnosis. Correspondence of the western medicine diagnose with the top of the Korean standard classification of disease(oriental medicine) list's western medicine diagnosis was 13.5%. For many to one correspondence classification for statistics, one western medicine diagnosis was selected for one oriental medicine diagnosis. In case of the main diagnosis(I sign) was not enough to explain oriental medicine diagnosis' characteristic, we chose multiple other diagnosis, so other diagnosis(II sign) about patient's cause of disease could be selected for supplement after we examined the patient's records. The statistics was possible with this many to one correspondence. Conclusion : The result of this study about correspondence between western medicine diagnoses and those of oriental medicine confirms that The Korean standard classification of disease(oriental medicine) is hard to be standardized with western medicine diagnosis. Therefore, according to this study, we use new many to one correspondence classification, multiple oriental medicine diagnoses with one ICD-10, which can be used by statistical data.

  • PDF