• Title/Summary/Keyword: classification trees

Search Result 313, Processing Time 0.025 seconds

A Two-Phase Shallow Semantic Parsing System Using Clause Boundary Information and Tree Distance (절 경계와 트리 거리를 사용한 2단계 부분 의미 분석 시스템)

  • Park, Kyung-Mi;Hwang, Kyu-Baek
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.5
    • /
    • pp.531-540
    • /
    • 2010
  • In this paper, we present a two-phase shallow semantic parsing method based on a maximum entropy model. The first phase is to recognize semantic arguments, i.e., argument identification. The second phase is to assign appropriate semantic roles to the recognized arguments, i.e., argument classification. Here, the performance of the first phase is crucial for the success of the entire system, because the second phase is performed on the regions recognized at the identification stage. In order to improve performances of the argument identification, we incorporate syntactic knowledge into its pre-processing step. More precisely, boundaries of the immediate clause and the upper clauses of a predicate obtained from clause identification are utilized for reducing the search space. Further, the distance on parse trees from the parent node of a predicate to the parent node of a parse constituent is exploited. Experimental results show that incorporation of syntactic knowledge and the separation of argument identification from the entire procedure enhance performances of the shallow semantic parsing system.

Classification of Viruses Based on the Amino Acid Sequences of Viral Polymerases (바이러스 핵산중합효소의 아미노산 서열에 의한 바이러스 분류)

  • Nam, Ji-Hyun;Lee, Dong-Hun;Lee, Keon-Myung;Lee, Chan-Hee
    • Korean Journal of Microbiology
    • /
    • v.43 no.4
    • /
    • pp.285-291
    • /
    • 2007
  • According to the Baltimore Scheme, viruses are classified into 6 main classes based on their replication and coding strategies. Except for some small DNA viruses, most viruses code for their own polymerases: DNA-dependent DNA, RNA-dependent RNA and RNA-dependent DNA polymerases, all of which contain 4 common motifs. We undertook a phylogenetic study to establish the relationship between the Baltimore Scheme and viral polymerases. Amino acid sequence data sets of viral polymerases were taken from NCBI GenBank, and a multiple alignment was performed with CLUSTAL X program. Phylogenetic trees of viral polymerases constructed from the distance matrices were generally consistent with Baltimore Scheme with some minor exceptions. Interestingly, negative RNA viruses (Class V) could be further divided into 2 subgroups with segmented and non-segmented genomes. Thus, Baltimore Scheme for viral taxonomy could be supported by phylogenetic analysis based on the amino acid sequences of viral polymerases.

3D based Classification of Urban Area using Height and Density Information of LiDAR (LiDAR의 높이 및 밀도 정보를 이용한 도시지역의 3D기반 분류)

  • Jung, Sung-Eun;Lee, Woo-Kyun;Kwak, Doo-Ahn;Choi, Hyun-Ah
    • Spatial Information Research
    • /
    • v.16 no.3
    • /
    • pp.373-383
    • /
    • 2008
  • LiDAR, unlike satellite imagery and aerial photographs, which provides irregularly distributed three-dimensional coordinates of ground surface, enables three-dimensional modeling. In this study, urban area was classified based on 3D information collected by LiDAR. Morphological and spatial properties are determined by the ratio of ground and non-ground point that are estimated with the number of ground reflected point data of LiDAR raw data. With this information, the residential and forest area could be classified in terms of height and density of trees. The intensity of the signal is distinguished by a statistical method, Jenk's Natural Break. Vegetative area (high or low density) and non-vegetative area (high or low density) are classified with reflective ratio of ground surface.

  • PDF

Early Criticality Prediction Model Using Fuzzy Classification (퍼지 분류를 이용한 초기 위험도 예측 모델)

  • Hong, Euy-Seok;Kwon, Yong-Kil
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.5
    • /
    • pp.1401-1408
    • /
    • 2000
  • Critical prediction models that determine whether a design entity is fault-prone or non fault-prone play an important role in reducing system development cost because the problems in early phases largely affected the quality of the late products. Real-time systems such as telecommunication system are so large that criticality prediction is more important in real-time system design. The current models are based on the technique such as discriminant analysis, neural net and classification trees. These models have some problems with analyzing cause of the prediction results and low extendability. In this paper, we propose a criticality prediction model using fuzzy rulebase constructed by genetic algorithm. This model makes it easy to analyze the cause of the result and also provides high extendability, high applicability, and no limit on the number of rules to be found.

  • PDF

Classification of Feature Points Required for Multi-Frame Based Building Recognition (멀티 프레임 기반 건물 인식에 필요한 특징점 분류)

  • Park, Si-young;An, Ha-eun;Lee, Gyu-cheol;Yoo, Ji-sang
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.3
    • /
    • pp.317-327
    • /
    • 2016
  • The extraction of significant feature points from a video is directly associated with the suggested method's function. In particular, the occlusion regions in trees or people, or feature points extracted from the background and not from objects such as the sky or mountains are insignificant and can become the cause of undermined matching or recognition function. This paper classifies the feature points required for building recognition by using multi-frames in order to improve the recognition function(algorithm). First, through SIFT(scale invariant feature transform), the primary feature points are extracted and the mismatching feature points are removed. To categorize the feature points in occlusion regions, RANSAC(random sample consensus) is applied. Since the classified feature points were acquired through the matching method, for one feature point there are multiple descriptors and therefore a process that compiles all of them is also suggested. Experiments have verified that the suggested method is competent in its algorithm.

The detection of cavitation in hydraulic machines by use of ultrasonic signal analysis

  • Gruber, P.;Farhat, M.;Odermatt, P.;Etterlin, M.;Lerch, T.;Frei, M.
    • International Journal of Fluid Machinery and Systems
    • /
    • v.8 no.4
    • /
    • pp.264-273
    • /
    • 2015
  • This presentation describes an experimental approach for the detection of cavitation in hydraulic machines by use of ultrasonic signal analysis. Instead of using the high frequency pulses (typically 1MHz) only for transit time measurement different other signal characteristics are extracted from the individual signals and its correlation function with reference signals in order to gain knowledge of the water conditions. As the pulse repetition rate is high (typically 100Hz), statistical parameters can be extracted of the signals. The idea is to find patterns in the parameters by a classifier that can distinguish between the different water states. This classification scheme has been applied to different cavitation sections: a sphere in a water flow in circular tube at the HSLU in Lucerne, a NACA profile in a cavitation tunnel and two Francis model test turbines all at LMH in Lausanne. From the signal raw data several statistical parameters in the time and frequency domain as well as from the correlation function with reference signals have been determined. As classifiers two methods were used: neural feed forward networks and decision trees. For both classification methods realizations with lowest complexity as possible are of special interest. It is shown that two to three signal characteristics, two from the signal itself and one from the correlation function are in many cases sufficient for the detection capability. The final goal is to combine these results with operating point, vibration, acoustic emission and dynamic pressure information such that a distinction between dangerous and not dangerous cavitation is possible.

Credit Card Bad Debt Prediction Model based on Support Vector Machine (신용카드 대손회원 예측을 위한 SVM 모형)

  • Kim, Jin Woo;Jhee, Won Chul
    • Journal of Information Technology Services
    • /
    • v.11 no.4
    • /
    • pp.233-250
    • /
    • 2012
  • In this paper, credit card delinquency means the possibility of occurring bad debt within the certain near future from the normal accounts that have no debt and the problem is to predict, on the monthly basis, the occurrence of delinquency 3 months in advance. This prediction is typical binary classification problem but suffers from the issue of data imbalance that means the instances of target class is very few. For the effective prediction of bad debt occurrence, Support Vector Machine (SVM) with kernel trick is adopted using credit card usage and payment patterns as its inputs. SVM is widely accepted in the data mining society because of its prediction accuracy and no fear of overfitting. However, it is known that SVM has the limitation in its ability to processing the large-scale data. To resolve the difficulties in applying SVM to bad debt occurrence prediction, two stage clustering is suggested as an effective data reduction method and ensembles of SVM models are also adopted to mitigate the difficulty due to data imbalance intrinsic to the target problem of this paper. In the experiments with the real world data from one of the major domestic credit card companies, the suggested approach reveals the superior prediction accuracy to the traditional data mining approaches that use neural networks, decision trees or logistics regressions. SVM ensemble model learned from T2 training set shows the best prediction results among the alternatives considered and it is noteworthy that the performance of neural networks with T2 is better than that of SVM with T1. These results prove that the suggested approach is very effective for both SVM training and the classification problem of data imbalance.

Incomplete data handling technique using decision trees (결정트리를 이용하는 불완전한 데이터 처리기법)

  • Lee, Jong Chan
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.8
    • /
    • pp.39-45
    • /
    • 2021
  • This paper discusses how to handle incomplete data including missing values. Optimally processing the missing value means obtaining an estimate that is the closest to the original value from the information contained in the training data, and replacing the missing value with this value. The way to achieve this is to use a decision tree that is completed in the process of classifying information by the classifier. In other words, this decision tree is obtained in the process of learning by inputting only complete information that does not include loss values among all training data into the C4.5 classifier. The nodes of this decision tree have classification variable information, and the higher node closer to the root contains more information, and the leaf node forms a classification region through a path from the root. In addition, the average of classified data events is recorded in each region. Events including the missing value are input to this decision tree, and the region closest to the event is searched through a traversal process according to the information of each node. The average value recorded in this area is regarded as an estimate of the missing value, and the compensation process is completed.

A Study on Obtaining Tree Data from Green Spaces in Parks Using Unmanned Aerial Vehicle Images: Focusing on Mureung Park in Chuncheon

  • Lee, Do-Hyung;Kil, Sung-Ho;Lee, Su-Been
    • Journal of People, Plants, and Environment
    • /
    • v.24 no.4
    • /
    • pp.441-450
    • /
    • 2021
  • Background and objective: The purpose of study is to analyze the three-dimensional (3D) structure by creating a 3D model for green spaces in a park using unmanned aerial vehicle (UAV) images. Methods: After producing a digital surface model (DSM) and a digital terrain model (DTM) using UAV images taken in Mureung Park in Chuncheon-si, we generated a digital tree height model (DHM). In addition, we used the mean shift algorithm to test the classification accuracy, and obtain accurate tree height and volume measures through field survey. Results: Most of the tree species planted in Mureung Park were Pinus koraiensis, followed by Pinus densiflora, and Zelkova serrata, and most of the shrubs planted were Rhododendron yedoense, followed by Buxus microphylla, and Spiraea prunifolia. The average height of trees measured at the site was 7.8 m, and the average height estimated by the model was 7.5 m, showing a difference of about 0.3 m. As a result of the t-test, there was no significant difference between height values of the field survey data and the model. The estimated green coverage and volume of the study site using the UAV were 5,019 m2 and 14,897 m3, respectively, and the green coverage and volume measured through the field survey were 6,339 m2 and 17,167 m3. It was analyzed that the green coverage showed a difference of about 21% and the volume showed a difference of about 13%. Conclusion: The UAV equipped with RTK (Real-Time Kinematic) and GNSS (Global Navigation Satellite System) modules used in this study could collect information on tree height, green coverage, and volume with relatively high accuracy within a short period of time. This could serve as an alternative to overcome the limitations of time and cost in previous field surveys using remote sensing techniques.

Development of Type 2 Prediction Prediction Based on Big Data (빅데이터 기반 2형 당뇨 예측 알고리즘 개발)

  • Hyun Sim;HyunWook Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.5
    • /
    • pp.999-1008
    • /
    • 2023
  • Early prediction of chronic diseases such as diabetes is an important issue, and improving the accuracy of diabetes prediction is especially important. Various machine learning and deep learning-based methodologies are being introduced for diabetes prediction, but these technologies require large amounts of data for better performance than other methodologies, and the learning cost is high due to complex data models. In this study, we aim to verify the claim that DNN using the pima dataset and k-fold cross-validation reduces the efficiency of diabetes diagnosis models. Machine learning classification methods such as decision trees, SVM, random forests, logistic regression, KNN, and various ensemble techniques were used to determine which algorithm produces the best prediction results. After training and testing all classification models, the proposed system provided the best results on XGBoost classifier with ADASYN method, with accuracy of 81%, F1 coefficient of 0.81, and AUC of 0.84. Additionally, a domain adaptation method was implemented to demonstrate the versatility of the proposed system. An explainable AI approach using the LIME and SHAP frameworks was implemented to understand how the model predicts the final outcome.