• Title/Summary/Keyword: TREE FEATURE

Search Result 372, Processing Time 0.029 seconds

A Framework for Semantic Interpretation of Noun Compounds Using Tratz Model and Binary Features

  • Zaeri, Ahmad;Nematbakhsh, Mohammad Ali
    • ETRI Journal
    • /
    • v.34 no.5
    • /
    • pp.743-752
    • /
    • 2012
  • Semantic interpretation of the relationship between noun compound (NC) elements has been a challenging issue due to the lack of contextual information, the unbounded number of combinations, and the absence of a universally accepted system for the categorization. The current models require a huge corpus of data to extract contextual information, which limits their usage in many situations. In this paper, a new semantic relations interpreter for NCs based on novel lightweight binary features is proposed. Some of the binary features used are novel. In addition, the interpreter uses a new feature selection method. By developing these new features and techniques, the proposed method removes the need for any huge corpuses. Implementing this method using a modular and plugin-based framework, and by training it using the largest and the most current fine-grained data set, shows that the accuracy is better than that of previously reported upon methods that utilize large corpuses. This improvement in accuracy and the provision of superior efficiency is achieved not only by improving the old features with such techniques as semantic scattering and sense collocation, but also by using various novel features and classifier max entropy. That the accuracy of the max entropy classifier is higher compared to that of other classifiers, such as a support vector machine, a Na$\ddot{i}$ve Bayes, and a decision tree, is also shown.

A Basis of Database Semantics: from Feature Structures to Tables (데이터베이스 의미론의 기초: 자질 구조에서 테이블로)

  • Lee, Ki-Yong
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.297-303
    • /
    • 1999
  • 오늘날 전산망을 통해 대량의 다양한 언어 정보가 일상 언어로 교환되고 있다. 따라서 대량의 이러한 정보를 효율적으로 처리할 수 있는 언어 정보 처리 시스템이 필요하다. Hausser (1999)와 이기용(1999)는 그러한 언어 정보 처리 시스템으로 데이터베이스 의미론을 주장하였다. 이 의미론의 특징은 자연언어의 정보 처리 시스템 구축에 상업용 데이터베이스 관리 시스템을 활용한다는 점이다. 이때 야기되는 문제 중의 하나가 표상(representation)의 문제이다. 그 이유는 언어학의 표상 방법이 데이터베이스 관리 시스템의 표상 방법과 다르기 때문이다. 특히, 관계형 데이터베이스 관리 시스템(RDBMS)에서는 테이블 (table) 형식으로 각종 정보를 표시한다. 따라서, 이 논문의 주안점(主眼点)은 언어학에서 흔히 쓰이는 표상 방법, 즉 문장의 통사 구조를 표시하는 수형(tree)이나 의미 구조를 표시하는 논리 형태(logical form), 또는 단어나 구의 특성을 나타내는 자질 구조(feature structure)를 테이블 형식으로 대체하는 방법을 모색하는 것이다. 더욱이 관계형 데이터베이스 관리 시스템에서는 테이블에 대한 각종 연산, 특히 두 테이블을 연결(link)하는 작업이 가능하고 이런 연산 과정을 통해 정보를 통합하거나 여과할 수 있기 때문에 관련 정보를 하나의 테이블에 표상하거나 정보 자료의 분산 저장과 자료의 순수성을 유지하는 것이 용이하다. 이 논문은 곧 이러한 점을 가급적 간단한 예를 들어 설명하는 데 그 목적이 있다.

  • PDF

Feature Analysis on Industrial Accidents of Manufacturing Businesses Using QUEST Algorithm

  • Leem, Young-Moon;Rogers, K.J.;Hwang, Young-Seob
    • International Journal of Safety
    • /
    • v.5 no.1
    • /
    • pp.37-41
    • /
    • 2006
  • The major objective of the statistical analysis about industrial accidents is to determine the safety factors so that it is possible to prevent or decrease the number of future accidents by educating those who work in a given industrial field in safety management. So far, however, there exists no quantitative method for evaluating danger related to industrial accidents. Therefore, as a method for developing quantitative evaluation technique, this study presents feature analysis of industrial accidents in manufacturing field using QUEST algorithm. In order to analyze features of industrial accidents, a retrospective analysis was performed on 10,536 subjects (10,313 injured people, 223 deaths). The sample for this work was chosen from data related to manufacturing businesses during a three-year period ($2002{\sim}2004$) in Korea. This study used AnswerTree of SPSS and the analysis results enabled us to determine the most important variables that can affect injured people such as the occurrence type, the company size, and the time of occurrence. Also, it was found that the classification system adopted in the present study using QUEST algorithm is quite reliable.

Comparing the Performance of 17 Machine Learning Models in Predicting Human Population Growth of Countries

  • Otoom, Mohammad Mahmood
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.1
    • /
    • pp.220-225
    • /
    • 2021
  • Human population growth rate is an important parameter for real-world planning. Common approaches rely upon fixed parameters like human population, mortality rate, fertility rate, which is collected historically to determine the region's population growth rate. Literature does not provide a solution for areas with no historical knowledge. In such areas, machine learning can solve the problem, but a multitude of machine learning algorithm makes it difficult to determine the best approach. Further, the missing feature is a common real-world problem. Thus, it is essential to compare and select the machine learning techniques which provide the best and most robust in the presence of missing features. This study compares 17 machine learning techniques (base learners and ensemble learners) performance in predicting the human population growth rate of the country. Among the 17 machine learning techniques, random forest outperformed all the other techniques both in predictive performance and robustness towards missing features. Thus, the study successfully demonstrates and compares machine learning techniques to predict the human population growth rate in settings where historical data and feature information is not available. Further, the study provides the best machine learning algorithm for performing population growth rate prediction.

Analysis of Feature Importance of Ship's Berthing Velocity Using Classification Algorithms of Machine Learning (머신러닝 분류 알고리즘을 활용한 선박 접안속도 영향요소의 중요도 분석)

  • Lee, Hyeong-Tak;Lee, Sang-Won;Cho, Jang-Won;Cho, Ik-Soon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.26 no.2
    • /
    • pp.139-148
    • /
    • 2020
  • The most important factor affecting the berthing energy generated when a ship berths is the berthing velocity. Thus, an accident may occur if the berthing velocity is extremely high. Several ship features influence the determination of the berthing velocity. However, previous studies have mostly focused on the size of the vessel. Therefore, the aim of this study is to analyze various features that influence berthing velocity and determine their respective importance. The data used in the analysis was based on the berthing velocity of a ship on a jetty in Korea. Using the collected data, machine learning classification algorithms were compared and analyzed, such as decision tree, random forest, logistic regression, and perceptron. As an algorithm evaluation method, indexes according to the confusion matrix were used. Consequently, perceptron demonstrated the best performance, and the feature importance was in the following order: DWT, jetty number, and state. Hence, when berthing a ship, the berthing velocity should be determined in consideration of various features, such as the size of the ship, position of the jetty, and loading condition of the cargo.

Combined Image Retrieval System using Clustering and Condensation Method (클러스터링과 차원축약 기법을 통합한 영상 검색 시스템)

  • Lee Se-Han;Cho Jungwon;Choi Byung-Uk
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.1 s.307
    • /
    • pp.53-66
    • /
    • 2006
  • This paper proposes the combined image retrieval system that gives the same relevance as exhaustive search method while its performance can be considerably improved. This system is combined with two different retrieval methods and each gives the same results that full exhaustive search method does. Both of them are two-stage method. One uses condensation of feature vectors, and the other uses binary-tree clustering. These two methods extract the candidate images that always include correct answers at the first stage, and then filter out the incorrect images at the second stage. Inasmuch as these methods use equal algorithm, they can get the same result as full exhaustive search. The first method condenses the dimension of feature vectors, and it uses these condensed feature vectors to compute similarity of query and images in database. It can be found that there is an optimal condensation ratio which minimizes the overall retrieval time. The optimal ratio is applied to first stage of this method. Binary-tree clustering method, searching with recursive 2-means clustering, classifies each cluster dynamically with the same radius. For preserving relevance, its range of query has to be compensated at first stage. After candidate clusters were selected, final results are retrieved by computing similarities again at second stage. The proposed method is combined with above two methods. Because they are not dependent on each other, combined retrieval system can make a remarkable progress in performance.

A Study on the Prediction Models of Used Car Prices Using Ensemble Model And SHAP Value: Focus on Feature of the Vehicle Type (앙상블 모델과 SHAP Value를 활용한 국내 중고차 가격 예측 모델에 관한 연구: 차종 특성을 중심으로)

  • Seungjun Yim;Joungho Lee;Choonho Ryu
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.27-43
    • /
    • 2024
  • The market share of online platform services in the used car market continues to expand. And The used car online platform service provides service users with specifications of vehicles, accident history, inspection details, detailed options, and prices of used cars. SUV vehicle type's share in the domestic automobile market will be more than 50% in 2023, Sales of Hybrid vehicle type are doubled compared to last year. And these vehicle types are also gaining popularity in the used car market. Prior research has proposed a used car price prediction model by executing a Machine Learning model for all vehicles or vehicles by brand. On the other hand, the popularity of SUV and Hybrid vehicles in the domestic market continues to rise, but It was difficult to find a study that proposed a used car price prediction model for these vehicle type. This study selects a used car price prediction model by vehicle type using vehicle specifications and options for Sedans, SUV, and Hybrid vehicles produced by domestic brands. Accordingly, after selecting feature through the Lasso regression model, which is a feature selection, the ensemble model was sequentially executed with the same sampling, and the best model by vehicle type was selected. As a result, the best model for all models was selected as the CBR model, and the contribution and direction of the features were confirmed by visualizing Tree SHAP Value for the best model for each model. The implications of this study are expected to propose a used car price prediction model by vehicle type to sales officials using online platform services, confirm the attribution and direction of features, and help solve problems caused by asymmetry fo information between them.

Investigating the Performance of Bayesian-based Feature Selection and Classification Approach to Social Media Sentiment Analysis (소셜미디어 감성분석을 위한 베이지안 속성 선택과 분류에 대한 연구)

  • Chang Min Kang;Kyun Sun Eo;Kun Chang Lee
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.1-19
    • /
    • 2022
  • Social media-based communication has become crucial part of our personal and official lives. Therefore, it is no surprise that social media sentiment analysis has emerged an important way of detecting potential customers' sentiment trends for all kinds of companies. However, social media sentiment analysis suffers from huge number of sentiment features obtained in the process of conducting the sentiment analysis. In this sense, this study proposes a novel method by using Bayesian Network. In this model MBFS (Markov Blanket-based Feature Selection) is used to reduce the number of sentiment features. To show the validity of our proposed model, we utilized online review data from Yelp, a famous social media about restaurant, bars, beauty salons evaluation and recommendation. We used a number of benchmarking feature selection methods like correlation-based feature selection, information gain, and gain ratio. A number of machine learning classifiers were also used for our validation tasks, like TAN, NBN, Sons & Spouses BN (Bayesian Network), Augmented Markov Blanket. Furthermore, we conducted Bayesian Network-based what-if analysis to see how the knowledge map between target node and related explanatory nodes could yield meaningful glimpse into what is going on in sentiments underlying the target dataset.

Disparity Estimation Algorithm using Variable Blocks and Search Ranges (가변블록 및 가변 탐색구간을 이용한 시차추정 알고리즘)

  • Koh Je hyun;Song Hyok;Yoo Ji sang
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.4C
    • /
    • pp.253-261
    • /
    • 2005
  • In this paper, we propose an efficient block-based disparity estimation algorithm fur multiple view image coding in EE2 and EE3 in 3DAV. The proposed method emphasizes on visual quality improvement to satisfy the requirements for multiple view generation. Therefore, we perform an adaptive disparity estimation that constructs variable blocks by considering given image features. Examining neighboring features around desired block search range is set up to decrease complexity and additional information than only using quad-tree coding through applying binary-tree and quad-tree coding by taking into account stereo image feature having big disparity. The experimental results show that the proposed method improves PSNR about 1 to 2dB compared to existing other methods and decreases computational complexity up to maximum 68 percentages than FBMA.

A study on object recognition using morphological shape decomposition

  • Ahn, Chang-Sun;Eum, Kyoung-Bae
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 1999.05a
    • /
    • pp.185-191
    • /
    • 1999
  • Mathematical morphology based on set theory has been applied to various areas in image processing. Pitas proposed a object recognition algorithm using Morphological Shape Decomposition(MSD), and a new representation scheme called Morphological Shape Representation(MSR). The Pitas's algorithm is a simple and adequate approach to recognize objects that are rotated 45 degree-units with respect to the model object. However, this recognition scheme fails in case of random rotation. This disadvantage may be compensated by defining small angle increments. However, this solution may greatly increase computational complexity because the smaller the step makes more number of rotations to be necessary. In this paper, we propose a new method for object recognition based on MSD. The first step of our method decomposes a binary shape into a union of simple binary shapes, and then a new tree structure is constructed which ran represent the relations of binary shapes in an object. finally, we obtain the feature informations invariant to the rotation, translation, and scaling from the tree and calculate matching scores using efficient matching measure. Because our method does not need to rotate the object to be tested, it could be more efficient than Pitas's one. MSR has an intricate structure so that it might be difficult to calculate matching scores even for a little complex object. But our tree has simpler structure than MSR, and easier to calculated the matchng score. We experimented 20 test images scaled, rotated, and translated versions of five kinds of automobile images. The simulation result using octagonal structure elements shows 95% correct recognition rate. The experimental results using approximated circular structure elements are examined. Also, the effect of noise on MSR scheme is considered.

  • PDF