• Title/Summary/Keyword: semantic features

Search Result 377, Processing Time 0.025 seconds

Feature Generation of Dictionary for Named-Entity Recognition based on Machine Learning (기계학습 기반 개체명 인식을 위한 사전 자질 생성)

  • Kim, Jae-Hoon;Kim, Hyung-Chul;Choi, Yun-Soo
    • Journal of Information Management
    • /
    • v.41 no.2
    • /
    • pp.31-46
    • /
    • 2010
  • Now named-entity recognition(NER) as a part of information extraction has been used in the fields of information retrieval as well as question-answering systems. Unlike words, named-entities(NEs) are generated and changed steadily in documents on the Web, newspapers, and so on. The NE generation causes an unknown word problem and makes many application systems with NER difficult. In order to alleviate this problem, this paper proposes a new feature generation method for machine learning-based NER. In general features in machine learning-based NER are related with words, but entities in named-entity dictionaries are related to phrases. So the entities are not able to be directly used as features of the NER systems. This paper proposes an encoding scheme as a feature generation method which converts phrase entities into features of word units. Futhermore, due to this scheme, entities with semantic information in WordNet can be converted into features of the NER systems. Through our experiments we have shown that the performance is increased by about 6% of F1 score and the errors is reduced by about 38%.

An Experimental Study on Feature Selection Using Wikipedia for Text Categorization (위키피디아를 이용한 분류자질 선정에 관한 연구)

  • Kim, Yong-Hwan;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.155-171
    • /
    • 2012
  • In text categorization, core terms of an input document are hardly selected as classification features if they do not occur in a training document set. Besides, synonymous terms with the same concept are usually treated as different features. This study aims to improve text categorization performance by integrating synonyms into a single feature and by replacing input terms not in the training document set with the most similar term occurring in training documents using Wikipedia. For the selection of classification features, experiments were performed in various settings composed of three different conditions: the use of category information of non-training terms, the part of Wikipedia used for measuring term-term similarity, and the type of similarity measures. The categorization performance of a kNN classifier was improved by 0.35~1.85% in $F_1$ value in all the experimental settings when non-learning terms were replaced by the learning term with the highest similarity above the threshold value. Although the improvement ratio is not as high as expected, several semantic as well as structural devices of Wikipedia could be used for selecting more effective classification features.

Research on Principles to Transcribe Geographical Names in English for English Version Electronic Map Service (영문판 전자지도서비스를 위한 지명 영문표기의 세부기준과 원칙에 관한 연구)

  • Yi, Mi Sook;Ahn, Jong Wook
    • Spatial Information Research
    • /
    • v.21 no.5
    • /
    • pp.53-61
    • /
    • 2013
  • This study has a research objective to suggest detailed rules and principles to transcribe geographical names in English for English version electronic map service. For this, guidelines which are used in English transcription of local geographical names in Korea and English transcription situations of Korea geographical names in foreign electronic map service were examined. Examining results of current situations showed the English transcription method of home and abroad geographical names caused the chaos because it is not homogenized. In order to identify easy and preferred transcription method for foreigners among English transcription methods about the geographical names which are used together like this, the preference of English transcription methods of the geographical names was examined targeting foreigners. Survey results showed that foreigners prefer to transcribe in Roman character (Romanization) with its semantic word together than just to transcribe the Roman character. Reflecting this preference research results, our country's geographical names were classified as Natural features, Cultural features and man-made structures, and Administrative units and the detailed English transcription rules and principles of each geographical names were suggested.

Classification of Brain Magnetic Resonance Images using 2 Level Decision Tree Learning (2 단계 결정트리 학습을 이용한 뇌 자기공명영상 분류)

  • Kim, Hyung-Il;Kim, Yong-Uk
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.1
    • /
    • pp.18-29
    • /
    • 2007
  • In this paper we present a system that classifies brain MR images by using 2 level decision tree learning. There are two kinds of information that can be obtained from images. One is the low-level features such as size, color, texture, and contour that can be acquired directly from the raw images, and the other is the high-level features such as existence of certain object, spatial relations between different parts that must be obtained through the interpretation of segmented images. Learning and classification should be performed based on the high-level features to classify images according to their semantic meaning. The proposed system applies decision tree learning to each level separately, and the high-level features are synthesized from the results of low-level classification. The experimental results with a set of brain MR images with tumor are discussed. Several experimental results that show the effectiveness of the proposed system are also presented.

Large Language Models-based Feature Extraction for Short-Term Load Forecasting (거대언어모델 기반 특징 추출을 이용한 단기 전력 수요량 예측 기법)

  • Jaeseung Lee;Jehyeok Rew
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.3
    • /
    • pp.51-65
    • /
    • 2024
  • Accurate electrical load forecasting is important to the effective operation of power systems in smart grids. With the recent development in machine learning, artificial intelligence-based models for predicting power demand are being actively researched. However, since existing models get input variables as numerical features, the accuracy of the forecasting model may decrease because they do not reflect the semantic relationship between these features. In this paper, we propose a scheme for short-term load forecasting by using features extracted through the large language models for input data. We firstly convert input variables into a sentence-like prompt format. Then, we use the large language model with frozen weights to derive the embedding vectors that represent the features of the prompt. These vectors are used to train the forecasting model. Experimental results show that the proposed scheme outperformed models based on numerical data, and by visualizing the attention weights in the large language models on the prompts, we identified the information that significantly influences predictions.

The Evaluation Structure of Auditory Images on the Streetscapes - The Semantic Issues of Soundscape based on the Students' Fieldwork - (거리경관에 대한 청각적 이미지의 평가구조 - 대학생들의 음풍경 체험을 통한 의미론적 고찰 -)

  • Han Myung-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.8
    • /
    • pp.481-491
    • /
    • 2005
  • The purpose of this study is to interpret the evaluation structure of auditory images about streetscapes in urban area on the basis of the semantic view of soundscapes. Using the caption evaluation method. which is a new method, from 2001 to 2005, a total of 45 college students participated in a fieldwork to find out the images of sounds while walking on the main streets of Namwon city. It was able get various data which include elements, features, impressions, and preferences about auditory scene. In Namwon city, the elements of the formation of auditory images are classified into natural sound and artificial sound which include machinery sounds, community sounds. and signal sounds. Also, the features of the auditory scene are classified by kind of sound, behavior, condition, character, relationship of circumference and image. Finally, the impression of auditory scene is classified into three categories, which are the emotions of humans, atmosphere of the streets, and the characteristics of the sound itself. From the relationship between auditory scene and estimation, the elements, features and impressions of auditory scene consist of the items which are positive, neutral, and negative images. Also, it was able to grasp the characteristics of auditory image of place or space through the evaluation model of streetscapes in Namwon city.

Sign Language Generation with Animation by Adverbial Phrase Analysis (부사어를 활용한 수화 애니메이션 생성)

  • Kim, Sang-Ha;Park, Jong-C.
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.27-32
    • /
    • 2008
  • Sign languages, commonly used in aurally challenged communities, are a kind of visual language expressing sign words with motion. Spatiality and motility of a sign language are conveyed mainly via sign words as predicates. A predicate is modified by an adverbial phrase with an accompanying change in its semantics so that the adverbial phrase can also affect the overall spatiality and motility of expressions of a sign language. In this paper, we analyze the semantic features of adverbial phrases which may affect the motion-related semantics of a predicate in converting expressions in Korean into those in a sign language and propose a system that generates corresponding animation by utilizing these features.

  • PDF

On Implementation of Korean-English Machine Translation System through Program Reuse (프로그램 재사용을 통한 한/영 기계번역시스템의 구현에 관한 연구)

  • Kim, Hion-Gun;Yang, Gi-Chul;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 1993.10a
    • /
    • pp.559-570
    • /
    • 1993
  • In this article we present a rapid development of a Korean to English translation system, by the help of general English generator, PENMAN. PENMAN is an English sentence generation system, of which input language is a language specially devised for sentence generation, named Sentence Planning Language(SPL). The language SPL has various features that are necessary for generating sentences, covering both syntactic and semantic features. In this development we integrated a Korean language parser based on dependency grammar and the English sentence generator PENMAN, bridging two systems through a converting module, which converts dependency structures produced by Korean parser into SPL for PENMAN.

  • PDF

Query-by-emotion sketch for local emotion-based image retrieval (지역 감성기반 영상 검색을 위한 감성 스케치 질의)

  • Lee, Kyoung-Mi
    • Journal of Internet Computing and Services
    • /
    • v.10 no.6
    • /
    • pp.113-121
    • /
    • 2009
  • In order to retrieve images with different emotions in regions of the images, this paper proposes the image retrieval system using emotion sketch. The proposed retrieval system divides an image into $17{\times}17$ sub-regions and extracts emotion features in each sub-region. In order to extract the emotion features, this paper uses emotion colors on 160 emotion words from H. Nagumo's color scheme imaging chart. We calculate a histogram of each sub-region and consider one emotion word having the maximal value as a representative emotion word of the sub-region. The system demonstrates the effectiveness of the proposed emotion sketch and our experimental results show that the system successfully retrieves on the Corel image database.

  • PDF

A Step towards the Improvement in the Performance of Text Classification

  • Hussain, Shahid;Mufti, Muhammad Rafiq;Sohail, Muhammad Khalid;Afzal, Humaira;Ahmad, Ghufran;Khan, Arif Ali
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.4
    • /
    • pp.2162-2179
    • /
    • 2019
  • The performance of text classification is highly related to the feature selection methods. Usually, two tasks are performed when a feature selection method is applied to construct a feature set; 1) assign score to each feature and 2) select the top-N features. The selection of top-N features in the existing filter-based feature selection methods is biased by their discriminative power and the empirical process which is followed to determine the value of N. In order to improve the text classification performance by presenting a more illustrative feature set, we present an approach via a potent representation learning technique, namely DBN (Deep Belief Network). This algorithm learns via the semantic illustration of documents and uses feature vectors for their formulation. The nodes, iteration, and a number of hidden layers are the main parameters of DBN, which can tune to improve the classifier's performance. The results of experiments indicate the effectiveness of the proposed method to increase the classification performance and aid developers to make effective decisions in certain domains.