• Title/Summary/Keyword: Linguistic segmentation

Search Result 19, Processing Time 0.03 seconds

The Language Determinant Analysis of Investment Among APEC Member Economies (APEC국가간 언어의 투자 결정요인 분석)

  • Shen, Zhi-Feng;Kim, Tae-In
    • Asia-Pacific Journal of Business
    • /
    • v.7 no.2
    • /
    • pp.61-76
    • /
    • 2016
  • This study aims to establish ways of how languages are used as determination factors for investment decisions among Asian countries where used languages are diversified. According to the analysis result, language segmentation of the investing country increases investment whereas the language segmentation of the invested countries is analyzed as the decreasing factor of investment. Also, it is analyzed that the further the linguistic distance between the investing country and the invested country the more investment increases. In the aspects of approached language distance and investment time selection, along with the increased linguistic distance, the elasticity to foreign direct investment is apprehended to be more flexible than other forms of investment. Such result shows the more segmented the languages of the targeted invested country the more investment cost will increase and therefore the results in linguistic distance can be explained as diversification of the invested country and the result to the forming of bridgehead at the invested area.

  • PDF

A Study on the Linguistic Manifestation of 'Couple Look' (Couple Look의 언어적 표현)

  • Han, Myung-Sook
    • The Research Journal of the Costume Culture
    • /
    • v.13 no.5 s.58
    • /
    • pp.756-762
    • /
    • 2005
  • The objective of this research is to examine psychological desires of college students who attempt to express themselves by wearing so called 'couple look' attire, which is a dressing habit that represents responses to various psychologies and the society. Moreover, the message that is trying to be conveyed to others by dressing as such and the question of whether that message is being conveyed, are subject to analysis by applying linguistic classification theory pertaining to this specific term. After a pre-examination based on a through interview conducted with 70 male and female college student, the main examination was based on question and answering methods on 450 male and female college students for data collection. The results were compared, reviewed and analyzed by applying Geoffrey Leech's meaning segmentation theory on linguistics, and was aimed at defining through research how meaning segmentation represented through languages can be applied in expressing one's self through clothing. The research results are as follows. 1. The psychological desires of wearing couple look attire are to express that they like and love each other, are dating, and to showcase their intimacy. 2. Clothing attire that are appropriate to express the couple look are T-shirts, jeans, pants, sweaters, mufflers, and accessories such as tennis shoes, hats, shoes, bags, rings, watches, ear-rings, etc. 3. Amongst people who have tried the couple look and those who have not, those who have said that they were willing to dress in couple look are mostly experienced in dressing so.

  • PDF

Ambiguity Resolution in Chinese Word Segmentation

  • Maosong, Sun;T'sou, Benjamin-K.
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 1995.02a
    • /
    • pp.121-126
    • /
    • 1995
  • A new method for Chinese word segmentation named Conditional F'||'&'||'BMM (Forward and Backward Maximal Matching) which incorporates both bigram statistics (ie., mutual infonllation and difference of t-test between Chinese characters) and linguistic rules for ambiguity resolution is proposed in this paper The key characteristics of this model are the use of: (i) statistics which can be automatically derived from any raw corpus, (ii) a rule base for disambiguation with consistency and controlled size to be built up in a systematic way.

  • PDF

Ambiguity Types of the Homonymic & Heterographic Units for Improving Korean Voice Recognition System - a Preliminary Research (한국어 음성인식 시스템 향상을 위한 동음이철 단위의 중의성 유형 분류)

  • Yoon, Ae-Sun;Kang, Mi-Young
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.67-81
    • /
    • 2008
  • The accuracy rate of P2G (Phoneme-to-Grapheme) is one of the important factors determining the quality of unlimited voice recognition (VR) systems. Few studies were, however, conducted to reduce ambiguities of a phoneme string which can be segmented into a variety of different linguistic units (i.e. morphemes, words, eo-jeols), thus be transformed into more than one grapheme string. This paper is a preliminary research for building a large knowledge base of those homonymic & heterographic units(HHUs), which will provide unlimited Korean VR systems with more accurate P2G information. This paper analyzes 2 main factors generating HHUs: (1) boundary determination of the prosodic unit; (2) its segmentation into linguistic units. In this paper, linguistic characteristics determining variable boundaries of a prosodic unit are investigated, and the ambiguity types of HHUs are classified in accordance with their morphological and syntactic structures as well as with the phonological rules governing them.

  • PDF

Identification of Chinese Personal Names in Unrestricted Texts

  • Cheung, Lawrence;Tsou, Benjamin K.;Sun, Mao-Song
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.28-35
    • /
    • 2002
  • Automatic identification of Chinese personal names in unrestricted texts is a key task in Chinese word segmentation, and can affect other NLP tasks such as word segmentation and information retrieval, if it is not properly addressed. This paper (1) demonstrates the problems of Chinese personal name identification in some If applications, (2) analyzes the structure of Chinese personal names, and (3) further presents the relevant processing strategies. The geographical differences of Chinese personal names between Beijing and Hong Kong are highlighted at the end. It shows that variation in names across different Chinese communities constitutes a critical factor in designing Chinese personal name Identification algorithm.

  • PDF

A Comparative study on the Effectiveness of Segmentation Strategies for Korean Word and Sentence Classification tasks (한국어 단어 및 문장 분류 태스크를 위한 분절 전략의 효과성 연구)

  • Kim, Jin-Sung;Kim, Gyeong-min;Son, Jun-young;Park, Jeongbae;Lim, Heui-seok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.12
    • /
    • pp.39-47
    • /
    • 2021
  • The construction of high-quality input features through effective segmentation is essential for increasing the sentence comprehension of a language model. Improving the quality of them directly affects the performance of the downstream task. This paper comparatively studies the segmentation that effectively reflects the linguistic characteristics of Korean regarding word and sentence classification. The segmentation types are defined in four categories: eojeol, morpheme, syllable and subchar, and pre-training is carried out using the RoBERTa model structure. By dividing tasks into a sentence group and a word group, we analyze the tendency within a group and the difference between the groups. By the model with subchar-level segmentation showing higher performance than other strategies by maximal NSMC: +0.62%, KorNLI: +2.38%, KorSTS: +2.41% in sentence classification, and the model with syllable-level showing higher performance at maximum NER: +0.7%, SRL: +0.61% in word classification, the experimental results confirm the effectiveness of those schemes.

A study on character segmentation and determination of linguistic type for recognition of on-line cursive characters (온라인 연속 필기 문자의 인식을 위한 문자간 구분 및 종류의 결정에 관한 연구)

  • 박강령;전병환;김창수;김우성;김재희
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.7
    • /
    • pp.61-69
    • /
    • 1997
  • With the vigorous researches in the character recognition, the need to recognize run-on multilingual handwritten characters is increasing to provide uses with more comfortable PUI(pen user interface) environments. In general, many intermediate word candidates word candidates are generated in run-on multilingual recognition because there is no information of ending position and linguistic kind of character. To remove unnecessary word candidates which are generated in run-on multilingual recognition, we classify them into two groups and select the best candidate among the word candidates in the group where the final characater is completed using 5 attributes. In this research, we propose a method in order to select the best one candidate. It is called WRM (Weighted ranking method). The weights are adaptively trained by LMS(Least mean square) learning rule. Results show that the abilities of decision makin gusing weights are much better than those not using weights.

  • PDF

Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

  • Hwang, Sangwon;Hong, Jang-Eui;Nam, Young-Kwang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1639-1658
    • /
    • 2019
  • Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.

Fuzzy Relaxation Based on the Theory of Possibility and FAM

  • Uam, Tae-Uk;Park, Yang-Woo;Ha, Yeong-Ho
    • Journal of Electrical Engineering and information Science
    • /
    • v.2 no.5
    • /
    • pp.72-78
    • /
    • 1997
  • This paper presents a fuzzy relaxation algorithm, which is based on the possibility and FAM instead of he probability and compatibility coefficients used in most of existing probabilistic relaxation algorithms, Because of eliminating stages for estimating of compatibility coefficients and normalization of the probability estimates, the proposed fuzzy relaxation algorithms increases the parallelism and has a simple iteration scheme. The construction of fuzzy relaxation scheme consists of the following three tasks: (1) definition of in/output linguistic variables, their term sets, and possibility. (2) Definition of FAM rule bases for relaxation using fuzzy compound relations. (3) Construction of the iteration scheme for calculating the new possibility estimate. Applications to region segmentation an ege detectiojn algorithms show that he proposed method can be used for not only reducing the image ambiguity and segmentation errors, but also enhancing the raw edge iteratively.

  • PDF

Text-dependent Speaker Verification System Over Telephone Lines (전화망을 위한 어구 종속 화자 확인 시스템)

  • 김유진;정재호
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.663-667
    • /
    • 1999
  • In this paper, we review the conventional speaker verification algorithm and present the text-dependent speaker verification system for application over telephone lines and its result of experiments. We apply blind-segmentation algorithm which segments speech into sub-word unit without linguistic information to the speaker verification system for training speaker model effectively with limited enrollment data. And the World-mode] that is created from PBW DB for score normalization is used. The experiments are presented in implemented system using database, which were constructed to simulate field test, and are shown 3.3% EER.

  • PDF