• Title/Summary/Keyword: Linguistic Model

Search Result 287, Processing Time 0.028 seconds

A Study on Named Entity Recognition for Effective Dialogue Information Prediction (효율적 대화 정보 예측을 위한 개체명 인식 연구)

  • Go, Myunghyun;Kim, Hakdong;Lim, Heonyeong;Lee, Yurim;Jee, Minkyu;Kim, Wonil
    • Journal of Broadcast Engineering
    • /
    • v.24 no.1
    • /
    • pp.58-66
    • /
    • 2019
  • Recognition of named entity such as proper nouns in conversation sentences is the most fundamental and important field of study for efficient conversational information prediction. The most important part of a task-oriented dialogue system is to recognize what attributes an object in a conversation has. The named entity recognition model carries out recognition of the named entity through the preprocessing, word embedding, and prediction steps for the dialogue sentence. This study aims at using user - defined dictionary in preprocessing stage and finding optimal parameters at word embedding stage for efficient dialogue information prediction. In order to test the designed object name recognition model, we selected the field of daily chemical products and constructed the named entity recognition model that can be applied in the task-oriented dialogue system in the related domain.

Predicate Recognition Method using BiLSTM Model and Morpheme Features (BiLSTM 모델과 형태소 자질을 이용한 서술어 인식 방법)

  • Nam, Chung-Hyeon;Jang, Kyung-Sik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.1
    • /
    • pp.24-29
    • /
    • 2022
  • Semantic role labeling task used in various natural language processing fields, such as information extraction and question answering systems, is the task of identifying the arugments for a given sentence and predicate. Predicate used as semantic role labeling input are extracted using lexical analysis results such as POS-tagging, but the problem is that predicate can't extract all linguistic patterns because predicate in korean language has various patterns, depending on the meaning of sentence. In this paper, we propose a korean predicate recognition method using neural network model with pre-trained embedding models and lexical features. The experiments compare the performance on the hyper parameters of models and with or without the use of embedding models and lexical features. As a result, we confirm that the performance of the proposed neural network model was 92.63%.

Applying Meta-model Formalization of Part-Whole Relationship to UML: Experiment on Classification of Aggregation and Composition (UML의 부분-전체 관계에 대한 메타모델 형식화 이론의 적용: 집합연관 및 복합연관 판별 실험)

  • Kim, Taekyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.99-118
    • /
    • 2015
  • Object-oriented programming languages have been widely selected for developing modern information systems. The use of concepts relating to object-oriented (OO, in short) programming has reduced efforts of reusing pre-existing codes, and the OO concepts have been proved to be a useful in interpreting system requirements. In line with this, we have witnessed that a modern conceptual modeling approach supports features of object-oriented programming. Unified Modeling Language or UML becomes one of de-facto standards for information system designers since the language provides a set of visual diagrams, comprehensive frameworks and flexible expressions. In a modeling process, UML users need to consider relationships between classes. Based on an explicit and clear representation of classes, the conceptual model from UML garners necessarily attributes and methods for guiding software engineers. Especially, identifying an association between a class of part and a class of whole is included in the standard grammar of UML. The representation of part-whole relationship is natural in a real world domain since many physical objects are perceived as part-whole relationship. In addition, even abstract concepts such as roles are easily identified by part-whole perception. It seems that a representation of part-whole in UML is reasonable and useful. However, it should be admitted that the use of UML is limited due to the lack of practical guidelines on how to identify a part-whole relationship and how to classify it into an aggregate- or a composite-association. Research efforts on developing the procedure knowledge is meaningful and timely in that misleading perception to part-whole relationship is hard to be filtered out in an initial conceptual modeling thus resulting in deterioration of system usability. The current method on identifying and classifying part-whole relationships is mainly counting on linguistic expression. This simple approach is rooted in the idea that a phrase of representing has-a constructs a par-whole perception between objects. If the relationship is strong, the association is classified as a composite association of part-whole relationship. In other cases, the relationship is an aggregate association. Admittedly, linguistic expressions contain clues for part-whole relationships; therefore, the approach is reasonable and cost-effective in general. Nevertheless, it does not cover concerns on accuracy and theoretical legitimacy. Research efforts on developing guidelines for part-whole identification and classification has not been accumulated sufficient achievements to solve this issue. The purpose of this study is to provide step-by-step guidelines for identifying and classifying part-whole relationships in the context of UML use. Based on the theoretical work on Meta-model Formalization, self-check forms that help conceptual modelers work on part-whole classes are developed. To evaluate the performance of suggested idea, an experiment approach was adopted. The findings show that UML users obtain better results with the guidelines based on Meta-model Formalization compared to a natural language classification scheme conventionally recommended by UML theorists. This study contributed to the stream of research effort about part-whole relationships by extending applicability of Meta-model Formalization. Compared to traditional approaches that target to establish criterion for evaluating a result of conceptual modeling, this study expands the scope to a process of modeling. Traditional theories on evaluation of part-whole relationship in the context of conceptual modeling aim to rule out incomplete or wrong representations. It is posed that qualification is still important; but, the lack of consideration on providing a practical alternative may reduce appropriateness of posterior inspection for modelers who want to reduce errors or misperceptions about part-whole identification and classification. The findings of this study can be further developed by introducing more comprehensive variables and real-world settings. In addition, it is highly recommended to replicate and extend the suggested idea of utilizing Meta-model formalization by creating different alternative forms of guidelines including plugins for integrated development environments.

Impact of Semantic Characteristics on Perceived Helpfulness of Online Reviews (온라인 상품평의 내용적 특성이 소비자의 인지된 유용성에 미치는 영향)

  • Park, Yoon-Joo;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.29-44
    • /
    • 2017
  • In Internet commerce, consumers are heavily influenced by product reviews written by other users who have already purchased the product. However, as the product reviews accumulate, it takes a lot of time and effort for consumers to individually check the massive number of product reviews. Moreover, product reviews that are written carelessly actually inconvenience consumers. Thus many online vendors provide mechanisms to identify reviews that customers perceive as most helpful (Cao et al. 2011; Mudambi and Schuff 2010). For example, some online retailers, such as Amazon.com and TripAdvisor, allow users to rate the helpfulness of each review, and use this feedback information to rank and re-order them. However, many reviews have only a few feedbacks or no feedback at all, thus making it hard to identify their helpfulness. Also, it takes time to accumulate feedbacks, thus the newly authored reviews do not have enough ones. For example, only 20% of the reviews in Amazon Review Dataset (Mcauley and Leskovec, 2013) have more than 5 reviews (Yan et al, 2014). The purpose of this study is to analyze the factors affecting the usefulness of online product reviews and to derive a forecasting model that selectively provides product reviews that can be helpful to consumers. In order to do this, we extracted the various linguistic, psychological, and perceptual elements included in product reviews by using text-mining techniques and identifying the determinants among these elements that affect the usability of product reviews. In particular, considering that the characteristics of the product reviews and determinants of usability for apparel products (which are experiential products) and electronic products (which are search goods) can differ, the characteristics of the product reviews were compared within each product group and the determinants were established for each. This study used 7,498 apparel product reviews and 106,962 electronic product reviews from Amazon.com. In order to understand a review text, we first extract linguistic and psychological characteristics from review texts such as a word count, the level of emotional tone and analytical thinking embedded in review text using widely adopted text analysis software LIWC (Linguistic Inquiry and Word Count). After then, we explore the descriptive statistics of review text for each category and statistically compare their differences using t-test. Lastly, we regression analysis using the data mining software RapidMiner to find out determinant factors. As a result of comparing and analyzing product review characteristics of electronic products and apparel products, it was found that reviewers used more words as well as longer sentences when writing product reviews for electronic products. As for the content characteristics of the product reviews, it was found that these reviews included many analytic words, carried more clout, and related to the cognitive processes (CogProc) more so than the apparel product reviews, in addition to including many words expressing negative emotions (NegEmo). On the other hand, the apparel product reviews included more personal, authentic, positive emotions (PosEmo) and perceptual processes (Percept) compared to the electronic product reviews. Next, we analyzed the determinants toward the usefulness of the product reviews between the two product groups. As a result, it was found that product reviews with high product ratings from reviewers in both product groups that were perceived as being useful contained a larger number of total words, many expressions involving perceptual processes, and fewer negative emotions. In addition, apparel product reviews with a large number of comparative expressions, a low expertise index, and concise content with fewer words in each sentence were perceived to be useful. In the case of electronic product reviews, those that were analytical with a high expertise index, along with containing many authentic expressions, cognitive processes, and positive emotions (PosEmo) were perceived to be useful. These findings are expected to help consumers effectively identify useful product reviews in the future.

Prediction of Break Indices in Korean Read Speech (국어 낭독체 발화의 운율경계 예측)

  • Kim Hyo Sook;Kim Chung Won;Kim Sun Ju;Kim Seoncheol;Kim Sam Jin;Kwon Chul Hong
    • MALSORI
    • /
    • no.43
    • /
    • pp.1-9
    • /
    • 2002
  • This study aims to model Korean prosodic phrasing using CART(classification and regression tree) method. Our data are limited to Korean read speech. We used 400 sentences made up of editorials, essays, novels and news scripts. Professional radio actress read 400sentences for about two hours. We used K-ToBI transcription system. For technical reason, original break indices 1,2 are merged into AP. Differ from original K-ToBI, we have three break index Zero, AP and IP. Linguistic information selected for this study is as follows: the number of syllables in ‘Eojeol’, the location of ‘Eojeol’ in sentence and part-of-speech(POS) of adjacent ‘Eojeol’s. We trained CART tree using above information as variables. Average accuracy of predicting NonIP(Zero and AP) and IP was 90.4% in training data and 88.5% in test data. Average prediction accuracy of Zero and AP was 79.7% in training data and 78.7% in test data.

  • PDF

Training of Fuzzy-Neural Network for Voice-Controlled Robot Systems by a Particle Swarm Optimization

  • Watanabe, Keigo;Chatterjee, Amitava;Pulasinghe, Koliya;Jin, Sang-Ho;Izumi, Kiyotaka;Kiguchi, Kazuo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1115-1120
    • /
    • 2003
  • The present paper shows the possible development of particle swarm optimization (PSO) based fuzzy-neural networks (FNN) which can be employed as an important building block in real life robot systems, controlled by voice-based commands. The PSO is employed to train the FNNs which can accurately output the crisp control signals for the robot systems, based on fuzzy linguistic spoken language commands, issued by an user. The FNN is also trained to capture the user spoken directive in the context of the present performance of the robot system. Hidden Markov Model (HMM) based automatic speech recognizers are developed, as part of the entire system, so that the system can identify important user directives from the running utterances. The system is successfully employed in a real life situation for motion control of a redundant manipulator.

  • PDF

Cross-cultural Validation of Instruments Measuring Health Beliefs about Colorectal Cancer Screening among Korean Americans

  • Lee, Shin-Young;Lee, Eunice E.
    • Journal of Korean Academy of Nursing
    • /
    • v.45 no.1
    • /
    • pp.129-138
    • /
    • 2015
  • Purpose: The purpose of this study was to report the instrument modification and validation processes to make existing health belief model scales culturally appropriate for Korean Americans (KAs) regarding colorectal cancer (CRC) screening utilization. Methods: Instrument translation, individual interviews using cognitive interviewing, and expert reviews were conducted during the instrument modification phase, and a pilot test and a cross-sectional survey were conducted during the instrument validation phase. Data analyses of the cross-sectional survey included internal consistency and construct validity using exploratory and confirmatory factor analysis. Results: The main issues identified during the instrument modification phase were (a) cultural and linguistic translation issues and (b) newly developed items reflecting Korean cultural barriers. Cross-sectional survey analyses during the instrument validation phase revealed that all scales demonstrate good internal consistency reliability (Cronbach's alpha=.72~.88). Exploratory factor analysis showed that susceptibility and severity loaded on the same factor, which may indicate a threat variable. Items with low factor loadings in the confirmatory factor analysis may relate to (a) lack of knowledge about fecal occult blood testing and (b) multiple dimensions of the subscales. Conclusion: Methodological, sequential processes of instrument modification and validation, including translation, individual interviews, expert reviews, pilot testing and a cross-sectional survey, were provided in this study. The findings indicate that existing instruments need to be examined for CRC screening research involving KAs.

The Loom-LAG for syntax analysis Adding a language-independent level to LAG

  • Schulze, Markus
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.411-420
    • /
    • 2002
  • The left-associative grammar model (LAG) has been applied successfully to the morphologic and syntactic analysis of various european and asian languages. The algebraic definition of the LAG is very well suited for the application to natural language processing as it inherently obeys de Saussure's second law (de Saussure, 1913, p. 103) on the linear nature of language, which phrase-structure grammar (PSG) and categorial grammar (CG) do not. This paper describes the so-called Loom-LAGs (LLAG) -a specialization of LAGs for the analysis of natural language. Whereas the only means of language-independent abstraction in ordinary LAG is the principle of possible continuations, LLAGs introduce a set of more detailed language-independent generalizations that form the so-called loom of a Loom-LAG. Every LLAG uses the very smut loom and adds the language-specific information in the form of a declarative description of the language -much like an ancient mechanised Jacquard-loom would take a program-card providing the specific pattern for the cloth to be woven. The linguistic information is formulated declaratively in so-called syntax plans that describe the sequential structure of clauses and phrases. This approach introduces the explicit notion of phrases and sentence structure to LAG without violating de Saussure's second law iud without leaving the ground of the original algebraic definition of LAG, LLAGS can in fact be shown to be just a notational variant of LAG -but one that is much better suited for the manual development of syntax grammars for the robust analysis of free texts.

  • PDF

Language (Meaning) and Cognitive Science (언어(특히 의미)와 인지과학)

  • Lee, Chung-Min
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2005.05a
    • /
    • pp.23-27
    • /
    • 2005
  • Humans perceptually segment events, but models that predict where events will be segmented are limited. Developing a detailed model may be hard because of the overlapping quality of events (i.e., one can smile and walk at the same time, but the endpoint of each event can be different). However, some aspects of events appear to be universally represented in the world's languages. For example, path, the trajectory of an object's movement, is one of the most universally encoded event features. Although it is generally encoded in the prepositions of English (e.g., up), in other languagesit is encoded in the verbs (e.g., descendere). Linguistic universals may represent basic levels of event perception. Here we consider how one of these, path, might be parsed. Because the spatiotemporal projection of paths to an observation point is similar to the spatial projection of objects, we tested the hypothesis that path segmentation and object segmentation would be based on similar image properties, such as discontinuities in orientation.

  • PDF

Effective Feature Extraction in the Individual frequency Sub-bands for Speech Recognition (음성인식을 위한 주파수 부대역별 효과적인 특징추출)

  • 지상문
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.4
    • /
    • pp.598-603
    • /
    • 2003
  • This paper presents a sub-band feature extraction approach in which the feature extraction method in the individual frequency sub-bands is determined in terms of speech recognition accuracy. As in the multi-band paradigm, features are extracted independently in frequency sub-regions of the speech signal. Since the spectral shape is well structured in the low frequency region, the all pole model is effective for feature extraction. But, in the high frequency region, the nonparametric transform, discrete cosine transform is effective for the extraction of cepstrum. Using the sub-band specific feature extraction method, the linguistic information in the individual frequency sub-bands can be extracted effectively for automatic speech recognition. The validity of the proposed method is shown by comparing the results of speech recognition experiments for our method with those obtained using a full-band feature extraction method.