• Title/Summary/Keyword: Sentence Feature

Search Result 108, Processing Time 0.019 seconds

Variation of Cannonical Sentence Structure in Korean & Japanese Dialects & its Implication

  • Khym, Han-gyoo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.7 no.2
    • /
    • pp.142-148
    • /
    • 2015
  • The main purpose of this squib is to provide a new principled account for variation of canonical sentence structure in Korean and Japanese based on the linguistic data commonly observed in some dialects of Korean and Japanese. Unlike the English case in which Comp(lementizer) such as 'that' in an embedded clause freely drops as far as the ECP (Lasnik & Saito 1992) is obeyed, some dialects of both Korean and Japanese show interesting linguistic data very different from those of English, thereby leading us to reasonably doubt the traditionally-accepted paradigm of the canonical sentence structure of CP for all languages. In this squib I propose, based on Korean & Japanese dialects and by developing the Minimal Structure Principle (MSP) ($Bo{\check{s}}kovi{\acute{c}}$ 1997, p. 25), that the cannonical structure of a sentence is not fixed, from the beginning at all, to be one single maximal category, CP. Instead, it should be decided to be either CP or IP, based on the feature of [${\pm}$markedness] and MSP, and the marked (or non-cannonical) embedded sentence needs to satisfy ECP for adjacency (or feature-licensing by the matrix verb in the MP terminology).

Modality-Based Sentence-Final Intonation Prediction for Korean Conversational-Style Text-to-Speech Systems

  • Oh, Seung-Shin;Kim, Sang-Hun
    • ETRI Journal
    • /
    • v.28 no.6
    • /
    • pp.807-810
    • /
    • 2006
  • This letter presents a prediction model for sentence-final intonations for Korean conversational-style text-to-speech systems in which we introduce the linguistic feature of 'modality' as a new parameter. Based on their function and meaning, we classify tonal forms in speech data into tone types meaningful for speech synthesis and use the result of this classification to build our prediction model using a tree structured classification algorithm. In order to show that modality is more effective for the prediction model than features such as sentence type or speech act, an experiment is performed on a test set of 970 utterances with a training set of 3,883 utterances. The results show that modality makes a higher contribution to the determination of sentence-final intonation than sentence type or speech act, and that prediction accuracy improves up to 25% when the feature of modality is introduced.

  • PDF

Hypernews Detection using Sentence BERT Embedding (Sentence BERT 임베딩을 이용한 과편향 뉴스 판별)

  • Lim, Jungwoo;Whang, Taesun;Oh, Dongsuk;Yang, Kisu;Lim, Heuiseok
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.388-391
    • /
    • 2019
  • 과편향 뉴스 판별(hyperpartisan news detection)은 뉴스 기사가 특정 인물 또는 정당에 편향되었는지 판단하는 task이다. 이를 위해 feature-based ELMo + CNN 모델이 제안되었으나, 이는 문서 임베딩이 아닌 단어 임베딩의 평균을 사용한다는 한계가 존재한다. 따라서 본 논문에서는 feature-based 접근법을 따르며 Sentence-BERT(SentBERT)의 문서 임베딩을 이용한 feature-based SentBERT 기반의 과편향 뉴스 판별 모델을 제안한다. 제안 모델의 효과를 입증하기 위해 ELMO, BERT, SBERT와 CNN, BiLSTM을 적용한 비교 실험을 진행하였고, 기존 state-of-the-art 모델보다 f1-score 기준 1.3%p 높은 성능을 보였다.

  • PDF

Event Sentence Extraction for Online Trend Analysis (온라인 동향 분석을 위한 이벤트 문장 추출 방안)

  • Yun, Bo-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.9
    • /
    • pp.9-15
    • /
    • 2012
  • A conventional event sentence extraction research doesn't learn the 3W features in the learning step and applies the rule on whether the 3W feature exists in the extraction step. This paper presents a sentence weight based event sentence extraction method that calculates the weight of the 3W features in the learning step and applies the weight of the 3W features in the extraction step. In the experimental result, we show that top 30% features by the $TF{\times}IDF$ weighting method is good in the feature filtering. In the real estate domain of the public issue, the performance of sentence weight based event sentence extraction method is improved by who and when of 3W features. Moreover, In the real estate domain of the public issue, the sentence weight based event sentence extraction method is better than the other machine learning based extraction method.

Language- Independent Sentence Boundary Detection with Automatic Feature Selection

  • Lee, Do-Gil
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1297-1304
    • /
    • 2008
  • This paper proposes a machine learning approach for language-independent sentence boundary detection. The proposed method requires no heuristic rules and language-specific features, such as part-of-speech information, a list of abbreviations or proper names. With only the language-independent features, we perform experiments on not only an inflectional language but also an agglutinative language, having fairly different characteristics (in this paper, English and Korean, respectively). In addition, we obtain good performances in both languages. We have also experimented with the methods under a wide range of experimental conditions, especially for the selection of useful features.

  • PDF

Classification of Cognitive States from fMRI data using Fisher Discriminant Ratio and Regions of Interest

  • Do, Luu Ngoc;Yang, Hyung Jeong
    • International Journal of Contents
    • /
    • v.8 no.4
    • /
    • pp.56-63
    • /
    • 2012
  • In recent decades, analyzing the activities of human brain achieved some accomplishments by using the functional Magnetic Resonance Imaging (fMRI) technique. fMRI data provide a sequence of three-dimensional images related to human brain's activity which can be used to detect instantaneous cognitive states by applying machine learning methods. In this paper, we propose a new approach for distinguishing human's cognitive states such as "observing a picture" versus "reading a sentence" and "reading an affirmative sentence" versus "reading a negative sentence". Since fMRI data are high dimensional (about 100,000 features in each sample), extremely sparse and noisy, feature selection is a very important step for increasing classification accuracy and reducing processing time. We used the Fisher Discriminant Ratio to select the most powerful discriminative features from some Regions of Interest (ROIs). The experimental results showed that our approach achieved the best performance compared to other feature extraction methods with the average accuracy approximately 95.83% for the first study and 99.5% for the second study.

The Effect of the Sentence Location on Arabic Sentiment Analysis

  • Alotaibi, Saud S.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.317-319
    • /
    • 2022
  • Rich morphology language such as Arabic needs more investigation and method to improve the sentiment analysis task. Using all document parts in the process of the sentiment analysis may add some unnecessary information to the classifier. Therefore, this paper shows the ongoing work to use sentence location as a feature with Arabic sentiment analysis. Our proposed method employs a supervised sentiment classification method by enriching the feature space model with some information from the document. The experiments and evaluations that were conducted in this work show that our proposed feature in the sentiment analysis for Arabic improves the performance of the classifier compared to the baseline model.

Generic Document Summarization using Coherence of Sentence Cluster and Semantic Feature (문장군집의 응집도와 의미특징을 이용한 포괄적 문서요약)

  • Park, Sun;Lee, Yeonwoo;Shim, Chun Sik;Lee, Seong Ro
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.12
    • /
    • pp.2607-2613
    • /
    • 2012
  • The results of inherent knowledge based generic summarization are influenced by the composition of sentence in document set. In order to resolve the problem, this papser propses a new generic document summarization which uses clustering of semantic feature of document and coherence of document cluster. The proposed method clusters sentences using semantic feature deriving from NMF(non-negative matrix factorization), which it can classify document topic group because inherent structure of document are well represented by the sentence cluster. In addition, the method can improve the quality of summarization because the importance sentences are extracted by using coherence of sentence cluster and the cluster refinement by re-cluster. The experimental results demonstrate appling the proposed method to generic summarization achieves better performance than generic document summarization methods.

Effectiveness of Fuzzy Graph Based Document Model

  • Aswathy M R;P.C. Reghu Raj;Ajeesh Ramanujan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.8
    • /
    • pp.2178-2198
    • /
    • 2024
  • Graph-based document models have good capabilities to reveal inter-dependencies among unstructured text data. Natural language processing (NLP) systems that use such models as an intermediate representation have shown good performance. This paper proposes a novel fuzzy graph-based document model and to demonstrate its effectiveness by applying fuzzy logic tools for text summarization. The proposed system accepts a text document as input and identifies some of its sentence level features, namely sentence position, sentence length, numerical data, thematic word, proper noun, title feature, upper case feature, and sentence similarity. The fuzzy membership value of each feature is computed from the sentences. We also propose a novel algorithm to construct the fuzzy graph as an intermediate representation of the input document. The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric is used to evaluate the model. The evaluation based on different quality metrics was also performed to verify the effectiveness of the model. The ANOVA test confirms the hypothesis that the proposed model improves the summarizer performance by 10% when compared with the state-of-the-art summarizers employing alternate intermediate representations for the input text.

Pseudo Feature Point Removal using Pixel Connectivity Tracing (픽셀 연결성 추적을 이용한 의사 특징점 제거)

  • Kim, Kang;Lee, Keon-Ik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.8
    • /
    • pp.95-101
    • /
    • 2011
  • In this paper, using pixel connectivity tracking feature to remove a doctor has been studied. Feature extraction method is a method using the crossing. However, by crossing a lot of feature extraction method sis a doctor. Extracted using the method of crossing the wrong feature to remove them from the downside and the eight pixels around the fork to trace if it satisfies the conditions in the actual feature extraction and feature conditions are not satisfied because the doctor was removed. To evaluate the performance using crossing methods and extracted using pixel connectivity trace was compared to the actual feature, the experimental results using pixel connectivity trace arcuate sentence, croissants sentence, sentence the defrost feature on your doctor about47%, respectively, 40%, 30%were found to remove.