• Title/Summary/Keyword: Sentence Type Identification

Search Result 4, Processing Time 0.018 seconds

Sentence Type Identification in Korean Applications to Korean-Sign Language Translation and Korean Speech Synthesis (한국어 문장 유형의 자동 분류 한국어-수화 변환 및 한국어 음성 합성에의 응용)

  • Chung, Jin-Woo;Lee, Ho-Joon;Park, Jong-C.
    • Journal of the HCI Society of Korea
    • /
    • v.5 no.1
    • /
    • pp.25-35
    • /
    • 2010
  • This paper proposes a method of automatically identifying sentence types in Korean and improving naturalness in sign language generation and speech synthesis using the identified sentence type information. In Korean, sentences are usually categorized into five types: declarative, imperative, propositive, interrogative, and exclamatory. However, it is also known that these types are quite ambiguous to identify in dialogues. In this paper, we present additional morphological and syntactic clues for the sentence type and propose a rule-based procedure for identifying the sentence type using these clues. The experimental results show that our method gives a reasonable performance. We also describe how the sentence type is used to generate non-manual signals in Korean-Korean sign language translation and appropriate intonation in Korean speech synthesis. Since the method of using sentence type information in speech synthesis and sign language generation is not much studied previously, it is anticipated that our method will contribute to research on generating more natural speech and sign language expressions.

  • PDF

Automatic Speech Style Recognition Through Sentence Sequencing for Speaker Recognition in Bilateral Dialogue Situations (양자 간 대화 상황에서의 화자인식을 위한 문장 시퀀싱 방법을 통한 자동 말투 인식)

  • Kang, Garam;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.17-32
    • /
    • 2021
  • Speaker recognition is generally divided into speaker identification and speaker verification. Speaker recognition plays an important function in the automatic voice system, and the importance of speaker recognition technology is becoming more prominent as the recent development of portable devices, voice technology, and audio content fields continue to expand. Previous speaker recognition studies have been conducted with the goal of automatically determining who the speaker is based on voice files and improving accuracy. Speech is an important sociolinguistic subject, and it contains very useful information that reveals the speaker's attitude, conversation intention, and personality, and this can be an important clue to speaker recognition. The final ending used in the speaker's speech determines the type of sentence or has functions and information such as the speaker's intention, psychological attitude, or relationship to the listener. The use of the terminating ending has various probabilities depending on the characteristics of the speaker, so the type and distribution of the terminating ending of a specific unidentified speaker will be helpful in recognizing the speaker. However, there have been few studies that considered speech in the existing text-based speaker recognition, and if speech information is added to the speech signal-based speaker recognition technique, the accuracy of speaker recognition can be further improved. Hence, the purpose of this paper is to propose a novel method using speech style expressed as a sentence-final ending to improve the accuracy of Korean speaker recognition. To this end, a method called sentence sequencing that generates vector values by using the type and frequency of the sentence-final ending appearing in the utterance of a specific person is proposed. To evaluate the performance of the proposed method, learning and performance evaluation were conducted with a actual drama script. The method proposed in this study can be used as a means to improve the performance of Korean speech recognition service.

Analysis of Precedents Related to Child Abuse Cases in Child Care Centers and Its Implications (어린이집 아동학대 사건의 판례분석과 시사점 : 아동학대범죄의 처벌 등에 관한 특례법을 중심으로)

  • Jeon, Byeong-Joo;Kim, Keon-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.4
    • /
    • pp.209-218
    • /
    • 2017
  • In Korea, child abuse in child care centers occurs continuously, and it is becoming a social problem. The government has intensified the supervision of management in child care centers, and through strong countermeasures and prevention against child abuse, it enacted the special act on child abuse in oder to enable children to become healthy members of society. In this study, this researcher gasped the legal application on child abusers, and analysed how the punishment for abusers changed according to the application of the special act on child abuse, through examining precedents of child abuse in child care centers. 21 cases related to child abuse cases were collected by searching homepage of the supreme court and each district court in this study. As a result of analyzing the precedents, the sentence of the defendant did not increase greatly, and it differed from the criminal identification of the people in cases of child abuse on which the special act on child abuse was applied. Therefore, it can be seen that there is a demand for more rigorous legal application for abusers in order to prevent child abuse in child care centers.

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality (지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법)

  • Choi, Sukjae;Lee, Jungwon;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.119-138
    • /
    • 2017
  • Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.