• Title/Summary/Keyword: Sentence-level

Search Result 202, Processing Time 0.026 seconds

Content-based Korean journal recommendation system using Sentence BERT (Sentence BERT를 이용한 내용 기반 국문 저널추천 시스템)

  • Yongwoo Kim;Daeyoung Kim;Hyunhee Seo;Young-Min Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.37-55
    • /
    • 2023
  • With the development of electronic journals and the emergence of various interdisciplinary studies, the selection of journals for publication has become a new challenge for researchers. Even if a paper is of high quality, it may face rejection due to a mismatch between the paper's topic and the scope of the journal. While research on assisting researchers in journal selection has been actively conducted in English, the same cannot be said for Korean journals. In this study, we propose a system that recommends Korean journals for submission. Firstly, we utilize SBERT (Sentence BERT) to embed abstracts of previously published papers at the document level, compare the similarity between new documents and published papers, and recommend journals accordingly. Next, the order of recommended journals is determined by considering the similarity of abstracts, keywords, and title. Subsequently, journals that are similar to the top recommended journal from previous stage are added by using a dictionary of words constructed for each journal, thereby enhancing recommendation diversity. The recommendation system, built using this approach, achieved a Top-10 accuracy level of 76.6%, and the validity of the recommendation results was confirmed through user feedback. Furthermore, it was found that each step of the proposed framework contributes to improving recommendation accuracy. This study provides a new approach to recommending academic journals in the Korean language, which has not been actively studied before, and it has also practical implications as the proposed framework can be easily applied to services.

A Study on the Natural Language Generation by Machine Translation (영한 기계번역의 자연어 생성 연구)

  • Hong Sung-Ryong
    • Journal of Digital Contents Society
    • /
    • v.6 no.1
    • /
    • pp.89-94
    • /
    • 2005
  • In machine translation the goal of natural language generation is to produce an target sentence transmitting the meaning of source sentence by using an parsing tree of source sentence and target expressions. It provides generator with linguistic structures, word mapping, part-of-speech, lexical information. The purpose of this study is to research the Korean Characteristics which could be used for the establishment of an algorism in speech recognition and composite sound. This is a part of realization for the plan of automatic machine translation. The stage of MT is divided into the level of morphemic, semantic analysis and syntactic construction.

  • PDF

The Study of Developing Korean SentiWordNet for Big Data Analytics : Focusing on Anger Emotion (빅데이터 분석을 위한 한국어 SentiWordNet 개발 방안 연구 : 분노 감정을 중심으로)

  • Choi, Sukjae;Kwon, Ohbyung
    • The Journal of Society for e-Business Studies
    • /
    • v.19 no.4
    • /
    • pp.1-19
    • /
    • 2014
  • Efforts to identify user's recognition which exists in the big data are being conducted actively. They try to measure scores of people's view about products, movies and social issues by analyzing statements raised on Internet bulletin boards or SNS. So this study deals with the problem of determining how to find the emotional vocabulary and the degree of these values. The survey methods are using the results of previous studies for the basic emotional vocabulary and degree, and inferring from the dictionary's glosses for the extended emotional vocabulary. The results were found to have the 4 emotional words lists (vocabularies) as basic emotional list, extended 1 stratum 1 level list from basic vocabulary's glosses, extended 2 stratum 1 level list from glosses of non-emotional words, and extended 2 stratum 2 level list from glosses' glosses. And we obtained the emotional degrees by applying the weight of the sentences and the emphasis multiplier values on the basis of basic emotional list. Experimental results have been identified as AND and OR sentence having a weight of average degree of included words. And MULTIPLY sentence having 1.2 to 1.5 weight depending on the type of adverb. It is also assumed that NOT sentence having a certain degree by reducing and reversing the original word's emotional degree. It is also considered that emphasis multiplier values have 2 for 1 stratum and 3 for 2 stratum.

An Automatic Extraction of English-Korean Bilingual Terms by Using Word-level Presumptive Alignment (단어 단위의 추정 정렬을 통한 영-한 대역어의 자동 추출)

  • Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.6
    • /
    • pp.433-442
    • /
    • 2013
  • A set of bilingual terms is one of the most important factors in building language-related applications such as a machine translation system and a cross-lingual information system. In this paper, we introduce a new approach that automatically extracts candidates of English-Korean bilingual terms by using a bilingual parallel corpus and a basic English-Korean lexicon. This approach can be useful even though the size of the parallel corpus is small. A sentence alignment is achieved first for the document-level parallel corpus. We can align words between a pair of aligned sentences by referencing a basic bilingual lexicon. For unaligned words between a pair of aligned sentences, several assumptions are applied in order to align bilingual term candidates of two languages. A location of a sentence, a relation between words, and linguistic information between two languages are examples of the assumptions. An experimental result shows approximately 71.7% accuracy for the English-Korean bilingual term candidates which are automatically extracted from 1,000 bilingual parallel corpus.

Recognition of Continuous Spoken Korean Language using HMM and Level Building (은닉 마르코프 모델과 레벨 빌딩을 이용한 한국어 연속 음성 인식)

  • 김경현;김상균;김항준
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.11
    • /
    • pp.63-75
    • /
    • 1998
  • Since many co-articulation problems are occurring in continuous spoken Korean language, several researches use words as a basic recognition unit. Though the word unit can solve this problem, it requires much memory and has difficulty fitting an input speech in a word list. In this paper, we propose an hidden Markov model(HMM) based recognition model that is an interconnection network of word HMMs for a syntax of sentences. To match suitably the input sentence into the continuous word list in the network, we use a level building search algorithm. This system represents the large sentence set with a relatively small memory and also has good extensibility. The experimental result of an airplane reservation system shows that it is proper method for a practical recognition system.

  • PDF

Vowel Compression due to Syllable Number in English and Korean

  • Yun, Il-Sung
    • Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.165-173
    • /
    • 2002
  • Strong compression effects in a stressed vowel due to the addition of syllables have been adopted as evidence for stress-timing. In relation to this, Yun (2002) investigated the compression effects of number of syllables on Korean vowel. The results generally revealed that Korean had neither significant nor consistent anticipatory or backwards compression effects, especially when it came to the sentence level. This led us to claim that Korean would not be a stress-timed language. But the language investigated in the study was only Korean, and further cross-linguistic research was needed to confirm the claim. In this study, Yun's (2002) sentence level data are compared with Fowler's (1981) English data. The comparison reveals that Korean seems to be similar to English in the backwards compression effect, whereas the two languages are markedly different in the anticipatory compression effect. Thus, if English is a stress-timed language and the strong anticipatory compression effect is evidence in favour of stress-timing as is claimed, the present cross-linguistic study confirms Yun's (2002) suggestion-Korean is unlikely to be stress-timed. On the other hand, compression effects are revisited: the differences in vowel compression between English and Korean are discussed from the syntactic and phonological points of view.

  • PDF

English Critique and Verb Dictionary based on Extended Verb Pattern (확장 동사형에 기반한 동사사전과 영어 문장 검사기)

  • 차의영
    • Korean Journal of Cognitive Science
    • /
    • v.3 no.2
    • /
    • pp.311-328
    • /
    • 1992
  • The level and accuracy of English sentence that is generated by a man or machine translator are determined by the content of the verb dictionary and effective generation algorithm.The conventional English critiques is not adequate for foreigners because they do not have the verb dictionary including verb pattern or the important grammatical constraints. In this paper,Ipropose a structure of verb dictionary and an English sentence critique based on extended verb pattern that is useful to check and correct mistakes of English sentences generated by machine translator.

Scaling Reuse Detection in the Web through Two-way Boosting with Signatures and LSH

  • Kim, Jong Wook
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.6
    • /
    • pp.735-745
    • /
    • 2013
  • The emergence of Web 2.0 technologies, such as blogs and wiki, enable even naive users to easily create and share content on the Web using freely available content sharing tools. Wide availability of almost free data and promiscuous sharing of content through social networking platforms created a content borrowing phenomenon, where the same content appears (in many cases in the form of extensive quotations) in different outlets. An immediate side effect of this phenomenon is that identifying which content is re-used by whom is becoming a critical tool in social network analysis, including expert identification and analysis of information flow. Internet-scale reuse detection, however, poses extremely challenging scalability issues: considering the large size of user created data on the web, it is essential that the techniques developed for content-reuse detection should be fast and scalable. Thus, in this paper, we propose a $qSign_{lsh}$ algorithm, a mechanism for identifying multi-sentence content reuse among documents by efficiently combining sentence-level evidences. The experiment results show that $qSign_{lsh}$ significantly improves the reuse detection speed and provides high recall.

The Relationship Between Young Children's Comprehension Ability and Story Making : The Development of Narrative (내러티브 발달 : 유아의 이야기 내용이해 및 꾸미기 능력간의 관계 분석)

  • Hwang, Yoon-Se
    • Korean Journal of Child Studies
    • /
    • v.28 no.2
    • /
    • pp.39-53
    • /
    • 2007
  • This study investigated the relationship between young children's comprehension and story making(narrative) by age and gender. Subjects were 109 3-, 4-, and 5-year-olds at two child care centers in K Province. Data were analyzed by two-way ANOVA and simple regression analysis. Results showed differences in comprehension between 3-, 4- and 5-year old children and differences in story making ability between 3- and 5-year-old children. Children's comprehension and story making had positive relationships. Specifically, there were significant relationships with children's comprehension and story construct concept, sentence structure level, language(vocabulary and sentence structure). In sum, the results of this study reveal that young children's comprehension ability is partially related to story making ability by age.

  • PDF

An analysis of English pronunciation for high-level proficiency adult learners (발음 숙련도 상위 성인 학습자들의 영어 발음에 대한 분석)

  • Kim, Ji-Eun
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.39-44
    • /
    • 2018
  • The purpose of this study is to investigate the English pronunciation for high-leveled adult Korean speakers based on pronunciation proficiency test. For this purpose, one native English speaker and eight Korean speakers' suprasegmental features such as sentence F0, standard deviation of vowels and stressed / unstressed vowels' F0, duration and intensity were measured and analyzed. The major results show that (1) high-leveled adult Korean speakers' sentence F0 was similar to that of native English speaker, (2) vowel durations, were less diverse than those of native English speakers, and (3) high-leveled adult Korean speakers utilize vowel duration more actively than F0 to indicate the stress assignment of vowels.