• Title/Summary/Keyword: 형태 분석 말뭉치

Search Result 63, Processing Time 0.021 seconds

Research Analysis in Automatic Fake News Detection (자동화기반의 가짜 뉴스 탐지를 위한 연구 분석)

  • Jwa, Hee-Jung;Oh, Dong-Suk;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.7
    • /
    • pp.15-21
    • /
    • 2019
  • Research in detecting fake information gained a lot of interest after the US presidential election in 2016. Information from unknown sources are produced in the shape of news, and its rapid spread is fueled by the interest of public drawn to stimulating and interesting issues. In addition, the wide use of mass communication platforms such as social network services makes this phenomenon worse. Poynter Institute created the International Fact Checking Network (IFCN) to provide guidelines for judging the facts of skilled professionals and releasing "Code of Ethics" for fact check agencies. However, this type of approach is costly because of the large number of experts required to test authenticity of each article. Therefore, research in automated fake news detection technology that can efficiently identify it is gaining more attention. In this paper, we investigate fake news detection systems and researches that are rapidly developing, mainly thanks to recent advances in deep learning technology. In addition, we also organize shared tasks and training corpus that are released in various forms, so that researchers can easily participate in this field, which deserves a lot of research effort.

A Study on the Development of English Inflectional Morphemes Based on the CHILDES Corpus (CHILDES 코퍼스를 기반으로 한 아동의 영어 굴절형태소 발달 연구)

  • Min, Myung Sook;Jun, Jongsup;Lee, Sun-Young
    • Korean Journal of Cognitive Science
    • /
    • v.24 no.3
    • /
    • pp.203-235
    • /
    • 2013
  • The goal of this paper is to test the findings about English-speaking children's acquisition of inflectional morphemes in the literature using a large-scale database. For this, we obtained a 4.7-million-word corpus from the CHILDES (Child Language Data Exchange System) database, and analyzed 1,630 British and American children's uses of English derivational morphemes up to age 7. We analyzed the type and token frequencies, type per token ratio (TTR), and the lexical diversity (D) for such inflectional morphemes as the present progressive -ing, the past tense -(e)d, the comparative and superlative -er/est with reference to children's nationality and age groups. To sum up our findings, the correlations between the D value and children's age varied from morpheme to morpheme; e.g. we found no correlation for -ing, a marginal correlation for -ed, and a strong correlation for -er/-est. Our findings are consistent with Brown's (1973) classical observation that children learn progressive forms earlier than the past tense marker. In addition, overgeneralization errors were frequently found for -ed, but rarely for -ing, showing a U-shaped developmental pattern at ages 2-3. Finally, American children showed higher D scores than British children, which showed that American children used inflectional morphemes for more word types compared with British children. The present study has its significance in testing the earlier findings in the literature by setting up well-defined methodology for analyzing the entire CHILDES database.

  • PDF

A Study on the Integration of Information Extraction Technology for Detecting Scientific Core Entities based on Large Resources (대용량 자원 기반 과학기술 핵심개체 탐지를 위한 정보추출기술 통합에 관한 연구)

  • Choi, Yun-Soo;Cheong, Chang-Hoo;Choi, Sung-Pil;You, Beom-Jong;Kim, Jae-Hoon
    • Journal of Information Management
    • /
    • v.40 no.4
    • /
    • pp.1-22
    • /
    • 2009
  • Large-scaled information extraction plays an important role in advanced information retrieval as well as question answering and summarization. Information extraction can be defined as a process of converting unstructured documents into formalized, tabular information, which consists of named-entity recognition, terminology extraction, coreference resolution and relation extraction. Since all the elementary technologies have been studied independently so far, it is not trivial to integrate all the necessary processes of information extraction due to the diversity of their input/output formation approaches and operating environments. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In this study, we define scientific as a set of 10 types of named entities and technical terminologies in a biomedical domain. in order to automatically extract these entities from scientific documents at once, we develop a framework for scientific core entity extraction which embraces all the pivotal language processors, named-entity recognizer, co-reference resolver and terminology extractor. Each module of the integrated system has been evaluated with various corpus as well as KEEC 2009. The system will be utilized for various information service areas such as information retrieval, question-answering(Q&A), document indexing, dictionary construction, and so on.