• Title/Summary/Keyword: 코퍼스 기반 어휘 목록

Search Result 5, Processing Time 0.021 seconds

A Corpus-based English Syntax Academic Word List Building and its Lexical Profile Analysis (코퍼스 기반 영어 통사론 학술 어휘목록 구축 및 어휘 분포 분석)

  • Lee, Hye-Jin;Lee, Je-Young
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.12
    • /
    • pp.132-139
    • /
    • 2021
  • This corpus-driven research expounded the compilation of the most frequently occurring academic words in the domain of syntax and compared the extracted wordlist with Academic Word List(AWL) of Coxhead(2000) and General Service List(GSL) of West(1953) to examine their distribution and coverage within the syntax corpus. A specialized 546,074 token corpus, composed of widely used must-read syntax textbooks for English education majors, was loaded into and analyzed with AntWordProfiler 1.4.1. Under the parameter of lexical frequency, the analysis identified 288(50.5%) AWL word forms, appeared 16 times or more, as well as 218(38.2%) AWL items, occurred not exceeding 15 times. The analysis also indicated that the coverage of AWL and GSL accounted for 9.19% and 78.92% respectively and the combination of GSL and AWL amounted to 88.11% of all tokens. Given that AWL can be instrumental in serving broad disciplinary needs, this study highlighted the necessity to compile the domain-specific AWL as a lexical repertoire to promote academic literacy and competence.

A Comparative Study of a New Approach to Keyword Analysis: Focusing on NBC (키워드 분석에 대한 최신 접근법 비교 연구: 성경 코퍼스를 중심으로)

  • Ha, Myoungho
    • Journal of Digital Convergence
    • /
    • v.19 no.7
    • /
    • pp.33-39
    • /
    • 2021
  • This paper aims to analyze lexical properties of keyword lists extracted from NLT Old Testament Corpus(NOTC), NLT New Testament Corpus(NNTC), and The NLT Bible Corpus(NBC) and identify that text dispersion keyness is more effective than corpus frequency keyness. For this purpose, NOTC including around 570,000 running words and NNTC about 200,000 were compiled after downloading the files from NLT website of Bible Hub. Scott's (2020) WordSmith 8.0 was utilized to extract keyword lists through comparing a target corpus and a reference corpus. The result demonstrated that text dispersion keyness showed lexical properties of keyword lists better than corpus frequency keyness and that the former was a superior measure for generating optimal keyword lists to fully meet content-generalizability and content distinctiveness.

Vocabulary Difference of South and North Korean English Textbook (남북한 영어교과서 어휘의 차이)

  • Kim, Jeong-ryeol
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.1
    • /
    • pp.107-116
    • /
    • 2020
  • This paper aims to explore the vocabulary difference between South and North Korean English textbooks as a first step toward a unified vocabulary list. To this end, both South and North Korean English textbooks in 2000s and 2010s are digitized into a corpus of text files, and a vocabulary list is constructed based on the corpus with reference to its concordances for the vocabulary use and contexts using AntConc 3.5.7. The vocabulary list of North Korean English textbooks are compared and found in their differences of quantity and quality of the English vocabulary in English education. Both quantitative and qualitative differences are found in between South and North Korean English textbook corpus. Both South and North aim that students learn about 3,000 words throughout the English education. North Korean English textbook contains more special academic vocabulary while South Korean English textbook is constrained by a strict vocabulary control which does not allow such a flexibility. Differences of vocabulary and their use are caused by the capitalistic market economy of South and the socialists' planned economy of North. Differences are also attributed to the religious words and grammatical vocabulary appearance.

A Diachronic Lexical Analysis of the North Korean English Textbooks (북한 영어 교과서 어휘의 통시적 분석)

  • Kim, Jiyoung;Lee, Je-Young;Kim, Jeong-ryeol
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.4
    • /
    • pp.331-341
    • /
    • 2017
  • This paper aims to analyze English vocabulary of the North Korean textbooks diachronically using the constructed English textbook corpus. The North Korea English textbooks attained from Information Center on North Korea of the Ministry of Unification are divided into before and after Kim Jong-Il era for the year of 1996 in which the curriculum revision has been conducted. They are stored as text files to analyse vocabularies using WordSmith Tools 7.0. The vocabulary size of the revised textbooks increased after the curriculum reorganization, but the number of vocabulary types and vocabulary diversity decreased. After the curriculum revision, it was found that lots of vocabulary related to the establishment of the Kim Jong-Il system appeared as the keyword. It was also found that some vocabularies reflected the economic and social life of North Korea. In addition, through comparison of the 100 high-frequency word list and keywords, it can be concluded that the vocabulary of the English textbooks of North Korea is gradually changing into communicative contents from contents related with written language.

Construction and Evaluation of a Sentiment Dictionary Using a Web Corpus Collected from Game Domain (게임 도메인 웹 코퍼스를 이용한 감성사전 구축 및 평가)

  • Jeong, Woo-Young;Bae, Byung-Chull;Cho, Sung Hyun;Kang, Shin-Jin
    • Journal of Korea Game Society
    • /
    • v.18 no.5
    • /
    • pp.113-122
    • /
    • 2018
  • This paper describes an approach to building and evaluating a sentiment dictionary using a Web corpus in the game domain. To build a sentiment dictionary, we collected vocabulary based on game-related web documents from a domestic portal site, using the Twitter Korean Processor. From the collected vocabulary, we selected the words whose POS are tagged as either verbs or adjectives, and assigned sentiment score for each selected word. To evaluate the constructed sentiment dictionary, we calculated F1 score with precision and recall, using Korean-SWN that is based on English Senti-word Net(SWN). The evaluation results show that average F1 scores are 0.85 for adjectives and 0.77 for verbs, respectively.