• 제목/요약/키워드: Corpus Analysis

Search Result 422, Processing Time 0.024 seconds

Corpus Annotation for the Linguistic Analysis of Reference Relations between Event and Spatial Expressions in Text (텍스트 내 사건-공간 표현 간 참조 관계 분석을 위한 말뭉치 주석)

  • Chung, Jin-Woo;Lee, Hee-Jin;Park, Jong C.
    • Language and Information
    • /
    • v.18 no.2
    • /
    • pp.141-168
    • /
    • 2014
  • Recognizing spatial information associated with events expressed in natural language text is essential not only for the interpretation of such events and but also for the understanding of the relations among them. However, spatial information is rarely mentioned as compared to events and the association between event and spatial expressions is also highly implicit in a text. This would make it difficult to automate the extraction of spatial information associated with events from the text. In this paper, we give a linguistic analysis of how spatial expressions are associated with event expressions in a text. We first present issues in annotating narrative texts with reference relations between event and spatial expressions, and then discuss surface-level linguistic characteristics of such relations based on the annotated corpus to give a helpful insight into developing an automated recognition method.

  • PDF

Predicting CEFR Levels in L2 Oral Speech, Based on Lexical and Syntactic Complexity

  • Hu, Xiaolin
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.1
    • /
    • pp.35-45
    • /
    • 2021
  • With the wide spread of the Common European Framework of Reference (CEFR) scales, many studies attempt to apply them in routine teaching and rater training, while more evidence regarding criterial features at different CEFR levels are still urgently needed. The current study aims to explore complexity features that distinguish and predict CEFR proficiency levels in oral performance. Using a quantitative/corpus-based approach, this research analyzed lexical and syntactic complexity features over 80 transcriptions (includes A1, A2, B1 CEFR levels, and native speakers), based on an interview test, Standard Speaking Test (SST). ANOVA and correlation analysis were conducted to exclude insignificant complexity indices before the discriminant analysis. In the result, distinctive differences in complexity between CEFR speaking levels were observed, and with a combination of six major complexity features as predictors, 78.8% of the oral transcriptions were classified into the appropriate CEFR proficiency levels. It further confirms the possibility of predicting CEFR level of L2 learners based on their objective linguistic features. This study can be helpful as an empirical reference in language pedagogy, especially for L2 learners' self-assessment and teachers' prediction of students' proficiency levels. Also, it offers implications for the validation of the rating criteria, and improvement of rating system.

A Study to Rethink the Components of Teaching Korean Genitive Particle '의': Based on the Errors in Korean Learners' Corpus (한국어 학습자 대상 관형격 조사 '의'의 교육 내용 재고: 학습자 말뭉치에 나타난 오류를 바탕으로)

  • Soo-Hyun Lee;Ji-Young Sim
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.26 no.3
    • /
    • pp.443-454
    • /
    • 2023
  • The purpose of this study is to reveal the Korean learners' usage pattern of '의', the genitive particle, according to semantic classification, so that it can be referred to in determining the contents and methods of related education. The method of this study adopts a quantitative analysis using learners corpus established by National Institute of Korean Language. As a result of the analysis, as proficiency increases, the overall frequency of '의' increases and the number of meaning senses used increases. However, the frequency of errors also increases with it. As for the usage pattern of each sense, the meaning of 'ownership, belonging' is the most frequent, and followed by 'acting entity', 'kinship, social relations', and 'relationship(area)'. In conclusion, the meanings of 'acting subjects' and 'relationships(area) need to be supplemented with explicit education. Other meanings need to be discussed, and decisions should be made in consideration of learning purpose and proficiency.

Text Mining Analysis Technique on ECDIS Accident Report (텍스트 마이닝 기법을 활용한 ECDIS 사고보고서 분석)

  • Lee, Jeong-Seok;Lee, Bo-Kyeong;Cho, Ik-Soon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.25 no.4
    • /
    • pp.405-412
    • /
    • 2019
  • SOLAS requires that ECDIS be installed on ships of more than 500 gross tonnage engaged in international navigation until the first inspection arriving after July 1, 2018. Several accidents related to the use of ECDIS have occurred with its installation as a new major navigation instrument. The 12 incident reports issued by MAIB, BSU, BEAmer, DMAIB, and DSB were analyzed, and the cause of accident was determined to be related to the operation of the navigator and the ECDIS system. The text was analyzed using the R-program to quantitatively analyze words related to the cause of the accident. We used text mining techniques such as Wordcloud, Wordnetwork and Wordweight to represent the importance of words according to their frequency of derivation. Wordcloud uses the N-gram model as a way of expressing the frequency of used words in cloud form. As a result of the uni-gram analysis of the N-gram model, ECDIS words were obtained the most, and the bi-gram analysis results showed that the word "Safety Contour" was used most frequently. Based on the bi-gram analysis, the causative words are classified into the officer and the ECDIS system, and the related words are represented by Wordnetwork. Finally, the related words with the of icer and the ECDIS system were composed of word corpus, and Wordweight was applied to analyze the change in corpus frequency by year. As a result of analyzing the tendency of corpus variation with the trend line graph, more recently, the corpus of the officer has decreased, and conversely, the corpus of the ECDIS system is gradually increasing.

MHC Class II+ (HLA-DP-like) Cells in the Cow Reproductive Tract: I. Immunolocalization and Distribution of MHC Class II+ Cells in Uterus at Different Phases of the Estrous Cycle

  • Eren, U.;Sandikci, M.;Kum, S.;Eren, V.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.21 no.1
    • /
    • pp.35-41
    • /
    • 2008
  • This study was undertaken to investigate the distribution of major histocompatibility complex class II positive (MHC II+) (HLA-DP-like) cells in the cow uterus (cervix, corpus and cornu uteri) and to compare these cells between the estrus and diestrus phases of the estrous cycle. Twenty-nine multiparous cows were used. Tissue samples from the middle of the cervix, the corpus and the right cornu were taken immediately after slaughter at the estrus or diestrus phase. Streptavidin-biotin peroxidase complex staining was used to detect MHC II+ cells. The number of MHC II+ cells per unit area of tissue was counted using image analysis software under a light microscope. Numerous MHC II+ cells were found in the endometrium (cervix, corpus and cornu uteri) in both estrus and diestrus. MHC II+ cells were found in the surface epithelium of the cervix uteri in diestrus, but in the corpus uteri in both estrus and diestrus and in the cornu uteri in estrus. MHC II+ cells were also found freely in the lumen of the glands and between the gland epithelia of the corpus and cornu uteri in both estrus and diestrus. There were also MHC II+ cells in the connective tissue of the myometrium and perimetrium (outside the endometrium) and around the blood vessels. Endothelial cells were frequently positive for MHC II staining. More MHC II+ cells were found in the endometrium than outside the endometrium in both estrus and diestrus (p<0.001). However, there was no difference in the numbers of positive cells between estrus and diestrus either in the endometrium or outside it. These results are the first evidence for HLA-DP-like MHC II+ cells in the bovine uterus. They indicate that antigen presentation by HLA-DP-like MHC II+ cells of the uterus is not influenced by hormonal status.

The f0 distribution of Korean speakers in a spontaneous speech corpus

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.31-37
    • /
    • 2021
  • The fundamental frequency, or f0, is an important acoustic measure in the prosody of human speech. The current study examined the f0 distribution of a corpus of spontaneous speech in order to provide normative data for Korean speakers. The corpus consists of 40 speakers talking freely about their daily activities and their personal views. Praat scripts were created to collect f0 values, and a majority of obvious errors were corrected manually by watching and listening to the f0 contour on a narrow-band spectrogram. Statistical analyses of the f0 distribution were conducted using R. The results showed that the f0 values of all the Korean speakers were right-skewed, with a pointy distribution. The speakers produced spontaneous speech within a frequency range of 274 Hz (from 65 Hz to 339 Hz), excluding statistical outliers. The mode of the total f0 data was 102 Hz. The female f0 range, with a bimodal distribution, appeared wider than that of the male group. Regression analyses based on age and f0 values yielded negligible R-squared values. As the mode of an individual speaker could be predicted from the median, either the median or mode could serve as a good reference for the individual f0 range. Finally, an analysis of the continuous f0 points of intonational phrases revealed that the initial and final segments of the phrases yielded several f0 measurement errors. From these results, we conclude that an examination of a spontaneous speech corpus can provide linguists with useful measures to generalize acoustic properties of f0 variability in a language by an individual or groups. Further studies would be desirable of the use of statistical measures to secure reliable f0 values of individual speakers.

Analysis of the English Textbooks in North Korean First Middle School (북한 제1중학교 영어교과서 분석)

  • Hwang, Seo-yeon;Kim, Jeong-ryeol
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.242-251
    • /
    • 2017
  • For the purposes of this research, a corpus of words was created from the English textbooks of the "First Middle School" for the gifted in North Korea, and using the corpus, their linguistic characteristics were analyzed. Although there have been many studies that identified the traits of English textbooks in the North Korea's general middle school, not much focus has been placed on the English textbooks used at North Korea's First Middle School. Initially, the structure of English textbooks of the first, second, fourth, and sixth grades that had been procured from the Information Center on North Korea was reviewed, after which their corpus was created. Then, by using Wordsmith Tools 7.0, linguistic properties and high frequency content words appeared in the English textbook of the first grade were analyzed specifically. Basic statistical data gathered indicated that while the number of vocabulary did not increase as students progress through the grades, the words used tended to diversify incrementally. In the mean time, a distribution of the high frequency content words by grade illustrated that a big difference was found between the content words used in the English texts of each grade, and it was a subject matter of the texts that determined such difference.

Examining Suicide Tendency Social Media Texts by Deep Learning and Topic Modeling Techniques (딥러닝 및 토픽모델링 기법을 활용한 소셜 미디어의 자살 경향 문헌 판별 및 분석)

  • Ko, Young Soo;Lee, Ju Hee;Song, Min
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.3
    • /
    • pp.247-264
    • /
    • 2021
  • This study aims to create a deep learning-based classification model to classify suicide tendency by suicide corpus constructed for the present study. Also, to analyze suicide factors, the study classified suicide tendency corpus into detailed topics by using topic modeling, an analysis technique that automatically extracts topics. For this purpose, 2,011 documents of the suicide-related corpus collected from social media naver knowledge iN were directly annotated into suicide-tendency documents or non-suicide-tendency documents based on suicide prevention education manual issued by the Central Suicide Prevention Center, and we also conducted the deep learning model(LSTM, BERT, ELECTRA) performance evaluation based on the classification model, using annotated corpus data. In addition, one of the topic modeling techniques, LDA identified suicide factors by classifying thematic literature, and co-word analysis and visualization were conducted to analyze the factors in-depth.

Morphological differences between Water deer and Sika deer ovaries during estrus and pregnancy

  • Ji-Hye Lee;Yong-Su Park;Min-Gee Oh;Sang-Hwan Kim
    • Journal of Animal Reproduction and Biotechnology
    • /
    • v.38 no.2
    • /
    • pp.62-69
    • /
    • 2023
  • Background: Research on the reproductive physiology of Water and Sika deer, an endemic in Korea, still needs to be completed. This study analyzed the ovarian development and morphological characteristics of wild Water deer and Sika deer. Methods: Water deer and Sika deer ovaries were collected from the Korean Peninsula and Russia-Korean Peninsula border during the estrus and pregnancy seasons, respectively. And, morphological and physiological analysis and immunohistochemistry were conducted to confirm the detection of Ca2+ and assess the morphological changes in the ovaries. Results: The results of morphological analysis of ovaries during pregnancy and estrus, the development of the corpus luteum and follicles of Water deer showed similar patterns to other mammals. In contrast, the corpus luteum of Sika deer differed in tissue morphology and composition from Water deer. Ca2+ related to tissue metabolism was detected in the theca cells zone of Water deer on the estrus and was highly detected in the luteum cells zone during pregnancy. The hormone receptor protein expression patterns were generally higher in the ovaries of Water deer on the estrus and the pregnancy than in Sika deer. The expression of LH receptor was relatively low in the lutein cell zone, unlikely that of Water deer. The expression of VEGF was also different from Water deer, and the response in Sika deer was relatively very low compared to Water deer in expressing all proteins-related development. Conclusions: Therefore, the results of the study were shown that the composition of the corpus luteum of Sika deer is not clear compared to Water deer, and there are many differences in the functional and morphological formation of the corpus luteum.

A Deterministic Method for Structural Analysis of Compound Words in Japanese

  • Han, Dongli;Ito, Takeshi;Furugori, Teiji
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.79-91
    • /
    • 2002
  • Structural analysis of compound words is necessary and an important process in natural language processing. Proposed here is a corpus- and statistics- based method for the structural analysis of compound words in Japanese. We determine the structure of a compound word by using Internet corpus and calculating the strength of word association among its constituent words. Experiments with 5, 6, 7, and 8 kanji compound words show that our method works well and its performance is better than those of other comparable studies.

  • PDF