• Title/Summary/Keyword: Corpus-based Study

Search Result 204, Processing Time 0.034 seconds

Investigation on the Effect of Multi-Vector Document Embedding for Interdisciplinary Knowledge Representation

  • Park, Jongin;Kim, Namgyu
    • Knowledge Management Research
    • /
    • v.21 no.1
    • /
    • pp.99-116
    • /
    • 2020
  • Text is the most widely used means of exchanging or expressing knowledge and information in the real world. Recently, researches on structuring unstructured text data for text analysis have been actively performed. One of the most representative document embedding method (i.e. doc2Vec) generates a single vector for each document using the whole corpus included in the document. This causes a limitation that the document vector is affected by not only core words but also other miscellaneous words. Additionally, the traditional document embedding algorithms map each document into only one vector. Therefore, it is not easy to represent a complex document with interdisciplinary subjects into a single vector properly by the traditional approach. In this paper, we introduce a multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. After introducing the previous study on multi-vector document embedding, we visually analyze the effects of the multi-vector document embedding method. Firstly, the new method vectorizes the document using only predefined keywords instead of the entire words. Secondly, the new method decomposes various subjects included in the document and generates multiple vectors for each document. The experiments for about three thousands of academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the multi-vector based method, we ascertained that the information and knowledge in complex documents can be represented more accurately by eliminating the interference among subjects.

A study of Corpus Annotation for Aspect Based Sentiment Analysis of Korean financial texts (한국어 경제 도메인 텍스트 속성 기반 감성 분석을 위한 말뭉치 주석 요소 연구)

  • Seoyoon Park;Yeonji Jang;Yejee Kang;Hyerin Kang;Hansaem Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.232-237
    • /
    • 2022
  • 본 논문에서는 미세 조정(fine-tuning) 및 비지도 학습 기법을 사용하여 경제 분야 텍스트인 금융 리포트에 대해 속성 기반 감성 분석(aspect-based sentiment analysis) 데이터셋을 반자동적으로 구축할 수 있는 방법론에 대한 연구를 수행하였다. 구축 시에는 속성기반 감성분석 주석 요소 중 극성, 속성 카테고리 정보를 부착하였으며, 미세조정과 비지도 학습 기법인 BERTopic을 통해 주석 요소를 자동적으로 부착하는 한편 이를 수동으로 검수하여 데이터셋의 완성도를 높이고자 하였다. 데이터셋에 대한 실험 결과, 극성 반자동 주석의 경우 기존에 구축된 데이터셋과 비슷한 수준의 성능을 보였다. 한편 정성적 분석을 통해 자동 구축을 동일하게 수행하였더라도 기술의 원리와 발달 정도에 따라 결과가 상이하게 달라짐을 관찰함으로써 경제 도메인의 ABSA 데이터셋 구축에 여전히 발전 여지가 있음을 확인할 수 있었다.

  • PDF

Critical Discourse Analysis of Diversity, Equity, and Inclusion in Contemporary Fashion -Analyzing Articles on Race in The New York Times- (현대 패션의 DE&I에 대한 비판적 담론분석 -뉴욕타임즈의 인종 기사를 중심으로-)

  • Myeongseon Yi;Eunhyuk Yim
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.47 no.3
    • /
    • pp.544-559
    • /
    • 2023
  • Social discourses surrounding diversity, equity, and inclusion (DE&I) in the fashion industry are vital as they extend beyond language and encompass social practices. This study aimed to understand how discourses on DE&I with in the fashion industry are reconstructed and practiced in society. Therefore, this paper analyzed DE&I in the fashion industry, by focusing on the New York Times articles, employing a quantitative research model based on corpus analysis and a qualitative approach through critical discourse analysis. Results of the analysis of textual practice, showed that the New York Times emphasized black individuals as the central discourse and created a critical racial narrative regarding DE&I in the fashion industry characterized by a dichotomy of black vs. white confrontation. Furthermore, results of the discourse practice analysis revealed that the dichotomy of racial confrontation in the New York Times article tended to select the subject of discourse related to racial DE&I in the fashion industry according based on social and historical context. Thirdly, the analytical results of sociocultural practices indicated that the dichotomous racial discourse between black and white, propagated by the New York Times, spread across social media, transforming fashion from an industry to a domain where black individuals struggle for human rights.

Experimental study trends on the prevention and treatment effects of herbal medicine for gastroesophageal reflux disease (GERD) - based on Pubmed (천연물의 위식도역류질환 예방, 치료 효과에 대한 실험연구 현황 – Pubmed를 중심으로)

  • YongBin Kim;Young-Sik Kim
    • Herbal Formula Science
    • /
    • v.31 no.4
    • /
    • pp.389-413
    • /
    • 2023
  • Objectives : This study aimed to review the current trends in experimental studies on the use of natural products for treatment of gastroesophageal reflux disease (GERD). Methods : Experimental studies assessing the efficacy of natural products against GERD were searched on PubMed. Articles were selected based on predefined inclusion and exclusion criteria and then analyzed for experimental methods, interventions, and result analysis techniques. Results : A total 37 studies were included in this review. Predominantly, in vivo experiments were conducted to induce GERD through surgery, involving the ligation of the pylorus and the transitional junction between the corpus and the forestomach using 7-week-old male Sprague-Dawley rats. The acute induction model, sacrificing animals after a single administration following GERD induction, was mainly used.The utilization of cell experiments was relatively infrequent, with a focus on assessing antioxidant and anti-inflammatory effects via the treatment of the RAW 264.7 cell line with lipopolysaccharides treatment. Glycyrrhizae Radix et Rhizoma, Pinelliae Tuber, Ginseng Radix and Zingiberis Rhizoma were used as single ingredients, and herbal formula, STW-5 (iberogast), Rikkunshito (六君子湯), Banhasasim-tang (半夏瀉心湯), and Hewei Jiangni granule (和胃降逆湯) were used. Outcome analysis methods encompassed Macroscopic evaluation, esophageal function assessment, blood biomarker analysis, histological examination, protein analysis, gene expression analysis, and gastric juice analysis. Proton pump inhibitors were predominantly employed as positive controls. Conclusions : This study revealed the current trends in non-clinical research evaluating natural products for GERD. Based on the results of this study, we expect that non-clinical research on clinically effective natural products will be revitalized.

Generation of Zero Pronouns using Center Transition of Preceding Utterances (선행 발화의 중심 전이를 이용한 영형 생성)

  • Roh, Ji-Eun;Na, Seung-Hoon;Lee, Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.10
    • /
    • pp.990-1002
    • /
    • 2005
  • To generate coherent texts, it is important to produce appropriate pronouns to refer to previously-mentioned things in a discourse. Specifically, we focus on pronominalization by zero pronouns which frequently occur in Korean. This paper investigates zero pronouns in Korean based on the cost-based centering theory, especially focusing on the center transitions of adjacent utterances. In previous centering works, only one type of nominal entity has been considered as the target of pronominalization, even though other entities are frequently pronominalized as zero pronouns. To resolve this problem, and explain the reference phenomena of real texts, four types of nominal entity (Npair, Ninter, Nintra, and Nnon) from centering theory are defined with the concept of inter-, intra-, and pairwise salience. For each entity type, a case study of zero phenomena is performed through analyzing corpus and building a pronominalization model. This study shows that the zero phenomena of entities which have been neglected in previous centering works are explained via the renter transition of the second previous utterance. We also show that in Ninter, Nintra, and Nnon, pronominalization accuracy achieved by complex combination of several types of features is completely or nearly achieved by using the second previous utterance's transition across genres.

The Stream of Uncertainty in Scientific Knowledge using Topic Modeling (토픽 모델링 기반 과학적 지식의 불확실성의 흐름에 관한 연구)

  • Heo, Go Eun
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.1
    • /
    • pp.191-213
    • /
    • 2019
  • The process of obtaining scientific knowledge is conducted through research. Researchers deal with the uncertainty of science and establish certainty of scientific knowledge. In other words, in order to obtain scientific knowledge, uncertainty is an essential step that must be performed. The existing studies were predominantly performed through a hedging study of linguistic approaches and constructed corpus with uncertainty word manually in computational linguistics. They have only been able to identify characteristics of uncertainty in a particular research field based on the simple frequency. Therefore, in this study, we examine pattern of scientific knowledge based on uncertainty word according to the passage of time in biomedical literature where biomedical claims in sentences play an important role. For this purpose, biomedical propositions are analyzed based on semantic predications provided by UMLS and DMR topic modeling which is useful method to identify patterns in disciplines is applied to understand the trend of entity based topic with uncertainty. As time goes by, the development of research has been confirmed that uncertainty in scientific knowledge is moving toward a decreasing pattern.

The cephalometric study on the depth of the mandibular antegonial notch as on indicator of mandibular growth pattern (Antegonial notch depth 에 따른 하악골 성장에 관한 두부방사선 계측학적 연구)

  • Kang, Sin-Ae;Ryu, Young-Kyu
    • The korean journal of orthodontics
    • /
    • v.19 no.1 s.27
    • /
    • pp.77-93
    • /
    • 1989
  • The purpose of the present study were to disclose whether the depth of the mandibular antegonial notch can be used as an indicator of mandibular growth potential. The patients composed of 76 samples and were classified following 3 groups, based on the depth of mandibular antegonial notch : Deep notch group (more than 3mm), Neutral notch group (1-3mm), Shallow notch group (less than 1mm). For each case, the first lateral cephalograms were taken prior to the start of treatment and the second films 3-4 years after. The results were as follows; 1. Deep notch group had a shorter corpus, less ramus height and greater genial angle than did Shallow notch group. 2. Deep notch group had a more retrusive mandibular position than Shallow notch group. 3. Deep notch group had longer total anterior facial height and longer anterior lower facial height group. 4. Deep notch group grow vertical clockwise growth pattern, while Shallow notch group grow horizontal counterclockwise growth pattern. 5. Deep notch group had less mandibular growth than Shallow notch group during observation period.

  • PDF

Teaching Grammar for Spoken Korean to English-speaking Learners: Reported Speech Marker '-dae'. (영어권 학습자를 위한 한국어 구어 문법 교육 - 보고 표지 '-대'를 중심으로 -)

  • Kim, Young A;Cho, In Jung
    • Journal of Korean language education
    • /
    • v.23 no.1
    • /
    • pp.1-23
    • /
    • 2012
  • The development of corpus in recent years has attracted increased research on spoken Korean. Nevertheless, these research outcomes are yet to be meaningfully and adequately reflected in Korean language textbooks. The reported speech marker '-dae' is one of these areas that need more attention. This study investigates whether or not in textbooks '-dae' is clearly explained to English-speaking learners to prevent confusion and misuse. Based on a contrastive analysis of Korean and English, this study argues three points: Firstly, '-dae' should be introduced to Korean learners as an independent sentence ender rather than a contracted form of '-dago hae'. Secondly, it is necessary to teach English-speaking learners that '-dae' is not equivalent to the English report speech form. It functions more or less as a third person marker in Korean. Learners should be informed that '-dae' is used for statements in English, if those statements were hearsay but the source of information does not need to be specified. This is a very distinctive difference between Korean and English and should be emphasized in class when 'dae' is taught. Thirdly, '-dae' should be introduced before indirect speech constructions, because it is mainly used in simple statements and the frequency of '-dae' is very high in spoken Korean.

Myelin Content in Mild Traumatic Brain Injury Patients with Post-Concussion Syndrome: Quantitative Assessment with a Multidynamic Multiecho Sequence

  • Roh-Eul Yoo;Seung Hong Choi;Sung-Won Youn;Moonjung Hwang;Eunkyung Kim;Byung-Mo Oh;Ji Ye Lee;Inpyeong Hwang;Koung Mi Kang;Tae Jin Yun;Ji-hoon Kim;Chul-Ho Sohn
    • Korean Journal of Radiology
    • /
    • v.23 no.2
    • /
    • pp.226-236
    • /
    • 2022
  • Objective: This study aimed to explore the myelin volume change in patients with mild traumatic brain injury (mTBI) with post-concussion syndrome (PCS) using a multidynamic multiecho (MDME) sequence and automatic whole-brain segmentation. Materials and Methods: Forty-one consecutive mTBI patients with PCS and 29 controls, who had undergone MRI including the MDME sequence between October 2016 and April 2018, were included. Myelin volume fraction (MVF) maps were derived from the MDME sequence. After three dimensional T1-based brain segmentation, the average MVF was analyzed at the bilateral cerebral white matter (WM), bilateral cerebral gray matter (GM), corpus callosum, and brainstem. The Mann-Whitney U-test was performed to compare MVF and myelin volume between patients with mTBI and controls. Myelin volume was correlated with neuropsychological test scores using the Spearman rank correlation test. Results: The average MVF at the bilateral cerebral WM was lower in mTBI patients with PCS (median [interquartile range], 25.2% [22.6%-26.4%]) than that in controls (26.8% [25.6%-27.8%]) (p = 0.004). The region-of-interest myelin volume was lower in mTBI patients with PCS than that in controls at the corpus callosum (1.87 cm3 [1.70-2.05 cm3] vs. 2.21 cm3 [1.86-3.46 cm3]; p = 0.003) and brainstem (9.98 cm3 [9.45-11.00 cm3] vs. 11.05 cm3 [10.10-11.53 cm3]; p = 0.015). The total myelin volume was lower in mTBI patients with PCS than that in controls at the corpus callosum (0.45 cm3 [0.39-0.48 cm3] vs. 0.48 cm3 [0.45-0.54 cm3]; p = 0.004) and brainstem (1.45 cm3 [1.28-1.59 cm3] vs. 1.54 cm3 [1.42-1.67 cm3]; p = 0.042). No significant correlation was observed between myelin volume parameters and neuropsychological test scores, except for the total myelin volume at the bilateral cerebral WM and verbal learning test (delayed recall) (r = 0.425; p = 0.048). Conclusion: MVF quantified from the MDME sequence was decreased at the bilateral cerebral WM in mTBI patients with PCS. The total myelin volumes at the corpus callosum and brainstem were decreased in mTBI patients with PCS due to atrophic changes.

Network Analysis between Uncertainty Words based on Word2Vec and WordNet (Word2Vec과 WordNet 기반 불확실성 단어 간의 네트워크 분석에 관한 연구)

  • Heo, Go Eun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.3
    • /
    • pp.247-271
    • /
    • 2019
  • Uncertainty in scientific knowledge means an uncertain state where propositions are neither true or false at present. The existing studies have analyzed the propositions written in the academic literature, and have conducted the performance evaluation based on the rule based and machine learning based approaches by using the corpus. Although they recognized that the importance of word construction, there are insufficient attempts to expand the word by analyzing the meaning of uncertainty words. On the other hand, studies for analyzing the structure of networks by using bibliometrics and text mining techniques are widely used as methods for understanding intellectual structure and relationship in various disciplines. Therefore, in this study, semantic relations were analyzed by applying Word2Vec to existing uncertainty words. In addition, WordNet, which is an English vocabulary database and thesaurus, was applied to perform a network analysis based on hypernyms, hyponyms, and synonyms relations linked to uncertainty words. The semantic and lexical relationships of uncertainty words were structurally identified. As a result, we identified the possibility of automatically expanding uncertainty words.