• 제목/요약/키워드: Corpus-based Study

검색결과 204건 처리시간 0.022초

Investigation on the Effect of Multi-Vector Document Embedding for Interdisciplinary Knowledge Representation

  • 박종인;김남규
    • 지식경영연구
    • /
    • 제21권1호
    • /
    • pp.99-116
    • /
    • 2020
  • Text is the most widely used means of exchanging or expressing knowledge and information in the real world. Recently, researches on structuring unstructured text data for text analysis have been actively performed. One of the most representative document embedding method (i.e. doc2Vec) generates a single vector for each document using the whole corpus included in the document. This causes a limitation that the document vector is affected by not only core words but also other miscellaneous words. Additionally, the traditional document embedding algorithms map each document into only one vector. Therefore, it is not easy to represent a complex document with interdisciplinary subjects into a single vector properly by the traditional approach. In this paper, we introduce a multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. After introducing the previous study on multi-vector document embedding, we visually analyze the effects of the multi-vector document embedding method. Firstly, the new method vectorizes the document using only predefined keywords instead of the entire words. Secondly, the new method decomposes various subjects included in the document and generates multiple vectors for each document. The experiments for about three thousands of academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the multi-vector based method, we ascertained that the information and knowledge in complex documents can be represented more accurately by eliminating the interference among subjects.

한국어 경제 도메인 텍스트 속성 기반 감성 분석을 위한 말뭉치 주석 요소 연구 (A study of Corpus Annotation for Aspect Based Sentiment Analysis of Korean financial texts)

  • 박서윤;장연지;강예지;강혜린;김한샘
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2022년도 제34회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.232-237
    • /
    • 2022
  • 본 논문에서는 미세 조정(fine-tuning) 및 비지도 학습 기법을 사용하여 경제 분야 텍스트인 금융 리포트에 대해 속성 기반 감성 분석(aspect-based sentiment analysis) 데이터셋을 반자동적으로 구축할 수 있는 방법론에 대한 연구를 수행하였다. 구축 시에는 속성기반 감성분석 주석 요소 중 극성, 속성 카테고리 정보를 부착하였으며, 미세조정과 비지도 학습 기법인 BERTopic을 통해 주석 요소를 자동적으로 부착하는 한편 이를 수동으로 검수하여 데이터셋의 완성도를 높이고자 하였다. 데이터셋에 대한 실험 결과, 극성 반자동 주석의 경우 기존에 구축된 데이터셋과 비슷한 수준의 성능을 보였다. 한편 정성적 분석을 통해 자동 구축을 동일하게 수행하였더라도 기술의 원리와 발달 정도에 따라 결과가 상이하게 달라짐을 관찰함으로써 경제 도메인의 ABSA 데이터셋 구축에 여전히 발전 여지가 있음을 확인할 수 있었다.

  • PDF

현대 패션의 DE&I에 대한 비판적 담론분석 -뉴욕타임즈의 인종 기사를 중심으로- (Critical Discourse Analysis of Diversity, Equity, and Inclusion in Contemporary Fashion -Analyzing Articles on Race in The New York Times-)

  • 이명선;임은혁
    • 한국의류학회지
    • /
    • 제47권3호
    • /
    • pp.544-559
    • /
    • 2023
  • Social discourses surrounding diversity, equity, and inclusion (DE&I) in the fashion industry are vital as they extend beyond language and encompass social practices. This study aimed to understand how discourses on DE&I with in the fashion industry are reconstructed and practiced in society. Therefore, this paper analyzed DE&I in the fashion industry, by focusing on the New York Times articles, employing a quantitative research model based on corpus analysis and a qualitative approach through critical discourse analysis. Results of the analysis of textual practice, showed that the New York Times emphasized black individuals as the central discourse and created a critical racial narrative regarding DE&I in the fashion industry characterized by a dichotomy of black vs. white confrontation. Furthermore, results of the discourse practice analysis revealed that the dichotomy of racial confrontation in the New York Times article tended to select the subject of discourse related to racial DE&I in the fashion industry according based on social and historical context. Thirdly, the analytical results of sociocultural practices indicated that the dichotomous racial discourse between black and white, propagated by the New York Times, spread across social media, transforming fashion from an industry to a domain where black individuals struggle for human rights.

천연물의 위식도역류질환 예방, 치료 효과에 대한 실험연구 현황 – Pubmed를 중심으로 (Experimental study trends on the prevention and treatment effects of herbal medicine for gastroesophageal reflux disease (GERD) - based on Pubmed)

  • 김용빈;김영식
    • 대한한의학방제학회지
    • /
    • 제31권4호
    • /
    • pp.389-413
    • /
    • 2023
  • Objectives : This study aimed to review the current trends in experimental studies on the use of natural products for treatment of gastroesophageal reflux disease (GERD). Methods : Experimental studies assessing the efficacy of natural products against GERD were searched on PubMed. Articles were selected based on predefined inclusion and exclusion criteria and then analyzed for experimental methods, interventions, and result analysis techniques. Results : A total 37 studies were included in this review. Predominantly, in vivo experiments were conducted to induce GERD through surgery, involving the ligation of the pylorus and the transitional junction between the corpus and the forestomach using 7-week-old male Sprague-Dawley rats. The acute induction model, sacrificing animals after a single administration following GERD induction, was mainly used.The utilization of cell experiments was relatively infrequent, with a focus on assessing antioxidant and anti-inflammatory effects via the treatment of the RAW 264.7 cell line with lipopolysaccharides treatment. Glycyrrhizae Radix et Rhizoma, Pinelliae Tuber, Ginseng Radix and Zingiberis Rhizoma were used as single ingredients, and herbal formula, STW-5 (iberogast), Rikkunshito (六君子湯), Banhasasim-tang (半夏瀉心湯), and Hewei Jiangni granule (和胃降逆湯) were used. Outcome analysis methods encompassed Macroscopic evaluation, esophageal function assessment, blood biomarker analysis, histological examination, protein analysis, gene expression analysis, and gastric juice analysis. Proton pump inhibitors were predominantly employed as positive controls. Conclusions : This study revealed the current trends in non-clinical research evaluating natural products for GERD. Based on the results of this study, we expect that non-clinical research on clinically effective natural products will be revitalized.

선행 발화의 중심 전이를 이용한 영형 생성 (Generation of Zero Pronouns using Center Transition of Preceding Utterances)

  • 노지은;나승훈;이종혁
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제32권10호
    • /
    • pp.990-1002
    • /
    • 2005
  • 자연스러운 텍스트를 생성하기 위해서는, 한번 언급된 대상을 지시하기 위한 대용화(pronominalization)과정이 필수적이며, 특히 한국어에 빈번히 발생하는 영형(zero pronoun)을 자연스럽게 생성하는 것이 중요하다. 본 논문에서는, 비용기반 중심화 이론(cost-based centering theory)을 적용하여, 선행 발화의 중심 전이(center transition)가 현 발화의 영형에 미치는 영향을 살펴본다. 이를 위해, 영형으로 실현될 수 있는 명사를 중심화 이론에 기반해 문장간 현저성, 문장내 현저성, 문장간/내 현저성을 가지는지의 여부로 4가지 유형(Npair, Ninter, Nintra, Nnon)으로 정의하고, 유형별로 영형 현상을 고찰하였다. 그 결과, 기존에 중심화 이론에서 배제되었던 명사들이 선행 발화의 중심 전이로 설명될 수 있음을 밝혔다. 또, 선행 발화의 중심 전이를 이용한 영형 생성 모델을 구축하여 다양한 자질을 적용한 영형 생성 모델의 성능과 비교하였다.

토픽 모델링 기반 과학적 지식의 불확실성의 흐름에 관한 연구 (The Stream of Uncertainty in Scientific Knowledge using Topic Modeling)

  • 허고은
    • 정보관리학회지
    • /
    • 제36권1호
    • /
    • pp.191-213
    • /
    • 2019
  • 과학적 지식을 얻는 과정은 연구자의 연구를 통해 이루어진다. 연구자들은 과학의 불확실성을 다루고 과학적 지식의 확실성을 구축해나간다. 즉, 과학적 지식을 얻기 위해서 불확실성은 반드시 거쳐가야 하는 필수적인 단계로 인식되고 있다. 현존하는 불확실성의 특성을 파악하는 연구는 언어학적 접근의 hedging 연구를 통해 소개되었으며 컴퓨터 언어학에서 수작업 기반으로 불확실성 단어 코퍼스를 구축해왔다. 기존의 연구들은 불확실성 단어의 단순 출현 빈도를 기반으로 특정 학문 영역의 불확실성의 특성을 파악해오는데 그쳤다. 따라서 본 연구에서는 문장 내 생의학적 주장이 중요한 역할을 하는 생의학 문헌을 대상으로 불확실성 단어 기반 과학적 지식의 패턴을 시간의 흐름에 따라 살펴보고자 한다. 이를 위해 생의학 온톨로지인 UMLS에서 제공하는 의미적 술어를 기반으로 생의학 명제를 분석하였으며, 학문 분야의 패턴을 파악하는데 용이한 DMR 토픽 모델링을 적용하여 생의학 개체의 불확실성 기반 토픽의 동향을 종합적으로 파악하였다. 시간이 흐름에 따라 과학적 지식의 표현은 불확실성이 감소하는 패턴으로 연구의 발전이 이루어지고 있음을 확인하였다.

Antegonial notch depth 에 따른 하악골 성장에 관한 두부방사선 계측학적 연구 (The cephalometric study on the depth of the mandibular antegonial notch as on indicator of mandibular growth pattern)

  • 강신애;유영규
    • 대한치과교정학회지
    • /
    • 제19권1호
    • /
    • pp.77-93
    • /
    • 1989
  • The purpose of the present study were to disclose whether the depth of the mandibular antegonial notch can be used as an indicator of mandibular growth potential. The patients composed of 76 samples and were classified following 3 groups, based on the depth of mandibular antegonial notch : Deep notch group (more than 3mm), Neutral notch group (1-3mm), Shallow notch group (less than 1mm). For each case, the first lateral cephalograms were taken prior to the start of treatment and the second films 3-4 years after. The results were as follows; 1. Deep notch group had a shorter corpus, less ramus height and greater genial angle than did Shallow notch group. 2. Deep notch group had a more retrusive mandibular position than Shallow notch group. 3. Deep notch group had longer total anterior facial height and longer anterior lower facial height group. 4. Deep notch group grow vertical clockwise growth pattern, while Shallow notch group grow horizontal counterclockwise growth pattern. 5. Deep notch group had less mandibular growth than Shallow notch group during observation period.

  • PDF

영어권 학습자를 위한 한국어 구어 문법 교육 - 보고 표지 '-대'를 중심으로 - (Teaching Grammar for Spoken Korean to English-speaking Learners: Reported Speech Marker '-dae'.)

  • 김영아;조인정
    • 한국어교육
    • /
    • 제23권1호
    • /
    • pp.1-23
    • /
    • 2012
  • The development of corpus in recent years has attracted increased research on spoken Korean. Nevertheless, these research outcomes are yet to be meaningfully and adequately reflected in Korean language textbooks. The reported speech marker '-dae' is one of these areas that need more attention. This study investigates whether or not in textbooks '-dae' is clearly explained to English-speaking learners to prevent confusion and misuse. Based on a contrastive analysis of Korean and English, this study argues three points: Firstly, '-dae' should be introduced to Korean learners as an independent sentence ender rather than a contracted form of '-dago hae'. Secondly, it is necessary to teach English-speaking learners that '-dae' is not equivalent to the English report speech form. It functions more or less as a third person marker in Korean. Learners should be informed that '-dae' is used for statements in English, if those statements were hearsay but the source of information does not need to be specified. This is a very distinctive difference between Korean and English and should be emphasized in class when 'dae' is taught. Thirdly, '-dae' should be introduced before indirect speech constructions, because it is mainly used in simple statements and the frequency of '-dae' is very high in spoken Korean.

Myelin Content in Mild Traumatic Brain Injury Patients with Post-Concussion Syndrome: Quantitative Assessment with a Multidynamic Multiecho Sequence

  • Roh-Eul Yoo;Seung Hong Choi;Sung-Won Youn;Moonjung Hwang;Eunkyung Kim;Byung-Mo Oh;Ji Ye Lee;Inpyeong Hwang;Koung Mi Kang;Tae Jin Yun;Ji-hoon Kim;Chul-Ho Sohn
    • Korean Journal of Radiology
    • /
    • 제23권2호
    • /
    • pp.226-236
    • /
    • 2022
  • Objective: This study aimed to explore the myelin volume change in patients with mild traumatic brain injury (mTBI) with post-concussion syndrome (PCS) using a multidynamic multiecho (MDME) sequence and automatic whole-brain segmentation. Materials and Methods: Forty-one consecutive mTBI patients with PCS and 29 controls, who had undergone MRI including the MDME sequence between October 2016 and April 2018, were included. Myelin volume fraction (MVF) maps were derived from the MDME sequence. After three dimensional T1-based brain segmentation, the average MVF was analyzed at the bilateral cerebral white matter (WM), bilateral cerebral gray matter (GM), corpus callosum, and brainstem. The Mann-Whitney U-test was performed to compare MVF and myelin volume between patients with mTBI and controls. Myelin volume was correlated with neuropsychological test scores using the Spearman rank correlation test. Results: The average MVF at the bilateral cerebral WM was lower in mTBI patients with PCS (median [interquartile range], 25.2% [22.6%-26.4%]) than that in controls (26.8% [25.6%-27.8%]) (p = 0.004). The region-of-interest myelin volume was lower in mTBI patients with PCS than that in controls at the corpus callosum (1.87 cm3 [1.70-2.05 cm3] vs. 2.21 cm3 [1.86-3.46 cm3]; p = 0.003) and brainstem (9.98 cm3 [9.45-11.00 cm3] vs. 11.05 cm3 [10.10-11.53 cm3]; p = 0.015). The total myelin volume was lower in mTBI patients with PCS than that in controls at the corpus callosum (0.45 cm3 [0.39-0.48 cm3] vs. 0.48 cm3 [0.45-0.54 cm3]; p = 0.004) and brainstem (1.45 cm3 [1.28-1.59 cm3] vs. 1.54 cm3 [1.42-1.67 cm3]; p = 0.042). No significant correlation was observed between myelin volume parameters and neuropsychological test scores, except for the total myelin volume at the bilateral cerebral WM and verbal learning test (delayed recall) (r = 0.425; p = 0.048). Conclusion: MVF quantified from the MDME sequence was decreased at the bilateral cerebral WM in mTBI patients with PCS. The total myelin volumes at the corpus callosum and brainstem were decreased in mTBI patients with PCS due to atrophic changes.

Word2Vec과 WordNet 기반 불확실성 단어 간의 네트워크 분석에 관한 연구 (Network Analysis between Uncertainty Words based on Word2Vec and WordNet)

  • 허고은
    • 한국문헌정보학회지
    • /
    • 제53권3호
    • /
    • pp.247-271
    • /
    • 2019
  • 과학에서 지식의 불확실성은 명제가 현재 상태로는 참도 거짓도 아닌 불확실한 상태를 의미한다. 기존의 연구들은 학술 문헌에 표현된 명제를 분석하여 불확실성을 의미하는 단어를 수동적으로 구축하고 구축한 코퍼스를 대상으로 규칙 기반, 기계 학습 기반의 성능평가를 수행해왔다. 불확실성 단어 구축의 중요성은 인지하고 있지만 단어의 의미를 분석하여 자동적으로 확장하고자 하는 시도들은 부족했다. 한편, 계량정보학이나 텍스트 마이닝 기법을 이용하여 네트워크의 구조를 파악하는 연구들은 다양한 학문분야에서 지적 구조와 관계성을 파악하기 위한 방법으로 널리 활용되고 있다. 따라서, 본 연구에서는 기존의 불확실성 단어를 대상으로 Word2Vec을 적용하여 의미적 관계성을 분석하였고, 영어 어휘 데이터베이스이자 시소러스인 WordNet을 적용하여 불확실성 단어와 연결된 상위어, 하위어 관계와 동의어 기반 네트워크 분석을 수행하였다. 이를 통해 불확실성 단어의 의미적, 어휘적 관계성을 구조적으로 파악하였으며, 향후 불확실성 단어의 자동 구축의 확장 가능성을 제시하였다.