• Title/Summary/Keyword: R Language

Search Result 510, Processing Time 0.027 seconds

Quantitative Text Mining for Social Science: Analysis of Immigrant in the Articles (사회과학을 위한 양적 텍스트 마이닝: 이주, 이민 키워드 논문 및 언론기사 분석)

  • Yi, Soo-Jeong;Choi, Doo-Young
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.5
    • /
    • pp.118-127
    • /
    • 2020
  • The paper introduces trends and methodological challenges of quantitative Korean text analysis by using the case studies of academic and news media articles on "migration" and "immigration" within the periods of 2017-2019. The quantitative text analysis based on natural language processing technology (NLP) and this became an essential tool for social science. It is a part of data science that converts documents into structured data and performs hypothesis discovery and verification as the data and visualize data. Furthermore, we examed the commonly applied social scientific statistical models of quantitative text analysis by using Natural Language Processing (NLP) with R programming and Quanteda.

SimKoR: A Sentence Similarity Dataset based on Korean Review Data and Its Application to Contrastive Learning for NLP (SimKoR: 한국어 리뷰 데이터를 활용한 문장 유사도 데이터셋 제안 및 대조학습에서의 활용 방안 )

  • Jaemin Kim;Yohan Na;Kangmin Kim;Sang Rak Lee;Dong-Kyu Chae
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.245-248
    • /
    • 2022
  • 최근 자연어 처리 분야에서 문맥적 의미를 반영하기 위한 대조학습 (contrastive learning) 에 대한 연구가 활발히 이뤄지고 있다. 이 때 대조학습을 위한 양질의 학습 (training) 데이터와 검증 (validation) 데이터를 이용하는 것이 중요하다. 그러나 한국어의 경우 대다수의 데이터셋이 영어로 된 데이터를 한국어로 기계 번역하여 검토 후 제공되는 데이터셋 밖에 존재하지 않는다. 이는 기계번역의 성능에 의존하는 단점을 갖고 있다. 본 논문에서는 한국어 리뷰 데이터로 임베딩의 의미 반영 정도를 측정할 수 있는 간단한 검증 데이터셋 구축 방법을 제안하고, 이를 활용한 데이터셋인 SimKoR (Similarity Korean Review dataset) 을 제안한다. 제안하는 검증 데이터셋을 이용해서 대조학습을 수행하고 효과성을 보인다.

  • PDF

A Big Data Analysis of Yumentingzheng: Weiwenqiju as an Example (어문청정 빅데이터 분석: 위문기거 일례)

  • Snowberger, Aaron Daniel;Lee, Choong Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.624-626
    • /
    • 2021
  • Yumentingzheng, which records the contents of the Qing dynasty's discussions with his subjects, is an important document like the Annals of Joseon in Korea. This paper describes the method and steps for big data analysis of Yumentingzheng written in Manchu alphabet. In big data analysis of documents written in Manchu characters, there are many problems that need to be solved in advance, and research on these should be preceded. In this paper, a method of big data analysis using the R language was proposed in the stage where the text written in Manchurian characters was transliterated into Latin characters through a preliminary study to be conducted in the future. In the proposed method, Apkai method was adopted for the transliteration of Wumentingzheng, and the results of big data analysis were presented using the text of Weiwenqiju.

  • PDF

MLOps workflow language and platform for time series data anomaly detection

  • Sohn, Jung-Mo;Kim, Su-Min
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.19-27
    • /
    • 2022
  • In this study, we propose a language and platform to describe and manage the MLOps(Machine Learning Operations) workflow for time series data anomaly detection. Time series data is collected in many fields, such as IoT sensors, system performance indicators, and user access. In addition, it is used in many applications such as system monitoring and anomaly detection. In order to perform prediction and anomaly detection of time series data, the MLOps platform that can quickly and flexibly apply the analyzed model to the production environment is required. Thus, we developed Python-based AI/ML Modeling Language (AMML) to easily configure and execute MLOps workflows. Python is widely used in data analysis. The proposed MLOps platform can extract and preprocess time series data from various data sources (R-DB, NoSql DB, Log File, etc.) using AMML and predict it through a deep learning model. To verify the applicability of AMML, the workflow for generating a transformer oil temperature prediction deep learning model was configured with AMML and it was confirmed that the training was performed normally.

Clinical Findings and Genetic Analysis of Isolated Hypermethioninemia Patients in Korea (단독성 고메티오닌혈증 환아들의 임상적 특성과 유전자 분석)

  • Yoo, Sang Soo;Rhee, Min Hee;Lee, Jeongho;Lee, Dong Hwan
    • Journal of The Korean Society of Inherited Metabolic disease
    • /
    • v.13 no.2
    • /
    • pp.98-103
    • /
    • 2013
  • Purpose: MAT-I/III deficiency by MAT1A gene mutation causes isolated hypermethioninemia, which is considered to be a clinically benign disease. But in some patients, mental retardation, developmental delay, myelination disorder may be shown. This study was performed to find out the clinical manifestations and genetic characteristics of patients with isolated hypermethioninemia. Methods: Clinical, biochemical and genetic analysis were done to 10 patients with isolated hypermethioninemia who were referred to department of pediatrics, Soonchunhyang University Hospital from March 1999 to March 2012. Results: At first visit, all patients' mean plasma methionine level was 5.5 mg/dL (2.1-14.6) and there were no increase of amino acid levels including homocystine in all patients. Serum homocysteine level was evaluated in seven patients who visited after year 2003, and ranged from 4.96 to $11.15{\mu}mol/L$ (normal < $25{\mu}mol/L$). Methionine restricted diet was started to all patients. Nine patients who managed regularly showed normal development, but one patient whose initial plasma methionine level was 14.6 mg/dL showed language delay at 1 year of age and was diagnosed as mild mental retardation (IQ=66) at 6 years of age. Genetic analysis was done to eight patients, R264H mutation was identified in seven patients. Also, both R299C and R356Q mutation were identified in one patient. Conclusion: Clinical findings in patients with isolated hypermethioninemia were generally good, but one patient showed mental retardation and language difficulty. R264H mutation which usually inherits as an autosomal dominant trait was most frequently found in our patients, and R299C/R356Q mutation were also identified.

  • PDF

Effects of Inter-phoneme Probabilities on the Acceptability Judgment of Korean CVC Nonwords

  • Lee, Yong-Eun
    • Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.41-52
    • /
    • 2007
  • Recent experimental studies have shown that language-users' knowledge of the statistical characteristic of their native language plays a key role in their task performance. One specific instance of this that the current study focuses on is the effect of phonotactic probabilities on speakers' wordlikeness judgment of nonwords. In this paper, I explore the question of whether the judgment of Korean speaking subjects as to the wordlikeness of Korean nonsense words is influenced by the degree of association between two-phoneme sequences in Korean. The current results suggest that the objective measure of correlations (expressed by $r_{\phi}$ values) between an onset consonant and a vowel inside Korean syllables play an important role in Korean speakers' nonword processing. The current results additionally indicate an effect of the correlations of two-phoneme sequences including vowels and coda consonants on nonword processing. Implications of these findings for Korean speakers' learning the correlations between adjacent segments inside the syllable are discussed.

  • PDF

Correlations between pronunciation test scores given by Korean/Nativel/ILT(Interactive Language Tutor) raters against the Korean-spoken English sentences (한국인의 영어 문장 발음에 대한 한국인/원어민/ILT(Interactive Language Tutor) 평가 점수 사이의 상관관계)

  • Rhee Seok-Chae;Park Jeon Gue
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.83-88
    • /
    • 2003
  • This study carried out an experimental English pronunciation assessment to see the differences in the relationship between the different rater categories. The result shows that i) correlation between Korean and Native American raters is high(r=.98) enough to be considered reliable, ii) previous instructions about assessment rubric and the knowledge about English phonetics and phonology exert little influence on the rating scores, iii) correlation between the automatic ILT(Interactive Language Tutor) rating using speech recognition technology and Natives' rating is stronger than that between ILT and Koreans' rating.

  • PDF

Modelling Foreign Language Learning Courseware for Korean Speakers (한국어 화자를 위한 외국어 학습 코스웨어의 모델링)

  • Yoon, Ae-Sun;Kim, Kyung-Hee
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.418-425
    • /
    • 1999
  • 한국어 화자를 위한 외국어 학습 코스웨어를 학습 목표언어 독립적으로 모델링하는 방안을 모색하기 위한 외국어 학습 이론과 웹상의 자료 제시 유형에 관해 논하고 기개발된 플랫폼 LangEdu를 살펴봄으로써 그 실현 가능성을 증명하고 있다. 체계적인 학습 자료 제시와 사용자간의 긴밀한 상호 작용 및 손쉬운 관리가 가능하도록 설계되어진 이 플랫폼을 이용하면 전산 전문 지식이 없는 교과 전문가가 큰 어려움이 없이 개별 외국어 학습 코스웨어를 제작할 수 있다. 따라서, 이 방법론은 비용효과적일 뿐만 아니라 교과전문가의 적극적인 참여를 유도하여 양질의 코스웨어 제작에 기여한다.

  • PDF

Question, Document, Response Validator for Question Answering System (질의 응답 시스템을 위한 질의, 문서, 답변 검증기)

  • Tae Hong Min;Jae Hong Lee;Soo Kyo In;Kiyoon Moon;Hwiyeol Jo;Kyungduk Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.604-607
    • /
    • 2022
  • 본 논문은 사용자의 질의에 대한 답변을 제공하는 질의 응답 시스템에서, 제공하는 답변이 사용자의 질의에 대하여 문서에 근거하여 올바르게 대답하였는지 검증하는 QDR validator에 대해 기술한 논문이다. 본 논문의 과제는 문서에 대한 주장을 판별하는 자연어 추론(Natural Language inference, NLI)와 유사한 과제이지만, 문서(D)와 주장(R)을 포함하여 질의(Q)까지 총 3가지 종류의 입력을 받아 NLI 과제보다 난도가 높다. QDR validation 과제를 수행하기 위하여, 약 16,000 건 데이터를 생성하였으며, 다양한 입력 형식 실험 및 NLI 과제 데이터 추가 학습, 임계 값 조절 실험을 통해 최종 83.05% 우수한 성능을 기록하였다

  • PDF

Design of Sentence Semantic Model for Cause-Effect Graph Automatic Generation from Natural Language Oriented Informal Requirement Specifications (비정형 요구사항으로부터 원인-결과 그래프 자동 발생을 위한 문장 의미 모델(Sentence Semantic Model) 설계)

  • Jang, Woo Sung;Jung, Se Jun;Kim, R.Young Chul
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.215-219
    • /
    • 2020
  • 현재 한글 언어학 영역에서는 많은 언어 분석 연구가 수행되었다. 또한 소프트웨어공학의 요구공학 영역에서는 명료한 요구사항 정의와 분석이 필요하고, 비정형화된 요구사항 명세서로부터 테스트 케이스 추출이 매우 중요한 이슈이다. 즉, 자연어 기반의 요구사항 명세서로부터 원인-결과 그래프(Cause-Effect Graph)를 통한 의사 결정 테이블(Decision Table) 기반 테스트케이스(Test Case)를 자동 생성하는 방법이 거의 없다. 이런 문제를 해결하기 위해 '한글 언어 의미 분석 기법'을 '요구공학 영역'에 적용하는 방법이 필요하다. 본 논문은 비정형화된 요구사항으로부터 테스트케이스 생성하는 과정의 중간 단계인 요구사항에서 문장 의미 모델(Sentence Semantic Model)을 자동 생성하는 방법을 제안 한다. 이는 요구사항으로부터 생성된 원인-결과 그래프의 정확성을 검증할 수 있다.

  • PDF