• Title/Summary/Keyword: t-distributed stochastic neighbor embedding

Search Result 8, Processing Time 0.023 seconds

Violation Pattern Analysis for Good Manufacturing Practice for Medicine using t-SNE Based on Association Rule and Text Mining (우수 의약품 제조 기준 위반 패턴 인식을 위한 연관규칙과 텍스트 마이닝 기반 t-SNE분석)

  • Jun-O, Lee;So Young, Sohn
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.4
    • /
    • pp.717-734
    • /
    • 2022
  • Purpose: The purpose of this study is to effectively detect violations that occur simultaneously against Good Manufacturing Practice, which were concealed by drug manufacturers. Methods: In this study, we present an analysis framework for analyzing regulatory violation patterns using Association Rule Mining (ARM), Text Mining, and t-distributed Stochastic Neighbor Embedding (t-SNE) to increase the effectiveness of on-site inspection. Results: A number of simultaneous violation patterns was discovered by applying Association Rule Mining to FDA's inspection data collected from October 2008 to February 2022. Among them there were 'concurrent violation patterns' derived from similar regulatory ranges of two or more regulations. These patterns do not help to predict violations that simultaneously appear but belong to different regulations. Those unnecessary patterns were excluded by applying t-SNE based on text-mining. Conclusion: Our proposed approach enables the recognition of simultaneous violation patterns during the on-site inspection. It is expected to decrease the detection time by increasing the likelihood of finding intentionally concealed violations.

Research Trends of Ergonomics in Occupational Safety and Health through MEDLINE Search: Focus on Abstract Word Modeling using Word Embedding (MEDLINE 검색을 통한 산업안전보건 분야에서의 인간공학 연구동향 : 워드임베딩을 활용한 초록 단어 모델링을 중심으로)

  • Kim, Jun Hee;Hwang, Ui Jae;Ahn, Sun Hee;Gwak, Gyeong Tae;Jung, Sung Hoon
    • Journal of the Korean Society of Safety
    • /
    • v.36 no.5
    • /
    • pp.61-70
    • /
    • 2021
  • This study aimed to analyze the research trends of the abstract data of ergonomic studies registered in MEDLINE, a medical bibliographic database, using word embedding. Medical-related ergonomic studies mainly focus on work-related musculoskeletal disorders, and there are no studies on the analysis of words as data using natural language processing techniques, such as word embedding. In this study, the abstract data of ergonomic studies were extracted with a program written with selenium and BeutifulSoup modules using python. The word embedding of the abstract data was performed using the word2vec model, after which the data found in the abstract were vectorized. The vectorized data were visualized in two dimensions using t-Distributed Stochastic Neighbor Embedding (t-SNE). The word "ergonomics" and ten of the most frequently used words in the abstract were selected as keywords. The results revealed that the most frequently used words in the abstract of ergonomics studies include "use", "work", and "task". In addition, the t-SNE technique revealed that words, such as "workplace", "design", and "engineering," exhibited the highest relevance to ergonomics. The keywords observed in the abstract of ergonomic studies using t-SNE were classified into four groups. Ergonomics studies registered with MEDLINE have investigated the risk factors associated with workers performing an operation or task using tools, and in this study, ergonomics studies were identified by the relationship between keywords using word embedding. The results of this study will provide useful and diverse insights on future research direction on ergonomic studies.

Physiological Signal-Based Emotion Recognition in Conversations Using T-SNE (생체신호 기반의 T-SNE 를 활용한 대화 내 감정 인식 )

  • Subeen Leem;Byeongcheon Lee;Jihoon Moon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.703-705
    • /
    • 2023
  • 본 연구는 대화 중 생체신호 데이터를 활용하여 감정 인식 분야에서 더욱 정확하고 범용성이 높은 인식 기술을 제안한다. 이를 위해, 먼저 대화별 길이에 따른 측정값의 개수를 동일하게 조정하고 효과적인 생체신호 데이터의 조합을 비교 및 분석하기 위해 차원 축소 기법인 T-SNE (T-distributed Stochastic Neighbor Embedding)을 활용하여 감정 라벨의 분포를 확인한다. 또한, AutoML (Automated Machine Learning)을 이용하여 축소된 데이터로 감정을 분류 및 각성도와 긍정도를 예측하여 감정을 가장 잘 인식하는 생체신호 데이터의 조합을 발견한다.

A Study in Relationship between Facial Expression and Action Unit (Manifold Learning을 통한 표정과 Action Unit 간의 상관성에 관한 연구)

  • Kim, Sunbin;Kim, Hyeoncheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.763-766
    • /
    • 2018
  • 표정은 사람들 사이에서 감정을 표현하는 강력한 비언어적 수단이다. 표정 인식은 기계학습에서 아주 중요한 분야 중에 하나이다. 표정 인식에 사용되는 기계학습 모델들은 사람 수준의 성능을 보여준다. 하지만 좋은 성능에도 불구하고, 기계학습 모델들은 표정 인식 결과에 대한 근거나 설명을 제공해주지 못한다. 이 연구는 표정 인식의 근거로서 Facial Action Coding Unit(AUs)을 사용하기 위해서 CK+ Dataset을 사용하여 표정 인식을 학습한 Convolutional Neural Network(CNN) 모델이 추출한 특징들을 t-distributed stochastic neighbor embedding(t-SNE)을 사용하여 시각화한 뒤, 인식된 표정과 AUs 사이의 분포의 연관성을 확인하는 연구이다.

Stochastic Strength Analysis according to Initial Void Defects in Composite Materials (복합재 초기 공극 결함에 따른 횡하중 강도 확률론적 분석)

  • Seung-Min Ji;Sung-Wook Cho;S.S. Cheon
    • Composites Research
    • /
    • v.37 no.3
    • /
    • pp.179-185
    • /
    • 2024
  • This study quantitatively evaluated and investigated the changes in transverse tensile strength of unidirectional fiber-reinforced composites with initial void defects using a Representative Volume Element (RVE) model. After calculating the appropriate sample size based on margin of error and confidence level for initial void defects, a sample group of 5000 RVE models with initial void defects was generated. Dimensional reduction and density-based clustering analysis were conducted on the sample group to assess similarity, confirming and verifying that the sample group was unbiased. The validated sample analysis results were represented using a Weibull distribution, allowing them to be applied to the reliability analysis of composite structures.

Odorant receptors in cancer

  • Chung, Chan;Cho, Hee Jin;Lee, ChaeEun;Koo, JaeHyung
    • BMB Reports
    • /
    • v.55 no.2
    • /
    • pp.72-80
    • /
    • 2022
  • Odorant receptors (ORs), the largest subfamily of G protein-coupled receptors, detect odorants in the nose. In addition, ORs were recently shown to be expressed in many nonolfactory tissues and cells, indicating that these receptors have physiological and pathophysiological roles beyond olfaction. Many ORs are expressed by tumor cells and tissues, suggesting that they may be associated with cancer progression or may be cancer biomarkers. This review describes OR expression in various types of cancer and the association of these receptors with various types of signaling mechanisms. In addition, the clinical relevance and significance of the levels of OR expression were evaluated. Namely, levels of OR expression in cancer were analyzed based on RNA-sequencing data reported in the Cancer Genome Atlas; OR expression patterns were visualized using t-distributed stochastic neighbor embedding (t-SNE); and the associations between patient survival and levels of OR expression were analyzed. These analyses of the relationships between patient survival and expression patterns obtained from an open mRNA database in cancer patients indicate that ORs may be cancer biomarkers and therapeutic targets.

Detection and Classification of Demagnetization and Short-Circuited Turns in Permanent Magnet Synchronous Motors

  • Youn, Young-Woo;Hwang, Don-Ha;Song, Sung-ju;Kim, Yong-Hwa
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.4
    • /
    • pp.1614-1622
    • /
    • 2018
  • The research related to fault diagnosis in permanent magnet synchronous motors (PMSMs) has attracted considerable attention in recent years because various faults such as permanent magnet demagnetization and short-circuited turns can occur and result in unexpected failure of motor related system. Several conventional current and back electromotive force (BEMF) analysis techniques were proposed to detect certain faults in PMSMs; however, they generally deal with a single fault only. On the contrary, cases of multiple faults are common in PMSMs. We propose a fault diagnosis method for PMSMs with single and multiple combined faults. Our method uses three phase BEMF voltages based on the fast Fourier transform (FFT), support vector machine(SVM), and visualization tools for identifying fault types and severities in PMSMs. Principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) are used to visualize the high-dimensional data into two-dimensional space. Experimental results show good visualization performance and high classification accuracy to identify fault types and severities for single and multiple faults in PMSMs.

Research Trends in Record Management Using Unstructured Text Data Analysis (비정형 텍스트 데이터 분석을 활용한 기록관리 분야 연구동향)

  • Deokyong Hong;Junseok Heo
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.23 no.4
    • /
    • pp.73-89
    • /
    • 2023
  • This study aims to analyze the frequency of keywords used in Korean abstracts, which are unstructured text data in the domestic record management research field, using text mining techniques to identify domestic record management research trends through distance analysis between keywords. To this end, 1,157 keywords of 77,578 journals were visualized by extracting 1,157 articles from 7 journal types (28 types) searched by major category (complex study) and middle category (literature informatics) from the institutional statistics (registered site, candidate site) of the Korean Citation Index (KCI). Analysis of t-Distributed Stochastic Neighbor Embedding (t-SNE) and Scattertext using Word2vec was performed. As a result of the analysis, first, it was confirmed that keywords such as "record management" (889 times), "analysis" (888 times), "archive" (742 times), "record" (562 times), and "utilization" (449 times) were treated as significant topics by researchers. Second, Word2vec analysis generated vector representations between keywords, and similarity distances were investigated and visualized using t-SNE and Scattertext. In the visualization results, the research area for record management was divided into two groups, with keywords such as "archiving," "national record management," "standardization," "official documents," and "record management systems" occurring frequently in the first group (past). On the other hand, keywords such as "community," "data," "record information service," "online," and "digital archives" in the second group (current) were garnering substantial focus.