• 제목/요약/키워드: data learning process

준지도학습 기반 반도체 공정 이상 상태 감지 및 분류 (Semi-Supervised Learning for Fault Detection and Classification of Plasma Etch Equipment)

  • 이용호;최정은;홍상진
    • 반도체디스플레이기술학회지
    • 제19권4호
    • pp.121-125
    • 2020
  • With miniaturization of semiconductor, the manufacturing process become more complex, and undetected small changes in the state of the equipment have unexpectedly changed the process results. Fault detection classification (FDC) system that conducts more active data analysis is feasible to achieve more precise manufacturing process control with advanced machine learning method. However, applying machine learning, especially in supervised learning criteria, requires an arduous data labeling process for the construction of machine learning data. In this paper, we propose a semi-supervised learning to minimize the data labeling work for the data preprocessing. We employed equipment status variable identification (SVID) data and optical emission spectroscopy data (OES) in silicon etch with SF6/O2/Ar gas mixture, and the result shows as high as 95.2% of labeling accuracy with the suggested semi-supervised learning algorithm.

Character Recognition Algorithm using Accumulation Mask

  • Yoo, Suk Won
    • International Journal of Advanced Culture Technology
    • 제6권2호
    • pp.123-128
    • 2018
  • Learning data is composed of 100 characters with 10 different fonts, and test data is composed of 10 characters with a new font that is not used for the learning data. In order to consider the variety of learning data with several different fonts, 10 learning masks are constructed by accumulating pixel values of same characters with 10 different fonts. This process eliminates minute difference of characters with different fonts. After finding maximum values of learning masks, test data is expanded by multiplying these maximum values to the test data. The algorithm calculates sum of differences of two corresponding pixel values of the expanded test data and the learning masks. The learning mask with the smallest value among these 10 calculated sums is selected as the result of the recognition process for the test data. The proposed algorithm can recognize various types of fonts, and the learning data can be modified easily by adding a new font. Also, the recognition process is easy to understand, and the algorithm makes satisfactory results for character recognition.

Advanced Information Data-interactive Learning System Effect for Creative Design Project

  • Park, Sangwoo;Lee, Inseop;Lee, Junseok;Sul, Sanghun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • 제16권8호
    • pp.2831-2845
    • 2022
  • Compared to the significant approach of project-based learning research, a data-driven design project-based learning has not reached a meaningful consensus regarding the most valid and reliable method for assessing design creativity. This article proposes an advanced information data-interactive learning system for creative design using a service design process that combines a design thinking. We propose a service framework to improve the convergence design process between students and advanced information data analysis, allowing students to participate actively in the data visualization and research using patent data. Solving a design problem by discovery and interpretation process, the Advanced information-interactive learning framework allows the students to verify the creative idea values or to ideate new factors and the associated various feasible solutions. The student can perform the patent data according to a business intelligence platform. Most of the new ideas for solving design projects are evaluated through complete patent data analysis and visualization in the beginning of the service design process. In this article, we propose to adapt advanced information data to educate the service design process, allowing the students to evaluate their own idea and define the problems iteratively until satisfaction. Quantitative evaluation results have shown that the advanced information data-driven learning system approach can improve the design project - based learning results in terms of design creativity. Our findings can contribute to data-driven project-based learning for advanced information data that play a crucial role in convergence design in related standards and other smart educational fields that are linked.

금융데이터의 성능 비교를 통한 연합학습 기법의 효용성 분석 (Utility Analysis of Federated Learning Techniques through Comparison of Financial Data Performance)

  • 장진혁;안윤수;최대선
    • 정보보호학회논문지
    • 제32권2호
    • pp.405-416
    • 2022
  • AI기술은 데이터 기반의 기계학습을 이용하여 삶의 질을 높여주고 있다. 기계학습을 이용시, 분산된 데이터를 전송해 한곳에 모으는 작업은 프라이버시 침해가 발생할 위험성이 있어 비식별화 과정을 거친다. 비식별화 데이터는 정보의 손상, 누락이 있어 기계학 습과정의 성능을 저하시키며 전처리과정을 복잡하게한다. 이에 구글이 2017년에 데이터의 비식별화와 데이터를 한 서버로 모으는 과정없이 학습하는 방법인 연합학습을 발표했다. 본 논문은 실제 금융데이터를 이용하여, K익명성, 차분프라이버시 재현데이터의 비식별과정을 거친 데이터의 학습성능과 연합학습의 성능간의 차이를 비교하여 효용성을 분석하였으며, 이를 통해 연합학습의 우수성을 보여주고자 한다. 실험결과 원본데이터 학습의 정확도는 91% K-익명성을 거친 데이터학습은 k=2일 때 정확도 79%, k=5일 때76%, k=7일 때 62%, 차분프라이버시를 사용한 데이터학습은 𝜖=2일 때 정확도 52%, 𝜖=1일 때 50%, 𝜖=0.1일 때 36% 재현데이터는 정확도 82%가 나왔으며 연합학습의 정확도는 86%로 두번째로 높은 성능을 보여 주었다.

개방형 e-Learning 플랫폼 기반 학습 프로세스 마이닝 기술 (Learning process mining techniques based on open education platforms)

  • 김현아
    • 문화기술의 융합
    • 제5권2호
    • pp.375-380
    • 2019
  • 본 논문의 핵심 주제는 개방형 교육 플랫폼 기반 학습 프로세스 마이닝 및 애널리틱스 기술로 최근에 관심과 사용이 급속히 증가하고 있는 MOOC(Massive Open Online Courseware) 등과 같은 개방형 교육 플랫폼을 기반으로 하는 개인별 학습 이력 로그로부터 학습 및 러닝 프로세스를 중심으로 하는 유의미한 학습 프로세스 지식을 발견하고 분석하기 위한 학습 프로세스 마이닝 프레임워크를 설계 및 구현하는 기술이다. 러한 프레임워크의 핵심 기술로서, 학습 프로세스의 표현, 추출, 분석, 가시화하는 기술과 이러한 마이닝 및 분석된 학습 프로세스 지식으로부터 개선된 학습 프로세스 관련 교육 서비스를 제공하는 기술로 구성된다.

Text Classification with Heterogeneous Data Using Multiple Self-Training Classifiers

  • William Xiu Shun Wong;Donghoon Lee;Namgyu Kim
    • Asia pacific journal of information systems
    • 제29권4호
    • pp.789-816
    • 2019
  • Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features generated for classification. Aside from proposing a new modified version of the existing algorithm or creating a new algorithm, we attempt to modify the use of data. The classifier performance is usually affected by the quality of learning data as the classifier is built based on these training data. We assume that the data from different domains might have different characteristics of noise, which can be utilized in the process of learning the classifier. Therefore, we attempt to enhance the robustness of the classifier by injecting the heterogeneous data artificially into the learning process in order to improve the classification accuracy. Semi-supervised approach was applied for utilizing the heterogeneous data in the process of learning the document classifier. However, the performance of document classifier might be degraded by the unlabeled data. Therefore, we further proposed an algorithm to extract only the documents that contribute to the accuracy improvement of the classifier.

기업교육을 위한 인터넷 원격훈련 학습과정 모니터링 연구 (Learning Process Monitoring of e-Learning for Corporate Education)

  • 김도헌;정효정
    • 산경연구논집
    • 제9권8호
    • pp.35-40
    • 2018
  • Purpose - The purpose of this study is to conduct a monitoring study on the learning process of e-learning contents. This study has two research objectives. First, by conducting monitoring research on the learning process, we aim to explore the implications for content development that reflects future student needs. Second, we want to collect empirical basic data on the estimation of appropriate amount of learning. Research design, data, and methodology - This study is a case study of learner's learning process in e-learning. After completion of the study, an in-depth interview was made after conducting a test to measure the total amount of cognitive load and the level of engagement that occurred during the learning process. The tool used to measure cognitive load is NASA-TLX, a subjective cognitive load measurement method. In the monitoring process, we observe external phenomena such as page movement and mouse movement path, and identify cognitive activities such as Think-Aloud technique. Results - In the total of three research subjects, the two courses showed excess learning time compared to the learning time, and one course showed less learning time than the learning time. This gives the following implications for content development. First, it is necessary to consider the importance of selecting the target and contents level according to the level of the subject. Second, it is necessary to design the learner participation activity that meets the learning goal level and to calculate the appropriate time accordingly. Third, it is necessary to design appropriate learning support strategy according to the learning task. This should be considered in designing lessons. Fourth, it is necessary to revitalize contents design centered on learning activities such as simulation. Conclusions - The implications of the examination system are as follows. First, it can be confirmed that there is difficulty in calculating the amount of learning centered on learning time and securing objective objectivity. Second, it can be seen that there are various variables affecting the actual learning time in addition to the content amount. Third, there is a need for reviewing the system of examination of learning amount centered on 'learning time'.

LIME을 활용한 준지도 학습 기반 이상 탐지 모델: 반도체 공정을 중심으로 (Anomaly Detection Model Based on Semi-Supervised Learning Using LIME: Focusing on Semiconductor Process)

  • 안강민;신주은;백동현
    • 산업경영시스템학회지
    • 제45권4호
    • pp.86-98
    • 2022
  • Recently, many studies have been conducted to improve quality by applying machine learning models to semiconductor manufacturing process data. However, in the semiconductor manufacturing process, the ratio of good products is much higher than that of defective products, so the problem of data imbalance is serious in terms of machine learning. In addition, since the number of features of data used in machine learning is very large, it is very important to perform machine learning by extracting only important features from among them to increase accuracy and utilization. This study proposes an anomaly detection methodology that can learn excellently despite data imbalance and high-dimensional characteristics of semiconductor process data. The anomaly detection methodology applies the LIME algorithm after applying the SMOTE method and the RFECV method. The proposed methodology analyzes the classification result of the anomaly classification model, detects the cause of the anomaly, and derives a semiconductor process requiring action. The proposed methodology confirmed applicability and feasibility through application of cases.

사용자 건강 상태알림 서비스의 상황인지를 위한 기계학습 모델의 학습 데이터 생성 방법 (Generating Training Dataset of Machine Learning Model for Context-Awareness in a Health Status Notification Service)

  • 문종혁;최종선;최재영
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • 제9권1호
    • pp.25-32
    • 2020
  • 다양한 분야에서 활용되는 상황인지 시스템은 상황정보를 획득하기 위한 추상화 과정에서 규칙 기반의 인공기능 기술이 기존에 사용되었다. 그러나 서비스에 대한 사용자의 요구사항이 다양해지고 사용되는 데이터의 증대로 규칙이 복잡해지면서 규칙 기반 모델의 유지보수와 비정형 데이터를 처리하는데 어려움이 있다. 이러한 한계점을 극복하기 위해 많은 연구들에서는 상황인지 시스템에 기계학습 기술을 적용하였으며, 이러한 기계학습 기반의 모델을 상황인지 시스템에 사용하기 위해서는 주기적으로 학습 데이터를 제공해야 한다. 이에 기계학습 기반 상황인지 시스템에 대한 선행연구에서는 여러 개의 기계학습 모델을 적용하기 위한 학습 데이터 생성, 제공 등의 과정을 보였으나 제한된 종류의 기계학습 모델만을 적용 가능하여 확장성이 고려되어야 한다. 본 논문은 기계학습 기반의 상황인지 시스템의 확장성을 고려한 기계학습 모델의 학습 데이터 생성 방법을 제안한다. 제안하는 방법은 시스템의 확장성을 고려하여 기계학습 모델의 요구사항을 반영할 수 있는 학습 데이터 생성 모델을 정의하고 학습 데이터 생성 모듈을 바탕으로 각각의 기계학습 모델의 학습 데이터를 생성하는 것이다. 시스템의 확장성의 검증을 위해 실험에서는 노인의 건강상태 알림 서비스를 위한 심박상태 분석 모델을 대상으로 한 학습데이터 생성 스키마를 기반으로 학습데이터 생성 모델을 정의하고 실환경에서 정의된 모델을 S/W에 적용하여 학습데이터를 생성한다. 또한 생성된 학습데이터의 유효성을 검증하기 위해 사용되는 기계학습 모델에 생성한 학습데이터를 학습시켜 정확도를 비교하는 과정을 보인다.

Privacy-Preserving Deep Learning using Collaborative Learning of Neural Network Model

  • Hye-Kyeong Ko
    • International journal of advanced smart convergence
    • 제12권2호
    • pp.56-66
    • 2023
  • The goal of deep learning is to extract complex features from multidimensional data use the features to create models that connect input and output. Deep learning is a process of learning nonlinear features and functions from complex data, and the user data that is employed to train deep learning models has become the focus of privacy concerns. Companies that collect user's sensitive personal information, such as users' images and voices, own this data for indefinite period of times. Users cannot delete their personal information, and they cannot limit the purposes for which the data is used. The study has designed a deep learning method that employs privacy protection technology that uses distributed collaborative learning so that multiple participants can use neural network models collaboratively without sharing the input datasets. To prevent direct leaks of personal information, participants are not shown the training datasets during the model training process, unlike traditional deep learning so that the personal information in the data can be protected. The study used a method that can selectively share subsets via an optimization algorithm that is based on modified distributed stochastic gradient descent, and the result showed that it was possible to learn with improved learning accuracy while protecting personal information.