• Title/Summary/Keyword: 이질성 학습

Search Result 40, Processing Time 0.024 seconds

A Reference Architecture for Blockchain-based Federated Learning (블록체인 기반 연합학습을 위한 레퍼런스 아키텍처)

  • Goh, Eunsu;Mun, Jong-Hyeon;Lee, Kwang-Kee;Sohn, Chae-bong
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.11a
    • /
    • pp.119-122
    • /
    • 2022
  • 연합학습은, 데이터 샘플을 보유하는 다수의 분산 에지 디바이스 또는 서버들이 원본 데이터를 공유하지 않고 기계학습 문제를 해결하기 위해 협력하는 기술로서, 각 클라이언트는 소유한 원본 데이터를 로컬모델 학습에만 사용함으로써, 데이터 소유자의 프라이버시를 보호하고, 데이터 소유 및 활용의 파편화 문제를 해결할 수 있다. 연합학습을 위해서는 통계적 이질성 및 시스템적 이질성 문제 해결이 필수적이며, 인공지능 모델 정확도와 시스템 성능을 향상하기 위한 다양한 연구가 진행되고 있다. 최근, 중앙서버 의존형 연합학습의 문제점을 극복하고, 데이터 무결성 및 추적성과 데이터 소유자 및 연합학습 참여자에게 보상을 효과적으로 제공하기 위한, 블록체인 융합 연합학습기술이 주목받고 있다. 본 연구에서는 이더리움 기반 블록체인 인프라와 호환되는 연합학습 레퍼런스 아키텍처를 정의 및 구현하고, 해당 아키텍처의 실용성과 확장성을 검증하기 위하여 대표적인 연합학습 알고리즘과 데이터셋에 대한 실험을 수행하였다.

  • PDF

Stress Affect Detection At Wearable Devices Via Clustered Federated Learning Based On Number of Samples Mahalanobis Distance (웨어러블 기기에서 데이터수 기반 마하라노비스 군집화 연합학습을 통한 스트레스 및 감정탐지)

  • Tae-Hwan Yoon;Bong-Jun Choi
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.764-767
    • /
    • 2024
  • 웨어러블 디바이스에서는 사용자의 다양한 메타데이터를 수집할 수 있다. 그러나 이런 개인정보를 함유하고 있는 데이터를 수집하는 것은 사용자에게 개인정보침해 위협을 야기한다. 때문에 본 논문에서는 개인정보보호를 통한 웨어러블 디바이스 데이터활용방안으로 연합학습을 채택하였다. 다만 기존 연합학습에서도 해결해야할 문제점들이 있다. 우리는 그중에서도 데이터이질성(Data Heterogeneity) 문제해결을 위해 군집화(Clustering) 방법을 활용하였다. 또한 기존의 코사인유사도 기반 군집화에서 파라미터중요도가 반영되지 않는다는 문제점을 해결하고자 데이터수 기반 마하라노비스거리(Number of Samples Mahalanobis Distance) 군집화 방법을 제시하였다. 이를 통해 WESAD(Werable Stress Affect Detection)데이터에서 피실험자의 데이터 이질성이 존재하는 상황에서 기존 연합학습보다 학습 안정성 측면에서 좋음을 보여주었다.

Temporal Fusion Transformers and Deep Learning Methods for Multi-Horizon Time Series Forecasting (Temporal Fusion Transformers와 심층 학습 방법을 사용한 다층 수평 시계열 데이터 분석)

  • Kim, InKyung;Kim, DaeHee;Lee, Jaekoo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.2
    • /
    • pp.81-86
    • /
    • 2022
  • Given that time series are used in various fields, such as finance, IoT, and manufacturing, data analytical methods for accurate time-series forecasting can serve to increase operational efficiency. Among time-series analysis methods, multi-horizon forecasting provides a better understanding of data because it can extract meaningful statistics and other characteristics of the entire time-series. Furthermore, time-series data with exogenous information can be accurately predicted by using multi-horizon forecasting methods. However, traditional deep learning-based models for time-series do not account for the heterogeneity of inputs. We proposed an improved time-series predicting method, called the temporal fusion transformer method, which combines multi-horizon forecasting with interpretable insights into temporal dynamics. Various real-world data such as stock prices, fine dust concentrates and electricity consumption were considered in experiments. Experimental results showed that our temporal fusion transformer method has better time-series forecasting performance than existing models.

Enhancing LoRA Fine-tuning Performance Using Curriculum Learning

  • Daegeon Kim;Namgyu Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.3
    • /
    • pp.43-54
    • /
    • 2024
  • Recently, there has been a lot of research on utilizing Language Models, and Large Language Models have achieved innovative results in various tasks. However, the practical application faces limitations due to the constrained resources and costs required to utilize Large Language Models. Consequently, there has been recent attention towards methods to effectively utilize models within given resources. Curriculum Learning, a methodology that categorizes training data according to difficulty and learns sequentially, has been attracting attention, but it has the limitation that the method of measuring difficulty is complex or not universal. Therefore, in this study, we propose a methodology based on data heterogeneity-based Curriculum Learning that measures the difficulty of data using reliable prior information and facilitates easy utilization across various tasks. To evaluate the performance of the proposed methodology, experiments were conducted using 5,000 specialized documents in the field of information communication technology and 4,917 documents in the field of healthcare. The results confirm that the proposed methodology outperforms traditional fine-tuning in terms of classification accuracy in both LoRA fine-tuning and full fine-tuning.

The Effects of Grouping by Middle School Students' Collectivism in Science Cooperative Learning and Their Perceptions (과학 협동학습에서 중학생들의 집단주의 성향에 따른 집단구성의 효과 및 학생들의 인식)

  • Joo, Young;Kim, Kyungsun;Noh, Taehee
    • Journal of The Korean Association For Science Education
    • /
    • v.32 no.10
    • /
    • pp.1551-1566
    • /
    • 2012
  • In this study, the effects of grouping by students' collectivism in cooperative learning strategy applied to middle school science classes on their academic achievement, science learning motivation, and perceptions of science learning environment were investigated. Students' perceptions of cooperative learning were also studied through survey and interview. The students were assigned to the control, heterogeneous, and homogeneous groups, and taught for 12 class hours. The analyses of results revealed that interactive effects between the instruction and the level of collectivism were found in the test scores of achievement, science learning motivation, and relevance, and that there were main effects in the test scores of confidence, perceptions of science learning environment, affiliation, and rule clarity. The achievement test scores of the students with low collectivism in the homogeneous group were significantly higher than those in the heterogeneous group. The test scores on science learning motivation and relevance of the students with high collectivism in the homogeneous and heterogeneous groups were significantly higher than those in the control group. In addition, the test scores of confidence and affiliation in the treatment groups were significantly higher than those in the control group. The test scores on perceptions of science learning environment and rule clarity in the homogeneous groups were significantly higher than those in the control group. There were also differences in the perceptions of science cooperative learning by students' collectivism.

Drivers' Rational Belief Formation under Bounded Traffic Environments (한정된 교통환경하에서 운전자의 합리적 신념형성에 관한 연구)

  • Do, Myeong-Sik
    • Journal of Korean Society of Transportation
    • /
    • v.25 no.3
    • /
    • pp.87-97
    • /
    • 2007
  • This paper proposes drivers' rational belief formation under a bounded traffic environment. This is to escape the criticism that excessive rationality (e.g., a driver's calculating ability and memory capacity) is required of drivers. Under bounded traffic environments. drivers do not have structural knowledge of traffic conditions and others' decisions. Simulations are carried out using a program coded in C. Consequently, the author found the learning process of drivers and the value of information can be differentiated by route conditions and the characteristics of driver groups. Also, it was found that rational drivers form different beliefs about traffic conditions even though they have the same traffic environment in a bounded traffic environment.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

Development of Intelligent Agent Based Inclination Test Grouping E-learning System (IIGS) (취향검사 지능적 에이전트기반 학습공동체 그룹핑 E-learning 시스템 설계 및 개발)

  • Kim, Myung-Sook;Cho, Young-Im
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.4
    • /
    • pp.544-553
    • /
    • 2005
  • In this paper, the research has been done to develop the inclination test items to form the desirable online learning community in which social interaction may be maximized, dropout rate lowered and learners' feeling of isolation eliminated. Once developed, the inclination test items have been classified into homogeneous ones and heterogeneous ones. And on the basis of the results of this research, Intelligent agent based Inclination Test Grouping e-learning System(IIGS) has been developed, which can perform automatic grouping of online leaning community by intelligent agent. The results of this research with 1,000 teachers in reality by means of developing the grouping system have shown that 151 groups are automatically formed. Among them, 34% have shown very high degree of learning satisfaction and intended to maintain the groups in the future.

  • PDF

An analysis of effect for grouping methods corresponding to ecological niche overlap of 7th graders' photosynthesis concepts (7학년 광합성 개념의 지위 중복 변화에 따른 소집단 구성의 효과 분석)

  • Jang, Hye-ji;Kim, Youngshin
    • Journal of Science Education
    • /
    • v.41 no.2
    • /
    • pp.195-212
    • /
    • 2017
  • Small group learning is an educational approach to allow students to solve the problems and to achieve a common goal. Especially, small group learning in science education is one of the most important educational approaches and effective to ensure understanding of a topic. Small group learning consisting of three students in science education maximize student understanding and learning efficiency. However, It is reported that the effects of small group learning on achievement show different results, corresponding to different grouping methods(homogeneous/heterogeneous). This study investigated the effects of grouping method on difference of ecological niche of photosynthesis concepts. To achieve this, 1107 7th students were composed of homogeneous and heterogeneous groups classified into top, middle, and bottom levels. The photosynthesis units were divided into four categories: the photosynthesizing place, the substances of photosynthesis, required materials for the photosynthesizing, and environmental factors affecting photosynthesis. A questionnaire was composed by selecting concepts having a frequency of 4% or more based on prior studies on the change of the ecological status of photosynthesis. The questionnaire was scored in terms of relativity and understanding on each of the proposed concepts in the four categories. The result of this study is as set forth below. 1) There was an enhancement of learning the concept of science in small group classes consisting of 3 students. 2) To enhance the average upon composing of a group, it is proposed that the group should be formed homogeneously, and to reduce the deviation between the members, it is proposed that the group should be formed heterogeneously. Through this study, it is expected that specific studies verifying the difference or effect on the duplicity of results are conducted based on the composition of groups.

Online news-based stock price forecasting considering homogeneity in the industrial sector (산업군 내 동질성을 고려한 온라인 뉴스 기반 주가예측)

  • Seong, Nohyoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.1-19
    • /
    • 2018
  • Since stock movements forecasting is an important issue both academically and practically, studies related to stock price prediction have been actively conducted. The stock price forecasting research is classified into structured data and unstructured data, and it is divided into technical analysis, fundamental analysis and media effect analysis in detail. In the big data era, research on stock price prediction combining big data is actively underway. Based on a large number of data, stock prediction research mainly focuses on machine learning techniques. Especially, research methods that combine the effects of media are attracting attention recently, among which researches that analyze online news and utilize online news to forecast stock prices are becoming main. Previous studies predicting stock prices through online news are mostly sentiment analysis of news, making different corpus for each company, and making a dictionary that predicts stock prices by recording responses according to the past stock price. Therefore, existing studies have examined the impact of online news on individual companies. For example, stock movements of Samsung Electronics are predicted with only online news of Samsung Electronics. In addition, a method of considering influences among highly relevant companies has also been studied recently. For example, stock movements of Samsung Electronics are predicted with news of Samsung Electronics and a highly related company like LG Electronics.These previous studies examine the effects of news of industrial sector with homogeneity on the individual company. In the previous studies, homogeneous industries are classified according to the Global Industrial Classification Standard. In other words, the existing studies were analyzed under the assumption that industries divided into Global Industrial Classification Standard have homogeneity. However, existing studies have limitations in that they do not take into account influential companies with high relevance or reflect the existence of heterogeneity within the same Global Industrial Classification Standard sectors. As a result of our examining the various sectors, it can be seen that there are sectors that show the industrial sectors are not a homogeneous group. To overcome these limitations of existing studies that do not reflect heterogeneity, our study suggests a methodology that reflects the heterogeneous effects of the industrial sector that affect the stock price by applying k-means clustering. Multiple Kernel Learning is mainly used to integrate data with various characteristics. Multiple Kernel Learning has several kernels, each of which receives and predicts different data. To incorporate effects of target firm and its relevant firms simultaneously, we used Multiple Kernel Learning. Each kernel was assigned to predict stock prices with variables of financial news of the industrial group divided by the target firm, K-means cluster analysis. In order to prove that the suggested methodology is appropriate, experiments were conducted through three years of online news and stock prices. The results of this study are as follows. (1) We confirmed that the information of the industrial sectors related to target company also contains meaningful information to predict stock movements of target company and confirmed that machine learning algorithm has better predictive power when considering the news of the relevant companies and target company's news together. (2) It is important to predict stock movements with varying number of clusters according to the level of homogeneity in the industrial sector. In other words, when stock prices are homogeneous in industrial sectors, it is important to use relational effect at the level of industry group without analyzing clusters or to use it in small number of clusters. When the stock price is heterogeneous in industry group, it is important to cluster them into groups. This study has a contribution that we testified firms classified as Global Industrial Classification Standard have heterogeneity and suggested it is necessary to define the relevance through machine learning and statistical analysis methodology rather than simply defining it in the Global Industrial Classification Standard. It has also contribution that we proved the efficiency of the prediction model reflecting heterogeneity.