• Title/Summary/Keyword: learning distribution

Search Result 981, Processing Time 0.027 seconds

Discretization of Continuous-Valued Attributes considering Data Distribution (데이터 분포를 고려한 연속 값 속성의 이산화)

  • Lee, Sang-Hoon;Park, Jung-Eun;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.4
    • /
    • pp.391-396
    • /
    • 2003
  • This paper proposes a new approach that converts continuous-valued attributes to categorical-valued ones considering the distribution of target attributes(classes). In this approach, It can be possible to get optimal interval boundaries by considering the distribution of data itself without any requirements of parameters. For each attributes, the distribution of target attributes is projected to one-dimensional space. And this space is clustered according to the criteria like as the density value of each target attributes and the amount of overlapped areas among each density values of target attributes. Clusters which are made in this ways are based on the probabilities that can predict a target attribute of instances. Therefore it has an interval boundaries that minimize a loss of information of original data. An improved performance of proposed discretization method can be validated using C4.5 algorithm and UCI Machine Learning Data Repository data sets.

Stock News Dataset Quality Assessment by Evaluating the Data Distribution and the Sentiment Prediction

  • Alasmari, Eman;Hamdy, Mohamed;Alyoubi, Khaled H.;Alotaibi, Fahd Saleh
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.1-8
    • /
    • 2022
  • This work provides a reliable and classified stocks dataset merged with Saudi stock news. This dataset allows researchers to analyze and better understand the realities, impacts, and relationships between stock news and stock fluctuations. The data were collected from the Saudi stock market via the Corporate News (CN) and Historical Data Stocks (HDS) datasets. As their names suggest, CN contains news, and HDS provides information concerning how stock values change over time. Both datasets cover the period from 2011 to 2019, have 30,098 rows, and have 16 variables-four of which they share and 12 of which differ. Therefore, the combined dataset presented here includes 30,098 published news pieces and information about stock fluctuations across nine years. Stock news polarity has been interpreted in various ways by native Arabic speakers associated with the stock domain. Therefore, this polarity was categorized manually based on Arabic semantics. As the Saudi stock market massively contributes to the international economy, this dataset is essential for stock investors and analyzers. The dataset has been prepared for educational and scientific purposes, motivated by the scarcity of data describing the impact of Saudi stock news on stock activities. It will, therefore, be useful across many sectors, including stock market analytics, data mining, statistics, machine learning, and deep learning. The data evaluation is applied by testing the data distribution of the categories and the sentiment prediction-the data distribution over classes and sentiment prediction accuracy. The results show that the data distribution of the polarity over sectors is considered a balanced distribution. The NB model is developed to evaluate the data quality based on sentiment classification, proving the data reliability by achieving 68% accuracy. So, the data evaluation results ensure dataset reliability, readiness, and high quality for any usage.

A Novel Two-Stage Training Method for Unbiased Scene Graph Generation via Distribution Alignment

  • Dongdong Jia;Meili Zhou;Wei WEI;Dong Wang;Zongwen Bai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.12
    • /
    • pp.3383-3397
    • /
    • 2023
  • Scene graphs serve as semantic abstractions of images and play a crucial role in enhancing visual comprehension and reasoning. However, the performance of Scene Graph Generation is often compromised when working with biased data in real-world situations. While many existing systems focus on a single stage of learning for both feature extraction and classification, some employ Class-Balancing strategies, such as Re-weighting, Data Resampling, and Transfer Learning from head to tail. In this paper, we propose a novel approach that decouples the feature extraction and classification phases of the scene graph generation process. For feature extraction, we leverage a transformer-based architecture and design an adaptive calibration function specifically for predicate classification. This function enables us to dynamically adjust the classification scores for each predicate category. Additionally, we introduce a Distribution Alignment technique that effectively balances the class distribution after the feature extraction phase reaches a stable state, thereby facilitating the retraining of the classification head. Importantly, our Distribution Alignment strategy is model-independent and does not require additional supervision, making it applicable to a wide range of SGG models. Using the scene graph diagnostic toolkit on Visual Genome and several popular models, we achieved significant improvements over the previous state-of-the-art methods with our model. Compared to the TDE model, our model improved mR@100 by 70.5% for PredCls, by 84.0% for SGCls, and by 97.6% for SGDet tasks.

A Study on Analyses of e-Learning Contents Development Cost and Rational Alternatives for Policy Making (이러닝 콘텐츠 개발단가 분석과 합리화 정책방안 연구)

  • Han, Tae-In
    • Journal of Digital Convergence
    • /
    • v.10 no.6
    • /
    • pp.361-368
    • /
    • 2012
  • The e-Learning contents producing industry has been situated at the difficult status because of excessive competition and small-scale business company in unprofitable contents development market. One of the most important issues is low cost for e-Learning contents development. This paper is focus on analysis of e-Learning contents development cost and suggest the rational alternatives for policy making. In order to make successful study, this paper tell about various contents development cost, comparison analysis among them, and e-Learning contents development cost model. As a result of the study, this paper suggest the rational alternatives of of policy making for e-Learning contents development cost.

Japanese Nursing Students' Learning Experience, Self-directed Learning Ability, and Self-efficacy in Nursing Practice Utilizing Portfolios (일본 간호학생의 학습포트폴리오를 활용한 임상실습교육의 학습경험과 자기주도학습능력 및 자기효능감)

  • Lee, Hye Young;Shimotakahara, Rie;Kim, Hye Weon;Ogata, Shige Mitsu
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.23 no.3
    • /
    • pp.279-289
    • /
    • 2017
  • Purpose: The purpose of this study is to investigate the learning experience, self-directed learning ability and self-efficacy of Japanese nursing students undergoing portfolio-based clinical practicums. Methods: The self-directed learning ability and self-efficacy of nursing students were examined using two scales. And using a text-mining approach, we constructed correspondence analysis followed by cluster analysis of open-ended responses forms. Results: The mean score of the self-directed learning ability was $60.89{\pm}5.28$ and the generalized self-efficacy was $68.37{\pm}11.56$. Moreover, the scores in the self-directed learning ability were positively correlated with scores in the generalized self-efficacy. In correspondence analysis, the distribution of extracted words showed that record was located on the negative side of the third quadrant, to the first principal component and that patient was located on the positive side of the first quadrant, contributing greatly to the second principal component. Conclusion: The results of this study contribute to approaching to "confidence, pride, stability," "growth and intention to development'' offers a key in developing self-directed learning ability. Students record what they see and learn the importance of visualizing it in learning portfolios. "Expression in detail of the learned contents" and "concerning to which objective evaluation is suggested" are important to the students.

Effects of Executive Compassion and Forgiving Behavior on Organizational Activities and Performance (중소기업에서 경영자의 배려와 용서가 학습조직 활동과 조직성과에 미치는 영향)

  • Park, Soo-Yong;Hawang, Moon-Young;Chol, Eun-Soo
    • Journal of Distribution Science
    • /
    • v.13 no.6
    • /
    • pp.105-118
    • /
    • 2015
  • Purpose - Currently, strengthening small and medium-sized enterprises (SME) in terms of competitiveness is a key economic issue. However, the problem is that many SMEs lack the internal competence required to cope with a rapidly changing market structure. Such problems can act as an obstacle to economic development, yet most SMEs in Korea are dealing with this problem today. A company's source of competitive advantage is changing from quantity to quality, facility to knowledge, and hardwork to creativity. Under such circumstances, a company should place learning and sharing of knowledge and continuously creating new knowledge as its priority. This study aims to identify the effect of a chief executive officer's (CEO) compassion and forgiveness - positive factors in organizational emotion - on learning organization activities and organizational performance, through a theoretical comparison. Research design, data, and methodology - For this study, SMEs based in Daejeon and Chungcheong area were selected. To secure credibility of the data, the subjects were selected among those who have been working at the business for six months or longer. The survey was conducted for 30 days from March 5, 2015 to April 5, 2015. Both offline and online surveys were conducted. Fifty companies were chosen and 700 questionnaires were distributed, with 506 used for analysis. Fifty subject companies (25 from Daejeon, 10 from Chungnam, 10 from Chungbuk, and five from Sejong) were selected and the objective, target, and survey content were explained to a manager at each company either face-to-face or on the phone. Of the total of 700 questionnaires distributed via mail or e-mail, 78.6% or 550 copies were returned. Excluding 44 insufficient questionnaires, the remainder, 506 questionnaires, were used for analysis. Results - This study analyzed how the CEO's compassion and forgiveness affects learning organization activities and organizational performance. First, compassion of the CEO at the SMEs directly affected the learning organization activities and indirectly affected the organizational performance. Second, forgiveness of the CEO at the SMEs did not affect the learning organization activities and organizational performance directly or indirectly. Conclusions - The study conclusions are as follows. First, CEO compassionate behavior at the SMEs was a significant variable that directly and indirectly affected learning organization activities and organizational performance. Therefore, the CEO of an SME can create a positive organizational atmosphere through compassionate behaviors in the organization. Second, the forgiving behavior of the CEO did not have direct or indirect effects on learning organization activities and organizational performance. However, the reason for a CEO to continue his or her forgiving behavior is because it strengthens employee resilience, commitment, and self-efficacy to protect the organization from negative influences such as layoffs, risks, and wrongdoings. The action of forgiveness does not have direct or indirect effects. However, the CEO shall continue such behavior to strengthen members' physiological resilience, commitment, and self - effectiveness, and to protect the organization from risks including layoff and external negative factors.

Evaluation of Machine Learning Algorithm Utilization for Lung Cancer Classification Based on Gene Expression Levels

  • Podolsky, Maxim D;Barchuk, Anton A;Kuznetcov, Vladimir I;Gusarova, Natalia F;Gaidukov, Vadim S;Tarakanov, Segrey A
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.2
    • /
    • pp.835-838
    • /
    • 2016
  • Background: Lung cancer remains one of the most common cancers in the world, both in terms of new cases (about 13% of total per year) and deaths (nearly one cancer death in five), because of the high case fatality. Errors in lung cancer type or malignant growth determination lead to degraded treatment efficacy, because anticancer strategy depends on tumor morphology. Materials and Methods: We have made an attempt to evaluate effectiveness of machine learning algorithms in the task of lung cancer classification based on gene expression levels. We processed four publicly available data sets. The Dana-Farber Cancer Institute data set contains 203 samples and the task was to classify four cancer types and sound tissue samples. With the University of Michigan data set of 96 samples, the task was to execute a binary classification of adenocarcinoma and non-neoplastic tissues. The University of Toronto data set contains 39 samples and the task was to detect recurrence, while with the Brigham and Women's Hospital data set of 181 samples it was to make a binary classification of malignant pleural mesothelioma and adenocarcinoma. We used the k-nearest neighbor algorithm (k=1, k=5, k=10), naive Bayes classifier with assumption of both a normal distribution of attributes and a distribution through histograms, support vector machine and C4.5 decision tree. Effectiveness of machine learning algorithms was evaluated with the Matthews correlation coefficient. Results: The support vector machine method showed best results among data sets from the Dana-Farber Cancer Institute and Brigham and Women's Hospital. All algorithms with the exception of the C4.5 decision tree showed maximum potential effectiveness in the University of Michigan data set. However, the C4.5 decision tree showed best results for the University of Toronto data set. Conclusions: Machine learning algorithms can be used for lung cancer morphology classification and similar tasks based on gene expression level evaluation.

The Effects of an English Lecture for a Korean Business Student: Enhancing Understanding and Learning Outcomes (유통기업을 위한 대학의 영어전공강의 성과분석: 이해도 제고와 학습성과를 중심으로)

  • Kim, Myoung-Sook;Kang, Shin-Ae
    • Journal of Distribution Science
    • /
    • v.14 no.10
    • /
    • pp.127-136
    • /
    • 2016
  • Purpose - This study investigated the effects of lectures in the English medium (EML) on understanding and learning outcomes. Sixty percent of EML lectures in Korea also use Korean for further support. Thus, this situation needs to clearly distinguish the specific impacts of the EML classes on learning outcomes. Here, we use the same English materials, including PowerPoint slides and video content, given in the Korean and English lectures. The difference between the lectures becomes only whether the lecture is delivered in Korean or English. Thus, we can clearly identify whether the language difference makes any difference in learning outcomes. Research design, data, and methodology - Our sample consisted of 91 students taking an international business course the spring of 2015. All course materials, including textbooks, PowerPoint slides, exams, video, and support content, were presented in English. Survey data and exam results were used. Students filled out their student identification number and name, so we could match the surveys against the exam results. Results - First, results show that whether the lecture was delivered in English or Korean was an important factor when students chose the class. Second, English proficiency related to international business and general English levels were higher in the English class than in the Korean class. However, the understanding of key concepts and reading abilities of international business newspapers were the same for students in both classes. Third, teaching materials and lectures were the most important material for the understanding of key concepts in the business major. Fourth, the exam results showed no difference in performance of the students in the English versus the Korean class. This shows that EML classes were not necessarily detrimental to the understanding of major concepts of the lecture. Thus, it is important that researchers carefully design empirical settings to study the effectiveness of EML. Conclusions - The English lecture can be as helpful for enhancing knowledge in the business major as the Korean lecture. For further research, various English lecture forms can be considered to distinguish the effects of the English lecture.

Development of Auto Tracking System for Baseball Pitching (투구된 공의 실시간 위치 자동추적 시스템 개발)

  • Lee, Ki-Chung;Bae, Sung-Jae;Shin, In-Sik
    • Korean Journal of Applied Biomechanics
    • /
    • v.17 no.1
    • /
    • pp.81-90
    • /
    • 2007
  • The effort identifying positioning information of the moving object in real time has been a issue not only in sport biomechanics but also other academic areas. In order to solve this issue, this study tried to track the movement of a pitched ball that might provide an easier prediction because of a clear focus and simple movement of the object. Machine learning has been leading the research of extracting information from continuous images such as object tracking. Though the rule-based methods in artificial intelligence prevailed for decades, it has evolved into the methods of statistical approach that finds the maximum a posterior location in the image. The development of machine learning, accompanied by the development of recording technology and computational power of computer, made it possible to extract the trajectory of pitched baseball from recorded images. We present a method of baseball tracking, based on object tracking methods in machine learning. We introduce three state-of-the-art researches regarding the object tracking and show how we can combine these researches to yield a novel engine that finds trajectory from continuous pitching images. The first research is about mean shift method which finds the mode of a supposed continuous distribution from a set of data. The second research is about the research that explains how we can find the mode and object region effectively when we are given the previous image's location of object and the region. The third is about the research of representing data into features that we can deal with. From those features, we can establish a distribution to generate a set of data for mean shift. In this paper, we combine three works to track baseball's location in the continuous image frames. From the information of locations from two sets of images, we can reconstruct the real 3-D trajectory of pitched ball. We show how this works in real pitching images.

An Efficient Deep Learning Ensemble Using a Distribution of Label Embedding

  • Park, Saerom
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.27-35
    • /
    • 2021
  • In this paper, we propose a new stacking ensemble framework for deep learning models which reflects the distribution of label embeddings. Our ensemble framework consists of two phases: training the baseline deep learning classifier, and training the sub-classifiers based on the clustering results of label embeddings. Our framework aims to divide a multi-class classification problem into small sub-problems based on the clustering results. The clustering is conducted on the label embeddings obtained from the weight of the last layer of the baseline classifier. After clustering, sub-classifiers are constructed to classify the sub-classes in each cluster. From the experimental results, we found that the label embeddings well reflect the relationships between classification labels, and our ensemble framework can improve the classification performance on a CIFAR 100 dataset.