• Title/Summary/Keyword: heterogeneous data learning

Search Result 101, Processing Time 0.025 seconds

Causal inference from nonrandomized data: key concepts and recent trends (비실험 자료로부터의 인과 추론: 핵심 개념과 최근 동향)

  • Choi, Young-Geun;Yu, Donghyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.173-185
    • /
    • 2019
  • Causal questions are prevalent in scientific research, for example, how effective a treatment was for preventing an infectious disease, how much a policy increased utility, or which advertisement would give the highest click rate for a given customer. Causal inference theory in statistics interprets those questions as inferring the effect of a given intervention (treatment or policy) in the data generating process. Causal inference has been used in medicine, public health, and economics; in addition, it has received recent attention as a tool for data-driven decision making processes. Many recent datasets are observational, rather than experimental, which makes the causal inference theory more complex. This review introduces key concepts and recent trends of statistical causal inference in observational studies. We first introduce the Neyman-Rubin's potential outcome framework to formularize from causal questions to average treatment effects as well as discuss popular methods to estimate treatment effects such as propensity score approaches and regression approaches. For recent trends, we briefly discuss (1) conditional (heterogeneous) treatment effects and machine learning-based approaches, (2) curse of dimensionality on the estimation of treatment effect and its remedies, and (3) Pearl's structural causal model to deal with more complex causal relationships and its connection to the Neyman-Rubin's potential outcome model.

Estimation of Fractional Urban Tree Canopy Cover through Machine Learning Using Optical Satellite Images (기계학습을 이용한 광학 위성 영상 기반의 도시 내 수목 피복률 추정)

  • Sejeong Bae ;Bokyung Son ;Taejun Sung ;Yeonsu Lee ;Jungho Im ;Yoojin Kang
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.1009-1029
    • /
    • 2023
  • Urban trees play a vital role in urban ecosystems,significantly reducing impervious surfaces and impacting carbon cycling within the city. Although previous research has demonstrated the efficacy of employing artificial intelligence in conjunction with airborne light detection and ranging (LiDAR) data to generate urban tree information, the availability and cost constraints associated with LiDAR data pose limitations. Consequently, this study employed freely accessible, high-resolution multispectral satellite imagery (i.e., Sentinel-2 data) to estimate fractional tree canopy cover (FTC) within the urban confines of Suwon, South Korea, employing machine learning techniques. This study leveraged a median composite image derived from a time series of Sentinel-2 images. In order to account for the diverse land cover found in urban areas, the model incorporated three types of input variables: average (mean) and standard deviation (std) values within a 30-meter grid from 10 m resolution of optical indices from Sentinel-2, and fractional coverage for distinct land cover classes within 30 m grids from the existing level 3 land cover map. Four schemes with different combinations of input variables were compared. Notably, when all three factors (i.e., mean, std, and fractional cover) were used to consider the variation of landcover in urban areas(Scheme 4, S4), the machine learning model exhibited improved performance compared to using only the mean of optical indices (Scheme 1). Of the various models proposed, the random forest (RF) model with S4 demonstrated the most remarkable performance, achieving R2 of 0.8196, and mean absolute error (MAE) of 0.0749, and a root mean squared error (RMSE) of 0.1022. The std variable exhibited the highest impact on model outputs within the heterogeneous land covers based on the variable importance analysis. This trained RF model with S4 was then applied to the entire Suwon region, consistently delivering robust results with an R2 of 0.8702, MAE of 0.0873, and RMSE of 0.1335. The FTC estimation method developed in this study is expected to offer advantages for application in various regions, providing fundamental data for a better understanding of carbon dynamics in urban ecosystems in the future.

A Methodology of Customer Churn Prediction based on Two-Dimensional Loyalty Segmentation (이차원 고객충성도 세그먼트 기반의 고객이탈예측 방법론)

  • Kim, Hyung Su;Hong, Seung Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.111-126
    • /
    • 2020
  • Most industries have recently become aware of the importance of customer lifetime value as they are exposed to a competitive environment. As a result, preventing customers from churn is becoming a more important business issue than securing new customers. This is because maintaining churn customers is far more economical than securing new customers, and in fact, the acquisition cost of new customers is known to be five to six times higher than the maintenance cost of churn customers. Also, Companies that effectively prevent customer churn and improve customer retention rates are known to have a positive effect on not only increasing the company's profitability but also improving its brand image by improving customer satisfaction. Predicting customer churn, which had been conducted as a sub-research area for CRM, has recently become more important as a big data-based performance marketing theme due to the development of business machine learning technology. Until now, research on customer churn prediction has been carried out actively in such sectors as the mobile telecommunication industry, the financial industry, the distribution industry, and the game industry, which are highly competitive and urgent to manage churn. In addition, These churn prediction studies were focused on improving the performance of the churn prediction model itself, such as simply comparing the performance of various models, exploring features that are effective in forecasting departures, or developing new ensemble techniques, and were limited in terms of practical utilization because most studies considered the entire customer group as a group and developed a predictive model. As such, the main purpose of the existing related research was to improve the performance of the predictive model itself, and there was a relatively lack of research to improve the overall customer churn prediction process. In fact, customers in the business have different behavior characteristics due to heterogeneous transaction patterns, and the resulting churn rate is different, so it is unreasonable to assume the entire customer as a single customer group. Therefore, it is desirable to segment customers according to customer classification criteria, such as loyalty, and to operate an appropriate churn prediction model individually, in order to carry out effective customer churn predictions in heterogeneous industries. Of course, in some studies, there are studies in which customers are subdivided using clustering techniques and applied a churn prediction model for individual customer groups. Although this process of predicting churn can produce better predictions than a single predict model for the entire customer population, there is still room for improvement in that clustering is a mechanical, exploratory grouping technique that calculates distances based on inputs and does not reflect the strategic intent of an entity such as loyalties. This study proposes a segment-based customer departure prediction process (CCP/2DL: Customer Churn Prediction based on Two-Dimensional Loyalty segmentation) based on two-dimensional customer loyalty, assuming that successful customer churn management can be better done through improvements in the overall process than through the performance of the model itself. CCP/2DL is a series of churn prediction processes that segment two-way, quantitative and qualitative loyalty-based customer, conduct secondary grouping of customer segments according to churn patterns, and then independently apply heterogeneous churn prediction models for each churn pattern group. Performance comparisons were performed with the most commonly applied the General churn prediction process and the Clustering-based churn prediction process to assess the relative excellence of the proposed churn prediction process. The General churn prediction process used in this study refers to the process of predicting a single group of customers simply intended to be predicted as a machine learning model, using the most commonly used churn predicting method. And the Clustering-based churn prediction process is a method of first using clustering techniques to segment customers and implement a churn prediction model for each individual group. In cooperation with a global NGO, the proposed CCP/2DL performance showed better performance than other methodologies for predicting churn. This churn prediction process is not only effective in predicting churn, but can also be a strategic basis for obtaining a variety of customer observations and carrying out other related performance marketing activities.

Analysis of Verbal Interaction within a Homogeneous Group in Inquiry Activity of the 'Use of Lenses' Unit in Elementary School (초등학교 '렌즈의 이용' 단원 탐구활동에서 나타나는 동질 모둠별 언어적 상호작용의 특징 분석)

  • Chung, Hee-Jung;Kwon, Gyeong-Pil
    • Korean Journal of Optics and Photonics
    • /
    • v.28 no.6
    • /
    • pp.327-333
    • /
    • 2017
  • The purpose of this research was to analyze characteristics of verbal interactions of each homogeneous group in the learning of the 6th grade's 'Use of Lenses' Unit. For this research, six learning sessions were conducted in one 6th grade class composed of a high-academic-achievement group, an intermediate-academic-achievement group, and a low-academic-achievement group. All lessons were recorded, to analyze the verbal interactions of each group, and the transcribed data were analyzed using the verbal-interaction analytic framework. Results included: In the upper group, although opinions were presented more frequently, there were many negative verbal interactions in completing the tasks. The middle group was observed more specifically to accept peer opinions critically in their observational activities. The middle group's members were more active in presenting their opinions than listening to others' opinions. The lower group had difficulties in drawing conclusions because of a lack of ability to persuade peers or to respect the opinions of peers, even though the frequency of verbal interactions was higher than in other groups. Therefore, a homogeneous group structure is good for a simple activity involving a simple inquiry or an exchange of opinions, while a heterogeneous group structure is more effective in activities focused on understanding scientific concepts and knowledge.

Multi-Object Goal Visual Navigation Based on Multimodal Context Fusion (멀티모달 맥락정보 융합에 기초한 다중 물체 목표 시각적 탐색 이동)

  • Jeong Hyun Choi;In Cheol Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.9
    • /
    • pp.407-418
    • /
    • 2023
  • The Multi-Object Goal Visual Navigation(MultiOn) is a visual navigation task in which an agent must visit to multiple object goals in an unknown indoor environment in a given order. Existing models for the MultiOn task suffer from the limitation that they cannot utilize an integrated view of multimodal context because use only a unimodal context map. To overcome this limitation, in this paper, we propose a novel deep neural network-based agent model for MultiOn task. The proposed model, MCFMO, uses a multimodal context map, containing visual appearance features, semantic features of environmental objects, and goal object features. Moreover, the proposed model effectively fuses these three heterogeneous features into a global multimodal context map by using a point-wise convolutional neural network module. Lastly, the proposed model adopts an auxiliary task learning module to predict the observation status, goal direction and the goal distance, which can guide to learn the navigational policy efficiently. Conducting various quantitative and qualitative experiments using the Habitat-Matterport3D simulation environment and scene dataset, we demonstrate the superiority of the proposed model.

Automatic Target Recognition Study using Knowledge Graph and Deep Learning Models for Text and Image data (지식 그래프와 딥러닝 모델 기반 텍스트와 이미지 데이터를 활용한 자동 표적 인식 방법 연구)

  • Kim, Jongmo;Lee, Jeongbin;Jeon, Hocheol;Sohn, Mye
    • Journal of Internet Computing and Services
    • /
    • v.23 no.5
    • /
    • pp.145-154
    • /
    • 2022
  • Automatic Target Recognition (ATR) technology is emerging as a core technology of Future Combat Systems (FCS). Conventional ATR is performed based on IMINT (image information) collected from the SAR sensor, and various image-based deep learning models are used. However, with the development of IT and sensing technology, even though data/information related to ATR is expanding to HUMINT (human information) and SIGINT (signal information), ATR still contains image oriented IMINT data only is being used. In complex and diversified battlefield situations, it is difficult to guarantee high-level ATR accuracy and generalization performance with image data alone. Therefore, we propose a knowledge graph-based ATR method that can utilize image and text data simultaneously in this paper. The main idea of the knowledge graph and deep model-based ATR method is to convert the ATR image and text into graphs according to the characteristics of each data, align it to the knowledge graph, and connect the heterogeneous ATR data through the knowledge graph. In order to convert the ATR image into a graph, an object-tag graph consisting of object tags as nodes is generated from the image by using the pre-trained image object recognition model and the vocabulary of the knowledge graph. On the other hand, the ATR text uses the pre-trained language model, TF-IDF, co-occurrence word graph, and the vocabulary of knowledge graph to generate a word graph composed of nodes with key vocabulary for the ATR. The generated two types of graphs are connected to the knowledge graph using the entity alignment model for improvement of the ATR performance from images and texts. To prove the superiority of the proposed method, 227 documents from web documents and 61,714 RDF triples from dbpedia were collected, and comparison experiments were performed on precision, recall, and f1-score in a perspective of the entity alignment..

Development and Application of the Scientific Inquiry Tasks for Small Group Argumentation (소집단의 논변활동을 위한 과학 탐구 과제의 개발과 적용)

  • Yun, Sun-Mi;Kim, Heui-Baik
    • Journal of The Korean Association For Science Education
    • /
    • v.31 no.5
    • /
    • pp.694-708
    • /
    • 2011
  • In this study, we developed tasks including cognitive scaffolding for students to explain scientific phenomena using valid evidences in science classroom and sought to investigate how tasks influence the development of small group scientific argumentation. Heterogeneous small groups in gender and achievement were organized in one classroom and the tasks were applied to the class. Students were asked to write down their own ideas, share individual ideas, and then choose the most plausible opinion in a group. One group was chosen for investigating the effect of tasks on the development of small group argumentation through the analysis of discourse transcripts of the group in 10 lessons, students' semi-structured interview, field note, and students' pre- and post argument tests. The discrepant argument examples were included in the tasks for students to refute an argument presenting evidences. Moreover, comparing opinion within the group and persuading others were included in the tasks to prompt small group argumentation. As a result, students' post-argument test grades were increased than pre-test grades, and they argued involving evidences and reasoning. The high level of arguments has appeared with high ratio of advanced utterances and lengthening of reasoning chain as lessons went on. Students had elaborate claims involving valid evidences and reasoning by reflective and critical thinking while discussing about the tasks. In addition, tasks which could have various warrants based on the data led to students' spontaneous participation. Therefore, this study has significance in understanding the context of developing small group argumentation, providing information about teaching and learning context prompting students to construct arguments in science inquiry lessons in middle school.

Entry, Exit, and Aggregate Productivity Growth: Evidence on Korean Manufacturing (진입·퇴출의 창조적 파괴과정과 총요소생산성 증가에 대한 실증분석)

  • Hahn, Chin Hee
    • KDI Journal of Economic Policy
    • /
    • v.25 no.2
    • /
    • pp.3-53
    • /
    • 2003
  • Using the plant level panel data on Korean manufacturing during 1990-98 period, this study tries to assess the role of entry and exit in enhancing aggregate productivity, both qualitatively and quantitatively. Main findings of this study are summarized as follows. First, plant entry and exit rates in Korean manufacturing seem quite high: they are higher than in the U.S. or several developing countries for which comparable studies exist. Second, in line with existing studies on other countries, plant turnovers reflect underlying productivity differential in Korean manufacturing, with the "shadow of death" effect as well as selection and learning effects all present. Third, plant entry and exit account for as much as 45 and 65 percent in manufacturing productivity growth during cyclical upturn and downturn, respectively. The findings of this study show that the entry and exit of plants has been an important source of productivity growth in Korean manufacturing. Plant birth and death are mainly a process of resource reallocation from plants with relatively low and declining productivity to a group of heterogeneous plants, some of which have the potential to become highly efficient in future. The most obvious lesson from this study is that it is important to establish policy or institutional environment where efficient businesses can succeed and inefficient businesses fail.

  • PDF

Classification of latent classes and analysis of influencing factors on longitudinal changes in middle school students' mathematics interest and achievement: Using multivariate growth mixture model (중학생들의 수학 흥미와 성취도의 종단적 변화에 따른 잠재집단 분류 및 영향요인 탐색: 다변량 성장혼합모형을 이용하여)

  • Rae Yeong Kim;Sooyun Han
    • The Mathematical Education
    • /
    • v.63 no.1
    • /
    • pp.19-33
    • /
    • 2024
  • This study investigates longitudinal patterns in middle school students' mathematics interest and achievement using panel data from the 4th to 6th year of the Gyeonggi Education Panel Study. Results from the multivariate growth mixture model confirmed the existence of heterogeneous characteristics in the longitudinal trajectory of students' mathematics interest and achievement. Students were classified into four latent classes: a low-level class with weak interest and achievement, a high-level class with strong interest and achievement, a middlelevel-increasing class where interest and achievement rise with grade, and a middle-level-decreasing class where interest and achievement decline with grade. Each class exhibited distinct patterns in the change of interest and achievement. Moreover, an examination of the correlation between intercepts and slopes in the multivariate growth mixture model reveals a positive association between interest and achievement with respect to their initial values and growth rates. We further explore predictive variables influencing latent class assignment. The results indicated that students' educational ambition and time spent on private education positively affect mathematics interest and achievement, and the influence of prior learning varies based on its intensity. The perceived instruction method significantly impacts latent class assignment: teacher-centered instruction increases the likelihood of belonging to higher-level classes, while learner-centered instruction increases the likelihood of belonging to lower-level classes. This study has significant implications as it presents a new method for analyzing the longitudinal patterns of students' characteristics in mathematics education through the application of the multivariate growth mixture model.

A Study on Automatic Classification Model of Documents Based on Korean Standard Industrial Classification (한국표준산업분류를 기준으로 한 문서의 자동 분류 모델에 관한 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.221-241
    • /
    • 2018
  • As we enter the knowledge society, the importance of information as a new form of capital is being emphasized. The importance of information classification is also increasing for efficient management of digital information produced exponentially. In this study, we tried to automatically classify and provide tailored information that can help companies decide to make technology commercialization. Therefore, we propose a method to classify information based on Korea Standard Industry Classification (KSIC), which indicates the business characteristics of enterprises. The classification of information or documents has been largely based on machine learning, but there is not enough training data categorized on the basis of KSIC. Therefore, this study applied the method of calculating similarity between documents. Specifically, a method and a model for presenting the most appropriate KSIC code are proposed by collecting explanatory texts of each code of KSIC and calculating the similarity with the classification object document using the vector space model. The IPC data were collected and classified by KSIC. And then verified the methodology by comparing it with the KSIC-IPC concordance table provided by the Korean Intellectual Property Office. As a result of the verification, the highest agreement was obtained when the LT method, which is a kind of TF-IDF calculation formula, was applied. At this time, the degree of match of the first rank matching KSIC was 53% and the cumulative match of the fifth ranking was 76%. Through this, it can be confirmed that KSIC classification of technology, industry, and market information that SMEs need more quantitatively and objectively is possible. In addition, it is considered that the methods and results provided in this study can be used as a basic data to help the qualitative judgment of experts in creating a linkage table between heterogeneous classification systems.