Search | Korea Science

A Hybrid Oversampling Technique for Imbalanced Structured Data based on SMOTE and Adapted CycleGAN (불균형 정형 데이터를 위한 SMOTE와 변형 CycleGAN 기반 하이브리드 오버샘플링 기법)

Jung-Dam Noh;Byounggu Choi
- Information Systems Review
- /
- v.24 no.4
- /
- pp.97-118
- /
- 2022
As generative adversarial network (GAN) based oversampling techniques have achieved impressive results in class imbalance of unstructured dataset such as image, many studies have begun to apply it to solving the problem of imbalance in structured dataset. However, these studies have failed to reflect the characteristics of structured data due to changing the data structure into an unstructured data format. In order to overcome the limitation, this study adapted CycleGAN to reflect the characteristics of structured data, and proposed hybridization of synthetic minority oversampling technique (SMOTE) and the adapted CycleGAN. In particular, this study tried to overcome the limitations of existing studies by using a one-dimensional convolutional neural network unlike previous studies that used two-dimensional convolutional neural network. Oversampling based on the method proposed have been experimented using various datasets and compared the performance of the method with existing oversampling methods such as SMOTE and adaptive synthetic sampling (ADASYN). The results indicated the proposed hybrid oversampling method showed superior performance compared to the existing methods when data have more dimensions or higher degree of imbalance. This study implied that the classification performance of oversampling structured data can be improved using the proposed hybrid oversampling method that considers the characteristic of structured data.
https://doi.org/10.14329/isr.2022.24.4.097 인용 PDF

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

Kim, Jieun;Kim, Namgyu;Cho, Yoonho
- Journal of Intelligence and Information Systems
- /
- v.20 no.2
- /
- pp.93-107
- /
- 2014
In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.
https://doi.org/10.13088/jiis.2014.20.2.093 인용 PDF KSCI

A Study on the Shelf-Life Prediction of the Domestic Single Base Propellants Ammunition : Based on 105mm High Explosive Propellants (국내 단기추진제 탄약의 저장수명 예측에 관한 연구 : 105미리 고폭탄 추진체를 중심으로)

Choi, Myoungjin;Park, Hyungju;Yang, Jaekyung;Baek, Janghyun
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.37 no.3
- /
- pp.36-42
- /
- 2014
Domestic 105mm HE (High Explosive) shell is composed of three parts that are Fuze, Projectile and Propellants. Among three parts, propelling charge of propellants part consists of single base propellants. It has been known that the lifespan of single base propellants is affected by a storage period. These are because Nitrocellulose (NC) which is the main component of propelling gunpowder can be naturally decomposed to unstable substances similar with other nitric acid ester. Even though it cannot be prevented fundamentally from being disassembled, a decomposition product ($NO_2$, $NO_3$, and $HNO_3$) and tranquillizer DPA (Diphenylamine) having high reactivity are added into a propellant to restrain induction of automatic catalysis by a decomposition product. The decay rate of the tranquillizer is also affected by a production rate of the decomposition product of NC. Therefore, an accurate prediction of the Self-Life is required to ensure against risks such as explosion. Hereupon, this paper presents a new methodology to estimate the shelf-life of single base propellants using data of ASRP (Ammunition Stockpile Reliability Program) to domestic 105mm HE (propelling charge of propellants part). We selected four attributes that are inferred to have influence on distribution of the DPA amount in a propellant from the ASRP dataset through data mining processes. Then the selected attributes were used as independent variables in a regression analysis in order to estimate the shelf-life of single base propellants.
https://doi.org/10.11627/jkise.2014.37.3.36 인용 PDF KSCI

Analysis on the National R&D Trends Related to Agro-Healing Using NTIS R&D DATA in Korea (NTIS 국가연구개발사업 정보를 활용한 치유농업 국가 R&D 동향 분석)

Jung, Yeo-Joo;Kim, Jeong-Eun;Ryu, Jin-Seok;Yang, Myung-Seok;Kim, Dae-Sik
- Journal of Korean Society of Rural Planning
- /
- v.27 no.3
- /
- pp.85-92
- /
- 2021
As the paradigm of green has been expended as the core of sustainable development in Korea, agro-healing projects increasingly have been a priority at the national policy and investment area. But little is known about the current overview of national research and development(R&D) related to agro-healing. The aim of this study was generally to investigate the research trends of national R&D related to agro-healing over the past five years. Dataset were gathered from provided by National Science & Technology Information Service(NTIS), word cloud techniques were applied. The main results showed that amounts of number and funding related to agro-healing projects have been increasing. In particular, the Rural Development Administration had the highest number of research, and it was found that the Ministry of Trade, Industry and Energy have spended a lot of money on agro-healing. As a results, it is necessary to expand the scope of the field of agro-healing projects, especially at the multisectoral and intersectoral level for improving health, well-being and a sustainable future.
https://doi.org/10.7851/ksrp.2021.27.3.085 인용 PDF KSCI

Multi-Label Classification Approach to Effective Aspect-Mining (효과적인 애스팩트 마이닝을 위한 다중 레이블 분류접근법)

Jong Yoon Won;Kun Chang Lee
- Information Systems Review
- /
- v.22 no.3
- /
- pp.81-97
- /
- 2020
Recent trends in sentiment analysis have been focused on applying single label classification approaches. However, when considering the fact that a review comment by one person is usually composed of several topics or aspects, it would be better to classify sentiments for those aspects respectively. This paper has two purposes. First, based on the fact that there are various aspects in one sentence, aspect mining is performed to classify the emotions by each aspect. Second, we apply the multiple label classification method to analyze two or more dependent variables (output values) at once. To prove our proposed approach's validity, online review comments about musical performances were garnered from domestic online platform, and the multi-label classification approach was applied to the dataset. Results were promising, and potentials of our proposed approach were discussed.
https://doi.org/10.14329/isr.2020.22.3.081 인용 PDF

The Role of stock market management and social media - Analyzing the types of individual investor and topic - (주식시장관리제도와 소셜 미디어의 역할 - 개인 투자자 집단 유형과 토픽 분석 -)

Kim, Jung-Su;Lee, Suk-Jun
- Management & Information Systems Review
- /
- v.34 no.5
- /
- pp.23-47
- /
- 2015
In the Korea stock market, individual investors have perceived stock as short arbitrage investment, not long-term investment strategy. In order to reinforce stock market transparency and soundness, it is important to enforce the measures for stock market management. Especially, stock market event caused by financial policy can be given individual investors negative information regarding a stock trading. Thus, it is a need for investigating whether comprehensive review of listing eligibility is influenced on individual investors' responses and stock behaviors in respect of effectiveness. The purpose of this study to examine the relations between such stock market management and transitional aspect of individual investors' trading types and response on the based of pre- and post-event occurrence. Using an dataset of user's text messages on 9 firms posted on the firm-based social media (i.e., Naver, Daum, Paxnet) over the period 2009 to 2014. And we performed text-clustering and topic modeling according to keywords for classifying into investors group and non-investors groups and two types of investors were categorized depending on main topic transition by event windows in Comprehensive review of listing eligibility. The results indicated that a variety of stockholders existed in the stock. And the ratio of non-investors group was on the decrease, on the other hand, the proportion of investors group veer onto the side of pre-pattern after comprehensive review of listing eligibility. A distinctive feature of our study is to explain the influence of stock market management on response changes of individual investors as well as to categorize in accordance with time progression. Implications an suggestions for future research were also discussed.
PDF

An Empirical Analysis of In-app Purchase Behavior in Mobile Games (모바일 게임 인앱구매에 영향을 주는 요인에 관한 연구)

Moonkyoung Jang;Changkeun Kim;Byungjoon Yoo
- Information Systems Review
- /
- v.22 no.2
- /
- pp.43-52
- /
- 2020
The mobile game industry has become the one of the fastest growing industries with its astonishing market size. Despite its industrial importance, a few studies empirically considered actual purchasing behavior in mobile games rather than the intention to purchase. Therefore, this paper investigates the key drivers of in-app purchase by analyzing the game-log dataset provided from a mobile game company in Korea. Specifically, the effects of goal-directed, habitual and social-interacted playing behavior are analyzed on in-app purchase. Furthermore, the recursive relationship with playing and purchasing behaviorsis also considered. The result shows that all suggested factors have positive impacts on in-app purchase in the current period. In addition, the effect of previous habitual playing has a positive impact, but the effect of social-interacted playing and in-app purchase in the previous period have negative impacts on in-app purchase of the current period. These findings can improve our understanding of the impact of game playing on in-app purchase in mobile games, and provide meaningful insights for researchers and practitioners.
https://doi.org/10.14329/isr.2020.22.2.043 인용 PDF

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

Chun, Se-Hak
- Journal of Intelligence and Information Systems
- /
- v.25 no.3
- /
- pp.239-251
- /
- 2019
Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.
https://doi.org/10.13088/jiis.2019.25.3.239 인용 PDF KSCI

Empirical Analysis of DEA models Validity for R&D Project Performance Evaluation : Focusing on Rank Correlation with Normalization Index (R&D 프로젝트 성과평가를 위한 DEA모형의 타당성 실증분석 : 정규화지표와의 순위상관을 중심으로)

Park, Sung-Min
- IE interfaces
- /
- v.24 no.4
- /
- pp.314-322
- /
- 2011
This study analyzes a relationship between Data Envelopment Analysis(DEA) efficiency scores and a normalization index in order to examine the validity of DEA models. A normalization index concerned in this study is 'sales per R&D project fund' which is regarded as a crucial R&D project performance evaluation index in practice. For this correlation analysis, three distinct DEA models are selected such as DEA basic model, DEA/AR-I revised model(i.e. DEA basic model with Acceptance Region Type I constraints) and Super-Efficiency(SE) model. Especially, SE model is adopted where efficient R&D projects(i.e. Decision Making Units, DMU's) with DEA efficiency score of unity from DEA basic model can be further differentiated in ranks. Considering the non-normality and outliers, two rank correlation coefficients such as Spearman's ${\rho}_s$ and Kendall's ${\tau}_B$ are investigated in addition to Pearson's ${\gamma}$. With an up-to-date empirical massive dataset of n = 482 R&D projects associated with R&D Loan Program of Korea Information Communication Promotion Fund in the year of 2011, statistically significant (+) correlations are verified between the normalization index and every model's DEA efficiency scores with all three correlation coefficients. Especially, the congruence verified in this empirical analysis can be a useful reference for enhancing the practitioner's acceptability onto DEA efficiency scores as a real-world R&D project performance evaluation index.
https://doi.org/10.7232/IEIF.2011.24.4.314 인용 PDF KSCI

A Data Based Methodology for Estimating the Unconditional Model of the Latent Growth Modeling (잠재성장모형의 무조건적 모델 추정을 위한 데이터 기반 방법론)

Cho, Yeong Bin
- Journal of Digital Convergence
- /
- v.16 no.6
- /
- pp.85-93
- /
- 2018
The Latent Growth Modeling(LGM) is known as the arising analysis method of longitudinal data and it could be classified into unconditional model and conditional model. Unconditional model requires estimated value of intercept and slope to complete a model of fitness. However, the existing LGM is in absence of a structured methodology to estimate slope when longitudinal data is neither simple linear function nor the pre-defined function. This study used Sequential Pattern of Association Rule Mining to calculate slope of unconditional model. The applied dataset is 'the Youth Panel 2001-2006' from Korea Employment Information Service. The proposed methodology was able to identify increasing fitness of the model comparing to the existing simple linear function and visualizing process of slope estimation.
https://doi.org/10.14400/JDC.2018.16.6.085 인용 PDF KSCI

Search Result 82, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)