Search | Korea Science

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
- Journal of Intelligence and Information Systems
- /
- v.26 no.1
- /
- pp.1-21
- /
- 2020
With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.
https://doi.org/10.13088/jiis.2020.26.1.001 인용 PDF KSCI

The Prediction of DEA based Efficiency Rating for Venture Business Using Multi-class SVM (다분류 SVM을 이용한 DEA기반 벤처기업 효율성등급 예측모형)

Park, Ji-Young;Hong, Tae-Ho
- Asia pacific journal of information systems
- /
- v.19 no.2
- /
- pp.139-155
- /
- 2009
For the last few decades, many studies have tried to explore and unveil venture companies' success factors and unique features in order to identify the sources of such companies' competitive advantages over their rivals. Such venture companies have shown tendency to give high returns for investors generally making the best use of information technology. For this reason, many venture companies are keen on attracting avid investors' attention. Investors generally make their investment decisions by carefully examining the evaluation criteria of the alternatives. To them, credit rating information provided by international rating agencies, such as Standard and Poor's, Moody's and Fitch is crucial source as to such pivotal concerns as companies stability, growth, and risk status. But these types of information are generated only for the companies issuing corporate bonds, not venture companies. Therefore, this study proposes a method for evaluating venture businesses by presenting our recent empirical results using financial data of Korean venture companies listed on KOSDAQ in Korea exchange. In addition, this paper used multi-class SVM for the prediction of DEA-based efficiency rating for venture businesses, which was derived from our proposed method. Our approach sheds light on ways to locate efficient companies generating high level of profits. Above all, in determining effective ways to evaluate a venture firm's efficiency, it is important to understand the major contributing factors of such efficiency. Therefore, this paper is constructed on the basis of following two ideas to classify which companies are more efficient venture companies: i) making DEA based multi-class rating for sample companies and ii) developing multi-class SVM-based efficiency prediction model for classifying all companies. First, the Data Envelopment Analysis(DEA) is a non-parametric multiple input-output efficiency technique that measures the relative efficiency of decision making units(DMUs) using a linear programming based model. It is non-parametric because it requires no assumption on the shape or parameters of the underlying production function. DEA has been already widely applied for evaluating the relative efficiency of DMUs. Recently, a number of DEA based studies have evaluated the efficiency of various types of companies, such as internet companies and venture companies. It has been also applied to corporate credit ratings. In this study we utilized DEA for sorting venture companies by efficiency based ratings. The Support Vector Machine(SVM), on the other hand, is a popular technique for solving data classification problems. In this paper, we employed SVM to classify the efficiency ratings in IT venture companies according to the results of DEA. The SVM method was first developed by Vapnik (1995). As one of many machine learning techniques, SVM is based on a statistical theory. Thus far, the method has shown good performances especially in generalizing capacity in classification tasks, resulting in numerous applications in many areas of business, SVM is basically the algorithm that finds the maximum margin hyperplane, which is the maximum separation between classes. According to this method, support vectors are the closest to the maximum margin hyperplane. If it is impossible to classify, we can use the kernel function. In the case of nonlinear class boundaries, we can transform the inputs into a high-dimensional feature space, This is the original input space and is mapped into a high-dimensional dot-product space. Many studies applied SVM to the prediction of bankruptcy, the forecast a financial time series, and the problem of estimating credit rating, In this study we employed SVM for developing data mining-based efficiency prediction model. We used the Gaussian radial function as a kernel function of SVM. In multi-class SVM, we adopted one-against-one approach between binary classification method and two all-together methods, proposed by Weston and Watkins(1999) and Crammer and Singer(2000), respectively. In this research, we used corporate information of 154 companies listed on KOSDAQ market in Korea exchange. We obtained companies' financial information of 2005 from the KIS(Korea Information Service, Inc.). Using this data, we made multi-class rating with DEA efficiency and built multi-class prediction model based data mining. Among three manners of multi-classification, the hit ratio of the Weston and Watkins method is the best in the test data set. In multi classification problems as efficiency ratings of venture business, it is very useful for investors to know the class with errors, one class difference, when it is difficult to find out the accurate class in the actual market. So we presented accuracy results within 1-class errors, and the Weston and Watkins method showed 85.7% accuracy in our test samples. We conclude that the DEA based multi-class approach in venture business generates more information than the binary classification problem, notwithstanding its efficiency level. We believe this model can help investors in decision making as it provides a reliably tool to evaluate venture companies in the financial domain. For the future research, we perceive the need to enhance such areas as the variable selection process, the parameter selection of kernel function, the generalization, and the sample size of multi-class.
PDF KSCI

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

Kim, Jieun;Kim, Namgyu;Cho, Yoonho
- Journal of Intelligence and Information Systems
- /
- v.20 no.2
- /
- pp.93-107
- /
- 2014
In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.
https://doi.org/10.13088/jiis.2014.20.2.093 인용 PDF KSCI

VKOSPI Forecasting and Option Trading Application Using SVM (SVM을 이용한 VKOSPI 일 중 변화 예측과 실제 옵션 매매에의 적용)

Ra, Yun Seon;Choi, Heung Sik;Kim, Sun Woong
- Journal of Intelligence and Information Systems
- /
- v.22 no.4
- /
- pp.177-192
- /
- 2016
Machine learning is a field of artificial intelligence. It refers to an area of computer science related to providing machines the ability to perform their own data analysis, decision making and forecasting. For example, one of the representative machine learning models is artificial neural network, which is a statistical learning algorithm inspired by the neural network structure of biology. In addition, there are other machine learning models such as decision tree model, naive bayes model and SVM(support vector machine) model. Among the machine learning models, we use SVM model in this study because it is mainly used for classification and regression analysis that fits well to our study. The core principle of SVM is to find a reasonable hyperplane that distinguishes different group in the data space. Given information about the data in any two groups, the SVM model judges to which group the new data belongs based on the hyperplane obtained from the given data set. Thus, the more the amount of meaningful data, the better the machine learning ability. In recent years, many financial experts have focused on machine learning, seeing the possibility of combining with machine learning and the financial field where vast amounts of financial data exist. Machine learning techniques have been proved to be powerful in describing the non-stationary and chaotic stock price dynamics. A lot of researches have been successfully conducted on forecasting of stock prices using machine learning algorithms. Recently, financial companies have begun to provide Robo-Advisor service, a compound word of Robot and Advisor, which can perform various financial tasks through advanced algorithms using rapidly changing huge amount of data. Robo-Adviser's main task is to advise the investors about the investor's personal investment propensity and to provide the service to manage the portfolio automatically. In this study, we propose a method of forecasting the Korean volatility index, VKOSPI, using the SVM model, which is one of the machine learning methods, and applying it to real option trading to increase the trading performance. VKOSPI is a measure of the future volatility of the KOSPI 200 index based on KOSPI 200 index option prices. VKOSPI is similar to the VIX index, which is based on S&P 500 option price in the United States. The Korea Exchange(KRX) calculates and announce the real-time VKOSPI index. VKOSPI is the same as the usual volatility and affects the option prices. The direction of VKOSPI and option prices show positive relation regardless of the option type (call and put options with various striking prices). If the volatility increases, all of the call and put option premium increases because the probability of the option's exercise possibility increases. The investor can know the rising value of the option price with respect to the volatility rising value in real time through Vega, a Black-Scholes's measurement index of an option's sensitivity to changes in the volatility. Therefore, accurate forecasting of VKOSPI movements is one of the important factors that can generate profit in option trading. In this study, we verified through real option data that the accurate forecast of VKOSPI is able to make a big profit in real option trading. To the best of our knowledge, there have been no studies on the idea of predicting the direction of VKOSPI based on machine learning and introducing the idea of applying it to actual option trading. In this study predicted daily VKOSPI changes through SVM model and then made intraday option strangle position, which gives profit as option prices reduce, only when VKOSPI is expected to decline during daytime. We analyzed the results and tested whether it is applicable to real option trading based on SVM's prediction. The results showed the prediction accuracy of VKOSPI was 57.83% on average, and the number of position entry times was 43.2 times, which is less than half of the benchmark (100 times). A small number of trading is an indicator of trading efficiency. In addition, the experiment proved that the trading performance was significantly higher than the benchmark.
https://doi.org/10.13088/jiis.2016.22.4.177 인용 PDF KSCI

Development of a Feasibility Evaluation Model for Apartment Remodeling with the Number of Households Increasing at the Preliminary Stage (노후공동주택 세대수증가형 리모델링 사업의 기획단계 사업성평가 모델 개발)

Koh, Won-kyung;Yoon, Jong-sik;Yu, Il-han;Shin, Dong-woo;Jung, Dae-woon
- Korean Journal of Construction Engineering and Management
- /
- v.20 no.4
- /
- pp.22-33
- /
- 2019
The government has steadily revised and developed laws and systems for activating remodeling of apartments in response to the problems of aged apartments. However, despite such efforts, remodeling has yet to be activated. For many reasons, this study noted that there were no tools for reasonable profitability judgements and decision making in the preliminary stages of the remodeling project. Thus, the feasibility evaluation model was developed. Generally, the profitability judgements are made after the conceptual design. However, decisions to drive remodeling projects are made at the preliminary stage. So a feasibility evaluation model is required at the preliminary stage. Accordingly, In this study, a feasibility evaluation model was developed for determining preliminary stage profitability. Construction costs, business expenses, financial expenses, and generally sales revenue were calculated using the initial available information and remodeling variables derived through the existing cases. Through this process, we developed an algorithm that can give an overview of the return on investment. In addition, the preliminary stage feasibility evaluation model developed was applied to three cases to verify the applicability of the model. Although applied in three cases, the difference between the model's forecast and actual case values is less than 5%, which is considered highly applicable. If cases are expanded in the future, it will be a useful tool that can be used in actual work. The feasibility evaluation model developed in this study will support decision making by union members, and if the model is applied in different regions, it will be expected to help local governments to understand the size of possible remodeling projects.
https://doi.org/10.6106/KJCEM.2019.20.4.022 인용 PDF KSCI HTML

MDP(Markov Decision Process) Model for Prediction of Survivor Behavior based on Topographic Information (지형정보 기반 조난자 행동예측을 위한 마코프 의사결정과정 모형)

Jinho Son;Suhwan Kim
- Journal of Intelligence and Information Systems
- /
- v.29 no.2
- /
- pp.101-114
- /
- 2023
In the wartime, aircraft carrying out a mission to strike the enemy deep in the depth are exposed to the risk of being shoot down. As a key combat force in mordern warfare, it takes a lot of time, effot and national budget to train military flight personnel who operate high-tech weapon systems. Therefore, this study studied the path problem of predicting the route of emergency escape from enemy territory to the target point to avoid obstacles, and through this, the possibility of safe recovery of emergency escape military flight personnel was increased. based problem, transforming the problem into a TSP, VRP, and Dijkstra algorithm, and approaching it with an optimization technique. However, if this problem is approached in a network problem, it is difficult to reflect the dynamic factors and uncertainties of the battlefield environment that military flight personnel in distress will face. So, MDP suitable for modeling dynamic environments was applied and studied. In addition, GIS was used to obtain topographic information data, and in the process of designing the reward structure of MDP, topographic information was reflected in more detail so that the model could be more realistic than previous studies. In this study, value iteration algorithms and deterministic methods were used to derive a path that allows the military flight personnel in distress to move to the shortest distance while making the most of the topographical advantages. In addition, it was intended to add the reality of the model by adding actual topographic information and obstacles that the military flight personnel in distress can meet in the process of escape and escape. Through this, it was possible to predict through which route the military flight personnel would escape and escape in the actual situation. The model presented in this study can be applied to various operational situations through redesign of the reward structure. In actual situations, decision support based on scientific techniques that reflect various factors in predicting the escape route of the military flight personnel in distress and conducting combat search and rescue operations will be possible.
https://doi.org/10.13088/jiis.2023.29.2.101 인용 PDF

Nonlinear Modeling and Application of PI Control on Pre-cooling Session of a Carbon Dioxide Storage Tank at Normal Temperature and Pressure (상온 상압의 이산화탄소 저장용 탱크를 위한 예냉과정의 비선형 모델링 및 비례-적분 제어 적용)

Lim, Yu Kyung;Lee, Seok Goo;Dan, Seungkyu;Ko, Min Su;Lee, Jong Min
- Korean Chemical Engineering Research
- /
- v.52 no.5
- /
- pp.574-580
- /
- 2014
Storage tanks of Carbon dioxide ($CO_2$) carriers utilized for the purpose of carbon capture and storage (CCS) into subsea strata have to undergo a pre-cooling session before beginning to load cryogenic liquid cargos in order to prevent physical and thermal deterioration of tanks which may result from cryogenic $CO_2$ contacting tank walls directly. In this study we propose dynamic model to calculate the tank inflow of $CO_2$ gas injected for precooling process and its dynamic simulation results under proportional-integral control algorithm. We selected two cases in which each of them had one controlled variable (CV) as either the tank pressure or the tank temperature and discussed the results of that decision-making on the pre-cooling process. As a result we demonstrated that the controlling instability arising from nonlinearity and singularity of the mathematical model could be avoided by choosing tank pressure as CV instead of tank temperature.
https://doi.org/10.9713/kcer.2014.52.5.574 인용 PDF KSCI

Construction of Artificial Intelligence Training Platform for Multi-Center Clinical Research (다기관 임상연구를 위한 인공지능 학습 플랫폼 구축)

Lee, Chung-Sub;Kim, Ji-Eon;No, Si-Hyeong;Kim, Tae-Hoon;Yoon, Kwon-Ha;Jeong, Chang-Won
- KIPS Transactions on Computer and Communication Systems
- /
- v.9 no.10
- /
- pp.239-246
- /
- 2020
In the medical field where artificial intelligence technology is introduced, research related to clinical decision support system(CDSS) in relation to diagnosis and prediction is actively being conducted. In particular, medical imaging-based disease diagnosis area applied AI technologies at various products. However, medical imaging data consists of inconsistent data, and it is a reality that it takes considerable time to prepare and use it for research. This paper describes a one-stop AI learning platform for converting to medical image standard R_CDM(Radiology Common Data Model) and supporting AI algorithm development research based on the dataset. To this, the focus is on linking with the existing CDM(common data model) and model the system, including the schema of the medical imaging standard model and report information for multi-center research based on DICOM(Digital Imaging and Communications in Medicine) tag information. And also, we show the execution results based on generated datasets through the AI learning platform. As a proposed platform, it is expected to be used for various image-based artificial intelligence researches.
https://doi.org/10.3745/KTCCS.2020.9.10.239 인용 PDF KSCI

Analysis of Galvanic Skin Response Signal for High-Arousal Negative Emotion Using Discrete Wavelet Transform (이산 웨이브렛 변환을 이용한 고각성 부정 감성의 GSR 신호 분석)

Lim, Hyun-Jun;Yoo, Sun-Kook;Jang, Won Seuk
- Science of Emotion and Sensibility
- /
- v.20 no.3
- /
- pp.13-22
- /
- 2017
Emotion has a direct influence such as decision-making, perception, etc. and plays an important role in human life. For the convenient and accurate recognition of high-arousal negative emotion, the purpose of this paper is to design an algorithm for analysis using the bio-signal. In this study, after two emotional induction using the 'normal' / 'fear' emotion types of videos, we measured the Galvanic Skin Response (GSR) signal which is the simple of bio-signals. Then, by decomposing Tonic component and Phasic component in the measured GSR and decomposing Skin Conductance Very Slow Response (SCVSR) and Skin Conductance Slow Response (SCSR) in the Phasic component associated with emotional stimulation, extracting the major features of the components for an accurate analysis, we used a discrete wavelet transform with excellent time-frequency localization characteristics, not the method used previously. The extracted features are maximum value of Phasic component, amplitude of Phasic component, zero crossing rate of SCVSR and zero crossing rate of SCSR for distinguishing high-arousal negative emotion. As results, the case of high-arousal negative emotion exhibited higher value than the case of low-arousal normal emotion in all 4 of the features, and the more significant difference between the two emotion was found statistically than the previous analysis method. Accordingly, the results of this study indicate that the GSR may be a useful indicator for a high-arousal negative emotion measurement and contribute to the development of the emotional real-time rating system using the GSR.
https://doi.org/10.14695/KJSOS.2017.20.3.13 인용 PDF KSCI

A Change of Peak Outflows due to Decision of Flow Path in Storm Sewer Network (우수관망 노선 결정에 따른 첨두유출량 변화 분석)

Lee, Jung-Ho
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.11 no.12
- /
- pp.5151-5156
- /
- 2010
In the previous researches for storm sewer design, the flow paths in overall network were determined to minimize the construction cost and then, it was not considered the superposition effect of runoff hydrographs in the sewer pipes. However, in this research, the flow paths are determined considering the superposition effect to reduce the inundation risk by controlling and distributing the flows in the sewer pipes. This is accomplished by distributing the inflows that enter into each junction by changing the flow path in which pipes are connected between junctions. In this paper, the superposition effect and peak outflows at outlet were analyzed considering the changes of the flow paths in the sewer network. Then, the flow paths are determined using genetic algorithm and the objective function is to minimize the peak outflow at outlet. As the applied result for the sample sewer network, the difference between maximum and minimum peak outflows which are caused by the change of flow path was about 5.6% for the design rainfall event of 10 years frequency with 30 min. duration. Also, the typhoon 'Rusa' which occurred at 2002 was applied to verify the reduction of inundation risk for the excessive rainfall, and then, the amount of overflows was reduced to about 31%.
https://doi.org/10.5762/KAIS.2010.11.12.5151 인용 PDF KSCI

Search Result 2,359, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)