• Title/Summary/Keyword: High Accuracy

Search Result 8,887, Processing Time 0.04 seconds

A Study on the Medical Application and Personal Information Protection of Generative AI (생성형 AI의 의료적 활용과 개인정보보호)

  • Lee, Sookyoung
    • The Korean Society of Law and Medicine
    • /
    • v.24 no.4
    • /
    • pp.67-101
    • /
    • 2023
  • The utilization of generative AI in the medical field is also being rapidly researched. Access to vast data sets reduces the time and energy spent in selecting information. However, as the effort put into content creation decreases, there is a greater likelihood of associated issues arising. For example, with generative AI, users must discern the accuracy of results themselves, as these AIs learn from data within a set period and generate outcomes. While the answers may appear plausible, their sources are often unclear, making it challenging to determine their veracity. Additionally, the possibility of presenting results from a biased or distorted perspective cannot be discounted at present on ethical grounds. Despite these concerns, the field of generative AI is continually advancing, with an increasing number of users leveraging it in various sectors, including biomedical and life sciences. This raises important legal considerations regarding who bears responsibility and to what extent for any damages caused by these high-performance AI algorithms. A general overview of issues with generative AI includes those discussed above, but another perspective arises from its fundamental nature as a large-scale language model ('LLM') AI. There is a civil law concern regarding "the memorization of training data within artificial neural networks and its subsequent reproduction". Medical data, by nature, often reflects personal characteristics of patients, potentially leading to issues such as the regeneration of personal information. The extensive application of generative AI in scenarios beyond traditional AI brings forth the possibility of legal challenges that cannot be ignored. Upon examining the technical characteristics of generative AI and focusing on legal issues, especially concerning the protection of personal information, it's evident that current laws regarding personal information protection, particularly in the context of health and medical data utilization, are inadequate. These laws provide processes for anonymizing and de-identification, specific personal information but fall short when generative AI is applied as software in medical devices. To address the functionalities of generative AI in clinical software, a reevaluation and adjustment of existing laws for the protection of personal information are imperative.

The Usefulness of 18F-FDG PET to Differentiate Subtypes of Dementia: The Systematic Review and Meta-Analysis

  • Seunghee Na;Dong Woo Kang;Geon Ha Kim;Ko Woon Kim;Yeshin Kim;Hee-Jin Kim;Kee Hyung Park;Young Ho Park;Gihwan Byeon;Jeewon Suh;Joon Hyun Shin;YongSoo Shim;YoungSoon Yang;Yoo Hyun Um;Seong-il Oh;Sheng-Min Wang;Bora Yoon;Hai-Jeon Yoon;Sun Min Lee;Juyoun Lee;Jin San Lee;Hak Young Rhee;Jae-Sung Lim;Young Hee Jung;Juhee Chin;Yun Jeong Hong;Hyemin Jang;Hongyoon Choi;Miyoung Choi;Jae-Won Jang;Korean Dementia Association
    • Dementia and Neurocognitive Disorders
    • /
    • v.23 no.1
    • /
    • pp.54-66
    • /
    • 2024
  • Background and Purpose: Dementia subtypes, including Alzheimer's dementia (AD), dementia with Lewy bodies (DLB), and frontotemporal dementia (FTD), pose diagnostic challenges. This review examines the effectiveness of 18F-Fluorodeoxyglucose Positron Emission Tomography (18F-FDG PET) in differentiating these subtypes for precise treatment and management. Methods: A systematic review following Preferred Reporting Items for Systematic reviews and Meta-Analyses guidelines was conducted using databases like PubMed and Embase to identify studies on the diagnostic utility of 18F-FDG PET in dementia. The search included studies up to November 16, 2022, focusing on peer-reviewed journals and applying the goldstandard clinical diagnosis for dementia subtypes. Results: From 12,815 articles, 14 were selected for final analysis. For AD versus FTD, the sensitivity was 0.96 (95% confidence interval [CI], 0.88-0.98) and specificity was 0.84 (95% CI, 0.70-0.92). In the case of AD versus DLB, 18F-FDG PET showed a sensitivity of 0.93 (95% CI 0.88-0.98) and specificity of 0.92 (95% CI, 0.70-0.92). Lastly, when differentiating AD from non-AD dementias, the sensitivity was 0.86 (95% CI, 0.80-0.91) and the specificity was 0.88 (95% CI, 0.80-0.91). The studies mostly used case-control designs with visual and quantitative assessments. Conclusions: 18F-FDG PET exhibits high sensitivity and specificity in differentiating dementia subtypes, particularly AD, FTD, and DLB. This method, while not a standalone diagnostic tool, significantly enhances diagnostic accuracy in uncertain cases, complementing clinical assessments and structural imaging.

Model Evaluation for Predicting the Full Bloom Date of Apples Based on Air Temperature Variations in South Korea's Major Production Regions (기온 변화에 따른 우리나라 사과 주산지 만개일 예측을 위한 모델 평가)

  • Jae Hoon Jeong;Jeom Hwa Han;Jung Gun Cho;Dong Yong Lee;Seul Ki Lee;Si Hyeong Jang;Suhyun Ryu
    • Journal of Bio-Environment Control
    • /
    • v.32 no.4
    • /
    • pp.501-512
    • /
    • 2023
  • This study aimed to assess and determine the optimal model for predicting the full bloom date of 'Fuji' apples across South Korea. We evaluated the performance of four distinct models: the Development Rate Model (DVR)1, DVR2, the Chill Days (CD) model, and a sequentially integrated approach that combined the Dynamic model (DM) and the Growing Degree Hours (GDH) model. The full bloom dates and air temperatures were collected over a three-year period from six orchards located in the major apple production regions of South Korea: Pocheon, Hwaseong, Geochang, Cheongsong, Gunwi, and Chungju. Among these models, the one that combined DM for calculating chilling accumulation and the GDH model for estimating heat accumulation in sequence demonstrated the most accurate predictive performance, in contrast to the CD model that exhibited the lowest predictive precision. Furthermore, the DVR1 model exhibited an underestimation error at orchard located in Hwaseong. It projected a faster progression of the full bloom dates than the actual observations. This area is characterized by minimal diurnal temperature ranges, where the daily minimum temperature is high and the daily maximum temperature is relatively low. Therefore, to achieve a comprehensive prediction of the blooming date of 'Fuji' apples across South Korea, it is recommended to integrate a DM model for calculating the necessary chilling accumulation to break dormancy with a GDH model for estimating the requisite heat accumulation for flowering after dormancy release. This results in a combined DM+GDH model recognized as the most effective approach. However, further data collection and evaluation from different regions are needed to further refine its accuracy and applicability.

A Study of Equipment Accuracy and Test Precision in Dual Energy X-ray Absorptiometry (골밀도검사의 올바른 질 관리에 따른 임상적용과 해석 -이중 에너지 방사선 흡수법을 중심으로-)

  • Dong, Kyung-Rae;Kim, Ho-Sung;Jung, Woon-Kwan
    • Journal of radiological science and technology
    • /
    • v.31 no.1
    • /
    • pp.17-23
    • /
    • 2008
  • Purpose : Because there is a difference depending on the environment as for an inspection equipment the important part of bone density scan and the precision/accuracy of a tester, the management of quality must be made systematically. The equipment failure caused by overload effect due to the aged equipment and the increase of a patient was made frequently. Thus, the replacement of equipment and additional purchases of new bonedensity equipment caused a compatibility problem in tracking patients. This study wants to know whether the clinical changes of patient's bonedensity can be accurately and precisely reflected when used it compatiblly like the existing equipment after equipment replacement and expansion. Materials and methods : Two equipments of GE Lunar Prodigy Advance(P1 and P2) and the Phantom HOLOGIC Spine Road(HSP) were used to measure equipment precision. Each device scans 20 times so that precision data was acquired from the phantom(Group 1). The precision of a tester was measured by shooting twice the same patient, every 15 members from each of the target equipment in 120 women(average age 48.78, 20-60 years old)(Group 2). In addition, the measurement of the precision of a tester and the cross-calibration data were made by scanning 20 times in each of the equipment using HSP, based on the data obtained from the management of quality using phantom(ASP) every morning (Group 3). The same patient was shot only once in one equipment alternately to make the measurement of the precision of a tester and the cross-calibration data in 120 women(average age 48.78, 20-60 years old)(Group 4). Results : It is steady equipment according to daily Q.C Data with $0.996\;g/cm^2$, change value(%CV) 0.08. The mean${\pm}$SD and a %CV price are ALP in Group 1(P1 : $1.064{\pm}0.002\;g/cm^2$, $%CV=0.190\;g/cm^2$, P2 : $1.061{\pm}0.003\;g/cm^2$, %CV=0.192). The mean${\pm}$SD and a %CV price are P1 : $1.187{\pm}0.002\;g/cm^2$, $%CV=0.164\;g/cm^2$, P2 : $1.198{\pm}0.002\;g/cm^2$, %CV=0.163 in Group 2. The average error${\pm}$2SD and %CV are P1 - (spine: $0.001{\pm}0.03\;g/cm^2$, %CV=0.94, Femur: $0.001{\pm}0.019\;g/cm^2$, %CV=0.96), P2 - (spine: $0.002{\pm}0.018\;g/cm^2$, %CV=0.55, Femur: $0.001{\pm}0.013\;g/cm^2$, %CV=0.48) in Group 3. The average error${\pm}2SD$, %CV, and r value was spine : $0.006{\pm}0.024\;g/cm^2$, %CV=0.86, r=0.995, Femur: $0{\pm}0.014\;g/cm^2$, %CV=0.54, r=0.998 in Group 4. Conclusion: Both LUNAR ASP CV% and HOLOGIC Spine Phantom are included in the normal range of error of ${\pm}2%$ defined in ISCD. BMD measurement keeps a relatively constant value, so showing excellent repeatability. The Phantom has homogeneous characteristics, but it has limitations to reflect the clinical part including variations in patient's body weight or body fat. As a result, it is believed that quality control using Phantom will be useful to check mis-calibration of the equipment used. A value measured a patient two times with one equipment, and that of double-crossed two equipment are all included within 2SD Value in the Bland - Altman Graph compared results of Group 3 with Group 4. The r value of 0.99 or higher in Linear regression analysis(Regression Analysis) indicated high precision and correlation. Therefore, it revealed that two compatible equipment did not affect in tracking the patients. Regular testing equipment and capabilities of a tester, then appropriate calibration will have to be achieved in order to calculate confidential BMD.

  • PDF

The Study in Objectification of the diagnosis of Sasang Constitution(According to Analysis of the Past Questionnaires) (사상체질진단(四象體質診斷)의 객관화(客觀化)에 관한 연구(硏究)(기존(旣存) 설문지(說問紙)의 분석(分析)을 중심(中心)으로))

  • Kim, Young-woo;Kim, Jong-won
    • Journal of Sasang Constitutional Medicine
    • /
    • v.11 no.2
    • /
    • pp.151-183
    • /
    • 1999
  • The object of this study was 200 patients who had been treated in the Oriental Medical Hospital at Dong Eui Medical Center during 9 months from Jan. 1999 to sept. 1999. We proceeded the judgment of Sasang Constitution according to 'Questionnaire of Sasang Constitution Classification (I)', and 'Questionnaire of Sasang Constitution Classification II(QSCCII)' and the diagnosis by a medical specialist. The following conclusion were made in comparison with Sasang Constitution and Questionnaire. 1. We selected the 84 subjects what had the statistical value out of the 196 subjects('Questionnaire of Sasang Constitution Classification (I)' had the 71 subjects and 'Questionnaire of Sasang Constitution Classification II(QSCCII)', had the 121 subjects). And we selected again the 73 subjects('Questionnaire of Sasang Constitution Classification (I)', had the 33 subjects and 'Questionnaire of Sasang Constitution Classification II (QSCC II)' had the 40 subjects) out of the 84 subjects, because it had a repeated subjects. 2. We made the Questionnaire what has the 85 subjects, including the subjects what was approved its statistical value by 'A CLINICAL STUDY OF THE JUDGMENT OF SASANG CONSTITUTION ACCORDING TO QUESTIONNAIRE' and 'A CLINICAL STUDY OF THE TYPE OF DISEASE AND SYMPTOM ACCORDING TO SASANG CONSTITUTION CLASSIFICATION'. The subject what ask the physique and the body form was 7, the subject what ask the external appearance and the posture was 7, the subject what ask the habit and the character was 3, the subject what ask the physiology and the pathology was 3, the subject what ask the phenomenon that he has frequency was 4, the subject what ask the eating was 3, the subject what ask the symptom that he has frequency was 14, the subject what ask the work and the qualities-defects was 6, the subject what ask the friendly intercourse was 7, the subject what ask the usual mind was 5, the subject what ask the emotional inclination was I, the subject what ask the behavioral inclination was 10, the subject what ask the character was 15. 3. In the new Questionnaire, the subject what has relevance to Soyang was 84, the subject what has relevance to Soeum was 87, the subject what has relevance to Taeeum was 70. And we made the point of subject with the statistical ratio. The total point of Soyang was 7785.04, the total point of Soeum was 7742.80, the total point of Taeeum was 7746.60. 4. As a result of judgment of Sasang Constitution between the clinical diagnosis by a medical specialist and the new Questionnaire, the diagnostic accuracy of new Questionnaire was 73.33%. The diagnostic accuracy of Soyang was low, the others was high. And the Taeyang was excepted.

  • PDF

A Store Recommendation Procedure in Ubiquitous Market for User Privacy (U-마켓에서의 사용자 정보보호를 위한 매장 추천방법)

  • Kim, Jae-Kyeong;Chae, Kyung-Hee;Gu, Ja-Chul
    • Asia pacific journal of information systems
    • /
    • v.18 no.3
    • /
    • pp.123-145
    • /
    • 2008
  • Recently, as the information communication technology develops, the discussion regarding the ubiquitous environment is occurring in diverse perspectives. Ubiquitous environment is an environment that could transfer data through networks regardless of the physical space, virtual space, time or location. In order to realize the ubiquitous environment, the Pervasive Sensing technology that enables the recognition of users' data without the border between physical and virtual space is required. In addition, the latest and diversified technologies such as Context-Awareness technology are necessary to construct the context around the user by sharing the data accessed through the Pervasive Sensing technology and linkage technology that is to prevent information loss through the wired, wireless networking and database. Especially, Pervasive Sensing technology is taken as an essential technology that enables user oriented services by recognizing the needs of the users even before the users inquire. There are lots of characteristics of ubiquitous environment through the technologies mentioned above such as ubiquity, abundance of data, mutuality, high information density, individualization and customization. Among them, information density directs the accessible amount and quality of the information and it is stored in bulk with ensured quality through Pervasive Sensing technology. Using this, in the companies, the personalized contents(or information) providing became possible for a target customer. Most of all, there are an increasing number of researches with respect to recommender systems that provide what customers need even when the customers do not explicitly ask something for their needs. Recommender systems are well renowned for its affirmative effect that enlarges the selling opportunities and reduces the searching cost of customers since it finds and provides information according to the customers' traits and preference in advance, in a commerce environment. Recommender systems have proved its usability through several methodologies and experiments conducted upon many different fields from the mid-1990s. Most of the researches related with the recommender systems until now take the products or information of internet or mobile context as its object, but there is not enough research concerned with recommending adequate store to customers in a ubiquitous environment. It is possible to track customers' behaviors in a ubiquitous environment, the same way it is implemented in an online market space even when customers are purchasing in an offline marketplace. Unlike existing internet space, in ubiquitous environment, the interest toward the stores is increasing that provides information according to the traffic line of the customers. In other words, the same product can be purchased in several different stores and the preferred store can be different from the customers by personal preference such as traffic line between stores, location, atmosphere, quality, and price. Krulwich(1997) has developed Lifestyle Finder which recommends a product and a store by using the demographical information and purchasing information generated in the internet commerce. Also, Fano(1998) has created a Shopper's Eye which is an information proving system. The information regarding the closest store from the customers' present location is shown when the customer has sent a to-buy list, Sadeh(2003) developed MyCampus that recommends appropriate information and a store in accordance with the schedule saved in a customers' mobile. Moreover, Keegan and O'Hare(2004) came up with EasiShop that provides the suitable tore information including price, after service, and accessibility after analyzing the to-buy list and the current location of customers. However, Krulwich(1997) does not indicate the characteristics of physical space based on the online commerce context and Keegan and O'Hare(2004) only provides information about store related to a product, while Fano(1998) does not fully consider the relationship between the preference toward the stores and the store itself. The most recent research by Sedah(2003), experimented on campus by suggesting recommender systems that reflect situation and preference information besides the characteristics of the physical space. Yet, there is a potential problem since the researches are based on location and preference information of customers which is connected to the invasion of privacy. The primary beginning point of controversy is an invasion of privacy and individual information in a ubiquitous environment according to researches conducted by Al-Muhtadi(2002), Beresford and Stajano(2003), and Ren(2006). Additionally, individuals want to be left anonymous to protect their own personal information, mentioned in Srivastava(2000). Therefore, in this paper, we suggest a methodology to recommend stores in U-market on the basis of ubiquitous environment not using personal information in order to protect individual information and privacy. The main idea behind our suggested methodology is based on Feature Matrices model (FM model, Shahabi and Banaei-Kashani, 2003) that uses clusters of customers' similar transaction data, which is similar to the Collaborative Filtering. However unlike Collaborative Filtering, this methodology overcomes the problems of personal information and privacy since it is not aware of the customer, exactly who they are, The methodology is compared with single trait model(vector model) such as visitor logs, while looking at the actual improvements of the recommendation when the context information is used. It is not easy to find real U-market data, so we experimented with factual data from a real department store with context information. The recommendation procedure of U-market proposed in this paper is divided into four major phases. First phase is collecting and preprocessing data for analysis of shopping patterns of customers. The traits of shopping patterns are expressed as feature matrices of N dimension. On second phase, the similar shopping patterns are grouped into clusters and the representative pattern of each cluster is derived. The distance between shopping patterns is calculated by Projected Pure Euclidean Distance (Shahabi and Banaei-Kashani, 2003). Third phase finds a representative pattern that is similar to a target customer, and at the same time, the shopping information of the customer is traced and saved dynamically. Fourth, the next store is recommended based on the physical distance between stores of representative patterns and the present location of target customer. In this research, we have evaluated the accuracy of recommendation method based on a factual data derived from a department store. There are technological difficulties of tracking on a real-time basis so we extracted purchasing related information and we added on context information on each transaction. As a result, recommendation based on FM model that applies purchasing and context information is more stable and accurate compared to that of vector model. Additionally, we could find more precise recommendation result as more shopping information is accumulated. Realistically, because of the limitation of ubiquitous environment realization, we were not able to reflect on all different kinds of context but more explicit analysis is expected to be attainable in the future after practical system is embodied.

One-stop Evaluation Protocol of Ischemic Heart Disease: Myocardial Fusion PET Study (허혈성 심장 질환의 One-stop Evaluation Protocol: Myocardial Fusion PET Study)

  • Kim, Kyong-Mok;Lee, Byung-Wook;Lee, Dong-Wook;Kim, Jeong-Su;Jang, Yeong-Do;Bang, Chan-Seok;Baek, Jong-Hun;Lee, In-Su
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.14 no.2
    • /
    • pp.33-37
    • /
    • 2010
  • Purpose: In the early stage of using PET/CT, it was used to damper revision but recently shows that CT with MDCT is commonly used and works well for an anatomical diagnosis. This hospital makes the accuracy and convenience more higher in the diagnosis and evaluate of coronary heart disease through concurrently running myocardial perfusion SPECT examination, myocardial PET examination with FDG, and CT coronary artery CT angiography(coronary CTA) used PET/CT with 64-slice. This report shows protocol and image based on results from about 400 coronary heart disease examinations since having 64 channels PET/CT in July 2007. Materials and Methods: An Equipment for this examination is 64-slice CT and Discovery VCT (DVCT) that is consisted of PET with BGO ($Bi_4Ge_3O_{12}$) scintillation crystal by GE health care. First myocardial perfusion SPECT with pharmacologic stress test to reduce waiting time of a patient and get a quick diagnosis and evaluation, and right after it, myocardial FDG PET examination and coronary CTA run without a break. One-stop evaluation protocol of ischemic heart disease is as follows. 1)Myocardial perfusion SPECT with pharmacologic stress: A patient is injected with $^{99m}Tc$-MIBI 10 mCi and does not have any fatty food for myocardial PET examination and drink natural water with ursodeoxcholic acid 100 mg and we get SPECT image in an hour. 2)Myocardial FDG PET: To reduce blood fatty content and to increase uptake of FDG, we used creative oral glucose load using insulin and Acipimox to according to blood acid content. A patient is injected with $^{18}F$-FDG 5 mCi for reduction of his radiation exposure and we get a gated image an hour later and get delay image when we need. 3) Coronary CTA: The most important point is to control heart rate and to get cooperation of patient's breath. In order to reduce a heart rate of him or her below 65 beats, let him or her take beta blocker 50 mg ~ 200 mg after a consultation with a doctor about it and have breath-practices then have the examination. Right before the examination, we spray isosorbide dinitrate 3 to 5 times to lower tension of bessel wall and to extension a blood wall of a patient. It makes to get better the shape of an anatomy. At filming, a patient is injected CT contrast with high pressure and have enough practices before the examination in order to have no problem. For reduction of his radiation exposure, we have to do ECG-triggered X-ray tube modulation exposure. Results: We evaluate coronary artery stenosis through coronary CTA and study correlation (culprit vessel check) of a decline between stenosis and perfusion from the myocardial perfusion SPECT with pharmacologic stress, coronary CTA, and can check viability of infarction or hibernating myocardium by FDG PET. Conclusion: The examination makes us to set up a direction of remedy (drug treatment, PCI, CABG) because we can estimate of effect from remedy, lesion site and severity. In addition, we have an advantage that it takes just 3 hours and one-stop in that all of process of examinations run in succession and at the same time. Therefore it shows that the method is useful in one stop evaluation of ischemic heart disease.

  • PDF

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

A Study on the Revitalization of Tourism Industry through Big Data Analysis (한국관광 실태조사 빅 데이터 분석을 통한 관광산업 활성화 방안 연구)

  • Lee, Jungmi;Liu, Meina;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.149-169
    • /
    • 2018
  • Korea is currently accumulating a large amount of data in public institutions based on the public data open policy and the "Government 3.0". Especially, a lot of data is accumulated in the tourism field. However, the academic discussions utilizing the tourism data are still limited. Moreover, the openness of the data of restaurants, hotels, and online tourism information, and how to use SNS Big Data in tourism are still limited. Therefore, utilization through tourism big data analysis is still low. In this paper, we tried to analyze influencing factors on foreign tourists' satisfaction in Korea through numerical data using data mining technique and R programming technique. In this study, we tried to find ways to revitalize the tourism industry by analyzing about 36,000 big data of the "Survey on the actual situation of foreign tourists from 2013 to 2015" surveyed by the Korea Culture & Tourism Research Institute. To do this, we analyzed the factors that have high influence on the 'Satisfaction', 'Revisit intention', and 'Recommendation' variables of foreign tourists. Furthermore, we analyzed the practical influences of the variables that are mentioned above. As a procedure of this study, we first integrated survey data of foreign tourists conducted by Korea Culture & Tourism Research Institute, which is stored in the tourist information system from 2013 to 2015, and eliminate unnecessary variables that are inconsistent with the research purpose among the integrated data. Some variables were modified to improve the accuracy of the analysis. And we analyzed the factors affecting the dependent variables by using data-mining methods: decision tree(C5.0, CART, CHAID, QUEST), artificial neural network, and logistic regression analysis of SPSS IBM Modeler 16.0. The seven variables that have the greatest effect on each dependent variable were derived. As a result of data analysis, it was found that seven major variables influencing 'overall satisfaction' were sightseeing spot attraction, food satisfaction, accommodation satisfaction, traffic satisfaction, guide service satisfaction, number of visiting places, and country. Variables that had a great influence appeared food satisfaction and sightseeing spot attraction. The seven variables that had the greatest influence on 'revisit intention' were the country, travel motivation, activity, food satisfaction, best activity, guide service satisfaction and sightseeing spot attraction. The most influential variables were food satisfaction and travel motivation for Korean style. Lastly, the seven variables that have the greatest influence on the 'recommendation intention' were the country, sightseeing spot attraction, number of visiting places, food satisfaction, activity, tour guide service satisfaction and cost. And then the variables that had the greatest influence were the country, sightseeing spot attraction, and food satisfaction. In addition, in order to grasp the influence of each independent variables more deeply, we used R programming to identify the influence of independent variables. As a result, it was found that the food satisfaction and sightseeing spot attraction were higher than other variables in overall satisfaction and had a greater effect than other influential variables. Revisit intention had a higher ${\beta}$ value in the travel motive as the purpose of Korean Wave than other variables. It will be necessary to have a policy that will lead to a substantial revisit of tourists by enhancing tourist attractions for the purpose of Korean Wave. Lastly, the recommendation had the same result of satisfaction as the sightseeing spot attraction and food satisfaction have higher ${\beta}$ value than other variables. From this analysis, we found that 'food satisfaction' and 'sightseeing spot attraction' variables were the common factors to influence three dependent variables that are mentioned above('Overall satisfaction', 'Revisit intention' and 'Recommendation'), and that those factors affected the satisfaction of travel in Korea significantly. The purpose of this study is to examine how to activate foreign tourists in Korea through big data analysis. It is expected to be used as basic data for analyzing tourism data and establishing effective tourism policy. It is expected to be used as a material to establish an activation plan that can contribute to tourism development in Korea in the future.