• Title/Summary/Keyword: analysis of algorithms

Search Result 3,548, Processing Time 0.03 seconds

The Analysis on the Relationship between Firms' Exposures to SNS and Stock Prices in Korea (기업의 SNS 노출과 주식 수익률간의 관계 분석)

  • Kim, Taehwan;Jung, Woo-Jin;Lee, Sang-Yong Tom
    • Asia pacific journal of information systems
    • /
    • v.24 no.2
    • /
    • pp.233-253
    • /
    • 2014
  • Can the stock market really be predicted? Stock market prediction has attracted much attention from many fields including business, economics, statistics, and mathematics. Early research on stock market prediction was based on random walk theory (RWT) and the efficient market hypothesis (EMH). According to the EMH, stock market are largely driven by new information rather than present and past prices. Since it is unpredictable, stock market will follow a random walk. Even though these theories, Schumaker [2010] asserted that people keep trying to predict the stock market by using artificial intelligence, statistical estimates, and mathematical models. Mathematical approaches include Percolation Methods, Log-Periodic Oscillations and Wavelet Transforms to model future prices. Examples of artificial intelligence approaches that deals with optimization and machine learning are Genetic Algorithms, Support Vector Machines (SVM) and Neural Networks. Statistical approaches typically predicts the future by using past stock market data. Recently, financial engineers have started to predict the stock prices movement pattern by using the SNS data. SNS is the place where peoples opinions and ideas are freely flow and affect others' beliefs on certain things. Through word-of-mouth in SNS, people share product usage experiences, subjective feelings, and commonly accompanying sentiment or mood with others. An increasing number of empirical analyses of sentiment and mood are based on textual collections of public user generated data on the web. The Opinion mining is one domain of the data mining fields extracting public opinions exposed in SNS by utilizing data mining. There have been many studies on the issues of opinion mining from Web sources such as product reviews, forum posts and blogs. In relation to this literatures, we are trying to understand the effects of SNS exposures of firms on stock prices in Korea. Similarly to Bollen et al. [2011], we empirically analyze the impact of SNS exposures on stock return rates. We use Social Metrics by Daum Soft, an SNS big data analysis company in Korea. Social Metrics provides trends and public opinions in Twitter and blogs by using natural language process and analysis tools. It collects the sentences circulated in the Twitter in real time, and breaks down these sentences into the word units and then extracts keywords. In this study, we classify firms' exposures in SNS into two groups: positive and negative. To test the correlation and causation relationship between SNS exposures and stock price returns, we first collect 252 firms' stock prices and KRX100 index in the Korea Stock Exchange (KRX) from May 25, 2012 to September 1, 2012. We also gather the public attitudes (positive, negative) about these firms from Social Metrics over the same period of time. We conduct regression analysis between stock prices and the number of SNS exposures. Having checked the correlation between the two variables, we perform Granger causality test to see the causation direction between the two variables. The research result is that the number of total SNS exposures is positively related with stock market returns. The number of positive mentions of has also positive relationship with stock market returns. Contrarily, the number of negative mentions has negative relationship with stock market returns, but this relationship is statistically not significant. This means that the impact of positive mentions is statistically bigger than the impact of negative mentions. We also investigate whether the impacts are moderated by industry type and firm's size. We find that the SNS exposures impacts are bigger for IT firms than for non-IT firms, and bigger for small sized firms than for large sized firms. The results of Granger causality test shows change of stock price return is caused by SNS exposures, while the causation of the other way round is not significant. Therefore the correlation relationship between SNS exposures and stock prices has uni-direction causality. The more a firm is exposed in SNS, the more is the stock price likely to increase, while stock price changes may not cause more SNS mentions.

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

Preliminary Inspection Prediction Model to select the on-Site Inspected Foreign Food Facility using Multiple Correspondence Analysis (차원축소를 활용한 해외제조업체 대상 사전점검 예측 모형에 관한 연구)

  • Hae Jin Park;Jae Suk Choi;Sang Goo Cho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.121-142
    • /
    • 2023
  • As the number and weight of imported food are steadily increasing, safety management of imported food to prevent food safety accidents is becoming more important. The Ministry of Food and Drug Safety conducts on-site inspections of foreign food facilities before customs clearance as well as import inspection at the customs clearance stage. However, a data-based safety management plan for imported food is needed due to time, cost, and limited resources. In this study, we tried to increase the efficiency of the on-site inspection by preparing a machine learning prediction model that pre-selects the companies that are expected to fail before the on-site inspection. Basic information of 303,272 foreign food facilities and processing businesses collected in the Integrated Food Safety Information Network and 1,689 cases of on-site inspection information data collected from 2019 to April 2022 were collected. After preprocessing the data of foreign food facilities, only the data subject to on-site inspection were extracted using the foreign food facility_code. As a result, it consisted of a total of 1,689 data and 103 variables. For 103 variables, variables that were '0' were removed based on the Theil-U index, and after reducing by applying Multiple Correspondence Analysis, 49 characteristic variables were finally derived. We build eight different models and perform hyperparameter tuning through 5-fold cross validation. Then, the performance of the generated models are evaluated. The research purpose of selecting companies subject to on-site inspection is to maximize the recall, which is the probability of judging nonconforming companies as nonconforming. As a result of applying various algorithms of machine learning, the Random Forest model with the highest Recall_macro, AUROC, Average PR, F1-score, and Balanced Accuracy was evaluated as the best model. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the selection reason for nonconforming facilities of individual instances, and discuss applicability to the on-site inspection facility selection system. Based on the results of this study, it is expected that it will contribute to the efficient operation of limited resources such as manpower and budget by establishing an imported food management system through a data-based scientific risk management model.

Retrieval of Sulfur Dioxide Column Density from TROPOMI Using the Principle Component Analysis Method (주성분분석방법을 이용한 TROPOMI로부터 이산화황 칼럼농도 산출 연구)

  • Yang, Jiwon;Choi, Wonei;Park, Junsung;Kim, Daewon;Kang, Hyeongwoo;Lee, Hanlim
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.6_3
    • /
    • pp.1173-1185
    • /
    • 2019
  • We, for the first time, retrieved sulfur dioxide (SO2) vertical column density (VCD) in industrial and volcanic areas from TROPOspheric Monitoring Instrument (TROPOMI) using the Principle component analysis(PCA) algorithm. Furthermore, SO2 VCDs retrieved by the PCA algorithm from TROPOMI raw data were compared with those retrieved by the Differential Optical Absorption Spectroscopy (DOAS) algorithm (TROPOMI Level 2 SO2 product). In East Asia, where large amounts of SO2 are released to the surface due to anthropogenic source such as fossil fuels, the mean value of SO2 VCD retrieved by the PCA (DOAS) algorithm was shown to be 0.05 DU (-0.02 DU). The correlation between SO2 VCD retrieved by the PCA algorithm and those retrieved by the DOAS algorithm were shown to be low (slope = 0.64; correlation coefficient (R) = 0.51) for cloudy condition. However, with cloud fraction of less than 0.5, the slope and correlation coefficient between the two outputs were increased to 0.68 and 0.61, respectively. It means that the SO2 retrieval sensitivity to surface is reduced when the cloud fraction is high in both algorithms. Furthermore, the correlation between volcanic SO2 VCD retrieved by the PCA algorithm and those retrieved by the DOAS algorithm is shown to be high (R = 0.90) for cloudy condition. This good agreement between both data sets for volcanic SO2 is thought to be due to the higher accuracy of the satellite-based SO2 VCD retrieval for SO2 which is mainly distributed in the upper troposphere or lower stratosphere in volcanic region.

Application of an empirical method to improve radar rainfall estimation using cross governmental dual-pol. radars (범부처 이중편파레이더의 강우 추정 향상을 위한 경험적 방법의 적용)

  • Yoon, Jungsoo;Suk, Mi-Kyung;Nam, Kyung-Yeub;Park, Jong-Sook
    • Journal of Korea Water Resources Association
    • /
    • v.49 no.7
    • /
    • pp.625-634
    • /
    • 2016
  • Three leading agencies under different ministries - Korea Meteorological Administration (KMA) in the ministry of Environment, Han river control office in the Ministry of Land, Infrastructure and Transport (MOLIT) and Weather Group of ROK Air Force in the Ministry of National Defense (MND) - have been operated radars in the purpose of observing weather, hydrology and military operational weather in Korea. Eight S-band dual-pol. radars have been newly installed or replaced by these ministries over different places by 2015. However each ministry has different aims of operating radars, observation strategies, data processing algorithms, etc. Due to the differences, there is a wide level of accuracy on observed radar data as well as the composite images made of the cross governmental radar measurement. Gaining fairly high level of accuracy on radar data obtained by different agencies has been shared as a great concern by the ministries. Thus, "an agreement of harmonizing weather and hydrological radar products" was made by the three ministries in 2010. Particularly, this is very important to produce better rainfall estimation using the cross governmental radar measurement. Weather Radar Center(WRC) in KMA has been developed an empirical method using measurements observed by Yongin testbed radar. This study is aiming to examine the efficiency of the empirical method to improve the accuracies of radar rainfalls estimated from cross governmental dual-pol. radar measurements. As a result, the radar rainfalls of three radars (Baengnyeongdo, Biseulsan, and, Sobaeksan Radar) were shown improvement in accuracy (1-NE) up to 70% using data from May to October in 2015. Also, the range of the accuracies in radar rainfall estimation, which were from 30% to 60% before adjusting polarimetric variables, were decreased from 65% to 70% after adjusting polarimetric variables.

Evaluation of Artificial Intelligence Accuracy by Increasing the CNN Hidden Layers: Using Cerebral Hemorrhage CT Data (CNN 은닉층 증가에 따른 인공지능 정확도 평가: 뇌출혈 CT 데이터)

  • Kim, Han-Jun;Kang, Min-Ji;Kim, Eun-Ji;Na, Yong-Hyeon;Park, Jae-Hee;Baek, Su-Eun;Sim, Su-Man;Hong, Joo-Wan
    • Journal of the Korean Society of Radiology
    • /
    • v.16 no.1
    • /
    • pp.1-6
    • /
    • 2022
  • Deep learning is a collection of algorithms that enable learning by summarizing the key contents of large amounts of data; it is being developed to diagnose lesions in the medical imaging field. To evaluate the accuracy of the cerebral hemorrhage diagnosis, we used a convolutional neural network (CNN) to derive the diagnostic accuracy of cerebral parenchyma computed tomography (CT) images and the cerebral parenchyma CT images of areas where cerebral hemorrhages are suspected of having occurred. We compared the accuracy of CNN with different numbers of hidden layers and discovered that CNN with more hidden layers resulted in higher accuracy. The analysis results of the derived CT images used in this study to determine the presence of cerebral hemorrhages are expected to be used as foundation data in studies related to the application of artificial intelligence in the medical imaging industry.

Evaluation and Comparison of Contrast to Noise Ratio and Signal to Noise Ratio According to Change of Reconstruction on Breast PET/CT (Breast PET CT 영상 재구성 변화에 따른 대조도 대 잡음비와 신호 대 잡음비의 비교평가)

  • Lee, Jea-Young;Lee, Eul-Kyu;Kim, Ki-Won;Jeong, Hoi-Woun;Lyu, Kwang-Yeul;Park, Hoon-Hee;Son, Jin-Hyun;Min, Jung-Whan
    • Journal of radiological science and technology
    • /
    • v.40 no.1
    • /
    • pp.79-85
    • /
    • 2017
  • The purpose of this study was to measure contrast to noise ratio (CNR) and signal to noise ratio (SNR) according to change of reconstruction from region of interest (ROI) in breast positron emission tomography-computed tomography (PET-CT), and to analyze the CNR and SNR statically. We examined images of breast PET-CT of 100 patients in a University-affiliated hospital, Seoul, Korea. Each patient's image of breast PET-CT were calculated by using ImageJ. Differences of CNR and SNR among four reconstruction algorithms were tested by SPSS Statistics21 ANOVA test for there was statistical significance (p<0.05). We have analysis socio-demographical variables, CNR and SNR according to reconstruction images, 95% confidence according to CNR and SNR of reconstruction and difference in a mean of CNR and SNR. SNR results, with the quality of distributions in the order of PSF_TOF, Iterative and Iterative-TOF, FBP-TOF. CNR, with the quality of distributions in the order of PSF_TOF, Iterative and Iterative-TOF, FBP-TOF. CNR and SNR of PET-CT reconstruction methods of the breast would be useful to evaluate breast diseases.

Computational estimation of the earthquake response for fibre reinforced concrete rectangular columns

  • Liu, Chanjuan;Wu, Xinling;Wakil, Karzan;Jermsittiparsert, Kittisak;Ho, Lanh Si;Alabduljabbar, Hisham;Alaskar, Abdulaziz;Alrshoudi, Fahed;Alyousef, Rayed;Mohamed, Abdeliazim Mustafa
    • Steel and Composite Structures
    • /
    • v.34 no.5
    • /
    • pp.743-767
    • /
    • 2020
  • Due to the impressive flexural performance, enhanced compressive strength and more constrained crack propagation, Fibre-reinforced concrete (FRC) have been widely employed in the construction application. Majority of experimental studies have focused on the seismic behavior of FRC columns. Based on the valid experimental data obtained from the previous studies, the current study has evaluated the seismic response and compressive strength of FRC rectangular columns while following hybrid metaheuristic techniques. Due to the non-linearity of seismic data, Adaptive neuro-fuzzy inference system (ANFIS) has been incorporated with metaheuristic algorithms. 317 different datasets from FRC column tests has been applied as one database in order to determine the most influential factor on the ultimate strengths of FRC rectangular columns subjected to the simulated seismic loading. ANFIS has been used with the incorporation of Particle Swarm Optimization (PSO) and Genetic algorithm (GA). For the analysis of the attained results, Extreme learning machine (ELM) as an authentic prediction method has been concurrently used. The variable selection procedure is to choose the most dominant parameters affecting the ultimate strengths of FRC rectangular columns subjected to simulated seismic loading. Accordingly, the results have shown that ANFIS-PSO has successfully predicted the seismic lateral load with R2 = 0.857 and 0.902 for the test and train phase, respectively, nominated as the lateral load prediction estimator. On the other hand, in case of compressive strength prediction, ELM is to predict the compressive strength with R2 = 0.657 and 0.862 for test and train phase, respectively. The results have shown that the seismic lateral force trend is more predictable than the compressive strength of FRC rectangular columns, in which the best results belong to the lateral force prediction. Compressive strength prediction has illustrated a significant deviation above 40 Mpa which could be related to the considerable non-linearity and possible empirical shortcomings. Finally, employing ANFIS-GA and ANFIS-PSO techniques to evaluate the seismic response of FRC are a promising reliable approach to be replaced for high cost and time-consuming experimental tests.

Association Between Gestational Diabetes Mellitus and Subsequent Risk of Cancer: a Systematic Review of Epidemiological Studies

  • Tong, Gui-Xian;Cheng, Jing;Chai, Jing;Geng, Qing-Qing;Chen, Peng-Lai;Shen, Xin-Rong;Liang, Han;Wang, De-Bin
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.10
    • /
    • pp.4265-4269
    • /
    • 2014
  • Purpose: This study aimed at summarizing epidemiological evidence of the association between gestational diabetes mellitus (GDM) and subsequent risk of cancer. Materials and Methods: We searched Medline, Embase, Cancer Lit and CINAHL for epidemiological studies published by February 1, 2014 examining the risk of cancer in patients with history of GDM using highly inclusive algorithms. Information about first author, year of publication, country of study, study design, cancer sites, sample sizes, attained age of subjects and methods used for determining GDM status were extracted by two researchers and Stata version 11.0 was used to perform the meta-analysis and estimate the pooled effects. Results: A total of 9 articles documented 5 cohort and 4 case-control studies containing 10,630 cancer cases and 14,608 women with a history of GDM were included in this review. Taken together, the pooled odds ratio (OR) between GDM and breast cancer risk was 1.01 (0.87-1.17); yet the same pooled ORs of case-control and cohort studies were 0.87 (0.71-1.06) and 1.25 (1.00-1.56) respectively. There are indications that GDM is strongly associated with higher risk of pancreatic cancer (HR=8.68) and hematologic malignancies (HR=4.53), but no relationships were detected between GDM and other types of cancer. Conclusions: Although GDM increases the risk of certain types of cancer, these results should be interpreted with caution becuase of some methodological flaws. The issue merits added investigation and coordinated efforts between researchers, antenatal clinics and cancer treatment and registration agencies to help attain better understanding.

Prediction of Prognosis in Glioblastoma Using Radiomics Features of Dynamic Contrast-Enhanced MRI

  • Elena Pak;Kyu Sung Choi;Seung Hong Choi;Chul-Kee Park;Tae Min Kim;Sung-Hye Park;Joo Ho Lee;Soon-Tae Lee;Inpyeong Hwang;Roh-Eul Yoo;Koung Mi Kang;Tae Jin Yun;Ji-Hoon Kim;Chul-Ho Sohn
    • Korean Journal of Radiology
    • /
    • v.22 no.9
    • /
    • pp.1514-1524
    • /
    • 2021
  • Objective: To develop a radiomics risk score based on dynamic contrast-enhanced (DCE) MRI for prognosis prediction in patients with glioblastoma. Materials and Methods: One hundred and fifty patients (92 male [61.3%]; mean age ± standard deviation, 60.5 ± 13.5 years) with glioblastoma who underwent preoperative MRI were enrolled in the study. Six hundred and forty-two radiomic features were extracted from volume transfer constant (Ktrans), fractional volume of vascular plasma space (Vp), and fractional volume of extravascular extracellular space (Ve) maps of DCE MRI, wherein the regions of interest were based on both T1-weighted contrast-enhancing areas and non-enhancing T2 hyperintense areas. Using feature selection algorithms, salient radiomic features were selected from the 642 features. Next, a radiomics risk score was developed using a weighted combination of the selected features in the discovery set (n = 105); the risk score was validated in the validation set (n = 45) by investigating the difference in prognosis between the "radiomics risk score" groups. Finally, multivariable Cox regression analysis for progression-free survival was performed using the radiomics risk score and clinical variables as covariates. Results: 16 radiomic features obtained from non-enhancing T2 hyperintense areas were selected among the 642 features identified. The radiomics risk score was used to stratify high- and low-risk groups in both the discovery and validation sets (both p < 0.001 by the log-rank test). The radiomics risk score and presence of isocitrate dehydrogenase (IDH) mutation showed independent associations with progression-free survival in opposite directions (hazard ratio, 3.56; p = 0.004 and hazard ratio, 0.34; p = 0.022, respectively). Conclusion: We developed and validated the "radiomics risk score" from the features of DCE MRI based on non-enhancing T2 hyperintense areas for risk stratification of patients with glioblastoma. It was associated with progression-free survival independently of IDH mutation status.