• Title/Summary/Keyword: 로지스틱회귀분석기법

Search Result 155, Processing Time 0.024 seconds

A Tracking Method of Same Drug Sales Accounts through Similarity Analysis of Instagram Profiles and Posts

  • Eun-Young Park;Jiyeon Kim;Chang-Hoon Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.2
    • /
    • pp.109-118
    • /
    • 2024
  • With the increasing number of social media users worldwide, cases of social media being abused to perpetrate various crimes are increasing. Specifically, drug distribution through social media is emerging as a serious social problem. Using social media channels, the curiosity of teenagers regarding drugs is stimulated through clever marketing. Further, social media easily facilitates drug purchases due to the high accessibility of drug sellers and consumers. Among various social media platforms, we focused on Instagram, which is the most used social media platform by young adults aged 19 to 24 years in South Korea. We collected four types of information, including profile photos, introductions, posts in the form of images, and posts in the form of texts on Instagram; then, we analyzed the similarity among each type of collected information. The profile photos and posts in the form of image were analyzed for similarity based on the SSIM(Structural Simplicity Index Measure), while introductions and posts in the form of text were analyzed for similarity using Jaccard and Cosine similarity techniques. Through the similarity analysis, the similarity among various accounts for each collected information type was measured, and accounts with similarity above the significance level were determined as the same drug sales account. By performing logistic regression analysis on the aforementioned information types, we confirmed that except posts in image form, profile photos, introductions, and posts in the text form were valid information for tracking the same drug sales account.

A Study on Self-sufficiency for Hospital Injury Inpatients in Korea (우리나라 의료기관 입원손상환자의 자체충족도에 관한 연구)

  • Lee, Hee-Won;Park, Jong-Ho;Kang, Sung-Hong;Kim, Won-Joong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.12
    • /
    • pp.5779-5788
    • /
    • 2011
  • This study was conducted to comprehend the current status of regional self-sufficiency of Hospital injury inpatients and, based on this, to prepare some measures for improving the self-sufficiency. For this purpose, 2005 & 2008 Patient Survey data, regional medical utilization data of National Health Insurance Corporation, yearbook of Central Emergency Medical Center and evaluation results of emergency medical institutions were obtained. Frequency analysis, cross-tabulation, decision tree and logistic regression techniques were used in the analysis of data. Self-sufficiency in 'metropolitan city/Do' area was lowest for Chungcheongnam-do for the year 2005 and 2008, followed by Gyeongsangbuk-do, Gyeonggi-do and Jeollanam-do. As for the self-sufficiency in 'Si/Gun/Gu' area with regard to local medical supply, for both 2005 and 2008, It was higher when general hospital, district emergency medical center, regional emergency medical center and regional emergency medical institution existed in the residential area. It was also found that, the higher the quality level of local emergency medical institution, the higher the self-sufficiency. It was confirmed that, when promoting the national policy for injury patients, priority should be placed on 'Do' area where the level of emergency medical supply was low, and that enhancing the quality level of emergency medical institutions was helpful for the improvement of self-sufficiency.

Classification Model of Chronic Gastritis According to The Feature Extraction Method of Radial Artery Pulse Signal (맥파의 특징점 추출 방법에 따른 만성위염 판별 모형)

  • Choi, Sang-Ho;Shin, Ki-Young;Kim, Jeauk;Jin, Seung-Oh;Lee, Tea-Bum
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.1
    • /
    • pp.185-194
    • /
    • 2014
  • One in every 10 persons suffer from chronic gastritis in Korea. Endoscopy is most commonly used to diagnose the chronic gastritis. Endoscopic diagnosis is precise but it is accompanied with pain and high cost. According to pulse diagnosis in Traditional East Asian Medicine, health problems in stomach can be diagnosed with radial pulse signals in 'Guan' location in the right wrist, which are non-invasive and cost-effective. In this study, we developed a classification model of chronic gastritis using pulse signals in right 'Guan' location. We used both linear discrimination method and logistic regression model with respect to pulse features obtained with a peak-valley detection algorithm and a Gaussian model. As a result, we obtained sensitivity ranged between 77%~89% and specificity ranged between 72%~83% depending on classification models and feature extraction methods, and the average classification rates were approximately 80%, irrespective of the models. Specifically, the Gaussian model were featured by superior sensitivities (89.1% and 87.5%) while the peak-valley detection method showed superior specificities (82.8% and 81.3%), and the average classification rate (sensitivity + specificity) of the Gaussian model was 80.9% which was 1.2% ahead of the peak-valley method. In conclusion, we obtained a reliable classification model for the chronic gastritis based on the radial pulse feature extraction algorithms, where the Gaussian model was featured by outperformed sensitivity and the peak-valley method was featured by outperformed specificity.

Development of Spatial Landslide Information System and Application of Spatial Landslide Information (산사태 공간 정보시스템 개발 및 산사태 공간 정보의 활용)

  • 이사로;김윤종;민경덕
    • Spatial Information Research
    • /
    • v.8 no.1
    • /
    • pp.141-153
    • /
    • 2000
  • The purpose of this study is to develop and apply spatial landslide information system using Geographic information system (GIS) in concerned with spatial data. Landslide locations detected from interpretation of aerial photo and field survey, and topographic , soil , forest , and geological maps of the study area, Yongin were collected and constructed into spatial database using GIS. As landslide occurrence factors, slope, aspect and curvature of topography were calculated from the topographic database. Texture, material, drainage and effective thickness of soil were extracted from the soil database, and type, age, diameter and density of wood were extracted from the forest database. Lithology was extracted from the geological database, and land use was classified from the Landsat TM satellite image. In addition, landslide damageable objects such as building, road, rail and other facility were extracted from the topographic database. Landslide susceptibility was analyzed using the landslide occurrence factors by probability, logistic regression and neural network methods. The spatial landslide information system was developed to retrieve the constructed GIS database and landslide susceptibility . The system was developed using Arc View script language(Avenue), and consisted of pull-down and icon menus for easy use. Also, the constructed database can be retrieved through Internet World Wide Web (WWW) using Internet GIS technology.

  • PDF

Group Classification on Management Behavior of Diabetic Mellitus (당뇨 환자의 관리행태에 대한 군집 분류)

  • Kang, Sung-Hong;Choi, Soon-Ho
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.2
    • /
    • pp.765-774
    • /
    • 2011
  • The purpose of this study is to provide informative statistics which can be used for effective Diabetes Management Programs. We collected and analyzed the data of 666 diabetic people who had participated in Korean National Health and Nutrition Examination Survey in 2007 and 2008. Group classification on management behavior of Diabetic Mellitus is based on the K-means clustering method. The Decision Tree method and Multiple Regression Analysis were used to study factors of the management behavior of Diabetic Mellitus. Diabetic people were largely classified into three categories: Health Behavior Program Group, Focused Management Program Group, and Complication Test Program Group. First, Health Behavior Program Group means that even though drug therapy and complication test are being well performed, people should still need to improve their health behavior such as exercising regularly and avoid drinking and smoking. Second, Focused Management Program Group means that they show an uncooperative attitude about treatment and complication test and also take a passive action to improve their health behavior. Third, Complication Test Program Group means that they take a positive attitude about treatment and improving their health behavior but they pay no attention to complication test to detect acute and chronic disease early. The main factor for group classification was to prove whether they have hyperlipidemia or not. This varied widely with an individual's gender, income, age, occupation, and self rated health. To improve the rate of diabetic management, specialized diabetic management programs should be applied depending on each group's character.

An Optimized Combination of π-fuzzy Logic and Support Vector Machine for Stock Market Prediction (주식 시장 예측을 위한 π-퍼지 논리와 SVM의 최적 결합)

  • Dao, Tuanhung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.43-58
    • /
    • 2014
  • As the use of trading systems has increased rapidly, many researchers have become interested in developing effective stock market prediction models using artificial intelligence techniques. Stock market prediction involves multifaceted interactions between market-controlling factors and unknown random processes. A successful stock prediction model achieves the most accurate result from minimum input data with the least complex model. In this research, we develop a combination model of ${\pi}$-fuzzy logic and support vector machine (SVM) models, using a genetic algorithm to optimize the parameters of the SVM and ${\pi}$-fuzzy functions, as well as feature subset selection to improve the performance of stock market prediction. To evaluate the performance of our proposed model, we compare the performance of our model to other comparative models, including the logistic regression, multiple discriminant analysis, classification and regression tree, artificial neural network, SVM, and fuzzy SVM models, with the same data. The results show that our model outperforms all other comparative models in prediction accuracy as well as return on investment.

Correlation Analysis of between Patient and Equipment Factors and Radiation Dose in Chest Low Dose and Abdominal Non-contrast CT (흉부 저선량 및 복부 비조영 CT 검사에서 환자 및 장비 인자와 선량과의 상관관계 분석)

  • Shim, Jina;Lee, Youngjin
    • Journal of the Korean Society of Radiology
    • /
    • v.15 no.2
    • /
    • pp.117-123
    • /
    • 2021
  • This paper is to establish a basis for a dose reduction strategy by confirming correlations with the factors that may affect the radiation dose based on the dose records in low-dose chest CT and abdominal non-contrast CT. In order to find out the causes of unnecessary exposure, the correlation between seven factors (age, gender, height, weight, BMI, patient status [inpatient and outpatient], and use of dose modulation) and CT dose were identified. Logistic regression was used as the statistical analysis for correlation verification. In the low dose chest CT, as the higher values of height and BMI and dose modulation off were associated with lowering the risk exceeding Diagnostic Reference Levels(DRL) (odds ration<1, p<0.05). However, as woman compared to man and the higher values of weight were associated with highering the risk exceeding DRL (odds ration>1, p<0.05). In the abdomen CT, as dose modulation off were associated with lowering the risk exceeding DRL (odds ration<1, p<0.05). Therefore It is necessary to conduct research on the relationship between various factors affecting radiation exposure and patient radiation dose for reducing the dose.

A Study on the Prediction Model of Stock Price Index Trend based on GA-MSVM that Simultaneously Optimizes Feature and Instance Selection (입력변수 및 학습사례 선정을 동시에 최적화하는 GA-MSVM 기반 주가지수 추세 예측 모형에 관한 연구)

  • Lee, Jong-sik;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.147-168
    • /
    • 2017
  • There have been many studies on accurate stock market forecasting in academia for a long time, and now there are also various forecasting models using various techniques. Recently, many attempts have been made to predict the stock index using various machine learning methods including Deep Learning. Although the fundamental analysis and the technical analysis method are used for the analysis of the traditional stock investment transaction, the technical analysis method is more useful for the application of the short-term transaction prediction or statistical and mathematical techniques. Most of the studies that have been conducted using these technical indicators have studied the model of predicting stock prices by binary classification - rising or falling - of stock market fluctuations in the future market (usually next trading day). However, it is also true that this binary classification has many unfavorable aspects in predicting trends, identifying trading signals, or signaling portfolio rebalancing. In this study, we try to predict the stock index by expanding the stock index trend (upward trend, boxed, downward trend) to the multiple classification system in the existing binary index method. In order to solve this multi-classification problem, a technique such as Multinomial Logistic Regression Analysis (MLOGIT), Multiple Discriminant Analysis (MDA) or Artificial Neural Networks (ANN) we propose an optimization model using Genetic Algorithm as a wrapper for improving the performance of this model using Multi-classification Support Vector Machines (MSVM), which has proved to be superior in prediction performance. In particular, the proposed model named GA-MSVM is designed to maximize model performance by optimizing not only the kernel function parameters of MSVM, but also the optimal selection of input variables (feature selection) as well as instance selection. In order to verify the performance of the proposed model, we applied the proposed method to the real data. The results show that the proposed method is more effective than the conventional multivariate SVM, which has been known to show the best prediction performance up to now, as well as existing artificial intelligence / data mining techniques such as MDA, MLOGIT, CBR, and it is confirmed that the prediction performance is better than this. Especially, it has been confirmed that the 'instance selection' plays a very important role in predicting the stock index trend, and it is confirmed that the improvement effect of the model is more important than other factors. To verify the usefulness of GA-MSVM, we applied it to Korea's real KOSPI200 stock index trend forecast. Our research is primarily aimed at predicting trend segments to capture signal acquisition or short-term trend transition points. The experimental data set includes technical indicators such as the price and volatility index (2004 ~ 2017) and macroeconomic data (interest rate, exchange rate, S&P 500, etc.) of KOSPI200 stock index in Korea. Using a variety of statistical methods including one-way ANOVA and stepwise MDA, 15 indicators were selected as candidate independent variables. The dependent variable, trend classification, was classified into three states: 1 (upward trend), 0 (boxed), and -1 (downward trend). 70% of the total data for each class was used for training and the remaining 30% was used for verifying. To verify the performance of the proposed model, several comparative model experiments such as MDA, MLOGIT, CBR, ANN and MSVM were conducted. MSVM has adopted the One-Against-One (OAO) approach, which is known as the most accurate approach among the various MSVM approaches. Although there are some limitations, the final experimental results demonstrate that the proposed model, GA-MSVM, performs at a significantly higher level than all comparative models.

Short-term Mortality Prediction of Recurrence Patients with ST-segment Elevation Myocardial Infarction (ST 분절 급상승 심근경색 환자들의 단기 재발 사망 예측)

  • Lim, Kwang-Hyeon;Ryu, Kwang-Sun;Park, Soo-Ho;Shon, Ho-Sun;Ryu, Keun-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.10
    • /
    • pp.145-154
    • /
    • 2012
  • Recently, the cardiovascular disease has increased by causes such as westernization dietary life, smoking, and obesity. In particular, the acute myocardial infarction (AMI) occupies 50% death rate in cardiovascular disease. Following this trend, the AMI has been carried out a research for discovery of risk factors based on national data. However, there is a lack of diagnosis minor suitable for Korean. The objective of this paper is to develop a classifier for short-term relapse mortality prediction of cardiovascular disease patient based on prognosis data which is supported by KAMIR(Korea Acute Myocardial Infarction). Through this study, we came to a conclusion that ANN is the most suitable method for predicting the short-term relapse mortality of patients who have ST-segment elevation myocardial infarction. Also, data set obtained by logistic regression analysis performed highly efficient performance than existing data set. So, it is expect to contribute to prognosis estimation through proper classification of high-risk patients.

Development of a Gangwon Province Forest Fire Prediction Model using Machine Learning and Sampling (머신러닝과 샘플링을 이용한 강원도 지역 산불발생예측모형 개발)

  • Chae, Kyoung-jae;Lee, Yu-Ri;cho, yong-ju;Park, Ji-Hyun
    • The Journal of Bigdata
    • /
    • v.3 no.2
    • /
    • pp.71-78
    • /
    • 2018
  • The study is based on machine learning techniques to increase the accuracy of the forest fire predictive model. It used 14 years of data from 2003 to 2016 in Gang-won-do where forest fire were the most frequent. To reduce weather data errors, Gang-won-do was divided into nine areas and weather data from each region was used. However, dividing the forest fire forecast model into nine zones would make a large difference between the date of occurrence and the date of not occurring. Imbalance issues can degrade model performance. To address this, several sampling methods were applied. To increase the accuracy of the model, five indices in the Canadian Frost Fire Weather Index (FWI) were used as derived variable. The modeling method used statistical methods for logistic regression and machine learning methods for random forest and xgboost. The selection criteria for each zone's final model were set in consideration of accuracy, sensitivity and specificity, and the prediction of the nine zones resulted in 80 of the 104 fires that occurred, and 7426 of the 9758 non-fires. Overall accuracy was 76.1%.