• 제목/요약/키워드: Automatic validation

검색결과 184건 처리시간 0.023초

An Ensemble Approach to Detect Fake News Spreaders on Twitter

  • Sarwar, Muhammad Nabeel;UlAmin, Riaz;Jabeen, Sidra
    • International Journal of Computer Science & Network Security
    • /
    • 제22권5호
    • /
    • pp.294-302
    • /
    • 2022
  • Detection of fake news is a complex and a challenging task. Generation of fake news is very hard to stop, only steps to control its circulation may help in minimizing its impacts. Humans tend to believe in misleading false information. Researcher started with social media sites to categorize in terms of real or fake news. False information misleads any individual or an organization that may cause of big failure and any financial loss. Automatic system for detection of false information circulating on social media is an emerging area of research. It is gaining attention of both industry and academia since US presidential elections 2016. Fake news has negative and severe effects on individuals and organizations elongating its hostile effects on the society. Prediction of fake news in timely manner is important. This research focuses on detection of fake news spreaders. In this context, overall, 6 models are developed during this research, trained and tested with dataset of PAN 2020. Four approaches N-gram based; user statistics-based models are trained with different values of hyper parameters. Extensive grid search with cross validation is applied in each machine learning model. In N-gram based models, out of numerous machine learning models this research focused on better results yielding algorithms, assessed by deep reading of state-of-the-art related work in the field. For better accuracy, author aimed at developing models using Random Forest, Logistic Regression, SVM, and XGBoost. All four machine learning algorithms were trained with cross validated grid search hyper parameters. Advantages of this research over previous work is user statistics-based model and then ensemble learning model. Which were designed in a way to help classifying Twitter users as fake news spreader or not with highest reliability. User statistical model used 17 features, on the basis of which it categorized a Twitter user as malicious. New dataset based on predictions of machine learning models was constructed. And then Three techniques of simple mean, logistic regression and random forest in combination with ensemble model is applied. Logistic regression combined in ensemble model gave best training and testing results, achieving an accuracy of 72%.

Is Text Mining on Trade Claim Studies Applicable? Focused on Chinese Cases of Arbitration and Litigation Applying the CISG

  • Yu, Cheon;Choi, DongOh;Hwang, Yun-Seop
    • Journal of Korea Trade
    • /
    • 제24권8호
    • /
    • pp.171-188
    • /
    • 2020
  • Purpose - This is an exploratory study that aims to apply text mining techniques, which computationally extracts words from the large-scale text data, to legal documents to quantify trade claim contents and enables statistical analysis. Design/methodology - This is designed to verify the validity of the application of text mining techniques as a quantitative methodology for trade claim studies, that have relied mainly on a qualitative approach. The subjects are 81 cases of arbitration and court judgments from China published on the website of the UNCITRAL where the CISG was applied. Validation is performed by comparing the manually analyzed result with the automatically analyzed result. The manual analysis result is the cluster analysis wherein the researcher reads and codes the case. The automatic analysis result is an analysis applying text mining techniques to the result of the cluster analysis. Topic modeling and semantic network analysis are applied for the statistical approach. Findings - Results show that the results of cluster analysis and text mining results are consistent with each other and the internal validity is confirmed. And the degree centrality of words that play a key role in the topic is high as the between centrality of words that are useful for grasping the topic and the eigenvector centrality of the important words in the topic is high. This indicates that text mining techniques can be applied to research on content analysis of trade claims for statistical analysis. Originality/value - Firstly, the validity of the text mining technique in the study of trade claim cases is confirmed. Prior studies on trade claims have relied on traditional approach. Secondly, this study has an originality in that it is an attempt to quantitatively study the trade claim cases, whereas prior trade claim cases were mainly studied via qualitative methods. Lastly, this study shows that the use of the text mining can lower the barrier for acquiring information from a large amount of digitalized text.

Three-Dimensional Evaluation of Skeletal Stability following Surgery-First Orthognathic Approach: Validation of a Simple and Effective Method

  • Nabil M. Mansour;Mohamed E. Abdelshaheed;Ahmed H. El-Sabbagh;Ahmed M. Bahaa El-Din;Young Chul Kim;Jong-Woo Choi
    • Archives of Plastic Surgery
    • /
    • 제50권3호
    • /
    • pp.254-263
    • /
    • 2023
  • Background The three-dimensional (3D) evaluation of skeletal stability after orthognathic surgery is a time-consuming and complex procedure. The complexity increases further when evaluating the surgery-first orthognathic approach (SFOA). Herein, we propose and validate a simple time-saving method of 3D analysis using a single software, demonstrating high accuracy and repeatability. Methods This retrospective cohort study included 12 patients with skeletal class 3 malocclusion who underwent bimaxillary surgery without any presurgical orthodontics. Computed tomography (CT)/cone-beam CT images of each patient were obtained at three different time points (preoperation [T0], immediately postoperation [T1], and 1 year after surgery [T2]) and reconstructed into 3D images. After automatic surface-based alignment of the three models based on the anterior cranial base, five easily located anatomical landmarks were defined to each model. A set of angular and linear measurements were automatically calculated and used to define the amount of movement (T1-T0) and the amount of relapse (T2-T1). To evaluate the reproducibility, two independent observers processed all the cases, One of them repeated the steps after 2 weeks to assess intraobserver variability. Intraclass correlation coefficients (ICCs) were calculated at a 95% confidence interval. Time required for evaluating each case was recorded. Results Both the intra- and interobserver variability showed high ICC values (more than 0.95) with low measurement variations (mean linear variations: 0.18 mm; mean angular variations: 0.25 degree). Time needed for the evaluation process ranged from 3 to 5 minutes. Conclusion This approach is time-saving, semiautomatic, and easy to learn and can be used to effectively evaluate stability after SFOA.

Gradient Boosting을 이용한 가축분뇨 인계관리시스템 인계서 자동 검증 (Automated Verification of Livestock Manure Transfer Management System Handover Document using Gradient Boosting)

  • 황종휘;김화경;류재학;김태호;신용태
    • 한국IT서비스학회지
    • /
    • 제22권4호
    • /
    • pp.97-110
    • /
    • 2023
  • In this study, we propose a technique to automatically generate transfer documents using sensor data from livestock manure transfer systems. The research involves analyzing sensor data and applying machine learning techniques to derive optimized outcomes for livestock manure transfer documents. By comparing and contrasting with existing documents, we present a method for automatic document generation. Specifically, we propose the utilization of Gradient Boosting, a machine learning algorithm. The objective of this research is to enhance the efficiency of livestock manure and liquid byproduct management. Currently, stakeholders including producers, transporters, and processors manually input data into the livestock manure transfer management system during the disposal of manure and liquid byproducts. This manual process consumes additional labor, leads to data inconsistency, and complicates the management of distribution and treatment. Therefore, the aim of this study is to leverage data to automatically generate transfer documents, thereby increasing the efficiency of livestock manure and liquid byproduct management. By utilizing sensor data from livestock manure and liquid byproduct transport vehicles and employing machine learning algorithms, we establish a system that automates the validation of transfer documents, reducing the burden on producers, transporters, and processors. This efficient management system is anticipated to create a transparent environment for the distribution and treatment of livestock manure and liquid byproducts.

R 피크 검출 정확도를 개선한 홀터 심전도 모니터의 개발 (Development of Holter ECG Monitor with Improved ECG R-peak Detection Accuracy)

  • 최정현;강민호;박준호;권기구;배태욱;박준모
    • 융합신호처리학회논문지
    • /
    • 제23권2호
    • /
    • pp.62-69
    • /
    • 2022
  • 의료현장에서는 최근 디지털 헬스케어의 중요성이 대두되면서, 다양한 형태의 생체신호 측정 관련 연구가 활발히 진행되고 있다. 생체신호 중 가장 중요한 신호로 심전도를 들 수 있으며, 특히 부정맥 환자에 있어 심전도 신호의 연속 모니터링은 매우 중요하다. 부정맥은 동결절(sinus node), 동빈맥(sinus tachycardia), 심방조기수축(atrial premature beat, APB), 심실세동 (ventricular fibrillation) 등으로 그 발병원에 따른 형태가 다양하며, 발병 이후의 예후가 좋지 않으므로 일상 중 연속 모니터링은 부정맥의 조기 진단과 치료방향 설정에서 매우 중요하다. 부정맥 환자의 심전도 신호는 매우 불안정하며, 부정맥을 자동 검출하기 위한 주요 특징점으로 작용하는 정확한 R-peak 포인트의 검출이 어렵다. 본 연구에서는 연속 측정하는 홀터 심전도 모니터링 기기와 분석용 소프트웨어를 개발하였으며, 부정맥 데이터베이스를 통해 심전도 신호의 R-peak 효용성을 확인하였다. 향후 연구에서는 다양한 발병원인으로 인한 부정맥의 형태적 구분 및 예측을 위한 알고리즘과 임상 데이터에 근거한 유효성 검증에 관한 추가 연구가 필요하다.

Genome Analysis and Optimization of Caproic Acid Production of Clostridium butyricum GD1-1 Isolated from the Pit Mud of Nongxiangxing Baijiu

  • Min Li;Tao Li;Jia Zheng;Zongwei Qiao;Kaizheng Zhang;Huibo Luo;Wei Zou
    • Journal of Microbiology and Biotechnology
    • /
    • 제33권10호
    • /
    • pp.1337-1350
    • /
    • 2023
  • Caproic acid is a precursor substance for the synthesis of ethyl caproate, the main flavor substance of nongxiangxing baijiu liquor. In this study, Clostridium butyricum GD1-1, a strain with high caproic acid concentration (3.86 g/l), was isolated from the storage pit mud of nongxiangxing baijiu for sequencing and analysis. The strain's genome was 3,840,048 bp in length with 4,050 open reading frames. In addition, virulence factor annotation analysis showed C. butyricum GD1-1 to be safe at the genetic level. However, the annotation results using the Kyoto Encyclopedia of Genes and Genomes Automatic Annotation Server predicted a deficiency in the strain's synthesis of alanine, methionine, and biotin. These results were confirmed by essential nutrient factor validation experiments. Furthermore, the optimized medium conditions for caproic acid concentration by strain GD1-1 were (g/l): glucose 30, NaCl 5, yeast extract 10, peptone 10, beef paste 10, sodium acetate 11, L-cysteine 0.6, biotin 0.004, starch 2, and 2.0% ethanol. The optimized fermentation conditions for caproic acid production by C. butyricum GD1-1 on a single-factor basis were: 5% inoculum volume, 35℃, pH 7, and 90% loading volume. Under optimal conditions, the caproic acid concentration of strain GD1-1 reached 5.42 g/l, which was 1.40 times higher than the initial concentration. C. butyricum GD1-1 could be further used in caproic acid production, NXXB pit mud strengthening and maintenance, and artificial pit mud preparation.

A Deep Learning Approach for Covid-19 Detection in Chest X-Rays

  • Sk. Shalauddin Kabir;Syed Galib;Hazrat Ali;Fee Faysal Ahmed;Mohammad Farhad Bulbul
    • International Journal of Computer Science & Network Security
    • /
    • 제24권3호
    • /
    • pp.125-134
    • /
    • 2024
  • The novel coronavirus 2019 is called COVID-19 has outspread swiftly worldwide. An early diagnosis is more important to control its quick spread. Medical imaging mechanics, chest calculated tomography or chest X-ray, are playing a vital character in the identification and testing of COVID-19 in this present epidemic. Chest X-ray is cost effective method for Covid-19 detection however the manual process of x-ray analysis is time consuming given that the number of infected individuals keep growing rapidly. For this reason, it is very important to develop an automated COVID-19 detection process to control this pandemic. In this study, we address the task of automatic detection of Covid-19 by using a popular deep learning model namely the VGG19 model. We used 1300 healthy and 1300 confirmed COVID-19 chest X-ray images in this experiment. We performed three experiments by freezing different blocks and layers of VGG19 and finally, we used a machine learning classifier SVM for detecting COVID-19. In every experiment, we used a five-fold cross-validation method to train and validated the model and finally achieved 98.1% overall classification accuracy. Experimental results show that our proposed method using the deep learning-based VGG19 model can be used as a tool to aid radiologists and play a crucial role in the timely diagnosis of Covid-19.

Clinically Available Software for Automatic Brain Volumetry: Comparisons of Volume Measurements and Validation of Intermethod Reliability

  • Ji Young Lee;Se Won Oh;Mi Sun Chung;Ji Eun Park;Yeonsil Moon;Hong Jun Jeon;Won-Jin Moon
    • Korean Journal of Radiology
    • /
    • 제22권3호
    • /
    • pp.405-414
    • /
    • 2021
  • Objective: To compare two clinically available MR volumetry software, NeuroQuant® (NQ) and Inbrain® (IB), and examine the inter-method reliabilities and differences between them. Materials and Methods: This study included 172 subjects (age range, 55-88 years; mean age, 71.2 years), comprising 45 normal healthy subjects, 85 patients with mild cognitive impairment, and 42 patients with Alzheimer's disease. Magnetic resonance imaging scans were analyzed with IB and NQ. Mean differences were compared with the paired t test. Inter-method reliability was evaluated with Pearson's correlation coefficients and intraclass correlation coefficients (ICCs). Effect sizes were also obtained to document the standardized mean differences. Results: The paired t test showed significant volume differences in most regions except for the amygdala between the two methods. Nevertheless, inter-method measurements between IB and NQ showed good to excellent reliability (0.72 < r < 0.96, 0.83 < ICC < 0.98) except for the pallidum, which showed poor reliability (left: r = 0.03, ICC = 0.06; right: r = -0.05, ICC = -0.09). For the measurements of effect size, volume differences were large in most regions (0.05 < r < 6.15). The effect size was the largest in the pallidum and smallest in the cerebellum. Conclusion: Comparisons between IB and NQ showed significantly different volume measurements with large effect sizes. However, they showed good to excellent inter-method reliability in volumetric measurements for all brain regions, with the exception of the pallidum. Clinicians using these commercial software should take into consideration that different volume measurements could be obtained depending on the software used.

산악기상자료와 목재평형함수율에 기반한 산림연료습도 추정식 개발 (Modeling and mapping fuel moisture content using equilibrium moisture content computed from weather data of the automatic mountain meteorology observation system (AMOS))

  • 이훈택;원명수;윤석희;장근창
    • 한국지리정보학회지
    • /
    • 제22권3호
    • /
    • pp.21-36
    • /
    • 2019
  • 본 연구는 산불 위험 예측의 주요 인자인 10시간 사연료습도(10-h FMC)를 산악기상관측망 기상자료로 추정하는 방법을 마련하기 위해 수행되었다. 안성(도심지)과 홍릉 두 지점(숲 속, 숲 밖)의 자동기상관측소에서 기상인자와 10-h FMC를 측정하고 이를 이용해 10-h FMC 추정식을 도출했다. 도출한 추정식을 이용해 지난 6년간(2013~2018년) 산불발생 다발일의 10-h FMC를 분석하고 전국 10-h FMC 지도를 제작했다. 기상인자(기온, 풍속, 목재평형함수율, 강우량)와 10-h FMC의 회귀분석 결과 목재평형함수율이 가장 효율적으로 10-h FMC를 설명했음을 확인했다. 목재평형함수율을 이용해 도출한 10-h FMC 추정식은 모형 적합과 검증과정 모두에서 높은 적합도를 보였다. 각 연구지의 추정식을 서로 다른 연구지에 적용하면 모형의 적합도가 같은 연구지에서 만든 식을 적용했을 때보다 줄어들었지만 여전히 만족할 만한 결과를 보였다. 본 연구의 회귀식은 10-h FMC와 목재평형함수율 사이 강우 후 건조반응 차이와 식생 유무가 10-h FMC에 미치는 영향을 반영하지 못해 적합도가 줄어든 것으로 나타났다. 마지막으로 도출한 추정식을 사용한 공간분석을 통해 지난 6년간 산불발생 다발일의 산불 중 70% 이상이 10.5% 이하의 10-h FMC 조건에서 발생했음을 확인했다. 본 연구 결과는 산악기상관측망과 연계하여 전국 산지의 10-h FMC를 추정하는 데 사용할 수 있다. 10-h FMC는 산불 위험 예측 기초 연구 자료로 활용되어 재해 관련 국가 정책 결정에 기여할 것으로 판단된다.

MODIS 지표면 온도 자료와 지구통계기법을 이용한 지상 기온 추정 (Estimation of Near Surface Air Temperature Using MODIS Land Surface Temperature Data and Geostatistics)

  • 신휴석;장은미;홍성욱
    • Spatial Information Research
    • /
    • 제22권1호
    • /
    • pp.55-63
    • /
    • 2014
  • 수문학, 기상학 및 기후학 등에서 필수적인 자료중의 하나인 지상기온 자료는 최근 보건, 생물, 환경 등의 다양한 분야로까지 활용영역이 확대되고 있어 그 중요성이 커지고 있으나 지상관측을 통한 지상기온자료의 취득은 시공간적인 제약이 크기 때문에 실측된 기온자료는 시공간 해상도가 낮아 높은 해상도가 요구되는 연구 분야에서는 활용성에 큰 제약을 갖게 된다. 이를 극복하기 위한 하나의 대안으로 상대적으로 높은 시공간 해상도를 가지고 있는 위성영상자료에서 얻을 수 있는 지표면온도 자료를 이용하여 지상기온을 추정하는 많은 연구들이 수행되어 왔다. 본 연구는 이러한 연구의 일환으로써 기상청에서 제공하고 있는 AWS(Automatic Weather Station)에서 취득된 2010년 지상 온도 자료(AWS data)를 바탕으로 대표적인 지표면 온도자료인 MODIS Land Surface temperature(LST data:MOD11A1)와 지상기온에 영향을 미칠 수 있는 Land Cover Data, DEM(digital elevation model) 등의 보조 자료와 함께 다양한 지구통계 기법들을 이용하여 남한 지역의 지상기온을 추정하였다. 추정 전 2010년 전체(365일) LST자료와 AWS자료와의 차이에 대한 RMSE(Root Mean Square Error)값의 계절별 피복별 분석결과 계절에 따른 RMSE값의 변동계수는 0.86으로 나타났으나 피복에 따른 변동계수는 0.00746으로 나타나 계절별 차이가 피복별 차이보다 큰 것으로 분석 되었다. 계절별 RMSE 값은 겨울철이 가장 낮은 것으로 나타났으며 AWS자료와 LST자료와 보조자료를 이용한 선형 회귀분석결과에서도 겨울철의 결정 계수가 가장 높은 0.818로 나타났으며, 여름철의 경우에는 0.078로 나타나 계절별 차이가 매우 크게 나타났다. 이러한 결과를 바탕으로 지구통계 기법들의 대표적인 방법론인 크리깅 방법 중 일반적으로 많이 사용되고 있는 정규 크리깅, 일반 크리깅, 공동 크리킹, 회귀 크리깅을 이용하여 지상기온을 추정한 후 모델의 정확도를 판단할 수 있는 교차 검증을 실시한 결과 정규 크리깅과 일반 크리깅에 의한 RMSE 값은 1.71, 공동 크리깅과 회귀 크리깅에 의한 RMSE 값은 각각 1.848, 1.63으로 나타나 회귀 크리깅 방법에 의한 추정의 정확도가 가장 높은 것으로 분석되었다.