• Title/Summary/Keyword: Standard Dataset

Search Result 195, Processing Time 0.026 seconds

A study on the development of an automatic detection algorithm for trees suspected of being damaged by forest pests (산림병해충 피해의심목 자동탐지 알고리즘 개발 연구)

  • Hoo-Dong, LEE;Seong-Hee, LEE;Young-Jin, LEE
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.25 no.4
    • /
    • pp.151-162
    • /
    • 2022
  • Recently, the forests in Korea have accumulated damage due to continuous forest disasters, and the need for technologies to monitor forest managements is being issued. The size of the affected area is large terrain, technologies using drones, artificial intelligence, and big data are being studied. In this study, a standard dataset were conducted to develop an algorithm that automatically detects suspicious trees damaged by forest pests using deep learning and drones. Experiments using the YOLO model among object detection algorithm models, the YOLOv4-P7 model showed the highest recall rate of 69.69% and precision of 69.15%. It was confirmed that YOLOv4-P7 should be used as an automatic detection algorithm model for trees suspected of being damaged by forest pests, considering the detection target is an ortho-image with a large image size.

Comparison of Feature Selection Methods Applied on Risk Prediction for Hypertension (고혈압 위험 예측에 적용된 특징 선택 방법의 비교)

  • Khongorzul, Dashdondov;Kim, Mi-Hye
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.107-114
    • /
    • 2022
  • In this paper, we have enhanced the risk prediction of hypertension using the feature selection method in the Korean National Health and Nutrition Examination Survey (KNHANES) database of the Korea Centers for Disease Control and Prevention. The study identified various risk factors correlated with chronic hypertension. The paper is divided into three parts. Initially, the data preprocessing step of removes missing values, and performed z-transformation. The following is the feature selection (FS) step that used a factor analysis (FA) based on the feature selection method in the dataset, and feature importance (FI) and multicollinearity analysis (MC) were compared based on FS. Finally, in the predictive analysis stage, it was applied to detect and predict the risk of hypertension. In this study, we compare the accuracy, f-score, area under the ROC curve (AUC), and mean standard error (MSE) for each model of classification. As a result of the test, the proposed MC-FA-RF model achieved the highest accuracy of 80.12%, MSE of 0.106, f-score of 83.49%, and AUC of 85.96%, respectively. These results demonstrate that the proposed MC-FA-RF method for hypertension risk predictions is outperformed other methods.

Construction of Web-Based Medical Imgage Standard Dataset Conversion and Management System (웹기반 의료영상 표준 데이터셋 변환 및 관리 시스템 구축)

  • Kim, Ji-Eon;Lim, Dong Wook;Yu, Yeong Ju;Noh, Si-Hyeong;Lee, ChungSub;Kim, Tae-Hoon;Jeong, Chang-Won
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.282-284
    • /
    • 2021
  • 최근 4차 산업혁명으로 의료빅데이터 기반으로 한 AI 기술이 급속도로 발전하고 있다. 특히, 의료영상을 기반으로 병변을 탐색, 분활 및 정량화 그리고 자동진단 및 예측 관련된 기술이 AI 제품으로 출시되고 있다. AI 기술개발은 많은 학습데이터가 요구되며, 임상검증에 단일기관에서 2개 이상 기관의 검증이 요구되고 있다. 그러나 아직까지도 단일기관에서 학습용 데이터와 테스트, 검증용 데이터를 달리하여 기술개발에 활용하고 있다. 본 논문은 AI 기술개발에 필요한 영상데이터에 대한 표준화된 데이터셋 변환 및 관리를 위한 시스템에 대해 기술한다. 다기관 데이터를 수집하기 위해서는 각 기관의 의료영상 데이터 수집 및 저장하는 기준이 명확하지 않아 표준화 작업이 필요하다. 제안한 시스템은 기관 또는 다기관 연구 그룹의 의료영상데이터를 표준화하여 저장할 수 있을 뿐만 아니라 의료영상 뷰어 및 의료영상 리스트를 통해 연구자가 원하는 의료영상 데이터 셋을 검색하여 다양한 데이터셋으로 제공할 수 있기 때문에 수집 및 변환 그리고 관리까지 지원할 수 있는 시스템으로 영상기반의 머신러닝 연구에 활력을 불어넣을 수 있을 것으로 기대하고 있다.

Anatomy of Sentiment Analysis of Tweets Using Machine Learning Approach

  • Misbah Iram;Saif Ur Rehman;Shafaq Shahid;Sayeda Ambreen Mehmood
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.97-106
    • /
    • 2023
  • Sentiment analysis using social network platforms such as Twitter has achieved tremendous results. Twitter is an online social networking site that contains a rich amount of data. The platform is known as an information channel corresponding to different sites and categories. Tweets are most often publicly accessible with very few limitations and security options available. Twitter also has powerful tools to enhance the utility of Twitter and a powerful search system to make publicly accessible the recently posted tweets by keyword. As popular social media, Twitter has the potential for interconnectivity of information, reviews, updates, and all of which is important to engage the targeted population. In this work, numerous methods that perform a classification of tweet sentiment in Twitter is discussed. There has been a lot of work in the field of sentiment analysis of Twitter data. This study provides a comprehensive analysis of the most standard and widely applicable techniques for opinion mining that are based on machine learning and lexicon-based along with their metrics. The proposed work is helpful to analyze the information in the tweets where opinions are highly unstructured, heterogeneous, and polarized positive, negative or neutral. In order to validate the performance of the proposed framework, an extensive series of experiments has been performed on the real world twitter dataset that alter to show the effectiveness of the proposed framework. This research effort also highlighted the recent challenges in the field of sentiment analysis along with the future scope of the proposed work.

Association between exposure to particulate matter and school absences in Korean asthmatic adolescents

  • Seongmin Jo;Kiook Baek;Joon Sakong;Chulyong Park
    • Annals of Occupational and Environmental Medicine
    • /
    • v.34
    • /
    • pp.21.1-21.13
    • /
    • 2022
  • Background: Because particulate matter (PM) and asthma are closely related, the prevalence of school absence among adolescents with asthma can be affected by the concentration of PM. We aimed to investigate the relationship between school absences due to asthma and the total number of days that the PM concentration exceeded the standard. Methods: We used the data from the 16th Korea Youth Risk Behavior Survey and the PM levels of 17 metropolitan cities and provinces gathered from the AirKorea. Information on the characteristics of asthmatic adolescents and the prevalence of school absence was obtained using a questionnaire, while the PM levels based on the total number of days with poor and very poor PM grades were collected from the AirKorea website. Both χ2 test and logistic regression analysis were performed using the weights presented in the original dataset. Results: In the case of particulate matter of 10 microns in diameter or smaller (PM10), the odds ratio (OR) after adjusting for confounders (sex, school year, body mass index, smoking history, diagnosis of allergic rhinitis, diagnosis of atopic dermatitis and city size) was 1.07 (95% confidence interval [CI]: 1.01-1.13) for absents due to asthma when the total days of poor and very poor grades of PM10 (81 ㎍/m3 or higher) increased by 1 day. In the analysis of particulate matter of 2.5 microns in diameter or smaller (PM2.5), the OR after adjusting for confounders was 1.01 (95% CI: 1.00-1.03) for absents due to asthma when the total number of days with poor and very poor PM2.5 grades (36 ㎍/m3 or higher) increased by 1 day. Conclusions: A significant association was observed between the total number of days of poor and very poor PM10 and PM2.5 grades and school absence due to asthma; PM can cause asthma exacerbation and affect the academic life.

Prediction of Shelf-life for 81mm Mortar High Explosive Ammunition Using Multiple Regression Model (다중 회귀 모델을 활용한 81mm 박격포 고폭탄 저장수명 예측)

  • Young-Jin Jung;Ji-Soo Hong;Kang-Young Lee;Sung-Woo Kang
    • Journal of the Korea Safety Management & Science
    • /
    • v.26 no.3
    • /
    • pp.1-9
    • /
    • 2024
  • This study aims to develop a regression model using data from the Ammunition Stockpile Reliability Program (ASRP) to predict the shelf life of 81mm mortar high-explosive shells. Ammunition is a single-use item that is discarded after use, and its quality is managed through sampling inspections. In particular, shelf life is closely related to the performance of the propellant. This research seeks to predict the shelf life of ammunition using a regression model. The experiment was conducted using 107 ASRP data points. The dependent variable was 'Storage Period', while the independent variables were 'Mean Ammunition Velocity,' 'Standard Deviation of Mean Ammunition Velocity,' and 'Stabilizer'. The explanatory power of the regression model was an R-squared value of 0.662. The results indicated that it takes approximately 55 years for the storage grade to change from A to C and about 62 years to change from C to D. The proposed model enhances the reliability of ammunition management, prevents unnecessary disposal, and contributes to the efficient use of defense resources. However, the model's explanatory power is somewhat limited due to the small dataset. Future research is expected to improve the model with additional data collection. Expanding the research to other types of ammunition may further aid in improving the military's ammunition management system.

A Folksonomy Ranking Framework: A Semantic Graph-based Approach (폭소노미 사이트를 위한 랭킹 프레임워크 설계: 시맨틱 그래프기반 접근)

  • Park, Hyun-Jung;Rho, Sang-Kyu
    • Asia pacific journal of information systems
    • /
    • v.21 no.2
    • /
    • pp.89-116
    • /
    • 2011
  • In collaborative tagging systems such as Delicious.com and Flickr.com, users assign keywords or tags to their uploaded resources, such as bookmarks and pictures, for their future use or sharing purposes. The collection of resources and tags generated by a user is called a personomy, and the collection of all personomies constitutes the folksonomy. The most significant need of the folksonomy users Is to efficiently find useful resources or experts on specific topics. An excellent ranking algorithm would assign higher ranking to more useful resources or experts. What resources are considered useful In a folksonomic system? Does a standard superior to frequency or freshness exist? The resource recommended by more users with mere expertise should be worthy of attention. This ranking paradigm can be implemented through a graph-based ranking algorithm. Two well-known representatives of such a paradigm are Page Rank by Google and HITS(Hypertext Induced Topic Selection) by Kleinberg. Both Page Rank and HITS assign a higher evaluation score to pages linked to more higher-scored pages. HITS differs from PageRank in that it utilizes two kinds of scores: authority and hub scores. The ranking objects of these pages are limited to Web pages, whereas the ranking objects of a folksonomic system are somewhat heterogeneous(i.e., users, resources, and tags). Therefore, uniform application of the voting notion of PageRank and HITS based on the links to a folksonomy would be unreasonable, In a folksonomic system, each link corresponding to a property can have an opposite direction, depending on whether the property is an active or a passive voice. The current research stems from the Idea that a graph-based ranking algorithm could be applied to the folksonomic system using the concept of mutual Interactions between entitles, rather than the voting notion of PageRank or HITS. The concept of mutual interactions, proposed for ranking the Semantic Web resources, enables the calculation of importance scores of various resources unaffected by link directions. The weights of a property representing the mutual interaction between classes are assigned depending on the relative significance of the property to the resource importance of each class. This class-oriented approach is based on the fact that, in the Semantic Web, there are many heterogeneous classes; thus, applying a different appraisal standard for each class is more reasonable. This is similar to the evaluation method of humans, where different items are assigned specific weights, which are then summed up to determine the weighted average. We can check for missing properties more easily with this approach than with other predicate-oriented approaches. A user of a tagging system usually assigns more than one tags to the same resource, and there can be more than one tags with the same subjectivity and objectivity. In the case that many users assign similar tags to the same resource, grading the users differently depending on the assignment order becomes necessary. This idea comes from the studies in psychology wherein expertise involves the ability to select the most relevant information for achieving a goal. An expert should be someone who not only has a large collection of documents annotated with a particular tag, but also tends to add documents of high quality to his/her collections. Such documents are identified by the number, as well as the expertise, of users who have the same documents in their collections. In other words, there is a relationship of mutual reinforcement between the expertise of a user and the quality of a document. In addition, there is a need to rank entities related more closely to a certain entity. Considering the property of social media that ensures the popularity of a topic is temporary, recent data should have more weight than old data. We propose a comprehensive folksonomy ranking framework in which all these considerations are dealt with and that can be easily customized to each folksonomy site for ranking purposes. To examine the validity of our ranking algorithm and show the mechanism of adjusting property, time, and expertise weights, we first use a dataset designed for analyzing the effect of each ranking factor independently. We then show the ranking results of a real folksonomy site, with the ranking factors combined. Because the ground truth of a given dataset is not known when it comes to ranking, we inject simulated data whose ranking results can be predicted into the real dataset and compare the ranking results of our algorithm with that of a previous HITS-based algorithm. Our semantic ranking algorithm based on the concept of mutual interaction seems to be preferable to the HITS-based algorithm as a flexible folksonomy ranking framework. Some concrete points of difference are as follows. First, with the time concept applied to the property weights, our algorithm shows superior performance in lowering the scores of older data and raising the scores of newer data. Second, applying the time concept to the expertise weights, as well as to the property weights, our algorithm controls the conflicting influence of expertise weights and enhances overall consistency of time-valued ranking. The expertise weights of the previous study can act as an obstacle to the time-valued ranking because the number of followers increases as time goes on. Third, many new properties and classes can be included in our framework. The previous HITS-based algorithm, based on the voting notion, loses ground in the situation where the domain consists of more than two classes, or where other important properties, such as "sent through twitter" or "registered as a friend," are added to the domain. Forth, there is a big difference in the calculation time and memory use between the two kinds of algorithms. While the matrix multiplication of two matrices, has to be executed twice for the previous HITS-based algorithm, this is unnecessary with our algorithm. In our ranking framework, various folksonomy ranking policies can be expressed with the ranking factors combined and our approach can work, even if the folksonomy site is not implemented with Semantic Web languages. Above all, the time weight proposed in this paper will be applicable to various domains, including social media, where time value is considered important.

Prediction of Shear Wave Velocity on Sand Using Standard Penetration Test Results : Application of Artificial Neural Network Model (표준관입시험결과를 이용한 사질토 지반의 전단파속도 예측 : 인공신경망 모델의 적용)

  • Kim, Bum-Joo;Ho, Joon-Ki;Hwang, Young-Cheol
    • Journal of the Korean Geotechnical Society
    • /
    • v.30 no.5
    • /
    • pp.47-54
    • /
    • 2014
  • Although shear wave velocity ($V_s$) is an important design factor in seismic design, the measurement is not usually made in typical field investigation due to time and economic limitations. In the present study, an investigation was made to predict sand $V_s$ based on the standard penetration test (SPT) results by using artificial neural network (ANN) model. A total of 650 dataset composed of SPT-N value ($N_{60}$), water content, fine content, specific gravity for input data and $V_s$ for output data was used to build and train the ANN model. The sensitivity analysis was then performed for the trained ANN to examine the effect of the input variables on the $V_s$. Also, the ANN model was compared with seven existing empirical models on the performance. The sensitivity analysis results revealed that the effect of the SPT-N value on $V_s$ is significantly greater compared to other input variables. Also, when compared with the empirical models using Nash-Sutcliffe Model Efficiency Coefficient (NSE) and Root Mean Square Error (RMSE), the ANN model was found to exhibit the highest prediction capability.

Construction of Artificial Intelligence Training Platform for Multi-Center Clinical Research (다기관 임상연구를 위한 인공지능 학습 플랫폼 구축)

  • Lee, Chung-Sub;Kim, Ji-Eon;No, Si-Hyeong;Kim, Tae-Hoon;Yoon, Kwon-Ha;Jeong, Chang-Won
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.10
    • /
    • pp.239-246
    • /
    • 2020
  • In the medical field where artificial intelligence technology is introduced, research related to clinical decision support system(CDSS) in relation to diagnosis and prediction is actively being conducted. In particular, medical imaging-based disease diagnosis area applied AI technologies at various products. However, medical imaging data consists of inconsistent data, and it is a reality that it takes considerable time to prepare and use it for research. This paper describes a one-stop AI learning platform for converting to medical image standard R_CDM(Radiology Common Data Model) and supporting AI algorithm development research based on the dataset. To this, the focus is on linking with the existing CDM(common data model) and model the system, including the schema of the medical imaging standard model and report information for multi-center research based on DICOM(Digital Imaging and Communications in Medicine) tag information. And also, we show the execution results based on generated datasets through the AI learning platform. As a proposed platform, it is expected to be used for various image-based artificial intelligence researches.

Mixed dentition analysis using a multivariate approach (다변량 기법을 이용한 혼합치열기 분석법)

  • Seo, Seung-Hyun;An, Hong-Seok;Lee, Shin-Jae;Lim, Won Hee;Kim, Bong-Rae
    • The korean journal of orthodontics
    • /
    • v.39 no.2
    • /
    • pp.112-119
    • /
    • 2009
  • Objective: To develop a mixed dentition analysis method in consideration of the normal variation of tooth sizes. Methods: According to the tooth-size of the maxillary central incisor, maxillary 1st molar, mandibular central incisor, mandibular lateral incisor, and mandibular 1st molar, 307 normal occlusion subjects were clustered into the smaller and larger tooth-size groups. Multiple regression analyses were then performed to predict the sizes of the canine and premolars for the 2 groups and both genders separately. For a cross validation dataset, 504 malocclusion patients were assigned into the 2 groups. Then multiple regression equations were applied. Results: Our results show that the maximum errors of the predicted space for the canine, 1st and 2nd premolars were 0.71 and 0.82 mm residual standard deviation for the normal occlusion and malocclusion groups, respectively. For malocclusion patients, the prediction errors did not imply a statistically significant difference depending on the types of malocclusion nor the types of tooth-size groups. The frequency of prediction error more than 1 mm and 2 mm were 17.3% and 1.8%, respectively. The overall prediction accuracy was dramatically improved in this study compared to that of previous studies. Conclusions: The computer aided calculation method used in this study appeared to be more efficient.