• Title/Summary/Keyword: Research dataset

Search Result 1,324, Processing Time 0.023 seconds

Intrusion Detection System Modeling Based on Learning from Network Traffic Data

  • Midzic, Admir;Avdagic, Zikrija;Omanovic, Samir
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.11
    • /
    • pp.5568-5587
    • /
    • 2018
  • This research uses artificial intelligence methods for computer network intrusion detection system modeling. Primary classification is done using self-organized maps (SOM) in two levels, while the secondary classification of ambiguous data is done using Sugeno type Fuzzy Inference System (FIS). FIS is created by using Adaptive Neuro-Fuzzy Inference System (ANFIS). The main challenge for this system was to successfully detect attacks that are either unknown or that are represented by very small percentage of samples in training dataset. Improved algorithm for SOMs in second layer and for the FIS creation is developed for this purpose. Number of clusters in the second SOM layer is optimized by using our improved algorithm to minimize amount of ambiguous data forwarded to FIS. FIS is created using ANFIS that was built on ambiguous training dataset clustered by another SOM (which size is determined dynamically). Proposed hybrid model is created and tested using NSL KDD dataset. For our research, NSL KDD is especially interesting in terms of class distribution (overlapping). Objectives of this research were: to successfully detect intrusions represented in data with small percentage of the total traffic during early detection stages, to successfully deal with overlapping data (separate ambiguous data), to maximize detection rate (DR) and minimize false alarm rate (FAR). Proposed hybrid model with test data achieved acceptable DR value 0.8883 and FAR value 0.2415. The objectives were successfully achieved as it is presented (compared with the similar researches on NSL KDD dataset). Proposed model can be used not only in further research related to this domain, but also in other research areas.

A Study on Record Selection Strategy and Procedure in Dataset for Administrative Information (행정정보 데이터세트 기록의 선별 기준 및 절차 연구)

  • Cho, Eun-Hee;Yim, Jin-Hee
    • The Korean Journal of Archival Studies
    • /
    • no.19
    • /
    • pp.251-291
    • /
    • 2009
  • Due to the trend toward computerization of business services in public sector and the push for e-government, the volume of records that are produced in electronic system and the types of records vary as well. Of those types, dataset is attracting everyone's attention because it is rapidly being supplied. Even though the administrative information system stipulated as an electronic record production system is increasing in number, as it is in blind spot for records management, the system can be superannuated or the records can be lost in case new system is developed. In addition, the system was designed not considering records management, it is managed in an unsatisfactory state because of not meeting the features and quality requirements as records management system. In the advanced countries, they recognized the importance of dataset and then managed the archives for dataset and carried out the project on management systems and a preservation formats for keeping data. Korea also is carrying out the researches on an dataset and individual administrative information systems, but the official scheme has not been established yet. In this study the items for managing archives which should be reflected when the administrative information system is designed was offered in two respects - an identification method and a quality requirement. The major directions for this system are as follows. First, as the dataset is a kind of an electronic record, it is necessary to reflect this factor from the design step prior to production. Second, the system should be established integrating the strategy for records management to the information strategy for the whole organization. In this study, based on such two directions the strategies to establish the identification for dataset in a frame to push e-government were suggested. The problem on the archiving steps including preservation format and the management procedures in dataset archive does not included in the research contents. In line with this, more researches on those contents as well as a variety of researches on dataset are expected to be more actively conducted.

A Crowdsourcing-Based Paraphrased Opinion Spam Dataset and Its Implication on Detection Performance (크라우드소싱 기반 문장재구성 방법을 통한 의견 스팸 데이터셋 구축 및 평가)

  • Lee, Seongwoon;Kim, Seongsoon;Park, Donghyeon;Kang, Jaewoo
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.7
    • /
    • pp.338-343
    • /
    • 2016
  • Today, opinion reviews on the Web are often used as a means of information exchange. As the importance of opinion reviews continues to grow, the number of issues for opinion spam also increases. Even though many research studies on detecting spam reviews have been conducted, some limitations of gold-standard datasets hinder research. Therefore, we introduce a new dataset called "Paraphrased Opinion Spam (POS)" that contains a new type of review spam that imitates truthful reviews. We have noticed that spammers refer to existing truthful reviews to fabricate spam reviews. To create such a seemingly truthful review spam dataset, we asked task participants to paraphrase truthful reviews to create a new deceptive review. The experiment results show that classifying our POS dataset is more difficult than classifying the existing spam datasets since the reviews in our dataset more linguistically look like truthful reviews. Also, training volume has been found to be an important factor for classification model performance.

Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games (데이터마이닝을 활용한 한국프로야구 승패예측모형 수립에 관한 연구)

  • Oh, Younhak;Kim, Han;Yun, Jaesub;Lee, Jong-Seok
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.8-17
    • /
    • 2014
  • In this research, we employed various data mining techniques to build predictive models for win-loss prediction in Korean professional baseball games. The historical data containing information about players and teams was obtained from the official materials that are provided by the KBO website. Using the collected raw data, we additionally prepared two more types of dataset, which are in ratio and binary format respectively. Dividing away-team's records by the records of the corresponding home-team generated the ratio dataset, while the binary dataset was obtained by comparing the record values. We applied seven classification techniques to three (raw, ratio, and binary) datasets. The employed data mining techniques are decision tree, random forest, logistic regression, neural network, support vector machine, linear discriminant analysis, and quadratic discriminant analysis. Among 21(= 3 datasets${\times}$7 techniques) prediction scenarios, the most accurate model was obtained from the random forest technique based on the binary dataset, which prediction accuracy was 84.14%. It was also observed that using the ratio and the binary dataset helped to build better prediction models than using the raw data. From the capability of variable selection in decision tree, random forest, and stepwise logistic regression, we found that annual salary, earned run, strikeout, pitcher's winning percentage, and four balls are important winning factors of a game. This research is distinct from existing studies in that we used three different types of data and various data mining techniques for win-loss prediction in Korean professional baseball games.

Study of Posture Evaluation Method in Chest PA Examination based on Artificial Intelligence (인공지능 기반 흉부 후전방향 검사에서 자세 평가 방법에 관한 연구)

  • Ho Seong Hwang;Yong Seok Choi;Dae Won Lee;Dong Hyun Kim;Ho Chul Kim
    • Journal of Biomedical Engineering Research
    • /
    • v.44 no.3
    • /
    • pp.167-175
    • /
    • 2023
  • Chest PA is the basic examination of radiographic imaging. Moreover, Chest PA's demands are constantly increasing because of the Increase in respiratory diseases. However, it is not meeting the demand due to problems such as a shortage of radiological technologist, sexual shame caused by patient contact, and the spread of infectious diseases. There have been many cases of using artificial intelligence to solve this problem. Therefore, the purpose of this research is to build an artificial intelligence dataset of Chest PA and to find a posture evaluation method. To construct the posture dataset, the posture image is acquired during actual and simulated examination and classified correct and incorrect posture of the patient. And to evaluate the artificial intelligence posture method, a posture estimation algorithm is used to preprocess the dataset and an artificial intelligence classification algorithm is applied. As a result, Chest PA posture dataset is validated with in over 95% accuracy in all artificial intelligence classification and the accuracy is improved through the Top-Down posture estimation algorithm AlphaPose and the classification InceptionV3 algorithm. Based on this, it will be possible to build a non-face-to-face automatic Chest PA examination system using artificial intelligence.

A Comprehensive Analysis of Deformable Image Registration Methods for CT Imaging

  • Kang Houn Lee;Young Nam Kang
    • Journal of Biomedical Engineering Research
    • /
    • v.44 no.5
    • /
    • pp.303-314
    • /
    • 2023
  • This study aimed to assess the practical feasibility of advanced deformable image registration (DIR) algorithms in radiotherapy by employing two distinct datasets. The first dataset included 14 4D lung CT scans and 31 head and neck CT scans. In the 4D lung CT dataset, we employed the DIR algorithm to register organs at risk and tumors based on respiratory phases. The second dataset comprised pre-, mid-, and post-treatment CT images of the head and neck region, along with organ at risk and tumor delineations. These images underwent registration using the DIR algorithm, and Dice similarity coefficients (DSCs) were compared. In the 4D lung CT dataset, registration accuracy was evaluated for the spinal cord, lung, lung nodules, esophagus, and tumors. The average DSCs for the non-learning-based SyN and NiftyReg algorithms were 0.92±0.07 and 0.88±0.09, respectively. Deep learning methods, namely Voxelmorph, Cyclemorph, and Transmorph, achieved average DSCs of 0.90±0.07, 0.91±0.04, and 0.89±0.05, respectively. For the head and neck CT dataset, the average DSCs for SyN and NiftyReg were 0.82±0.04 and 0.79±0.05, respectively, while Voxelmorph, Cyclemorph, and Transmorph showed average DSCs of 0.80±0.08, 0.78±0.11, and 0.78±0.09, respectively. Additionally, the deep learning DIR algorithms demonstrated faster transformation times compared to other models, including commercial and conventional mathematical algorithms (Voxelmorph: 0.36 sec/images, Cyclemorph: 0.3 sec/images, Transmorph: 5.1 sec/images, SyN: 140 sec/images, NiftyReg: 40.2 sec/images). In conclusion, this study highlights the varying clinical applicability of deep learning-based DIR methods in different anatomical regions. While challenges were encountered in head and neck CT registrations, 4D lung CT registrations exhibited favorable results, indicating the potential for clinical implementation. Further research and development in DIR algorithms tailored to specific anatomical regions are warranted to improve the overall clinical utility of these methods.

A GMDH-based estimation model for axial load capacity of GFRP-RC circular columns

  • Mohammed Berradia;El Hadj Meziane;Ali Raza;Mohamed Hechmi El Ouni;Faisal Shabbir
    • Steel and Composite Structures
    • /
    • v.49 no.2
    • /
    • pp.161-180
    • /
    • 2023
  • In the previous research, the axial compressive capacity models for the glass fiber-reinforced polymer (GFRP)-reinforced circular concrete compression elements restrained with GFRP helix were put forward based on small and noisy datasets by considering a limited number of parameters portraying less accuracy. Consequently, it is important to recommend an accurate model based on a refined and large testing dataset that considers various parameters of such components. The core objective and novelty of the current research is to suggest a deep learning model for the axial compressive capacity of GFRP-reinforced circular concrete columns restrained with a GFRP helix utilizing various parameters of a large experimental dataset to give the maximum precision of the estimates. To achieve this aim, a test dataset of 61 GFRP-reinforced circular concrete columns restrained with a GFRP helix has been created from prior studies. An assessment of 15 diverse theoretical models is carried out utilizing different statistical coefficients over the created dataset. A novel model utilizing the group method of data handling (GMDH) has been put forward. The recommended model depicted good effectiveness over the created dataset by assuming the axial involvement of GFRP main bars and the confining effectiveness of transverse GFRP helix and depicted the maximum precision with MAE = 195.67, RMSE = 255.41, and R2 = 0.94 as associated with the previously recommended equations. The GMDH model also depicted good effectiveness for the normal distribution of estimates with only a 2.5% discrepancy from unity. The recommended model can accurately calculate the axial compressive capacity of FRP-reinforced concrete compression elements that can be considered for further analysis and design of such components in the field of structural engineering.

Analysis and Implications of Australian National Data Service(ANDS) (오스트레일리아의 과학데이터 서비스체제(ANDS) 분석과 시사점)

  • Park, Dong-Jin
    • Journal of Digital Convergence
    • /
    • v.9 no.3
    • /
    • pp.1-10
    • /
    • 2011
  • Our country does not currently have a concrete policy for the management and preservation of the scientific dataset on the national level. The scientists and the research groups that are implementing a research project are not capable of searching or sharing the information about the dataset. In this situation where there is a major increase in the number of researches that use digitalized dataset, being able to share and reuse the scientific data amongst researchers is recognized to be very important. Therefore our country needs a new formulated policy that manages scientific data on the national level. This paper helps to find the implications of the strategic planning in our country by analyzing previous advanced case studies done by foreign countries. We selected Australia as our subject because its intensive government-driven research environment, research infrastructure and information service are very similar to Korea. To be specific, we analyzed ANDS (Australian National Data Service) and drew out the implications that could be applied to our country also. And finally we propose the basic principles that needs to be mirrored when formulating a policy on our country's scientific data.

A Study on the Improvement Model of Administrative Information Dataset Records Management Environment: Focused on the Dataset of Picture Archiving and Communication System (행정정보 데이터세트 기록관리 환경개선 모델 연구: 의료영상저장전송시스템(PACS)의 데이터세트를 중심으로)

  • Lee, Sun-kyung
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.22 no.2
    • /
    • pp.51-73
    • /
    • 2022
  • Currently, an implementation plan of administrative information dataset record management has been prepared; however, analyzing the specificity of various administrative information systems and preparing a reasonable level of management reference table by applying about 1.3% (EA portal registration system: 16,199, consulting system: 214) has its limitations. This study started by recognizing the importance of the records management environment in administrative information datasets. Based on the described information, the current records management environment was analyzed by dividing the six areas of the management reference table of the picture archiving and communication system (PACS) into three groups. Thus, a systematic environmental improvement model was proposed, enhancing the effectiveness of dataset records management in the field. Although there is a limitation in analyzing one of the dataset records management environments of various institutions, it is intended to help broaden the horizons of records management research.

Utilizing Artificial Neural Networks for Establishing Hearing-Loss Predicting Models Based on a Longitudinal Dataset and Their Implications for Managing the Hearing Conservation Program

  • Thanawat Khajonklin;Yih-Min Sun;Yue-Liang Leon Guo;Hsin-I Hsu;Chung Sik Yoon;Cheng-Yu Lin;Perng-Jy Tsai
    • Safety and Health at Work
    • /
    • v.15 no.2
    • /
    • pp.220-227
    • /
    • 2024
  • Background: Though the artificial neural network (ANN) technique has been used to predict noise-induced hearing loss (NIHL), the established prediction models have primarily relied on cross-sectional datasets, and hence, they may not comprehensively capture the chronic nature of NIHL as a disease linked to long-term noise exposure among workers. Methods: A comprehensive dataset was utilized, encompassing eight-year longitudinal personal hearing threshold levels (HTLs) as well as information on seven personal variables and two environmental variables to establish NIHL predicting models through the ANN technique. Three subdatasets were extracted from the afirementioned comprehensive dataset to assess the advantages of the present study in NIHL predictions. Results: The dataset was gathered from 170 workers employed in a steel-making industry, with a median cumulative noise exposure and HTL of 88.40 dBA-year and 19.58 dB, respectively. Utilizing the longitudinal dataset demonstrated superior prediction capabilities compared to cross-sectional datasets. Incorporating the more comprehensive dataset led to improved NIHL predictions, particularly when considering variables such as noise pattern and use of personal protective equipment. Despite fluctuations observed in the measured HTLs, the ANN predicting models consistently revealed a discernible trend. Conclusions: A consistent correlation was observed between the measured HTLs and the results obtained from the predicting models. However, it is essential to exercise caution when utilizing the model-predicted NIHLs for individual workers due to inherent personal fluctuations in HTLs. Nonetheless, these ANN models can serve as a valuable reference for the industry in effectively managing its hearing conservation program.