• 제목/요약/키워드: Combined dataset

검색결과 158건 처리시간 0.021초

Mitigation of Phishing URL Attack in IoT using H-ANN with H-FFGWO Algorithm

  • Gopal S. B;Poongodi C
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권7호
    • /
    • pp.1916-1934
    • /
    • 2023
  • The phishing attack is a malicious emerging threat on the internet where the hackers try to access the user credentials such as login information or Internet banking details through pirated websites. Using that information, they get into the original website and try to modify or steal the information. The problem with traditional defense systems like firewalls is that they can only stop certain types of attacks because they rely on a fixed set of principles to do so. As a result, the model needs a client-side defense mechanism that can learn potential attack vectors to detect and prevent not only the known but also unknown types of assault. Feature selection plays a key role in machine learning by selecting only the required features by eliminating the irrelevant ones from the real-time dataset. The proposed model uses Hyperparameter Optimized Artificial Neural Networks (H-ANN) combined with a Hybrid Firefly and Grey Wolf Optimization algorithm (H-FFGWO) to detect and block phishing websites in Internet of Things(IoT) Applications. In this paper, the H-FFGWO is used for the feature selection from phishing datasets ISCX-URL, Open Phish, UCI machine-learning repository, Mendeley website dataset and Phish tank. The results showed that the proposed model had an accuracy of 98.07%, a recall of 98.04%, a precision of 98.43%, and an F1-Score of 98.24%.

Corporate Characteristics and Occupational Injuries by Industry

  • Sunyoung Park;Myung-Joong Kim
    • Safety and Health at Work
    • /
    • 제14권3호
    • /
    • pp.259-266
    • /
    • 2023
  • Background: Recent research on occupational injuries in companies has faced difficulties in obtaining representative data, leading to studies relying on surveys or case studies. Moreover, it is difficult to find studies on how a company's industry characteristics affect occupational injuries. This study aims to address these limitations. Methods: We collected 11 years of disclosure data from 1,247 listed companies in the Korean stock market and combined it with their occupational injury histories collected by the Republic of Korea Occupational Safety and Health Agency (KOSHA) to build a dataset. We attempted to analyze a linear panel model by dividing the dataset into manufacturing, construction, and other industries. Results: The higher proportion of full-time employees and better job skills correlate with lower occupational injuries in other industries. The wage increase reduces occupational injuries in manufacturing and other industries, but the substitution effect produces the opposite outcome in construction. Also, foreign ownership and credit ratings increase effectively reduce occupational injuries mainly in the manufacturing industry. Conclusion: Our results suggest that in explaining the relationship between corporate characteristics and occupational injuries, it is necessary to consider the nature of the industry more closely, and in particular, employment and labor policies for preventing occupational injuries need to be selectively applied according to industry. In addition, to improve the limitations and increase the usability of the research results, further detailed studies are needed in the future.

Improve the Performance of Semi-Supervised Side-channel Analysis Using HWFilter Method

  • Hong Zhang;Lang Li;Di Li
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권3호
    • /
    • pp.738-754
    • /
    • 2024
  • Side-channel analysis (SCA) is a cryptanalytic technique that exploits physical leakages, such as power consumption or electromagnetic emanations, from cryptographic devices to extract secret keys used in cryptographic algorithms. Recent studies have shown that training SCA models with semi-supervised learning can effectively overcome the problem of few labeled power traces. However, the process of training SCA models using semi-supervised learning generates many pseudo-labels. The performance of the SCA model can be reduced by some of these pseudo-labels. To solve this issue, we propose the HWFilter method to improve semi-supervised SCA. This method uses a Hamming Weight Pseudo-label Filter (HWPF) to filter the pseudo-labels generated by the semi-supervised SCA model, which enhances the model's performance. Furthermore, we introduce a normal distribution method for constructing the HWPF. In the normal distribution method, the Hamming weights (HWs) of power traces can be obtained from the normal distribution of power points. These HWs are filtered and combined into a HWPF. The HWFilter was tested using the ASCADv1 database and the AES_HD dataset. The experimental results demonstrate that the HWFilter method can significantly enhance the performance of semi-supervised SCA models. In the ASCADv1 database, the model with HWFilter requires only 33 power traces to recover the key. In the AES_HD dataset, the model with HWFilter outperforms the current best semi-supervised SCA model by 12%.

Human Action Recognition Using Pyramid Histograms of Oriented Gradients and Collaborative Multi-task Learning

  • Gao, Zan;Zhang, Hua;Liu, An-An;Xue, Yan-Bing;Xu, Guang-Ping
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제8권2호
    • /
    • pp.483-503
    • /
    • 2014
  • In this paper, human action recognition using pyramid histograms of oriented gradients and collaborative multi-task learning is proposed. First, we accumulate global activities and construct motion history image (MHI) for both RGB and depth channels respectively to encode the dynamics of one action in different modalities, and then different action descriptors are extracted from depth and RGB MHI to represent global textual and structural characteristics of these actions. Specially, average value in hierarchical block, GIST and pyramid histograms of oriented gradients descriptors are employed to represent human motion. To demonstrate the superiority of the proposed method, we evaluate them by KNN, SVM with linear and RBF kernels, SRC and CRC models on DHA dataset, the well-known dataset for human action recognition. Large scale experimental results show our descriptors are robust, stable and efficient, and outperform the state-of-the-art methods. In addition, we investigate the performance of our descriptors further by combining these descriptors on DHA dataset, and observe that the performances of combined descriptors are much better than just using only sole descriptor. With multimodal features, we also propose a collaborative multi-task learning method for model learning and inference based on transfer learning theory. The main contributions lie in four aspects: 1) the proposed encoding the scheme can filter the stationary part of human body and reduce noise interference; 2) different kind of features and models are assessed, and the neighbor gradients information and pyramid layers are very helpful for representing these actions; 3) The proposed model can fuse the features from different modalities regardless of the sensor types, the ranges of the value, and the dimensions of different features; 4) The latent common knowledge among different modalities can be discovered by transfer learning to boost the performance.

결측 데이터 보정법에 의한 의사 데이터로 조정된 예측 최적화 방법 (Predictive Optimization Adjusted With Pseudo Data From A Missing Data Imputation Technique)

  • 김정우
    • 한국산학기술학회논문지
    • /
    • 제20권2호
    • /
    • pp.200-209
    • /
    • 2019
  • 미래 값을 예측할 때, 학습 오차(training error)를 최소화하여 추정된 모형은 보통 많은 테스트 오차(test error)를 야기할 수 있다. 이것은 추정 모델이 주어진 데이터 집합에만 집중하여 발생하는 모델 복잡성에 따른 과적합(overfitting) 문제이다. 일부 정규화 및 리샘플링 방법은 이 문제를 완화하여 테스트 오차를 줄이기 위해 도입되었지만, 이 방법들 또한 주어진 데이터 집합에서만 국한 되도록 설계되었다. 본 논문에서는 테스트 오차 최소화 문제를 학습 오차 최소화 문제로 변환하여 테스트 오차를 줄이기 위한 새로운 최적화 방법을 제안한다. 이 변환을 수행하기 위해 주어진 데이터 집합에 대해 의사(pseudo) 데이터라고 하는 새로운 데이터를 추가하였다. 그리고 적절한 의사 데이터를 만들기 위해 결측 데이터 보정법의 세 가지 유형을 사용하였다. 예측 모델로서 선형회귀모형, 자기회귀모형, ridge 회귀모형을 사용하고 이 모형들에 의사 데이터 방법을 적용하였다. 또한, 의사 데이터로 조정된 최적화 방법을 활용하여 환경 데이터 및 금융 데이터에 적용한 사례를 제시하였다. 결과적으로 이 논문에서 제시된 방법은 원래의 예측 모형보다 테스트 오차를 감소시키는 것으로 나타났다.

다중 애플리케이션 처리를 위한 경량 인공지능 하드웨어 기반 통합 프레임워크 연구 (A Study of Unified Framework with Light Weight Artificial Intelligence Hardware for Broad range of Applications)

  • 전석훈;이재학;한지수;김병수
    • 한국전자통신학회논문지
    • /
    • 제14권5호
    • /
    • pp.969-976
    • /
    • 2019
  • 경량 인공지능 하드웨어는 다양한 문제의 해결을 위해 멀티모달 센서 데이터를 입력받아 특징 선택, 추출, 차원축소, 정규화 과정을 수행한 후 인공지능 엔진으로 예측 결과를 도출한다. 다양한 애플리케이션에서 높은 성능을 달성하기 위해서는 이러한 경량 인공지능 하드웨어의 초 매개변수와 전체적인 전처리 시스템의 구성을 데이터에 맞춰 최적화할 필요가 있다. 본 논문에서는 경량 인공지능 하드웨어의 효율적인 제어 및 최적화를 위한 통합 프레임워크를 제안한다. 제안된 통합 프레임워크는 데이터 전처리 및 뉴로모픽 기반 경량 인공지능 엔진을 유연하게 재구성할 수 있으며, 최적의 모델을 생성할 수 있다. 기능검증을 위해 손글씨 이미지 데이터 세트와 관성 센서 데이터 기반의 낙상 검출 데이터 세트를 사용하였으며, 실험 결과 제안하는 통합 프레임워크가 각각의 데이터 세트에서 90% 이상의 정확도를 갖는 최적의 모델을 생성함을 확인하였다.

IoT botnet attack detection using deep autoencoder and artificial neural networks

  • Deris Stiawan;Susanto ;Abdi Bimantara;Mohd Yazid Idris;Rahmat Budiarto
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권5호
    • /
    • pp.1310-1338
    • /
    • 2023
  • As Internet of Things (IoT) applications and devices rapidly grow, cyber-attacks on IoT networks/systems also have an increasing trend, thus increasing the threat to security and privacy. Botnet is one of the threats that dominate the attacks as it can easily compromise devices attached to an IoT networks/systems. The compromised devices will behave like the normal ones, thus it is difficult to recognize them. Several intelligent approaches have been introduced to improve the detection accuracy of this type of cyber-attack, including deep learning and machine learning techniques. Moreover, dimensionality reduction methods are implemented during the preprocessing stage. This research work proposes deep Autoencoder dimensionality reduction method combined with Artificial Neural Network (ANN) classifier as botnet detection system for IoT networks/systems. Experiments were carried out using 3- layer, 4-layer and 5-layer pre-processing data from the MedBIoT dataset. Experimental results show that using a 5-layer Autoencoder has better results, with details of accuracy value of 99.72%, Precision of 99.82%, Sensitivity of 99.82%, Specificity of 99.31%, and F1-score value of 99.82%. On the other hand, the 5-layer Autoencoder model succeeded in reducing the dataset size from 152 MB to 12.6 MB (equivalent to a reduction of 91.2%). Besides that, experiments on the N_BaIoT dataset also have a very high level of accuracy, up to 99.99%.

로직에 기반 한 트리 구조의 퍼지 뉴럴 네트워크를 이용한 복합 화력 발전소의 출력 예측 (Output Power Prediction of Combined Cycle Power Plant using Logic-based Tree Structured Fuzzy Neural Networks)

  • 한창욱;이돈규
    • 전기전자학회논문지
    • /
    • 제23권2호
    • /
    • pp.529-533
    • /
    • 2019
  • 오늘날 복합 화력 발전소는 전력 생산을 위해 많이 사용되고 있고, 최근에는 운전 매개 변수를 기반으로 발전 출력을 예측하는 것이 주요 관심사이다. 본 논문에서는 복합 화력 발전소의 출력을 예측하기 위해 컴퓨터 지능 기법을 이용하는 방법을 제시한다. 컴퓨터 지능 기술은 지속적으로 발전되어 많은 실제 문제에 적용되어 왔다. 본 논문에서는 트리 구조의 퍼지 뉴럴 네트워크를 이용하여 발전 출력을 예측하고자 한다. 트리 구조의 퍼지 뉴럴 네트워크는 퍼지 뉴런을 노드로 선택하고 관련 입력을 최적으로 선택하여 규칙 수를 줄이는 장점이 있다. 네트워크의 최적화를 위해 2 단계 최적화 방법이 사용된다. 유전 알고리즘은 최적의 노드와 리프를 선택하여 네트워크의 이진 구조를 최적화 한 다음 랜덤 신호 기반 학습을 수행하여 최적화 된 이진 연결을 단위 구간에서 미세 학습한다. 제안 된 방법의 유용성을 검증하기 위해 UCI Machine Learning Repository Database에서 얻은 복합 화력 발전소 데이터를 사용한다.

개방형 다중 데이터셋을 활용한 Combined Segmentation Network 기반 드론 영상의 의미론적 분할 (Semantic Segmentation of Drone Images Based on Combined Segmentation Network Using Multiple Open Datasets)

  • 송아람
    • 대한원격탐사학회지
    • /
    • 제39권5_3호
    • /
    • pp.967-978
    • /
    • 2023
  • 본 연구에서는 다양한 드론 영상 데이터셋을 효과적으로 학습하여 의미론적 분할의 정확도를 향상시키기 위한 combined segmentation network (CSN)를 제안하고 검증하였다. CSN은 세 가지 드론 데이터셋의 다양성을 고려하기 위하여 인코딩 영역의 전체를 공유하며, 디코딩 영역은 독립적으로 학습된다. CSN의 경우, 학습 시 모든 데이터셋에 대한 손실값을 고려하기 때문에 U-Net 및 pyramid scene parsing network (PSPNet)으로 단일 데이터셋을 학습할 때보다 학습 효율이 떨어졌다. 그러나 국내 자율주행 드론 영상에 CSN을 적용한 결과, CSN이 PSPNet에 비해 초기 학습 없이도 영상 내 화소를 적절한 클래스로 분류할 수 있는 것을 확인하였다. 본 연구를 통하여 CSN이 다양한 드론 영상 데이터셋을 효과적으로 학습하고 새로운 지역에 대한 객체 인식 정확성을 향상시키는 데 중요한 도구로써 활용될 수 있을 것으로 기대할 수 있다.

Possibility of the Use of Public Microarray Database for Identifying Significant Genes Associated with Oral Squamous Cell Carcinoma

  • Kim, Ki-Yeol;Cha, In-Ho
    • Genomics & Informatics
    • /
    • 제10권1호
    • /
    • pp.23-32
    • /
    • 2012
  • There are lots of studies attempting to identify the expression changes in oral squamous cell carcinoma. Most studies include insufficient samples to apply statistical methods for detecting significant gene sets. This study combined two small microarray datasets from a public database and identified significant genes associated with the progress of oral squamous cell carcinoma. There were different expression scales between the two datasets, even though these datasets were generated under the same platforms - Affymetrix U133A gene chips. We discretized gene expressions of the two datasets by adjusting the differences between the datasets for detecting the more reliable information. From the combination of the two datasets, we detected 51 significant genes that were upregulated in oral squamous cell carcinoma. Most of them were published in previous studies as cancer-related genes. From these selected genes, significant genetic pathways associated with expression changes were identified. By combining several datasets from the public database, sufficient samples can be obtained for detecting reliable information. Most of the selected genes were known as cancer-related genes, including oral squamous cell carcinoma. Several unknown genes can be biologically evaluated in further studies.