• Title/Summary/Keyword: Preprocessing Process

Search Result 440, Processing Time 0.032 seconds

Vulnerability Threat Classification Based on XLNET AND ST5-XXL model

  • Chae-Rim Hong;Jin-Keun Hong
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.3
    • /
    • pp.262-273
    • /
    • 2024
  • We provide a detailed analysis of the data processing and model training process for vulnerability classification using Transformer-based language models, especially sentence text-to-text transformers (ST5)-XXL and XLNet. The main purpose of this study is to compare the performance of the two models, identify the strengths and weaknesses of each, and determine the optimal learning rate to increase the efficiency and stability of model training. We performed data preprocessing, constructed and trained models, and evaluated performance based on data sets with various characteristics. We confirmed that the XLNet model showed excellent performance at learning rates of 1e-05 and 1e-04 and had a significantly lower loss value than the ST5-XXL model. This indicates that XLNet is more efficient for learning. Additionally, we confirmed in our study that learning rate has a significant impact on model performance. The results of the study highlight the usefulness of ST5-XXL and XLNet models in the task of classifying security vulnerabilities and highlight the importance of setting an appropriate learning rate. Future research should include more comprehensive analyzes using diverse data sets and additional models.

Machine learning-based evaluation technology of 3D spatial distribution of residual radioactivity in large-scale radioactive structures

  • UkJae Lee;Phillip Chang;Nam-Suk Jung;Jonghun Jang;Jimin Lee;Hee-Seock Lee
    • Nuclear Engineering and Technology
    • /
    • v.56 no.8
    • /
    • pp.3199-3209
    • /
    • 2024
  • During the decommissioning of nuclear and particle accelerator facilities, a considerable amount of large-scale radioactive waste may be generated. Accurately defining the activation level of the waste is crucial for proper disposal. However, directly measuring the internal radioactivity distribution poses challenges. This study introduced a novel technology employing machine learning to assess the internal radioactivity distribution based on external measurements. Random radioactivity distribution within a structure were established, and the photon spectrum measured by detectors from outside the structure was simulated using the FLUKA Monte-Carlo code. Through training with spectrum data corresponding to various radioactivity distributions, an evaluation model for radioactivity using simulated data was developed by above Monte-Carlo simulation. Convolutional Neural Network and Transformer methods were utilized to establish the evaluation model. The machine learning construction involves 5425 simulation datasets, and 603 datasets, which were used to obtain the evaluated results. Preprocessing was applied to the datasets, but the evaluation model using raw spectrum data showed the best evaluation results. The estimation of the intensity and shape of the radioactivity distribution inside the structure was achieved with a relative error of 10%. Additionally, the evaluation based on the constructed model takes only a few seconds to complete the process.

Ultrasonographic Analysis of the Size and Shape of the Muscles (근육의 크기와 형태의 초음파적 분석)

  • Kim, Kwang-Baek
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.2
    • /
    • pp.9-15
    • /
    • 2011
  • In this paper, we propose a method to extract the external oblique muscle of abdomen images that is often excluded by previous method due to image distortion. In the preprocessing phase of the proposed method, we emphasize the brightness contrast with Ends-in search stretching algorithm after removing noise from the initial ultrasonic images. Then we apply average binarization in vertical direction to extract candidate fascia areas. After removing other areas than fascia with morphological characteristics, the lost part in the fascia during the process is restored with such characteristic information and location information. Then the skin area is also removed with information from the arc appearing in convex filming and the candidate muscle areas are extracted by overlapping two results two way up-down search algorithm. Another noise removing process is done to determine the muscle area. In case of obtaining obscure result, after restoring the muscle area by smearing method, the thickness of the muscle is measured by min square method. The experiment verifies that the proposed method is sufficiently effective to analyze the size and shape of muscles in abdomen in ultrasonography than previously used methods.

Simple Frame Marker: Implementation of In-Marker Image and Character Recognition and Tracking Method (심플 프레임 마커: 마커 내부 이미지 및 문자 패턴의 인식 및 추적 기법 구현)

  • Kim, Hye-Jin;Woo, Woon-Tack
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.558-561
    • /
    • 2009
  • In this paper, we propose Simple Frame Marker(SFMarker) to support recognition of characters and images included in a marker in augmented reality. If characters are inserted inside of marker and are recognised using Optical Character Recognition(OCR), it doesn't need marker learning process before an execution. It also reduces visual disturbance compared to 2D barcode marker due to familarity of characters. Therefore, proposed SFMarker distinguishes Square SFMarker that embeds images from Rectangle SFMarker with characters according to ratio of marker and applies different recognition algorithms. Also, in order to reduce preprocessing of character recognition, SFMarker inserts direction information in border of marker and extracts it to execute character recognition fast and correctly. Finally, since the character recognition for every frame slows down tracking speed, we increase the speed of recognition process using the result of character recognition in previous frame when frame difference is low.

  • PDF

A Study on the Method for Extracting the Purpose-Specific Customized Information from Online Product Reviews based on Text Mining (텍스트 마이닝 기반의 온라인 상품 리뷰 추출을 통한 목적별 맞춤화 정보 도출 방법론 연구)

  • Kim, Joo Young;Kim, Dong soo
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.2
    • /
    • pp.151-161
    • /
    • 2016
  • In the era of the Web 2.0, characterized by the openness, sharing and participation, it is easy for internet users to produce and share the data. The amount of the unstructured data which occupies most of the digital world's data has increased exponentially. One of the kinds of the unstructured data called personal online product reviews is necessary for both the company that produces those products and the potential customers who are interested in those products. In order to extract useful information from lots of scattered review data, the process of collecting data, storing, preprocessing, analyzing, and drawing a conclusion is needed. Therefore we introduce the text-mining methodology for applying the natural language process technology to the text format data like product review in order to carry out extracting structured data by using R programming. Also, we introduce the data-mining to derive the purpose-specific customized information from the structured review information drawn by the text-mining.

Text extraction in images using simplify color and edges pattern analysis (색상 단순화와 윤곽선 패턴 분석을 통한 이미지에서의 글자추출)

  • Yang, Jae-Ho;Park, Young-Soo;Lee, Sang-Hun
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.8
    • /
    • pp.33-40
    • /
    • 2017
  • In this paper, we propose a text extraction method by pattern analysis on contour for effective text detection in image. Text extraction algorithms using edge based methods show good performance in images with simple backgrounds, The images of complex background has a poor performance shortcomings. The proposed method simplifies the color of the image by using K-means clustering in the preprocessing process to detect the character region in the image. Enhance the boundaries of the object through the High pass filter to improve the inaccuracy of the boundary of the object in the color simplification process. Then, by using the difference between the expansion and erosion of the morphology technique, the edges of the object is detected, and the character candidate region is discriminated by analyzing the pattern of the contour portion of the acquired region to remove the unnecessary region (picture, background). As a final result, we have shown that the characters included in the candidate character region are extracted by removing unnecessary regions.

Item Recommendation Technique Using Spark (Spark를 이용한 항목 추천 기법에 관한 연구)

  • Yun, So-Young;Youn, Sung-Dae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.5
    • /
    • pp.715-721
    • /
    • 2018
  • With the spread of mobile devices, the users of social network services or e-commerce sites have increased dramatically, and the amount of data produced by the users has increased exponentially. E-commerce companies have faced a task regarding how to extract useful information from a vast amount of data produced by the users. To solve this problem, there are various studies applying big data processing technique. In this paper, we propose a collaborative filtering method that applies the tag weight in the Apache Spark platform. In order to elevate the accuracy of recommendation, the proposed method refines the tag data in the preprocessing process and categorizes the items and then applies the information of periods and tag weight to the estimate rating of the items. After generating RDD, we calculate item similarity and prediction values and recommend items to users. The experiment result indicated that the proposed method process large amounts of data quickly and improve the appropriateness of recommendation better.

Shadow Removal based on Chromaticity and Brightness Distortion for Effective Moving Object Tracking (효과적인 이동물체 추적을 위한 색도와 밝기 왜곡 기반의 그림자 제거)

  • Kim, Yeon-Hee;Kim, Jae-Ho;Kim, Yoon-Ho
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.8 no.4
    • /
    • pp.249-256
    • /
    • 2015
  • Shadow is a common physical phenomenon in natural images and may cause problems in computer vision tasks. Therefore, shadow removal is an essential preprocessing process for effective moving object tracking in video image. In this paper, we proposed the method of shadow removal algorithm using chromaticity, brightness distortion and direction of shadow candidate. The proposed method consists of two steps. First, removal process of primary shadow candidate region by using chromaticity, brightness and distortion. The second stage applies the final shadow candidate region to obtain a direction feature of shadow which is estimated by the thinning algorithm after calculating the lowest pixel position of the moving object. To verify the proposed approach, some experiments are conducted to draw a compare between conventional method and that of proposed. Experimental results showed that proposed methodology is simple, but robust and well adaptive to be need to remove a shadow removal operation.

Personalized Mobile Junk Message Filtering System (사용자 맞춤형 스팸 문자 필터링 시스템)

  • Lee, Seung-Jae;Choi, Deok-Jai
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.12
    • /
    • pp.122-135
    • /
    • 2011
  • Mobile spam message is a harmful factor which makes receivers to be annoyed and leads to unnecessary social cost. Unwanted junk messages flowing to a smart phone ruin main purpose of the smart work system to enhance the productivity, so we need to study on this area. In this paper, we proposed a novel spam filter on the smartphone in order to reduce computing process and improve the accuracy rate by feedback of error results to a training sample set. As the spam classifier operates on the smartphone independently with training on only user's received data, it could reflect user preference. The authorized personal computer takes on heavy works, such as preprocessing, feature selecting and training process, and the smartphone takes on light works to block junk messages. Experimental results showed reasonable accuracy rate of over 95%, and we found that the application occupied constant computing resources while running on the phone.

Real-time Volume Rendering using Point-Primitive (포인트 프리미티브를 이용한 실시간 볼륨 렌더링 기법)

  • Kang, Dong-Soo;Shin, Byeong-Seok
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.10
    • /
    • pp.1229-1237
    • /
    • 2011
  • The volume ray-casting method is one of the direct volume rendering methods that produces high-quality images as well as manipulates semi-transparent object. Although the volume ray-casting method produces high-quality image by sampling in the region of interest, its rendering speed is slow since the color acquisition process is complicated for repetitive memory reference and accumulation of sample values. Recently, the GPU-based acceleration techniques are introduced. However, they require pre-processing or additional memory. In this paper, we propose efficient point-primitive based method to overcome complicated computation of GPU ray-casting. It presents semi-transparent objects, however it does not require preprocessing and additional memory. Our method is fast since it generates point-primitives from volume dataset during sampling process and it projects the primitives onto the image plane. Also, our method can easily cope with OTF change because we can add or delete point-primitive in real-time.