• Title/Summary/Keyword: automatic processing

Search Result 2,241, Processing Time 0.032 seconds

Prerequisite Research for the Development of an End-to-End System for Automatic Tooth Segmentation: A Deep Learning-Based Reference Point Setting Algorithm (자동 치아 분할용 종단 간 시스템 개발을 위한 선결 연구: 딥러닝 기반 기준점 설정 알고리즘)

  • Kyungdeok Seo;Sena Lee;Yongkyu Jin;Sejung Yang
    • Journal of Biomedical Engineering Research
    • /
    • v.44 no.5
    • /
    • pp.346-353
    • /
    • 2023
  • In this paper, we propose an innovative approach that leverages deep learning to find optimal reference points for achieving precise tooth segmentation in three-dimensional tooth point cloud data. A dataset consisting of 350 aligned maxillary and mandibular cloud data was used as input, and both end coordinates of individual teeth were used as correct answers. A two-dimensional image was created by projecting the rendered point cloud data along the Z-axis, where an image of individual teeth was created using an object detection algorithm. The proposed algorithm is designed by adding various modules to the Unet model that allow effective learning of a narrow range, and detects both end points of the tooth using the generated tooth image. In the evaluation using DSC, Euclid distance, and MAE as indicators, we achieved superior performance compared to other Unet-based models. In future research, we will develop an algorithm to find the reference point of the point cloud by back-projecting the reference point detected in the image in three dimensions, and based on this, we will develop an algorithm to divide the teeth individually in the point cloud through image processing techniques.

F_MixBERT: Sentiment Analysis Model using Focal Loss for Imbalanced E-commerce Reviews

  • Fengqian Pang;Xi Chen;Letong Li;Xin Xu;Zhiqiang Xing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.263-283
    • /
    • 2024
  • Users' comments after online shopping are critical to product reputation and business improvement. These comments, sometimes known as e-commerce reviews, influence other customers' purchasing decisions. To confront large amounts of e-commerce reviews, automatic analysis based on machine learning and deep learning draws more and more attention. A core task therein is sentiment analysis. However, the e-commerce reviews exhibit the following characteristics: (1) inconsistency between comment content and the star rating; (2) a large number of unlabeled data, i.e., comments without a star rating, and (3) the data imbalance caused by the sparse negative comments. This paper employs Bidirectional Encoder Representation from Transformers (BERT), one of the best natural language processing models, as the base model. According to the above data characteristics, we propose the F_MixBERT framework, to more effectively use inconsistently low-quality and unlabeled data and resolve the problem of data imbalance. In the framework, the proposed MixBERT incorporates the MixMatch approach into BERT's high-dimensional vectors to train the unlabeled and low-quality data with generated pseudo labels. Meanwhile, data imbalance is resolved by Focal loss, which penalizes the contribution of large-scale data and easily-identifiable data to total loss. Comparative experiments demonstrate that the proposed framework outperforms BERT and MixBERT for sentiment analysis of e-commerce comments.

IoT-Based Automatic Water Quality Monitoring System with Optimized Neural Network

  • Anusha Bamini A M;Chitra R;Saurabh Agarwal;Hyunsung Kim;Punitha Stephan;Thompson Stephan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.1
    • /
    • pp.46-63
    • /
    • 2024
  • One of the biggest dangers in the globe is water contamination. Water is a necessity for human survival. In most cities, the digging of borewells is restricted. In some cities, the borewell is allowed for only drinking water. Hence, the scarcity of drinking water is a vital issue for industries and villas. Most of the water sources in and around the cities are also polluted, and it will cause significant health issues. Real-time quality observation is necessary to guarantee a secure supply of drinking water. We offer a model of a low-cost system of monitoring real-time water quality using IoT to address this issue. The potential for supporting the real world has expanded with the introduction of IoT and other sensors. Multiple sensors make up the suggested system, which is utilized to identify the physical and chemical features of the water. Various sensors can measure the parameters such as temperature, pH, and turbidity. The core controller can process the values measured by sensors. An Arduino model is implemented in the core controller. The sensor data is forwarded to the cloud database using a WI-FI setup. The observed data will be transferred and stored in a cloud-based database for further processing. It wasn't easy to analyze the water quality every time. Hence, an Optimized Neural Network-based automation system identifies water quality from remote locations. The performance of the feed-forward neural network classifier is further enhanced with a hybrid GA- PSO algorithm. The optimized neural network outperforms water quality prediction applications and yields 91% accuracy. The accuracy of the developed model is increased by 20% because of optimizing network parameters compared to the traditional feed-forward neural network. Significant improvement in precision and recall is also evidenced in the proposed work.

Multi-Label Classification for Corporate Review Text: A Local Grammar Approach (머신러닝 기반의 기업 리뷰 다중 분류: 부분 문법 적용을 중심으로)

  • HyeYeon Baek;Young Kyun Chang
    • Information Systems Review
    • /
    • v.25 no.3
    • /
    • pp.27-41
    • /
    • 2023
  • Unlike the previous works focusing on the state-of-the-art methodologies to improve the performance of machine learning models, this study improves the 'quality' of training data used in machine learning. We propose a method to enhance the quality of training data through the processing of 'local grammar,' frequently used in corpus analysis. We collected a vast amount of unstructured corporate review text data posted by employees working in the top 100 companies in Korea. After improving the data quality using the local grammar process, we confirmed that the classification model with local grammar outperformed the model without it in terms of classification performance. We defined five factors of work engagement as classification categories, and analyzed how the pattern of reviews changed before and after the COVID-19 pandemic. Through this study, we provide evidence that shows the value of the local grammar-based automatic identification and classification of employee experiences, and offer some clues for significant organizational cultural phenomena.

A Study on the Development of a Problem Bank in an Automated Assessment Module for Data Visualization Based on Public Data

  • HakNeung Go;Sangsu Jeong;Youngjun Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.5
    • /
    • pp.203-211
    • /
    • 2024
  • Utilizing programming languages for data visualization can enhance the efficiency and effectiveness in handling data volume, processing time, and flexibility. However, practice is required to become proficient in programming. Therefore public data-based the problem bank was developed to practice data visualization in a programming automatic assessment system. Public data were collected based on topics suggested in the curriculum and were preprocessed to make it suitable for users to visualize. The problem bank was associated with the mathematics curriculum to learn various data visualization methods. The developed problems were reviewed to expert and pilot testing, which validated the level of the questions and the potential of integrating data visualization in math education. However, feedback indicated a lack of student interest in the topics, leading us to develop additional questions using student-center data. The developed problem bank is expected to be used when students who have learned Python in primary school information gifted or middle school or higher learn data visualization.

Development of Operation System for Satellite Laser Ranging on Geochang Station (거창 인공위성 레이저 추적을 위한 운영 시스템 개발)

  • Ki-Pyoung Sung;Hyung-Chul Lim;Man-Soo Choi;Sung-Yeol Yu
    • Journal of Space Technology and Applications
    • /
    • v.4 no.2
    • /
    • pp.169-183
    • /
    • 2024
  • Korea Astronomy and Space Science Institute (KASI) developed the Geochang satellite laser ranging (SLR) system for the scientific research on the space geodesy as well as for the national space missions including precise orbit determination and space surveillance. The operation system was developed based on the server-client communication structure, which controls the SLR subsystems, provides manual and automatic observation modes based on the observation algorithm, generates the range data between satellites and SLR stations, and carry out the post-processing to remove noises. In this study, we analyzed the requirements of operation system, and presented the development environments, the software structure and the observation algorithm, for the server-client communications. We also obtained laser ranging data for the ground target and the space geodetic satellite, and then analyzed the ranging precision between the Geochang SLR station and the International Laser Ranging Service (ILRS) network stations, in order to verify the operation system.

Automatic Detection of Stage 1 Sleep Utilizing Simultaneous Analyses of EEG Spectrum and Slow Eye Movement (느린 안구 운동(SEM)과 뇌파의 스펙트럼 동시 분석을 이용한 1단계 수면탐지)

  • Shin, Hong-Beom;Han, Jong-Hee;Jeong, Do-Un;Park, Kwang-Suk
    • Sleep Medicine and Psychophysiology
    • /
    • v.10 no.1
    • /
    • pp.52-60
    • /
    • 2003
  • Objectives: Stage 1 sleep provides important information regarding interpretation of nocturnal polysomnography, particularly sleep onset. It is a short transition period from wakeful consciousness to sleep. The lack of prominent sleep events characterizing stage 1 sleep is a major obstacle in automatic sleep stage scoring. In this study, utilization of simultaneous EEG and EOG processing and analyses to detect stage 1 sleep automatically were attempted. Methods: Relative powers of the alpha waves and the theta waves were calculated from spectral estimation. A relative power of alpha waves less than 50% or relative power of theta waves more than 23% was regarded as stage 1 sleep. SEM(slow eye movement) was defined as the duration of both-eye movement ranging from 1.5 to 4 seconds, and was also regarded as stage 1 sleep. If one of these three criteria was met, the epoch was regarded as stage 1 sleep. Results were compared to the manual rating results done by two polysomnography experts. Results: A total of 169 epochs were analyzed. The agreement rate for stage 1 sleep between automatic detection and manual scoring was 79.3% and Cohen’s Kappa was 0.586 (p<0.01). A significant portion (32%) of automatically detected stage 1 sleep included SEM. Conclusion: Generally, digitally-scored sleep staging shows accuracy up to 70%. Considering potential difficulty in stage 1 sleep scoring, accuracy of 79.3% in this study seems to be strong enough. Simultaneous analysis of EOG differentiates this study from previous ones which mainly depended on EEG analysis. The issue of close relationship between SEM and stage 1 sleep raised by Kinnari remains a valid one in this study.

  • PDF

An Implementation Method of the Character Recognizer for the Sorting Rate Improvement of an Automatic Postal Envelope Sorting Machine (우편물 자동구분기의 구분율 향상을 위한 문자인식기의 구현 방법)

  • Lim, Kil-Taek;Jeong, Seon-Hwa;Jang, Seung-Ick;Kim, Ho-Yon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.4
    • /
    • pp.15-24
    • /
    • 2007
  • The recognition of postal address images is indispensable for the automatic sorting of postal envelopes. The process of the address image recognition is composed of three steps-address image preprocessing, character recognition, address interpretation. The extracted character images from the preprocessing step are forwarded to the character recognition step, in which multiple candidate characters with reliability scores are obtained for each character image extracted. aracters with reliability scores are obtained for each character image extracted. Utilizing those character candidates with scores, we obtain the final valid address for the input envelope image through the address interpretation step. The envelope sorting rate depends on the performance of all three steps, among which character recognition step could be said to be very important. The good character recognizer would be the one which could produce valid candidates with very reliable scores to help the address interpretation step go easy. In this paper, we propose the method of generating character candidates with reliable recognition scores. We utilize the existing MLP(multilayered perceptrons) neural network of the address recognition system in the current automatic postal envelope sorters, as the classifier for the each image from the preprocessing step. The MLP is well known to be one of the best classifiers in terms of processing speed and recognition rate. The false alarm problem, however, might be occurred in recognition results, which made the address interpretation hard. To make address interpretation easy and improve the envelope sorting rate, we propose promising methods to reestimate the recognition score (confidence) of the existing MLP classifier: the generation method of the statistical recognition properties of the classifier and the method of the combination of the MLP and the subspace classifier which roles as a reestimator of the confidence. To confirm the superiority of the proposed method, we have used the character images of the real postal envelopes from the sorters in the post office. The experimental results show that the proposed method produces high reliability in terms of error and rejection for individual characters and non-characters.

  • PDF

Gridding of Automatic Mountain Meteorology Observation Station (AMOS) Temperature Data Using Optimal Kriging with Lapse Rate Correction (기온감률 보정과 최적크리깅을 이용한 산악기상관측망 기온자료의 우리나라 500미터 격자화)

  • Youjeong Youn;Seoyeon Kim;Jonggu Kang;Yemin Jeong;Soyeon Choi;Yungyo Im;Youngmin Seo;Myoungsoo Won;Junghwa Chun;Kyungmin Kim;Keunchang Jang;Joongbin Lim;Yangwon Lee
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.715-727
    • /
    • 2023
  • To provide detailed and appropriate meteorological information in mountainous areas, the Korea Forest Service has established an Automatic Mountain Meteorology Observation Station (AMOS) network in major mountainous regions since 2012, and 464 stations are currently operated. In this study, we proposed an optimal kriging technique with lapse rate correction to produce gridded temperature data suitable for Korean forests using AMOS point observations. First, the outliers of the AMOS temperature data were removed through statistical processing. Then, an optimized theoretical variogram, which best approximates the empirical variogram, was derived to perform the optimal kriging with lapse rate correction. A 500-meter resolution Kriging map for temperature was created to reflect the elevation variations in Korean mountainous terrain. A blind evaluation of the method using a spatially unbiased validation sample showed a correlation coefficient of 0.899 to 0.953 and an error of 0.933 to 1.230℃, indicating a slight accuracy improvement compared to regular kriging without lapse rate correction. However, the critical advantage of the proposed method is that it can appropriately represent the complex terrain of Korean forests, such as local variations in mountainous areas and coastal forests in Gangwon province and topographical differences in Jirisan and Naejangsan and their surrounding forests.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.