• Title/Summary/Keyword: transactions

Search Result 45,713, Processing Time 0.055 seconds

Analyzing Korean Math Word Problem Data Classification Difficulty Level Using the KoEPT Model (KoEPT 기반 한국어 수학 문장제 문제 데이터 분류 난도 분석)

  • Rhim, Sangkyu;Ki, Kyung Seo;Kim, Bugeun;Gweon, Gahgene
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.8
    • /
    • pp.315-324
    • /
    • 2022
  • In this paper, we propose KoEPT, a Transformer-based generative model for automatic math word problems solving. A math word problem written in human language which describes everyday situations in a mathematical form. Math word problem solving requires an artificial intelligence model to understand the implied logic within the problem. Therefore, it is being studied variously across the world to improve the language understanding ability of artificial intelligence. In the case of the Korean language, studies so far have mainly attempted to solve problems by classifying them into templates, but there is a limitation in that these techniques are difficult to apply to datasets with high classification difficulty. To solve this problem, this paper used the KoEPT model which uses 'expression' tokens and pointer networks. To measure the performance of this model, the classification difficulty scores of IL, CC, and ALG514, which are existing Korean mathematical sentence problem datasets, were measured, and then the performance of KoEPT was evaluated using 5-fold cross-validation. For the Korean datasets used for evaluation, KoEPT obtained the state-of-the-art(SOTA) performance with 99.1% in CC, which is comparable to the existing SOTA performance, and 89.3% and 80.5% in IL and ALG514, respectively. In addition, as a result of evaluation, KoEPT showed a relatively improved performance for datasets with high classification difficulty. Through an ablation study, we uncovered that the use of the 'expression' tokens and pointer networks contributed to KoEPT's state of being less affected by classification difficulty while obtaining good performance.

Sign Language Dataset Built from S. Korean Government Briefing on COVID-19 (대한민국 정부의 코로나 19 브리핑을 기반으로 구축된 수어 데이터셋 연구)

  • Sim, Hohyun;Sung, Horyeol;Lee, Seungjae;Cho, Hyeonjoong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.8
    • /
    • pp.325-330
    • /
    • 2022
  • This paper conducts the collection and experiment of datasets for deep learning research on sign language such as sign language recognition, sign language translation, and sign language segmentation for Korean sign language. There exist difficulties for deep learning research of sign language. First, it is difficult to recognize sign languages since they contain multiple modalities including hand movements, hand directions, and facial expressions. Second, it is the absence of training data to conduct deep learning research. Currently, KETI dataset is the only known dataset for Korean sign language for deep learning. Sign language datasets for deep learning research are classified into two categories: Isolated sign language and Continuous sign language. Although several foreign sign language datasets have been collected over time. they are also insufficient for deep learning research of sign language. Therefore, we attempted to collect a large-scale Korean sign language dataset and evaluate it using a baseline model named TSPNet which has the performance of SOTA in the field of sign language translation. The collected dataset consists of a total of 11,402 image and text. Our experimental result with the baseline model using the dataset shows BLEU-4 score 3.63, which would be used as a basic performance of a baseline model for Korean sign language dataset. We hope that our experience of collecting Korean sign language dataset helps facilitate further research directions on Korean sign language.

The Estimation of the Population by Using the Estimated Appropriate Rate Based on Customized Classification of Agriculture, Livestock and Food Industry (농축산식품산업 특수분류 기반 추정적격률을 이용한 모집단 추정 )

  • Wee Seong Seung;Lee MinCheol;Kim Jin Min;Shin Yong Tae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.3
    • /
    • pp.117-124
    • /
    • 2023
  • Through reorganization in 2008, The ministry of Agriculture, Food and Rural Affairs integrated management of the food industry by transferred functions which was scattered in the Ministry of Health and Welfare, and established comprehensive policies covering the primary, secondary, and tertiary industries. In the agricultural industry sector, new business concepts such as smart farm and food tech have recently emerged alongside the fourth industrial revolution. In order for the Ministry of Agriculture, Food, and Rural Affairs to develop appropriate policies for the fourth industrial revolution, it is necessary to accurately estimate the size of agricultural and livestock-related businesses. In 2017, the Ministry of Agriculture, Food, and Rural Affairs initiated research for the agriculture, livestock and food industry's special classification, which was approved by the National Statistical Office in 2020. The estimation of the agriculture, livestock and food industry's size based on special classification is crucial because it has a substantial impact on the formulation and significance of policies. In this paper, the appropriate rate was derived from samples extracted from the special classification and the Korean standard industrial classification. Proposed are a method for estimating the population of the agricultural and livestock food industry, as well as a method for calculating the appropriate rate that more accurately reflects the population than the method currently in use.

Comparison of Machine Learning-Based Greenhouse VPD Prediction Models (머신러닝 기반의 온실 VPD 예측 모델 비교)

  • Jang Kyeong Min;Lee Myeong Bae;Lim Jong Hyun;Oh Han Byeol;Shin Chang Sun;Park Jang Woo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.3
    • /
    • pp.125-132
    • /
    • 2023
  • In this study, we compared the performance of machine learning models for predicting Vapor Pressure Deficits (VPD) in greenhouses that affect pore function and photosynthesis as well as plant growth due to nutrient absorption of plants. For VPD prediction, the correlation between the environmental elements in and outside the greenhouse and the temporal elements of the time series data was confirmed, and how the highly correlated elements affect VPD was confirmed. Before analyzing the performance of the prediction model, the amount and interval of analysis time series data (1 day, 3 days, 7 days) and interval (20 minutes, 1 hour) were checked to adjust the amount and interval of data. Finally, four machine learning prediction models (XGB Regressor, LGBM Regressor, Random Forest Regressor, etc.) were applied to compare the prediction performance by model. As a result of the prediction of the model, when data of 1 day at 20 minute intervals were used, the highest prediction performance was 0.008 for MAE and 0.011 for RMSE in LGBM. In addition, it was confirmed that the factor that most influences VPD prediction after 20 minutes was VPD (VPD_y__71) from the past 20 minutes rather than environmental factors. Using the results of this study, it is possible to increase crop productivity through VPD prediction, condensation of greenhouses, and prevention of disease occurrence. In the future, it can be used not only in predicting environmental data of greenhouses, but also in various fields such as production prediction and smart farm control models.

The Application Methods of FarmMap Reading in Agricultural Land Using Deep Learning (딥러닝을 이용한 농경지 팜맵 판독 적용 방안)

  • Wee Seong Seung;Jung Nam Su;Lee Won Suk;Shin Yong Tae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.2
    • /
    • pp.77-82
    • /
    • 2023
  • The Ministry of Agriculture, Food and Rural Affairs established the FarmMap, an digital map of agricultural land. In this study, using deep learning, we suggest the application of farm map reading to farmland such as paddy fields, fields, ginseng, fruit trees, facilities, and uncultivated land. The farm map is used as spatial information for planting status and drone operation by digitizing agricultural land in the real world using aerial and satellite images. A reading manual has been prepared and updated every year by demarcating the boundaries of agricultural land and reading the attributes. Human reading of agricultural land differs depending on reading ability and experience, and reading errors are difficult to verify in reality because of budget limitations. The farmmap has location information and class information of the corresponding object in the image of 5 types of farmland properties, so the suitable AI technique was tested with ResNet50, an instance segmentation model. The results of attribute reading of agricultural land using deep learning and attribute reading by humans were compared. If technology is developed by focusing on attribute reading that shows different results in the future, it is expected that it will play a big role in reducing attribute errors and improving the accuracy of digital map of agricultural land.

Apartment Price Prediction Using Deep Learning and Machine Learning (딥러닝과 머신러닝을 이용한 아파트 실거래가 예측)

  • Hakhyun Kim;Hwankyu Yoo;Hayoung Oh
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.2
    • /
    • pp.59-76
    • /
    • 2023
  • Since the COVID-19 era, the rise in apartment prices has been unconventional. In this uncertain real estate market, price prediction research is very important. In this paper, a model is created to predict the actual transaction price of future apartments after building a vast data set of 870,000 from 2015 to 2020 through data collection and crawling on various real estate sites and collecting as many variables as possible. This study first solved the multicollinearity problem by removing and combining variables. After that, a total of five variable selection algorithms were used to extract meaningful independent variables, such as Forward Selection, Backward Elimination, Stepwise Selection, L1 Regulation, and Principal Component Analysis(PCA). In addition, a total of four machine learning and deep learning algorithms were used for deep neural network(DNN), XGBoost, CatBoost, and Linear Regression to learn the model after hyperparameter optimization and compare predictive power between models. In the additional experiment, the experiment was conducted while changing the number of nodes and layers of the DNN to find the most appropriate number of nodes and layers. In conclusion, as a model with the best performance, the actual transaction price of apartments in 2021 was predicted and compared with the actual data in 2021. Through this, I am confident that machine learning and deep learning will help investors make the right decisions when purchasing homes in various economic situations.

Malicious Traffic Classification Using Mitre ATT&CK and Machine Learning Based on UNSW-NB15 Dataset (마이터 어택과 머신러닝을 이용한 UNSW-NB15 데이터셋 기반 유해 트래픽 분류)

  • Yoon, Dong Hyun;Koo, Ja Hwan;Won, Dong Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.2
    • /
    • pp.99-110
    • /
    • 2023
  • This study proposed a classification of malicious network traffic using the cyber threat framework(Mitre ATT&CK) and machine learning to solve the real-time traffic detection problems faced by current security monitoring systems. We applied a network traffic dataset called UNSW-NB15 to the Mitre ATT&CK framework to transform the label and generate the final dataset through rare class processing. After learning several boosting-based ensemble models using the generated final dataset, we demonstrated how these ensemble models classify network traffic using various performance metrics. Based on the F-1 score, we showed that XGBoost with no rare class processing is the best in the multi-class traffic environment. We recognized that machine learning ensemble models through Mitre ATT&CK label conversion and oversampling processing have differences over existing studies, but have limitations due to (1) the inability to match perfectly when converting between existing datasets and Mitre ATT&CK labels and (2) the presence of excessive sparse classes. Nevertheless, Catboost with B-SMOTE achieved the classification accuracy of 0.9526, which is expected to be able to automatically detect normal/abnormal network traffic.

Design and Implementation of a COncept-based Image Retrieval System: COIRS (개념 기반 이미지 정보 검색 시스템 COIRS의 설계 및 구현)

  • Yang, Hyung-Jeong;Kim, Ho-Young;Yang, Jae-Dong;Hur, Dae-Young
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.12
    • /
    • pp.3025-3035
    • /
    • 1998
  • In this paper, we describe the design and implementationof COIRS COncept,based Image Retricval System). It differs from extant content-based image retrieval systems in that it enables users to query based on concepts- it allows users to get images concepmally relevant. A concept is basically an aggregation of promitive objects in an image. For such a cencept based image retrieval functionality. COIRS aglopts an image descriptor called triple and includes a triple thesaurus used for capturing concepts. There are four facilities in COIRS: a visual image indeses a triple thesaurus, an inverted fiel, and a user query interface. The visnal image indeser facilitates object laeling and the percification of positionof objects. It is an assistant tool designed to minimize manual work when indexing images. The thesarrus captires the concepts by analyzing triples, thereby extracting image semantics. The triples are then for formalating queries as well as indexing images. The user query interiare enables users to formulate...

  • PDF

Quantitative Estimation Method for ML Model Performance Change, Due to Concept Drift (Concept Drift에 의한 ML 모델 성능 변화의 정량적 추정 방법)

  • Soon-Hong An;Hoon-Suk Lee;Seung-Hoon Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.6
    • /
    • pp.259-266
    • /
    • 2023
  • It is very difficult to measure the performance of the machine learning model in the business service stage. Therefore, managing the performance of the model through the operational department is not done effectively. Academically, various studies have been conducted on the concept drift detection method to determine whether the model status is appropriate. The operational department wants to know quantitatively the performance of the operating model, but concept drift can only detect the state of the model in relation to the data, it cannot estimate the quantitative performance of the model. In this study, we propose a performance prediction model (PPM) that quantitatively estimates precision through the statistics of concept drift. The proposed model induces artificial drift in the sampling data extracted from the training data, measures the precision of the sampling data, creates a dataset of drift and precision, and learns it. Then, the difference between the actual precision and the predicted precision is compared through the test data to correct the error of the performance prediction model. The proposed PPM was applied to two models, a loan underwriting model and a credit card fraud detection model that can be used in real business. It was confirmed that the precision was effectively predicted.

Estimation of Illuminant Chromaticity by Equivalent Distance Reference Illumination Map and Color Correlation (균등거리 기준 조명 맵과 색 상관성을 이용한 조명 색도 추정)

  • Kim Jeong Yeop
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.6
    • /
    • pp.267-274
    • /
    • 2023
  • In this paper, a method for estimating the illuminant chromaticity of a scene for an input image is proposed. The illuminant chromaticity is estimated using the illuminant reference region. The conventional method uses a certain number of reference lighting information. By comparing the chromaticity distribution of pixels from the input image with the chromaticity set prepared in advance for the reference illuminant, the reference illuminant with the largest overlapping area is regarded as the scene illuminant for the corresponding input image. In the process of calculating the overlapping area, the weights for each reference light were applied in the form of a Gaussian distribution, but a clear standard for the variance value could not be presented. The proposed method extracts an independent reference chromaticity region from a given reference illuminant, calculates the characteristic values in the r-g chromaticity plane of the RGB color coordinate system for all pixels of the input image, and then calculates the independent chromaticity region and features from the input image. The similarity is evaluated and the illuminant with the highest similarity was estimated as the illuminant chromaticity component of the image. The performance of the proposed method was evaluated using the database image and showed an average of about 60% improvement compared to the conventional basic method and showed an improvement performance of around 53% compared to the conventional Gaussian weight of 0.1.