• Title/Summary/Keyword: 다양한 수작업

Search Result 287, Processing Time 0.021 seconds

A Methodology for Automatic Multi-Categorization of Single-Categorized Documents (단일 카테고리 문서의 다중 카테고리 자동확장 방법론)

  • Hong, Jin-Sung;Kim, Namgyu;Lee, Sangwon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.77-92
    • /
    • 2014
  • Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we propose a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. First, we attempt to find the relationship between documents and topics by using the result of topic analysis for single-categorized documents. Second, we construct a correspondence table between topics and categories by investigating the relationship between them. Finally, we calculate the matching scores for each document to multiple categories. The results imply that a document can be classified into a certain category if and only if the matching score is higher than the predefined threshold. For example, we can classify a certain document into three categories that have larger matching scores than the predefined threshold. The main contribution of our study is that our methodology can improve the applicability of traditional multi-category classifiers by generating multi-categorized documents from single-categorized documents. Additionally, we propose a module for verifying the accuracy of the proposed methodology. For performance evaluation, we performed intensive experiments with news articles. News articles are clearly categorized based on the theme, whereas the use of vulgar language and slang is smaller than other usual text document. We collected news articles from July 2012 to June 2013. The articles exhibit large variations in terms of the number of types of categories. This is because readers have different levels of interest in each category. Additionally, the result is also attributed to the differences in the frequency of the events in each category. In order to minimize the distortion of the result from the number of articles in different categories, we extracted 3,000 articles equally from each of the eight categories. Therefore, the total number of articles used in our experiments was 24,000. The eight categories were "IT Science," "Economy," "Society," "Life and Culture," "World," "Sports," "Entertainment," and "Politics." By using the news articles that we collected, we calculated the document/category correspondence scores by utilizing topic/category and document/topics correspondence scores. The document/category correspondence score can be said to indicate the degree of correspondence of each document to a certain category. As a result, we could present two additional categories for each of the 23,089 documents. Precision, recall, and F-score were revealed to be 0.605, 0.629, and 0.617 respectively when only the top 1 predicted category was evaluated, whereas they were revealed to be 0.838, 0.290, and 0.431 when the top 1 - 3 predicted categories were considered. It was very interesting to find a large variation between the scores of the eight categories on precision, recall, and F-score.

RAUT: An end-to-end tool for automated parsing and uploading river cross-sectional survey in AutoCAD format to river information system for supporting HEC-RAS operation (하천정비기본계획 CAD 형식 단면 측량자료 자동 추출 및 하천공간 데이터베이스 업로딩과 HEC-RAS 지원을 위한 RAUT 툴 개발)

  • Kim, Kyungdong;Kim, Dongsu;You, Hojun
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.12
    • /
    • pp.1339-1348
    • /
    • 2021
  • In accordance with the River Law, the basic river maintenance plan is established every 5-10 years with a considerable national budget for domestic rivers, and various river surveys such as the river section required for HEC-RAS simulation for flood level calculation are being conducted. However, river survey data are provided only in the form of a pdf report to the River Management Geographic Information System (RIMGIS), and the original data are distributedly owned by designers who performed the river maintenance plan in CAD format. It is a situation that the usability for other purposes is considerably lowered. In addition, when using surveyed CAD-type cross-sectional data for HEC-RAS, tools such as 'Dream' are used, but the reality is that time and cost are almost as close as manual work. In this study, RAUT (River Information Auto Upload Tool), a tool that can solve these problems, was developed. First, the RAUT tool attempted to automate the complicated steps of manually inputting CAD survey data and simulating the input data of the HEC-RAS one-dimensional model used in establishing the basic river plan in practice. Second, it is possible to directly read CAD survey data, which is river spatial information, and automatically upload it to the river spatial information DB based on the standard data model (ArcRiver), enabling the management of river survey data in the river maintenance plan at the national level. In other words, if RIMGIS uses a tool such as RAUT, it will be able to systematically manage national river survey data such as river section. The developed RAUT reads the river spatial information CAD data of the river maintenance master plan targeting the Jeju-do agar basin, builds it into a mySQL-based spatial DB, and automatically generates topographic data for HEC-RAS one-dimensional simulation from the built DB. A pilot process was implemented.

Current status and future of insect smart factory farm using ICT technology (ICT기술을 활용한 곤충스마트팩토리팜의 현황과 미래)

  • Seok, Young-Seek
    • Food Science and Industry
    • /
    • v.55 no.2
    • /
    • pp.188-202
    • /
    • 2022
  • In the insect industry, as the scope of application of insects is expanded from pet insects and natural enemies to feed, edible and medicinal insects, the demand for quality control of insect raw materials is increasing, and interest in securing the safety of insect products is increasing. In the process of expanding the industrial scale, controlling the temperature and humidity and air quality in the insect breeding room and preventing the spread of pathogens and other pollutants are important success factors. It requires a controlled environment under the operating system. European commercial insect breeding facilities have attracted considerable investor interest, and insect companies are building large-scale production facilities, which became possible after the EU approved the use of insect protein as feedstock for fish farming in July 2017. Other fields, such as food and medicine, have also accelerated the application of cutting-edge technology. In the future, the global insect industry will purchase eggs or small larvae from suppliers and a system that focuses on the larval fattening, i.e., production raw material, until the insects mature, and a system that handles the entire production process from egg laying, harvesting, and initial pre-treatment of larvae., increasingly subdivided into large-scale production systems that cover all stages of insect larvae production and further processing steps such as milling, fat removal and protein or fat fractionation. In Korea, research and development of insect smart factory farms using artificial intelligence and ICT is accelerating, so insects can be used as carbon-free materials in secondary industries such as natural plastics or natural molding materials as well as existing feed and food. A Korean-style customized breeding system for shortening the breeding period or enhancing functionality is expected to be developed soon.

A Short Composting Method by the Single Phase Composter for the Production of Oyster Mushroom (느타리버섯 배지 제조기를 이용한 배지의 제조 연구)

  • Lee, Ho-Yong;Shin, Chang-Yup;Lee, Young-Keun;Chang, Hwa-Hyoung;Min, Bong-Hee
    • The Korean Journal of Mycology
    • /
    • v.27 no.1 s.88
    • /
    • pp.10-14
    • /
    • 1999
  • A single phase composter was constructed by modifying the conventional mixer of sawdust for the cultivation of oyster mushroom Pleurotus ostreatus. The machine was designed on the basis of 3-phase-1 system which was controlled in prewetting, pasteurization and fermentation processes. In composting 200 kg of straw and cotton waste in the machine, it took 20 minutes in prewetting step and also to hours at $65^{\circ}C$ in pasteurization process. Postfermentation by aerothermophiles was completed by treating the compost at $45^{\circ}C-50^{\circ}C$ for 48 hours which was shorten 24 hours from the conventional method. In the postfermentation at high temperature, forced aeration and/or vigorous mixing process(es) played a great role in the improvement of spawn quality. The growth of mycelium of oyster mushroom was excellent in the culture combinated with 3 parts of surface inoculation and 7 parts of mechanical mixing.

  • PDF

An Outlier Detection Using Autoencoder for Ocean Observation Data (해양 이상 자료 탐지를 위한 오토인코더 활용 기법 최적화 연구)

  • Kim, Hyeon-Jae;Kim, Dong-Hoon;Lim, Chaewook;Shin, Yongtak;Lee, Sang-Chul;Choi, Youngjin;Woo, Seung-Buhm
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.33 no.6
    • /
    • pp.265-274
    • /
    • 2021
  • Outlier detection research in ocean data has traditionally been performed using statistical and distance-based machine learning algorithms. Recently, AI-based methods have received a lot of attention and so-called supervised learning methods that require classification information for data are mainly used. This supervised learning method requires a lot of time and costs because classification information (label) must be manually designated for all data required for learning. In this study, an autoencoder based on unsupervised learning was applied as an outlier detection to overcome this problem. For the experiment, two experiments were designed: one is univariate learning, in which only SST data was used among the observation data of Deokjeok Island and the other is multivariate learning, in which SST, air temperature, wind direction, wind speed, air pressure, and humidity were used. Period of data is 25 years from 1996 to 2020, and a pre-processing considering the characteristics of ocean data was applied to the data. An outlier detection of actual SST data was tried with a learned univariate and multivariate autoencoder. We tried to detect outliers in real SST data using trained univariate and multivariate autoencoders. To compare model performance, various outlier detection methods were applied to synthetic data with artificially inserted errors. As a result of quantitatively evaluating the performance of these methods, the multivariate/univariate accuracy was about 96%/91%, respectively, indicating that the multivariate autoencoder had better outlier detection performance. Outlier detection using an unsupervised learning-based autoencoder is expected to be used in various ways in that it can reduce subjective classification errors and cost and time required for data labeling.

The way to make training data for deep learning model to recognize keywords in product catalog image at E-commerce (온라인 쇼핑몰에서 상품 설명 이미지 내의 키워드 인식을 위한 딥러닝 훈련 데이터 자동 생성 방안)

  • Kim, Kitae;Oh, Wonseok;Lim, Geunwon;Cha, Eunwoo;Shin, Minyoung;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.1-23
    • /
    • 2018
  • From the 21st century, various high-quality services have come up with the growth of the internet or 'Information and Communication Technologies'. Especially, the scale of E-commerce industry in which Amazon and E-bay are standing out is exploding in a large way. As E-commerce grows, Customers could get what they want to buy easily while comparing various products because more products have been registered at online shopping malls. However, a problem has arisen with the growth of E-commerce. As too many products have been registered, it has become difficult for customers to search what they really need in the flood of products. When customers search for desired products with a generalized keyword, too many products have come out as a result. On the contrary, few products have been searched if customers type in details of products because concrete product-attributes have been registered rarely. In this situation, recognizing texts in images automatically with a machine can be a solution. Because bulk of product details are written in catalogs as image format, most of product information are not searched with text inputs in the current text-based searching system. It means if information in images can be converted to text format, customers can search products with product-details, which make them shop more conveniently. There are various existing OCR(Optical Character Recognition) programs which can recognize texts in images. But existing OCR programs are hard to be applied to catalog because they have problems in recognizing texts in certain circumstances, like texts are not big enough or fonts are not consistent. Therefore, this research suggests the way to recognize keywords in catalog with the Deep Learning algorithm which is state of the art in image-recognition area from 2010s. Single Shot Multibox Detector(SSD), which is a credited model for object-detection performance, can be used with structures re-designed to take into account the difference of text from object. But there is an issue that SSD model needs a lot of labeled-train data to be trained, because of the characteristic of deep learning algorithms, that it should be trained by supervised-learning. To collect data, we can try labelling location and classification information to texts in catalog manually. But if data are collected manually, many problems would come up. Some keywords would be missed because human can make mistakes while labelling train data. And it becomes too time-consuming to collect train data considering the scale of data needed or costly if a lot of workers are hired to shorten the time. Furthermore, if some specific keywords are needed to be trained, searching images that have the words would be difficult, as well. To solve the data issue, this research developed a program which create train data automatically. This program can make images which have various keywords and pictures like catalog and save location-information of keywords at the same time. With this program, not only data can be collected efficiently, but also the performance of SSD model becomes better. The SSD model recorded 81.99% of recognition rate with 20,000 data created by the program. Moreover, this research had an efficiency test of SSD model according to data differences to analyze what feature of data exert influence upon the performance of recognizing texts in images. As a result, it is figured out that the number of labeled keywords, the addition of overlapped keyword label, the existence of keywords that is not labeled, the spaces among keywords and the differences of background images are related to the performance of SSD model. This test can lead performance improvement of SSD model or other text-recognizing machine based on deep learning algorithm with high-quality data. SSD model which is re-designed to recognize texts in images and the program developed for creating train data are expected to contribute to improvement of searching system in E-commerce. Suppliers can put less time to register keywords for products and customers can search products with product-details which is written on the catalog.

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.