• Title/Summary/Keyword: Automatic Machine Learning

Search Result 298, Processing Time 0.026 seconds

Building an Analytical Platform of Big Data for Quality Inspection in the Dairy Industry: A Machine Learning Approach (유제품 산업의 품질검사를 위한 빅데이터 플랫폼 개발: 머신러닝 접근법)

  • Hwang, Hyunseok;Lee, Sangil;Kim, Sunghyun;Lee, Sangwon
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.125-140
    • /
    • 2018
  • As one of the processes in the manufacturing industry, quality inspection inspects the intermediate products or final products to separate the good-quality goods that meet the quality management standard and the defective goods that do not. The manual inspection of quality in a mass production system may result in low consistency and efficiency. Therefore, the quality inspection of mass-produced products involves automatic checking and classifying by the machines in many processes. Although there are many preceding studies on improving or optimizing the process using the data generated in the production process, there have been many constraints with regard to actual implementation due to the technical limitations of processing a large volume of data in real time. The recent research studies on big data have improved the data processing technology and enabled collecting, processing, and analyzing process data in real time. This paper aims to propose the process and details of applying big data for quality inspection and examine the applicability of the proposed method to the dairy industry. We review the previous studies and propose a big data analysis procedure that is applicable to the manufacturing sector. To assess the feasibility of the proposed method, we applied two methods to one of the quality inspection processes in the dairy industry: convolutional neural network and random forest. We collected, processed, and analyzed the images of caps and straws in real time, and then determined whether the products were defective or not. The result confirmed that there was a drastic increase in classification accuracy compared to the quality inspection performed in the past.

Building the Outlier Candidate Discrimination Training Data based on Inventory for Automatic Classification of Transferred Records (이관 기록물 분류 자동화를 위한 목록 기반 이상치 판별 학습데이터 구축)

  • Jeong, Ji-Hye;Lee, Gemma;Wang, Hosung;Oh, Hyo-Jung
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.22 no.1
    • /
    • pp.43-59
    • /
    • 2022
  • Electronic public records are classified simultaneously as production, a preservation period is granted, and after a certain period, they are transferred to an archive and preserved. This study intends to find a way to improve the efficiency in classifying transferred records and maintain consistent standards. To this end, the current record classification work process carried out by the National Archives of Korea was analyzed, and problems were identified. As a way to minimize the manual work of record classification by converging the required improvement, the process of identifying outlier candidates based on a list consisting of classified information of the transferred records was proposed and systemized. Furthermore, the proposed outlier discrimination process was applied to the actual records transferred to the National Archives of Korea. The results were standardized and constructed as a training data format that can be used for machine learning in the future.

A Ship-Wake Joint Detection Using Sentinel-2 Imagery

  • Woojin, Jeon;Donghyun, Jin;Noh-hun, Seong;Daeseong, Jung;Suyoung, Sim;Jongho, Woo;Yugyeong, Byeon;Nayeon, Kim;Kyung-Soo, Han
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.1
    • /
    • pp.77-86
    • /
    • 2023
  • Ship detection is widely used in areas such as maritime security, maritime traffic, fisheries management, illegal fishing, and border control, and ship detection is important for rapid response and damage minimization as ship accident rates increase due to recent increases in international maritime traffic. Currently, according to a number of global and national regulations, ships must be equipped with automatic identification system (AIS), which provide information such as the location and speed of the ship periodically at regular intervals. However, most small vessels (less than 300 tons) are not obligated to install the transponder and may not be transmitted intentionally or accidentally. There is even a case of misuse of the ship'slocation information. Therefore, in this study, ship detection was performed using high-resolution optical satellite images that can periodically remotely detect a wide range and detectsmallships. However, optical images can cause false-alarm due to noise on the surface of the sea, such as waves, or factors indicating ship-like brightness, such as clouds and wakes. So, it is important to remove these factors to improve the accuracy of ship detection. In this study, false alarm wasreduced, and the accuracy ofship detection wasimproved by removing wake.As a ship detection method, ship detection was performed using machine learning-based random forest (RF), and convolutional neural network (CNN) techniquesthat have been widely used in object detection fieldsrecently, and ship detection results by the model were compared and analyzed. In addition, in this study, the results of RF and CNN were combined to improve the phenomenon of ship disconnection and the phenomenon of small detection. The ship detection results of thisstudy are significant in that they improved the limitations of each model while maintaining accuracy. In addition, if satellite images with improved spatial resolution are utilized in the future, it is expected that ship and wake simultaneous detection with higher accuracy will be performed.

Study of Smart Integration processing Systems for Sensor Data (센서 데이터를 위한 스마트 통합 처리 시스템 연구)

  • Ji, Hyo-Sang;Kim, Jae-Sung;Kim, Ri-Won;Kim, Jeong-Joon;Han, Ik-Joo;Park, Jeong-Min
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.8
    • /
    • pp.327-342
    • /
    • 2017
  • In this paper, we introduce an integrated processing system of smart sensor data for IoT service which collects sensor data and efficiently processes it. Based on the technology of collecting sensor data to the development of the IoT field and sending it to the network · Based on the receiving technology, as various projects such as smart homes, autonomous running vehicles progress, the sensor data is processed and effectively An autonomous control system to utilize has been a problem. However, since the data type of the sensor for monitoring the autonomous control system varies according to the domain, a sensor data integration processing system applying the autonomous control system to various different domains is necessary. Therefore, in this paper, we introduce the Smart Sensor Data Integrated Processing System, apply it and use the window as a reference to process internal and external sensor data 1) receiveData, 2) parseData, 3) addToDatabase 3 With the process of the stage, we provide and implement the automatic window opening / closing system "Smart Window" which ventilates to create a comfortable indoor environment by autonomous control system. As a result, standby information is collected and monitored, and machine learning for performing statistical analysis and better autonomous control based on the stored data is made possible.

A Study on Automatic Classification Model of Documents Based on Korean Standard Industrial Classification (한국표준산업분류를 기준으로 한 문서의 자동 분류 모델에 관한 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.221-241
    • /
    • 2018
  • As we enter the knowledge society, the importance of information as a new form of capital is being emphasized. The importance of information classification is also increasing for efficient management of digital information produced exponentially. In this study, we tried to automatically classify and provide tailored information that can help companies decide to make technology commercialization. Therefore, we propose a method to classify information based on Korea Standard Industry Classification (KSIC), which indicates the business characteristics of enterprises. The classification of information or documents has been largely based on machine learning, but there is not enough training data categorized on the basis of KSIC. Therefore, this study applied the method of calculating similarity between documents. Specifically, a method and a model for presenting the most appropriate KSIC code are proposed by collecting explanatory texts of each code of KSIC and calculating the similarity with the classification object document using the vector space model. The IPC data were collected and classified by KSIC. And then verified the methodology by comparing it with the KSIC-IPC concordance table provided by the Korean Intellectual Property Office. As a result of the verification, the highest agreement was obtained when the LT method, which is a kind of TF-IDF calculation formula, was applied. At this time, the degree of match of the first rank matching KSIC was 53% and the cumulative match of the fifth ranking was 76%. Through this, it can be confirmed that KSIC classification of technology, industry, and market information that SMEs need more quantitatively and objectively is possible. In addition, it is considered that the methods and results provided in this study can be used as a basic data to help the qualitative judgment of experts in creating a linkage table between heterogeneous classification systems.

Prediction of Spring Flowering Timing in Forested Area in 2023 (산림지역에서의 2023년 봄철 꽃나무 개화시기 예측)

  • Jihee Seo;Sukyung Kim;Hyun Seok Kim;Junghwa Chun;Myoungsoo Won;Keunchang Jang
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.25 no.4
    • /
    • pp.427-435
    • /
    • 2023
  • Changes in flowering time due to weather fluctuations impact plant growth and ecosystem dynamics. Accurate prediction of flowering timing is crucial for effective forest ecosystem management. This study uses a process-based model to predict flowering timing in 2023 for five major tree species in Korean forests. Models are developed based on nine years (2009-2017) of flowering data for Abeliophyllum distichum, Robinia pseudoacacia, Rhododendron schlippenbachii, Rhododendron yedoense f. poukhanense, and Sorbus commixta, distributed across 28 regions in the country, including mountains. Weather data from the Automatic Mountain Meteorology Observation System (AMOS) and the Korea Meteorological Administration (KMA) are utilized as inputs for the models. The Single Triangle Degree Days (STDD) and Growing Degree Days (GDD) models, known for their superior performance, are employed to predict flowering dates. Daily temperature readings at a 1 km spatial resolution are obtained by merging AMOS and KMA data. To improve prediction accuracy nationwide, random forest machine learning is used to generate region-specific correction coefficients. Applying these coefficients results in minimal prediction errors, particularly for Abeliophyllum distichum, Robinia pseudoacacia, and Rhododendron schlippenbachii, with root mean square errors (RMSEs) of 1.2, 0.6, and 1.2 days, respectively. Model performance is evaluated using ten random sampling tests per species, selecting the model with the highest R2. The models with applied correction coefficients achieve R2 values ranging from 0.07 to 0.7, except for Sorbus commixta, and exhibit a final explanatory power of 0.75-0.9. This study provides valuable insights into seasonal changes in plant phenology, aiding in identifying honey harvesting seasons affected by abnormal weather conditions, such as those of Robinia pseudoacacia. Detailed information on flowering timing for various plant species and regions enhances understanding of the climate-plant phenology relationship.

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

Automated Analyses of Ground-Penetrating Radar Images to Determine Spatial Distribution of Buried Cultural Heritage (매장 문화재 공간 분포 결정을 위한 지하투과레이더 영상 분석 자동화 기법 탐색)

  • Kwon, Moonhee;Kim, Seung-Sep
    • Economic and Environmental Geology
    • /
    • v.55 no.5
    • /
    • pp.551-561
    • /
    • 2022
  • Geophysical exploration methods are very useful for generating high-resolution images of underground structures, and such methods can be applied to investigation of buried cultural properties and for determining their exact locations. In this study, image feature extraction and image segmentation methods were applied to automatically distinguish the structures of buried relics from the high-resolution ground-penetrating radar (GPR) images obtained at the center of Silla Kingdom, Gyeongju, South Korea. The major purpose for image feature extraction analyses is identifying the circular features from building remains and the linear features from ancient roads and fences. Feature extraction is implemented by applying the Canny edge detection and Hough transform algorithms. We applied the Hough transforms to the edge image resulted from the Canny algorithm in order to determine the locations the target features. However, the Hough transform requires different parameter settings for each survey sector. As for image segmentation, we applied the connected element labeling algorithm and object-based image analysis using Orfeo Toolbox (OTB) in QGIS. The connected components labeled image shows the signals associated with the target buried relics are effectively connected and labeled. However, we often find multiple labels are assigned to a single structure on the given GPR data. Object-based image analysis was conducted by using a Large-Scale Mean-Shift (LSMS) image segmentation. In this analysis, a vector layer containing pixel values for each segmented polygon was estimated first and then used to build a train-validation dataset by assigning the polygons to one class associated with the buried relics and another class for the background field. With the Random Forest Classifier, we find that the polygons on the LSMS image segmentation layer can be successfully classified into the polygons of the buried relics and those of the background. Thus, we propose that these automatic classification methods applied to the GPR images of buried cultural heritage in this study can be useful to obtain consistent analyses results for planning excavation processes.