• Title/Summary/Keyword: 이상탐지분석

Search Result 610, Processing Time 0.027 seconds

A Study on Searching for Export Candidate Countries of the Korean Food and Beverage Industry Using Node2vec Graph Embedding and Light GBM Link Prediction (Node2vec 그래프 임베딩과 Light GBM 링크 예측을 활용한 식음료 산업의 수출 후보국가 탐색 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Seo, Jinny
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.73-95
    • /
    • 2021
  • This study uses Node2vec graph embedding method and Light GBM link prediction to explore undeveloped export candidate countries in Korea's food and beverage industry. Node2vec is the method that improves the limit of the structural equivalence representation of the network, which is known to be relatively weak compared to the existing link prediction method based on the number of common neighbors of the network. Therefore, the method is known to show excellent performance in both community detection and structural equivalence of the network. The vector value obtained by embedding the network in this way operates under the condition of a constant length from an arbitrarily designated starting point node. Therefore, it has the advantage that it is easy to apply the sequence of nodes as an input value to the model for downstream tasks such as Logistic Regression, Support Vector Machine, and Random Forest. Based on these features of the Node2vec graph embedding method, this study applied the above method to the international trade information of the Korean food and beverage industry. Through this, we intend to contribute to creating the effect of extensive margin diversification in Korea in the global value chain relationship of the industry. The optimal predictive model derived from the results of this study recorded a precision of 0.95 and a recall of 0.79, and an F1 score of 0.86, showing excellent performance. This performance was shown to be superior to that of the binary classifier based on Logistic Regression set as the baseline model. In the baseline model, a precision of 0.95 and a recall of 0.73 were recorded, and an F1 score of 0.83 was recorded. In addition, the light GBM-based optimal prediction model derived from this study showed superior performance than the link prediction model of previous studies, which is set as a benchmarking model in this study. The predictive model of the previous study recorded only a recall rate of 0.75, but the proposed model of this study showed better performance which recall rate is 0.79. The difference in the performance of the prediction results between benchmarking model and this study model is due to the model learning strategy. In this study, groups were classified by the trade value scale, and prediction models were trained differently for these groups. Specific methods are (1) a method of randomly masking and learning a model for all trades without setting specific conditions for trade value, (2) arbitrarily masking a part of the trades with an average trade value or higher and using the model method, and (3) a method of arbitrarily masking some of the trades with the top 25% or higher trade value and learning the model. As a result of the experiment, it was confirmed that the performance of the model trained by randomly masking some of the trades with the above-average trade value in this method was the best and appeared stably. It was found that most of the results of potential export candidates for Korea derived through the above model appeared appropriate through additional investigation. Combining the above, this study could suggest the practical utility of the link prediction method applying Node2vec and Light GBM. In addition, useful implications could be derived for weight update strategies that can perform better link prediction while training the model. On the other hand, this study also has policy utility because it is applied to trade transactions that have not been performed much in the research related to link prediction based on graph embedding. The results of this study support a rapid response to changes in the global value chain such as the recent US-China trade conflict or Japan's export regulations, and I think that it has sufficient usefulness as a tool for policy decision-making.

A Checklist to Improve the Fairness in AI Financial Service: Focused on the AI-based Credit Scoring Service (인공지능 기반 금융서비스의 공정성 확보를 위한 체크리스트 제안: 인공지능 기반 개인신용평가를 중심으로)

  • Kim, HaYeong;Heo, JeongYun;Kwon, Hochang
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.259-278
    • /
    • 2022
  • With the spread of Artificial Intelligence (AI), various AI-based services are expanding in the financial sector such as service recommendation, automated customer response, fraud detection system(FDS), credit scoring services, etc. At the same time, problems related to reliability and unexpected social controversy are also occurring due to the nature of data-based machine learning. The need Based on this background, this study aimed to contribute to improving trust in AI-based financial services by proposing a checklist to secure fairness in AI-based credit scoring services which directly affects consumers' financial life. Among the key elements of trustworthy AI like transparency, safety, accountability, and fairness, fairness was selected as the subject of the study so that everyone could enjoy the benefits of automated algorithms from the perspective of inclusive finance without social discrimination. We divided the entire fairness related operation process into three areas like data, algorithms, and user areas through literature research. For each area, we constructed four detailed considerations for evaluation resulting in 12 checklists. The relative importance and priority of the categories were evaluated through the analytic hierarchy process (AHP). We use three different groups: financial field workers, artificial intelligence field workers, and general users which represent entire financial stakeholders. According to the importance of each stakeholder, three groups were classified and analyzed, and from a practical perspective, specific checks such as feasibility verification for using learning data and non-financial information and monitoring new inflow data were identified. Moreover, financial consumers in general were found to be highly considerate of the accuracy of result analysis and bias checks. We expect this result could contribute to the design and operation of fair AI-based financial services.

Improvement and Validation of Convective Rainfall Rate Retrieved from Visible and Infrared Image Bands of the COMS Satellite (COMS 위성의 가시 및 적외 영상 채널로부터 복원된 대류운의 강우강도 향상과 검증)

  • Moon, Yun Seob;Lee, Kangyeol
    • Journal of the Korean earth science society
    • /
    • v.37 no.7
    • /
    • pp.420-433
    • /
    • 2016
  • The purpose of this study is to improve the calibration matrixes of 2-D and 3-D convective rainfall rates (CRR) using the brightness temperature of the infrared $10.8{\mu}m$ channel (IR), the difference of brightness temperatures between infrared $10.8{\mu}m$ and vapor $6.7{\mu}m$ channels (IR-WV), and the normalized reflectance of the visible channel (VIS) from the COMS satellite and rainfall rate from the weather radar for the period of 75 rainy days from April 22, 2011 to October 22, 2011 in Korea. Especially, the rainfall rate data of the weather radar are used to validate the new 2-D and 3-DCRR calibration matrixes suitable for the Korean peninsula for the period of 24 rainy days in 2011. The 2D and 3D calibration matrixes provide the basic and maximum CRR values ($mm\;h^{-1}$) by multiplying the rain probability matrix, which is calculated by using the number of rainy and no-rainy pixels with associated 2-D (IR, IR-WV) and 3-D (IR, IR-WV, VIS) matrixes, by the mean and maximum rainfall rate matrixes, respectively, which is calculated by dividing the accumulated rainfall rate by the number of rainy pixels and by the product of the maximum rain rate for the calibration period by the number of rain occurrences. Finally, new 2-D and 3-D CRR calibration matrixes are obtained experimentally from the regression analysis of both basic and maximum rainfall rate matrixes. As a result, an area of rainfall rate more than 10 mm/h is magnified in the new ones as well as CRR is shown in lower class ranges in matrixes between IR brightness temperature and IR-WV brightness temperature difference than the existing ones. Accuracy and categorical statistics are computed for the data of CRR events occurred during the given period. The mean error (ME), mean absolute error (MAE), and root mean squire error (RMSE) in new 2-D and 3-D CRR calibrations led to smaller than in the existing ones, where false alarm ratio had decreased, probability of detection had increased a bit, and critical success index scores had improved. To take into account the strong rainfall rate in the weather events such as thunderstorms and typhoon, a moisture correction factor is corrected. This factor is defined as the product of the total precipitable waterby the relative humidity (PW RH), a mean value between surface and 500 hPa level, obtained from a numerical model or the COMS retrieval data. In this study, when the IR cloud top brightness temperature is lower than 210 K and the relative humidity is greater than 40%, the moisture correction factor is empirically scaled from 1.0 to 2.0 basing on PW RH values. Consequently, in applying to this factor in new 2D and 2D CRR calibrations, the ME, MAE, and RMSE are smaller than the new ones.

Cloud-cell Tracking Analysis using Satellite Image of Extreme Heavy Snowfall in the Yeongdong Region (영동지역의 극한 대설에 대한 위성관측으로부터 구름 추적)

  • Cho, Young-Jun;Kwon, Tae-Yong
    • Korean Journal of Remote Sensing
    • /
    • v.30 no.1
    • /
    • pp.83-107
    • /
    • 2014
  • This study presents spatial characteristics of cloud using satellite image in the extreme heavy snowfall of the Yeongdong region. 3 extreme heavy snowfall events in the Yeongdong region during the recent 12 years (2001 ~ 2012) are selected for which the fresh snow cover exceed 50 cm/day. Spatial characteristics (minimum brightness temperature; Tmin, cloud size, center of cloud-cell) of cloud are analyzed by tracking main cloud-cell related with these events. These characteristics are compared with radar precipitation in the Yeongdong region to investigate relationship between cloud and precipitation. The results are summarized as follows, selected extreme heavy snowfall events are associated with the isolated, well-developed, and small-scale convective cloud which is developing over the Yeongdong region or moving from over East Korea Bay to the Yeongdong region. During the period of main precipitation, cloud-cell Tmin is low ($-40{\sim}-50^{\circ}C$) and cloud area is small (17,000 ~ 40,000 $km^2$). Precipitation area (${\geq}$ 0.5 mm/hr) from radar also shows small and isolated shape (4,000 ~ 8,000 $km^2$). The locations of the cloud and precipitation are similar, but in there centers are located closely to the coast of the Yeongdong region. In all events the extreme heavy snowfall occur in the period a developed cloud-cell was moving into the coastal waters of the Yeongdong. However, it was found that developing stage of cloud and precipitation are not well matched each other in one of 3 events. Water vapor image shows that cloud-cell is developed on the northern edge of the dry(dark) region. Therefore, at the result analyzed from cloud and precipitation, selected extreme heavy snowfall events are associated with small-scale secondary cyclone or vortex, not explosive polar low. Detection and tracking small-scale cloud-cell in the real-time forecasting of the Yeongdong extreme heavy snowfall is important.

Migration of the Dokdo Cold Eddy in the East Sea (동해 독도 냉수성 소용돌이의 이동 특성)

  • KIM, JAEMIN;CHOI, BYOUNG-JU;LEE, SANG-HO;BYUN, DO-SEONG;KANG, BOONSOON
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.24 no.2
    • /
    • pp.351-373
    • /
    • 2019
  • The cold eddies around the Ulleung Basin in the East Sea were identified from satellite altimeter sea level data using the Winding-Angle method from 1993 to 2015. Among the cold eddies, the Dokdo Cold Eddies (DCEs), which were formed at the first meandering trough of the East Korea Warm Current (EKWC) and were pinched off to the southwest from the eastward flow, were classified and their migration patterns were analyzed. The vertical structures of water temperature, salinity, and flow velocity near the DCE center were also examined using numerical simulation and observation data provided by the Hybrid Coordinate Ocean Model and the National Institute of Fisheries Science, respectively. A total of 112 DCEs were generated for 23 years. Of these, 39 DCEs migrated westward and arrived off the east coast of Korea. The average travel distance was 250.9 km, the average lifespan was 93 days, and the average travel speed was 3.5 cm/s. The other 73 DCEs had moved to the east or had hovered around the generated location until they disappeared. At 50-100 m depth under the DCE, water temperature and salinity (T < $5^{\circ}C$, S < 34.1) were lower than those of ambient water and isotherms made a dome shape. Current faster than 10 cm/s circulates counterclockwise from the surface to 300 m depth at 38 km away from the center of DCE. After the EKWC separates from the coast, it flows eastward and starts to meander near Ulleungdo. The first trough of the meander in the east of Ulleungdo is pushed deep into the southwest and forms a cold eddy (DCE), which is shed from the meander in the south of Ulleungdo. While a DCE moves westward, it circumvents the Ulleung Warm Eddy (UWE) clockwise and follows U shape path toward the east coast of Korea. When the DCE arrives near the coast, the EKWC separates from the coast at the south of DCE and circumvents the DCE. As the DCE near the coast weakens and extinguishes about 30 days later after the arrival, the EKWC flows northward along the coast recovering its original path. The DCE steadily transports heat and salt from the north to the south, which helps to form a cold water region in the southwest of the Ulleung Basin and brings positive vorticity to change the separation latitude and path of the EKWC. Some of the DCEs moving to the west were merged into a coastal cold eddy to form a wide cold water region in the west of Ulleung Basin and to create a elongated anticlockwise circulation, which separated the UWE in the north from the EKWC in the south.

A Study on the Possibility of Short-term Monitoring of Coastal Topography Changes Using GOCI-II (GOCI-II를 활용한 단기 연안지형변화 모니터링 가능성 평가 연구)

  • Lee, Jingyo;Kim, Keunyong;Ryu, Joo-Hyung
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.5_2
    • /
    • pp.1329-1340
    • /
    • 2021
  • The intertidal zone, which is a transitional zone between the ocean and the land, requires continuous monitoring as various changes occur rapidly due to artificial activity and natural disturbance. Monitoring of coastal topography changes using remote sensing method is evaluated to be effective in overcoming the limitations of intertidal zone accessibility and observing long-term topographic changes in intertidal zone. Most of the existing coastal topographic monitoring studies using remote sensing were conducted through high spatial resolution images such as Landsat and Sentinel. This study extracted the waterline using the NDWI from the GOCI-II (Geostationary Ocean Color Satellite-II) data, identified the changes in the intertidal area in Gyeonggi Bay according to various tidal heights, and examined the utility of DEM generation and topography altitude change observation over a short period of time. GOCI-II (249 scenes), Sentinel-2A/B (39 scenes), Landsat 8 OLI (7 scenes) images were obtained around Gyeonggi Bay from October 8, 2020 to August 16, 2021. If generating intertidal area DEM, Sentinel and Landsat images required at least 3 months to 1 year of data collection, but the GOCI-II satellite was able to generate intertidal area DEM in Gyeonggi Bay using only one day of data according to tidal heights, and the topography altitude was also observed through exposure frequency. When observing coastal topography changes using the GOCI-II satellite, it would be a good idea to detect topography changes early through a short cycle and to accurately interpolate and utilize insufficient spatial resolutions using multi-remote sensing data of high resolution. Based on the above results, it is expected that it will be possible to quickly provide information necessary for the latest topographic map and coastal management of the Korean Peninsula by expanding the research area and developing technologies that can be automatically analyzed and detected.

Extraction of Water Body Area using Micro Satellite SAR: A Case Study of the Daecheng Dam of South korea (초소형 SAR 위성을 활용한 수체면적 추출: 대청댐 유역 대상)

  • PARK, Jongsoo;KANG, Ki-Mook;HWANG, Eui-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.24 no.4
    • /
    • pp.41-54
    • /
    • 2021
  • It is very essential to estimate the water body area using remote exploration for water resource management, analysis and prediction of water disaster damage. Hydrophysical detection using satellites has been mainly performed on large satellites equipped with optical and SAR sensors. However, due to the long repeat cycle, there is a limitation that timely utilization is impossible in the event of a disaster/disaster. With the recent active development of Micro satellites, it has served as an opportunity to overcome the limitations of time resolution centered on existing large satellites. The Micro satellites currently in active operation are ICEYE in Finland and Capella satellites in the United States, and are operated in the form of clusters for earth observation purposes. Due to clustering operation, it has a short revisit cycle and high resolution and has the advantage of being able to observe regardless of weather or day and night with the SAR sensor mounted. In this study, the operation status and characteristics of micro satellites were described, and the water area estimation technology optimized for micro SAR satellite images was applied to the Daecheong Dam basin on the Korean Peninsula. In addition, accuracy verification was performed based on the reference value of the water generated from the optical satellite Sentinel-2 satellite as a reference. In the case of the Capella satellite, the smallest difference in area was shown, and it was confirmed that all three images showed high correlation. Through the results of this study, it was confirmed that despite the low NESZ of Micro satellites, it is possible to estimate the water area, and it is believed that the limitations of water resource/water disaster monitoring using existing large SAR satellites can be overcome.

A Study on the Emergence Period and Geographic Distribution of Cicadinae (Hemiptera: Cicadidae) in Korea Using Bioacoustic Detection Technique (생물음향 탐지기법을 이용한 한국 매미아과의 출현 시기 및 서식지 분포 특성 연구)

  • Kim, Yoon-Jae;Ki, Kyong-Seok
    • Korean Journal of Environment and Ecology
    • /
    • v.35 no.6
    • /
    • pp.594-600
    • /
    • 2021
  • The purpose of this study is to observe the period of mating calls of cicadas in South Korea to identify the emergence period and geographic distribution for each cicada species. The study sites were 19 protection areas nationwide. The mating calls of cicadas were collected over the 12 months of 2019. A bioacoustics measuring device was installed to record the mating calls of cicadas in WAV, 44,100Hz format for 1 minute every hour. The temperature was recorded once or twice every hour using a micro-meteorological measuring device. Nine species of Korean cicadinae were studied. The start and end periods of mating calls were recorded for each cicada species for the subsequent analysis. The analysis results showed that nine cicada species appeared in the 19 protection areas. The chronological order of mating call periods for each species was as follows: Cryptotympana atrata (7/12 - 9/30), Meimuna opalifera (7/27 - 10/20), Hyalessa fuscata (7/25 - 10/9), Graptopsaltria nigrofuscata (7/28 - 9/5), Platypleura kaempferi (7/3 - 9/29), Suisha coreana (9/14 - 10/30), Leptosemia takanonis (6/26 - 8/2), Auritibicen intermedius (7/27 - 9/28), and Meimuna mongolica (8/8 - 9/11). The mating call period was between 35 (Meimuna mongolica) and 89 (Platypleura kaempferi) days, with the average being 62 days. The elevation above sea level for the habitats of each species was as follows: 5 - 386 m for Cryptotympana atrata, 7 - 759 m for Meimuna opalifera, 7 - 967 m for Hyalessa fuscata, 42 - 700m for Graptopsaltria nigrofuscata, 7 - 700 m for Platypleura kaempferi, 5 - 759 m for Suisha coreana, 7 - 759 m for Leptosemia takanonis, 397 - 967 m for Auritibicen intermedius, and 7 - 42 m for Meimuna mongolica. The average temperature of the habitats of each species was as follows: 23.9℃ for Cryptotympana atrata, 21.8℃ for Meimuna opalifera, 22℃ for Hyalessa fuscata, 23℃ for Graptopsaltria nigrofuscata, 22.9℃ for Platypleura kaempferi, 14.6℃ for Suisha coreana, 20.6℃ for Leptosemia takanonis, 19.3℃ for Auritibicen intermedius, and 24.4℃ for Meimuna mongolica. In terms of the habitat distribution of species, Meimuna opalifera, Hyalessa fuscata, and Platypleura kaempferi were distributed in more than 15 protection sites. Cryptotympana atrata was distributed in the lowlands in the southwest. Graptopsaltria nigrofuscata was distributed in the western area of the Korean Peninsula. Suisha coreana was distributed in areas excluding high mountain areas and parts of the southeast area. Leptosemia takanonis was distributed in areas near the mountains. Auritibicen intermedius was distributed locally in the high mountain areas. Meimuna mongolica was distributed locally in flat wetlands.

Analysis of Hydrocarbon Trap in the Southwestern Margin of the Ulleung Basin, East Sea (동해 울릉분지 남서주변부의 탄화수소 트랩 분석)

  • Lee, Minwoo;Kang, Moo-Hee;Yoon, Youngho;Yi, Bo-Yeon;Kim, Kyong-O;Kim, Jinho;Park, Myong-ho;Lee, Keumsuk
    • Economic and Environmental Geology
    • /
    • v.48 no.4
    • /
    • pp.301-312
    • /
    • 2015
  • A commercial gas field was found in the southwestern continental shelf of the Ulleung Basin, East Sea in the late 1990s. To develop additional gas field, an exploration well was drilled through the coarse infill of submarine canyon near the gas field, but it was uneconomic to develop hydrocarbons. Using newly acquired deep seismic reflection and previous well data, we have identified additional geological structure which has hydrocarbon potentials below submarine canyons in the southwestern margin of the basin. Based on the interpretation of the deep seismic reflection and well data, the sequences of the study area can be classified into the syn-rift megasequence(MS1), post-rift megasequence(MS2), syn-compressional megasequence(MS3), and post-compressional megasequence(MS4) in relation to the tectonic events. MS1, deposited simultaneously with the basin formation before the middle Miocene, is characterized by chaotic seismic facies with low- to moderate-amplitude and low frequency reflections. MS2 comprises laterally continuous, low- to moderate-amplitude reflections, showing progradational stacking patterns due to high rates of sediment supply during basin expansion in the middle Miocene. MS3 is mainly composed of continuous reflections with high amplitude and moderate- to high-frequency which are interpreted as coarse-grained sediments. The coarse-grained sediments of MS3 sequence is widely truncated by several submarine canyons which filled with fine-grained sediment of MS4 to form a stratigraphic trap of hydrocarbon. Therefore, the reservoir and seal of the hydrocarbon trap in the study area are coarse-grained sediment of MS3 and submarine canyon filled with fine-grained sediment of MS4, respectively. A flat-spot seismic anomaly, which may indicate the presence of hydrocarbon, is observed within the stratigraphic trap.

Development of a complex failure prediction system using Hierarchical Attention Network (Hierarchical Attention Network를 이용한 복합 장애 발생 예측 시스템 개발)

  • Park, Youngchan;An, Sangjun;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.127-148
    • /
    • 2020
  • The data center is a physical environment facility for accommodating computer systems and related components, and is an essential foundation technology for next-generation core industries such as big data, smart factories, wearables, and smart homes. In particular, with the growth of cloud computing, the proportional expansion of the data center infrastructure is inevitable. Monitoring the health of these data center facilities is a way to maintain and manage the system and prevent failure. If a failure occurs in some elements of the facility, it may affect not only the relevant equipment but also other connected equipment, and may cause enormous damage. In particular, IT facilities are irregular due to interdependence and it is difficult to know the cause. In the previous study predicting failure in data center, failure was predicted by looking at a single server as a single state without assuming that the devices were mixed. Therefore, in this study, data center failures were classified into failures occurring inside the server (Outage A) and failures occurring outside the server (Outage B), and focused on analyzing complex failures occurring within the server. Server external failures include power, cooling, user errors, etc. Since such failures can be prevented in the early stages of data center facility construction, various solutions are being developed. On the other hand, the cause of the failure occurring in the server is difficult to determine, and adequate prevention has not yet been achieved. In particular, this is the reason why server failures do not occur singularly, cause other server failures, or receive something that causes failures from other servers. In other words, while the existing studies assumed that it was a single server that did not affect the servers and analyzed the failure, in this study, the failure occurred on the assumption that it had an effect between servers. In order to define the complex failure situation in the data center, failure history data for each equipment existing in the data center was used. There are four major failures considered in this study: Network Node Down, Server Down, Windows Activation Services Down, and Database Management System Service Down. The failures that occur for each device are sorted in chronological order, and when a failure occurs in a specific equipment, if a failure occurs in a specific equipment within 5 minutes from the time of occurrence, it is defined that the failure occurs simultaneously. After configuring the sequence for the devices that have failed at the same time, 5 devices that frequently occur simultaneously within the configured sequence were selected, and the case where the selected devices failed at the same time was confirmed through visualization. Since the server resource information collected for failure analysis is in units of time series and has flow, we used Long Short-term Memory (LSTM), a deep learning algorithm that can predict the next state through the previous state. In addition, unlike a single server, the Hierarchical Attention Network deep learning model structure was used in consideration of the fact that the level of multiple failures for each server is different. This algorithm is a method of increasing the prediction accuracy by giving weight to the server as the impact on the failure increases. The study began with defining the type of failure and selecting the analysis target. In the first experiment, the same collected data was assumed as a single server state and a multiple server state, and compared and analyzed. The second experiment improved the prediction accuracy in the case of a complex server by optimizing each server threshold. In the first experiment, which assumed each of a single server and multiple servers, in the case of a single server, it was predicted that three of the five servers did not have a failure even though the actual failure occurred. However, assuming multiple servers, all five servers were predicted to have failed. As a result of the experiment, the hypothesis that there is an effect between servers is proven. As a result of this study, it was confirmed that the prediction performance was superior when the multiple servers were assumed than when the single server was assumed. In particular, applying the Hierarchical Attention Network algorithm, assuming that the effects of each server will be different, played a role in improving the analysis effect. In addition, by applying a different threshold for each server, the prediction accuracy could be improved. This study showed that failures that are difficult to determine the cause can be predicted through historical data, and a model that can predict failures occurring in servers in data centers is presented. It is expected that the occurrence of disability can be prevented in advance using the results of this study.