• Title/Summary/Keyword: 예측정확도

Search Result 2,746, Processing Time 0.033 seconds

Calculation Method of Oil Slick Area on Sea Surface Using High-resolution Satellite Imagery: M/V Symphony Oil Spill Accident (고해상도 광학위성을 이용한 해상 유출유 면적 산출: 심포니호 기름유출 사고 사례)

  • Kim, Tae-Ho;Shin, Hye-Kyeong;Jang, So Yeong;Ryu, Joung-Mi;Kim, Pyeongjoong;Yang, Chan-Su
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_1
    • /
    • pp.1773-1784
    • /
    • 2021
  • In order to minimize damage to oil spill accidents in the ocean, it is essential to collect a spilled area as soon as possible. Thus satellite-based remote sensing is a powerful source to detect oil spills in the ocean. With the recent rapid increase in the number of available satellites, it has become possible to generate a status report of marine oil spills soon after the accident. In this study, the oil spill area was calculated using various satellite images for the Symphony oil spill accident that occurred off the coast of Qingdao Port, China, on April 27, 2021. In particular, improving the accuracy of oil spill area determination was applied using high-resolution commercial satellite images with a spatial resolution of 2m. Sentinel-1, Sentinel-2, LANDSAT-8, GEO-KOMPSAT-2B (GOCI-II) and Skysat satellite images were collected from April 27 to May 13, but five images were available considering the weather conditions. The spilled oil had spread northeastward, bound for coastal region of China. This trend was confirmed in the Skysat image and also similar to the movement prediction of oil particles from the accident location. From this result, the look-alike patch observed in the north area from the Sentinel-1A (2021.05.01) image was discriminated as a false alarm. Through the survey period, the spilled oil area tends to increase linearly after the accident. This study showed that high-resolution optical satellites can be used to calculate more accurately the distribution area of spilled oil and contribute to establishing efficient response strategies for oil spill accidents.

Construction of Database System on Amylose and Protein Contents Distribution in Rice Germplasm Based on NIRS Data (벼 유전자원의 아밀로스 및 단백질 성분 함량 분포에 관한 자원정보 구축)

  • Oh, Sejong;Choi, Yu Mi;Lee, Myung Chul;Lee, Sukyeung;Yoon, Hyemyeong;Rauf, Muhammad;Chae, Byungsoo
    • Korean Journal of Plant Resources
    • /
    • v.32 no.2
    • /
    • pp.124-143
    • /
    • 2019
  • This study was carried out to build a database system for amylose and protein contents of rice germplasm based on NIRS (Near-Infrared Reflectance Spectroscopy) analysis data. The average waxy type amylose contents was 8.7% in landrace, variety and weed type, whereas 10.3% in breeding line. In common rice, the average amylose contents was 22.3% for landrace, 22.7% for variety, 23.6% for weed type and 24.2% for breeding line. Waxy type resources comprised of 5% of the total germplasm collections, whereas low, intermediate and high amylose content resources share 5.5%, 20.5% and 69.0% of total germplasm collections, respectively. The average percent of protein contents was 8.2 for landrace, 8.0 for variety, and 7.9 for weed type and breeding line. The average Variability Index Value was 0.62 in waxy rice, 0.80 in common rice, and 0.51 in protein contents. The accession ratio in arbitrary ranges of landrace was 0.45 in amylose contents ranging from 6.4 to 8.7%, and 0.26 in protein ranging from 7.3 to 8.2%. In the variety, it was 0.32 in amylose ranging from 20.1 to 22.7%, and 0.51 in protein ranging from 6.1 to 8.3%. And also, weed type was 0.67 in amylose ranging from 6.6 to 9.7%, and 0.33 in protein ranging from 7.0 to 7.9%, whereas, in breeding line it was 0.47 in amylose ranging from 10.0 to 12.0%, and 0.26 in protein ranging from 7.0 to 7.9%. These results could be helpful to build database programming system for germplasm management.

Non-astronomical Tides and Monthly Mean Sea Level Variations due to Differing Hydrographic Conditions and Atmospheric Pressure along the Korean Coast from 1999 to 2017 (한국 연안에서 1999년부터 2017년까지 해수물성과 대기압 변화에 따른 계절 비천문조와 월평균 해수면 변화)

  • BYUN, DO-SEONG;CHOI, BYOUNG-JU;KIM, HYOWON
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.26 no.1
    • /
    • pp.11-36
    • /
    • 2021
  • The solar annual (Sa) and semiannual (Ssa) tides account for much of the non-uniform annual and seasonal variability observed in sea levels. These non-equilibrium tides depend on atmospheric variations, forced by changes in the Sun's distance and declination, as well as on hydrographic conditions. Here we employ tidal harmonic analyses to calculate Sa and Ssa harmonic constants for 21 Korean coastal tidal stations (TS), operated by the Korea Hydrographic and Oceanographic Agency. We used 19 year-long (1999 to 2017) 1 hr-interval sea level records from each site, and used two conventional harmonic analysis (HA) programs (Task2K and UTide). The stability of Sa harmonic constants was estimated with respect to starting date and record length of the data, and we examined the spatial distribution of the calculated Sa and Ssa harmonic constants. HA was performed on Incheon TS (ITS) records using 369-day subsets; the first start date was January 1, 1999, the subsequent data subset starting 24 hours later, and so on up until the final start date was December 27, 2017. Variations in the Sa constants produced by the two HA packages had similar magnitudes and start date sensitivity. Results from the two HA packages had a large difference in phase lag (about 78°) but relatively small amplitude (<1 cm) difference. The phase lag difference occurred in large part since Task2K excludes the perihelion astronomical variable. Sensitivity of the ITS Sa constants to data record length (i.e., 1, 2, 3, 5, 9, and 19 years) was also tested to determine the data length needed to yield stable Sa results. HA results revealed that 5 to 9 year sea level records could estimate Sa harmonic constants with relatively small error, while the best results are produced using 19 year-long records. As noted earlier, Sa amplitudes vary with regional hydrographic and atmospheric conditions. Sa amplitudes at the twenty one TS ranged from 15.0 to 18.6 cm, 10.7 to 17.5 cm, and 10.5 to 13.0 cm, along the west coast, south coast including Jejudo, and east coast including Ulleungdo, respectively. Except at Ulleungdo, it was found that the Ssa constituent contributes to produce asymmetric seasonal sea level variation and it delays (hastens) the highest (lowest) sea levels. Comparisons between monthly mean, air-pressure adjusted, and steric sea level variations revealed that year-to-year and asymmetric seasonal variations in sea levels were largely produced by steric sea level variation and inverted barometer effect.

A Study on Intelligent Skin Image Identification From Social media big data

  • Kim, Hyung-Hoon;Cho, Jeong-Ran
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.9
    • /
    • pp.191-203
    • /
    • 2022
  • In this paper, we developed a system that intelligently identifies skin image data from big data collected from social media Instagram and extracts standardized skin sample data for skin condition diagnosis and management. The system proposed in this paper consists of big data collection and analysis stage, skin image analysis stage, training data preparation stage, artificial neural network training stage, and skin image identification stage. In the big data collection and analysis stage, big data is collected from Instagram and image information for skin condition diagnosis and management is stored as an analysis result. In the skin image analysis stage, the evaluation and analysis results of the skin image are obtained using a traditional image processing technique. In the training data preparation stage, the training data were prepared by extracting the skin sample data from the skin image analysis result. And in the artificial neural network training stage, an artificial neural network AnnSampleSkin that intelligently predicts the skin image type using this training data was built up, and the model was completed through training. In the skin image identification step, skin samples are extracted from images collected from social media, and the image type prediction results of the trained artificial neural network AnnSampleSkin are integrated to intelligently identify the final skin image type. The skin image identification method proposed in this paper shows explain high skin image identification accuracy of about 92% or more, and can provide standardized skin sample image big data. The extracted skin sample set is expected to be used as standardized skin image data that is very efficient and useful for diagnosing and managing skin conditions.

Preliminary Inspection Prediction Model to select the on-Site Inspected Foreign Food Facility using Multiple Correspondence Analysis (차원축소를 활용한 해외제조업체 대상 사전점검 예측 모형에 관한 연구)

  • Hae Jin Park;Jae Suk Choi;Sang Goo Cho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.121-142
    • /
    • 2023
  • As the number and weight of imported food are steadily increasing, safety management of imported food to prevent food safety accidents is becoming more important. The Ministry of Food and Drug Safety conducts on-site inspections of foreign food facilities before customs clearance as well as import inspection at the customs clearance stage. However, a data-based safety management plan for imported food is needed due to time, cost, and limited resources. In this study, we tried to increase the efficiency of the on-site inspection by preparing a machine learning prediction model that pre-selects the companies that are expected to fail before the on-site inspection. Basic information of 303,272 foreign food facilities and processing businesses collected in the Integrated Food Safety Information Network and 1,689 cases of on-site inspection information data collected from 2019 to April 2022 were collected. After preprocessing the data of foreign food facilities, only the data subject to on-site inspection were extracted using the foreign food facility_code. As a result, it consisted of a total of 1,689 data and 103 variables. For 103 variables, variables that were '0' were removed based on the Theil-U index, and after reducing by applying Multiple Correspondence Analysis, 49 characteristic variables were finally derived. We build eight different models and perform hyperparameter tuning through 5-fold cross validation. Then, the performance of the generated models are evaluated. The research purpose of selecting companies subject to on-site inspection is to maximize the recall, which is the probability of judging nonconforming companies as nonconforming. As a result of applying various algorithms of machine learning, the Random Forest model with the highest Recall_macro, AUROC, Average PR, F1-score, and Balanced Accuracy was evaluated as the best model. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the selection reason for nonconforming facilities of individual instances, and discuss applicability to the on-site inspection facility selection system. Based on the results of this study, it is expected that it will contribute to the efficient operation of limited resources such as manpower and budget by establishing an imported food management system through a data-based scientific risk management model.

A stratified random sampling design for paddy fields: Optimized stratification and sample allocation for effective spatial modeling and mapping of the impact of climate changes on agricultural system in Korea (농지 공간격자 자료의 층화랜덤샘플링: 농업시스템 기후변화 영향 공간모델링을 위한 국내 농지 최적 층화 및 샘플 수 최적화 연구)

  • Minyoung Lee;Yongeun Kim;Jinsol Hong;Kijong Cho
    • Korean Journal of Environmental Biology
    • /
    • v.39 no.4
    • /
    • pp.526-535
    • /
    • 2021
  • Spatial sampling design plays an important role in GIS-based modeling studies because it increases modeling efficiency while reducing the cost of sampling. In the field of agricultural systems, research demand for high-resolution spatial databased modeling to predict and evaluate climate change impacts is growing rapidly. Accordingly, the need and importance of spatial sampling design are increasing. The purpose of this study was to design spatial sampling of paddy fields (11,386 grids with 1 km spatial resolution) in Korea for use in agricultural spatial modeling. A stratified random sampling design was developed and applied in 2030s, 2050s, and 2080s under two RCP scenarios of 4.5 and 8.5. Twenty-five weather and four soil characteristics were used as stratification variables. Stratification and sample allocation were optimized to ensure minimum sample size under given precision constraints for 16 target variables such as crop yield, greenhouse gas emission, and pest distribution. Precision and accuracy of the sampling were evaluated through sampling simulations based on coefficient of variation (CV) and relative bias, respectively. As a result, the paddy field could be optimized in the range of 5 to 21 strata and 46 to 69 samples. Evaluation results showed that target variables were within precision constraints (CV<0.05 except for crop yield) with low bias values (below 3%). These results can contribute to reducing sampling cost and computation time while having high predictive power. It is expected to be widely used as a representative sample grid in various agriculture spatial modeling studies.

Diagnostic Value of CYFRA 21-1 Measurement in Fine-Needle Aspiration Washouts for Detection of Axillary Recurrence in Postoperative Breast Cancer Patients (유방암 수술 후 액와림프절 재발 진단에 있어서의 미세침세척액 CYFRA 21-1의 진단적 가치)

  • So Yeon Won;Eun-Kyung Kim;Hee Jung Moon;Jung Hyun Yoon;Vivian Youngjean Park;Min Jung Kim
    • Journal of the Korean Society of Radiology
    • /
    • v.81 no.1
    • /
    • pp.147-156
    • /
    • 2020
  • Purpose The objective of this study was to evaluate the diagnostic value and threshold levels of cytokeratin fragment 21-1 (CYFRA 21-1) in fine-needle aspiration (FNA) washouts for detection of lymph node (LN) recurrence in postoperative breast cancer patients. Materials and Methods FNA cytological assessments and CYFRA 21-1 measurement in FNA washouts were performed for 64 axillary LNs suspicious for recurrence in 64 post-operative breast cancer patients. Final diagnosis was made on the basis of FNA cytology and follow-up data over at least 2 years. The concentration of CYFRA 21-1 was compared between recurrent LNs and benign LNs. Diagnostic performance and cut-off value were evaluated using a receiver operating characteristic curve. Results Regardless of the non-diagnostic results, the median concentration of CYFRA 21-1 in recurrent LNs was significantly higher than that in benign LNs (p < 0.001). The optimal diagnostic cut-off value was 1.6 ng/mL. The sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of CYFRA 21-1 for LN recurrence were 90.9%, 100%, 100%, 98.1%, and 98.4%, respectively. Conclusion Measurement of CYFRA 21-1 concentration from ultrasound-guided FNA biopsy aspirates showed excellent diagnostic performance with a cut-off value of 1.6 ng/mL. These results indicate that measurement of CYFRA 21-1 concentration in FNA washouts is useful for the diagnosis of axillary LN recurrence in post-operative breast cancer patients.

Estimation of Greenhouse Tomato Transpiration through Mathematical and Deep Neural Network Models Learned from Lysimeter Data (라이시미터 데이터로 학습한 수학적 및 심층 신경망 모델을 통한 온실 토마토 증산량 추정)

  • Meanne P. Andes;Mi-young Roh;Mi Young Lim;Gyeong-Lee Choi;Jung Su Jung;Dongpil Kim
    • Journal of Bio-Environment Control
    • /
    • v.32 no.4
    • /
    • pp.384-395
    • /
    • 2023
  • Since transpiration plays a key role in optimal irrigation management, knowledge of the irrigation demand of crops like tomatoes, which are highly susceptible to water stress, is necessary. One way to determine irrigation demand is to measure transpiration, which is affected by environmental factor or growth stage. This study aimed to estimate the transpiration amount of tomatoes and find a suitable model using mathematical and deep learning models using minute-by-minute data. Pearson correlation revealed that observed environmental variables significantly correlate with crop transpiration. Inside air temperature and outside radiation positively correlated with transpiration, while humidity showed a negative correlation. Multiple Linear Regression (MLR), Polynomial Regression model, Artificial Neural Network (ANN), Long short-term Memory (LSTM), and Gated Recurrent Unit (GRU) models were built and compared their accuracies. All models showed potential in estimating transpiration with R2 values ranging from 0.770 to 0.948 and RMSE of 0.495 mm/min to 1.038 mm/min in the test dataset. Deep learning models outperformed the mathematical models; the GRU demonstrated the best performance in the test data with 0.948 R2 and 0.495 mm/min RMSE. The LSTM and ANN closely followed with R2 values of 0.946 and 0.944, respectively, and RMSE of 0.504 m/min and 0.511, respectively. The GRU model exhibited superior performance in short-term forecasts while LSTM for long-term but requires verification using a large dataset. Compared to the FAO56 Penman-Monteith (PM) equation, PM has a lower RMSE of 0.598 mm/min than MLR and Polynomial models degrees 2 and 3 but performed least among all models in capturing variability in transpiration. Therefore, this study recommended GRU and LSTM models for short-term estimation of tomato transpiration in greenhouses.

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.

A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network (뉴럴 텐서 네트워크 기반 주식 개별종목 지식개체명 추출 방법에 관한 연구)

  • Yang, Yunseok;Lee, Hyun Jun;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.25-38
    • /
    • 2019
  • Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it's constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge. So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied. Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented. The empirical study to confirm the usefulness of the presented model, experts' reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set. As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge. In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user's investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement. Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.