• Title/Summary/Keyword: Exploratory Data Analysis(EDA)

Search Result 19, Processing Time 0.026 seconds

Long-Term Trend Analysis and Exploratory Data Analysis of Geumho River based on Seasonal Mann-Kendall Test (계절 맨-켄달 기법을 이용한 금호강 본류 BOD의 장기 경향 분석 및 탐색적 자료 분석)

  • Jung, Kang-Young;Lee, In Jung;Lee, Kyung-Lak;Cheon, Se-Uk;Hong, Jun Young;Ahn, Jung-Min
    • Journal of Environmental Science International
    • /
    • v.25 no.2
    • /
    • pp.217-229
    • /
    • 2016
  • The government has conducted a plan of total maximum daily loads(TMDL), which divides with unit watershed, for management of stable water quality target by setting the permitted total amount of the pollutant. In this study, BOD concentration trends over the last 10 years from 2005 to 2014 were analyzed in the Geumho river. Improvement effect of water quality throughout the implementation period of TMDL was evaluated using the seasonal Mann-Kendall test and a LOWESS(locally weighted scatter plot smoother) smooth. As a study result of the seasonal Mann-Kendall test and the LOWESS smooth, BOD concentration in the Geumho river appeared to have been reduced or held at a constant. As a result of quantitatively analysis for BOD concentration with exploratory data analysis(EDA), the mean and the median of BOD concentration appeared in the order of GH8 > GH7 > GH6 > GH5 > GH4 > GH3 > GH2 > GH1. The monthly average concentration of BOD appeared in the order of Apr > Mar > Feb >May > Jun > Jul > Jan > Aug > Sep > Dec > Nov > Oct. As a result of the outlier, its value was the most frequent in February, which is estimated 1.5 times more than July, and was smallest frequent in July. The outlier in terms of water quality management is necessary in order to establish a management plan for the contaminants in watershed.

The Study of Failure Mode Data Development and Feature Parameter's Reliability Verification Using LSTM Algorithm for 2-Stroke Low Speed Engine for Ship's Propulsion (선박 추진용 2행정 저속엔진의 고장모드 데이터 개발 및 LSTM 알고리즘을 활용한 특성인자 신뢰성 검증연구)

  • Jae-Cheul Park;Hyuk-Chan Kwon;Chul-Hwan Kim;Hwa-Sup Jang
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.60 no.2
    • /
    • pp.95-109
    • /
    • 2023
  • In the 4th industrial revolution, changes in the technological paradigm have had a direct impact on the maintenance system of ships. The 2-stroke low speed engine system integrates with the core equipment required for propulsive power. The Condition Based Management (CBM) is defined as a technology that predictive maintenance methods in existing calender-based or running time based maintenance systems by monitoring the condition of machinery and diagnosis/prognosis failures. In this study, we have established a framework for CBM technology development on our own, and are engaged in engineering-based failure analysis, data development and management, data feature analysis and pre-processing, and verified the reliability of failure mode DB using LSTM algorithms. We developed various simulated failure mode scenarios for 2-stroke low speed engine and researched to produce data on onshore basis test_beds. The analysis and pre-processing of normal and abnormal status data acquired through failure mode simulation experiment used various Exploratory Data Analysis (EDA) techniques to feature extract not only data on the performance and efficiency of 2-stroke low speed engine but also key feature data using multivariate statistical analysis. In addition, by developing an LSTM classification algorithm, we tried to verify the reliability of various failure mode data with time-series characteristics.

An EDA Analysis of Seoul Metropolitan Area's Mountain Usage Patterns of Users in Their 20~30s after COVID-19 Occurrence

  • Lee, BoBae;Yeon, PoungSik
    • Journal of People, Plants, and Environment
    • /
    • v.24 no.2
    • /
    • pp.229-244
    • /
    • 2021
  • Background and objective: The purpose of this study was to comprehensively analyze the user behavior in order to cope appropriately with the increasing demand for mountain usage of those in their 20s and 30s and to allocate resources efficiently. Methods: To analyze the behavior of mountain hiking users, an exploratory data analysis (EDA) was conducted on the data which had been collected in the app Tranggle. The main target are users in their 20s and 30s who visited the mountains in the metropolitan area in 2019-2020. Among them, we have selected data on the top 13 mountains based on the frequency of visits. After data pre-processing, mountain usage patterns were analyzed through statistical analysis and visualization. Results: Compared to 2019, the number of users in 2020 increased 1.36 times. The utilization rate of the well-established hiking trails has also increased. The usage of mountain on weekends (Saturday > Sunday) was still the highest, and the difference in the usage between the days of the week decreased. Outside of work hours, early morning usage has increased and night-time usage has decreased. There was no significant change in usages depending on activity type, level (experience point) and exercise properties. Conclusion: Since the COVID-19 outbreak, the usage of mountains has been changing towards low user density and short-distance trip. in the post-COVID-19 era, the function and role of forests in daily life are expected to increase. To cope with this, further research needs to be carried out with consideration of the wider demographic and social characteristics.

Data Framework Design of EDISON 2.0 Digital Platform for Convergence Research

  • Sunggeun Han;Jaegwang Lee;Inho Jeon;Jeongcheol Lee;Hoon Choi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.8
    • /
    • pp.2292-2313
    • /
    • 2023
  • With improving computing performance, various digital platforms are being developed to enable easily utilization of high-performance computing environments. EDISON 1.0 is an online simulation platform widely used in computational science and engineering education. As the research paradigm changes, the demand for developing the EDISON 1.0 platform centered on simulation into the EDISON 2.0 platform centered on data and artificial intelligence is growing. Herein, a data framework, a core module for data-centric research on EDISON 2.0 digital platform, is proposed. The proposed data framework provides the following three functions. First, it provides a data repository suitable for the data lifecycle to increase research reproducibility. Second, it provides a new data model that can integrate, manage, search, and utilize heterogeneous data to support a data-driven interdisciplinary convergence research environment. Finally, it provides an exploratory data analysis (EDA) service and data enrichment using an AI model, both developed to strengthen data reliability and maximize the efficiency and effectiveness of research endeavors. Using the EDISON 2.0 data framework, researchers can conduct interdisciplinary convergence research using heterogeneous data and easily perform data pre-processing through the web-based UI. Further, it presents the opportunity to leverage the derived data obtained through AI technology to gain insights and create new research topics.

Development and Validation of Data Science Education Instructional Model (데이터 과학 교육을 위한 수업모형 개발 및 타당성 검증)

  • Bongchul Kim;Bomsol Kim;Jonghoon Kim
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.5
    • /
    • pp.417-425
    • /
    • 2022
  • The 'Comprehensive Plan for Nurturing Digital Talents' reported at the Cabinet meeting of the Ministry of Education in August 2022 focuses on qualitative and quantitative expansion of informatics education centered on SW, AI education. With the advent of the era of artificial intelligence, data science education is also drawing attention as a field of informatics education. Data science is originally a field where various studies are fused, and advanced technologies are being used for data analysis, modeling, and machine learning. This study devised a draft of the instructional model of data science education through literature research and analysis of previous studies, and developed a final instructional model through usability test and expert validation.

Application of Urban Computing to Explore Living Environment Characteristics in Seoul : Integration of S-Dot Sensor and Urban Data

  • Daehwan Kim;Woomin Nam;Keon Chul Park
    • Journal of Internet Computing and Services
    • /
    • v.24 no.4
    • /
    • pp.65-76
    • /
    • 2023
  • This paper identifies the aspects of living environment elements (PM2.5, PM10, Noise) throughout Seoul and the urban characteristics that affect them by utilizing the big data of the S-Dot sensors in Seoul, which has recently become a hot topic. In other words, it proposes a big data based urban computing research methodology and research direction to confirm the relationship between urban characteristics and living environments that directly affect citizens. The temporal range is from 2020 to 2021, which is the available range of time series data for S-Dot sensors, and the spatial range is throughout Seoul by 500mX500m GRID. First of all, as part of analyzing specific living environment patterns, simple trends through EDA are identified, and cluster analysis is conducted based on the trends. After that, in order to derive specific urban planning factors of each cluster, basic statistical analysis such as ANOVA, OLS and MNL analysis were conducted to confirm more specific characteristics. As a result of this study, cluster patterns of environment elements(PM2.5, PM10, Noise) and urban factors that affect them are identified, and there are areas with relatively high or low long-term living environment values compared to other regions. The results of this study are believed to be a reference for urban planning management measures for vulnerable areas of living environment, and it is expected to be an exploratory study that can provide directions to urban computing field, especially related to environmental data in the future.

Evaluation of Travel Time Prediction Reliability on Highway Using DSRC Data (DSRC 기반 고속도로 통행 소요시간 예측정보 신뢰성 평가)

  • Han, Daechul;Kim, Joohyon;Kim, Seoungbum
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.17 no.4
    • /
    • pp.86-98
    • /
    • 2018
  • Since 2015, the Korea Expressway Corporation has provided predicted travel time information, which is reproduced from DSRC systems over the extended expressway network in Korea. When it is open for public information, it helps travelers decide optimal routes while minimizing traffic congestions and travel cost. Although, sutiable evaluations to investigate the reliability of travel time forecast information have not been conducted so far. First of all, this study seeks to find out a measure of effectiveness to evaluate the reliability of travel time forecast via various literatures. Secondly, using the performance measurement, this study evaluates concurrent travel time forecast information in highway quantitatively and examines the forecast error by exploratory data analysis. It appears that most of highway lines provided reliable forecast information. However, we found significant over/under-forecast on a few links within several long lines and it turns out that such minor errors reduce overall reliability in travel time forecast of the corresponding highway lines. This study would help to build a priority for quality control of the travel time forecast information system, and highlight the importance of performing periodic and sustainable management for travel time forecast information.

The Relative Effects of Business-to-Business (vs. Business-to-Consumer) Business Model Innovation on Innovation Performance (B2B (vs. B2C) 비즈니스모델혁신이 혁신성과에 미치는 상대적 효과)

  • Yejin Park;Chaeeun Lee;Wonjoo Yun
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.18 no.6
    • /
    • pp.159-172
    • /
    • 2023
  • This study aims to empirically investigate the relative effects of business-to-business (vs. business-to-consumer) business model innovation (BMI) on innovation performance. The research examines the impact of three key components of BMI: 1. value creation, 2. value proposition, and 3. value capture, on innovation performance. The 2022 Entrepreneurship Survey data by the Korean Entrepreneurship Foundation was used to analyze 2,879 companies. An exploratory data analysis (EDA) including various categories such as industry, firm, CEO, and technology chracteristics was conducted to show the latest startup status in Korea. The results show that value creation of B2B (vs. B2C) firms has a more positive and significant impact on innovation performance. Whereas, value proposition of B2C (vs. B2B) firms was found to have a more positive and significant effect on innovation performance. Interestingly, value capture did not show any effects for either type of firms. Additionally, the study employed seemingly unrelated regression (SUR) analysis for robustness checks. These findings provide important insights about the relative effects of B2B-BMI (vs. B2C-BMI).

  • PDF

Export Prediction Using Separated Learning Method and Recommendation of Potential Export Countries (분리학습 모델을 이용한 수출액 예측 및 수출 유망국가 추천)

  • Jang, Yeongjin;Won, Jongkwan;Lee, Chaerok
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.69-88
    • /
    • 2022
  • One of the characteristics of South Korea's economic structure is that it is highly dependent on exports. Thus, many businesses are closely related to the global economy and diplomatic situation. In addition, small and medium-sized enterprises(SMEs) specialized in exporting are struggling due to the spread of COVID-19. Therefore, this study aimed to develop a model to forecast exports for next year to support SMEs' export strategy and decision making. Also, this study proposed a strategy to recommend promising export countries of each item based on the forecasting model. We analyzed important variables used in previous studies such as country-specific, item-specific, and macro-economic variables and collected those variables to train our prediction model. Next, through the exploratory data analysis(EDA) it was found that exports, which is a target variable, have a highly skewed distribution. To deal with this issue and improve predictive performance, we suggest a separated learning method. In a separated learning method, the whole dataset is divided into homogeneous subgroups and a prediction algorithm is applied to each group. Thus, characteristics of each group can be more precisely trained using different input variables and algorithms. In this study, we divided the dataset into five subgroups based on the exports to decrease skewness of the target variable. After the separation, we found that each group has different characteristics in countries and goods. For example, In Group 1, most of the exporting countries are developing countries and the majority of exporting goods are low value products such as glass and prints. On the other hand, major exporting countries of South Korea such as China, USA, and Vietnam are included in Group 4 and Group 5 and most exporting goods in these groups are high value products. Then we used LightGBM(LGBM) and Exponential Moving Average(EMA) for prediction. Considering the characteristics of each group, models were built using LGBM for Group 1 to 4 and EMA for Group 5. To evaluate the performance of the model, we compare different model structures and algorithms. As a result, it was found that the separated learning model had best performance compared to other models. After the model was built, we also provided variable importance of each group using SHAP-value to add explainability of our model. Based on the prediction model, we proposed a second-stage recommendation strategy for potential export countries. In the first phase, BCG matrix was used to find Star and Question Mark markets that are expected to grow rapidly. In the second phase, we calculated scores for each country and recommendations were made according to ranking. Using this recommendation framework, potential export countries were selected and information about those countries for each item was presented. There are several implications of this study. First of all, most of the preceding studies have conducted research on the specific situation or country. However, this study use various variables and develops a machine learning model for a wide range of countries and items. Second, as to our knowledge, it is the first attempt to adopt a separated learning method for exports prediction. By separating the dataset into 5 homogeneous subgroups, we could enhance the predictive performance of the model. Also, more detailed explanation of models by group is provided using SHAP values. Lastly, this study has several practical implications. There are some platforms which serve trade information including KOTRA, but most of them are based on past data. Therefore, it is not easy for companies to predict future trends. By utilizing the model and recommendation strategy in this research, trade related services in each platform can be improved so that companies including SMEs can fully utilize the service when making strategies and decisions for exports.