• Title/Summary/Keyword: evaluation models

Search Result 3,815, Processing Time 0.045 seconds

A Performance Evaluation of the e-Gov Standard Framework on PaaS Cloud Computing Environment: A Geo-based Image Processing Case (PaaS 클라우드 컴퓨팅 환경에서 전자정부 표준프레임워크 성능평가: 공간영상 정보처리 사례)

  • KIM, Kwang-Seob;LEE, Ki-Won
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.21 no.4
    • /
    • pp.1-13
    • /
    • 2018
  • Both Platform as a Service (PaaS) as one of the cloud computing service models and the e-government (e-Gov) standard framework from the Ministry of the Interior and Safety (MOIS) provide developers with practical computing environments to build their applications in every web-based services. Web application developers in the geo-spatial information field can utilize and deploy many middleware software or common functions provided by either the cloud-based service or the e-Gov standard framework. However, there are few studies for their applicability and performance in the field of actual geo-spatial information application yet. Therefore, the motivation of this study was to investigate the relevance of these technologies or platform. The applicability of these computing environments and the performance evaluation were performed after a test application deployment of the spatial image processing case service using Web Processing Service (WPS) 2.0 on the e-Gov standard framework. This system was a test service supported by a cloud environment of Cloud Foundry, one of open source PaaS cloud platforms. Using these components, the performance of the test system in two cases of 300 and 500 threads was assessed through a comparison test with two kinds of service: a service case for only the PaaS and that on the e-Gov on the PaaS. The performance measurements were based on the recording of response time with respect to users' requests during 3,600 seconds. According to the experimental results, all the test cases of the e-Gov on PaaS considered showed a greater performance. It is expected that the e-Gov standard framework on the PaaS cloud would be important factors to build the web-based spatial information service, especially in public sectors.

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Habitat Quality Analysis and an Evaluation of Gajisan Provincial Park Ecosystem Service Using InVEST Model (InVEST 모델을 이용한 가지산도립공원의 서식지질 분석과 생태계서비스평가)

  • Kwon, Hye-Yeon;Jang, Jung-Eun;Shin, Hae-Seon;Yu, Byeong-Hyeok;Lee, Sang-Cheol;Choi, Song-Hyun
    • Korean Journal of Environment and Ecology
    • /
    • v.36 no.3
    • /
    • pp.318-326
    • /
    • 2022
  • The Convention on Biodiversity (CBD) recommends that 17% of the land be designated as a protected area to counter global environmental problems. Korea also realized a need to designate protected areas according to the international level and explain the significance of designating protected areas. Accordingly, studies on ecosystem services are required. In Korea, the protected areas are designated as national parks, provincial parks, and county parks by hierarchy under the Natural Parks Act. However, as priority was on political and administrative aspects, research on ecosystem service value evaluation and habitat management were concentrated in national parks, and provincial and county parks were relatively neglected. Therefore, more studies on provincial and county parks are necessary. In this study, habitat quality for Gajisan Provincial Park, where there were few studies on habitat management and ecosystem service valuation, was evaluated using the InVEST Habitat Quality model among the InVEST models. The analysis results were compared with 16 mountainous national parks. The results showed that the habitat quality value of Gajisan Provincial Park was 0.83, higher than that of the surrounding areas. The analysis of habitat quality in three districts showed 0,84 for the Tongdosa and Naewonsa districts and 0.83 for the Seoknamsa district. By use district, the nature conservation district, the natural environment district, the cultural heritage district, and the park village district had the highest habitat quality value in that order. Compared with the existing habitat quality analysis results of national parks, Gajisan Provincial Park showed naturalness at the level of Mudeungsan National Park. These results can be used as objective data for establishing policies and management plans to preserve biodiversity and promote ecosystem services in provincial parks.

Analysis of Changes in Tree Height-Diameter Allometry for Major Tree Species in South Korea (우리나라 주요 수종의 수고-직경 상대생장 변화 분석)

  • Moonil Kim;Taejin Park;Youngjin Ko;Go-Mi Choi;Soonchul Son;Yejun Kang;Jaehee Yoo;Minkyeong Kim;Hyeonji Park;Woo-Kyun Lee
    • Journal of Korean Society of Forest Science
    • /
    • v.112 no.1
    • /
    • pp.71-82
    • /
    • 2023
  • Forest biomass is used as a representative indicator of forest size, maturity, and productivity. Therefore, quantitative evaluation is important for management and harvest as well as the evaluation of ecosystem functions and services including CO2 absorption. The allometric equation is a widely used method for estimating the value of each component through the relative growth rate of plants. Recently, studies indicated that the relative growth of trees is changing because of the increased CO2 concentration in the atmosphere and the resulting climate change, raising the need to review the previously developed relative growth models and coefficients. In this study, the height-diameter at breast height (DBH) relationships of four major tree species in Korea [(Pinus densiflora (PD), Larix kaempferi (LK), Quercus variabilis (QV), and Quercus mongolica (QM)] were analyzed using the 5th-7th National Forest Inventory (NFI) data. Furthermore, these results were compared with the present yield table from the National Institute for Forest Science. This analysis revealed that the expected height for the same DBH increased as the NFI progressed. For example, in model analysis, the expected heights for PD, LK, QV, and QM for DBH of 25 cm were 12.48, 19.17, 14.47, and 13.19 m, respectively, in the 5th NFI data. In the 7th NFI data, these values were estimated as 13.61 (+9.1%), 21.58 (+12.7%), 15.76 (+8.9%), and 13.93 m (+5.6%), respectively. These results indicate that the major tree species in South Korean forests currently are more vigorous in height growth than in diameter growth when compared to the height-DBH development trends by tree species identified through past survey data.

Introduction and Evaluation of the Production Method for Chlorophyll-a Using Merging of GOCI-II and Polar Orbit Satellite Data (GOCI-II 및 극궤도 위성 자료를 병합한 Chlorophyll-a 산출물 생산방법 소개 및 활용 가능성 평가)

  • Hye-Kyeong Shin;Jae Yeop Kwon;Pyeong Joong Kim;Tae-Ho Kim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_1
    • /
    • pp.1255-1272
    • /
    • 2023
  • Satellite-based chlorophyll-a concentration, produced as a long-term time series, is crucial for global climate change research. The production of data without gaps through the merging of time-synthesized or multi-satellite data is essential. However, studies related to satellite-based chlorophyll-a concentration in the waters around the Korean Peninsula have mainly focused on evaluating seasonal characteristics or proposing algorithms suitable for research areas using a single ocean color sensor. In this study, a merging dataset of remote sensing reflectance from the geostationary sensor GOCI-II and polar-orbiting sensors (MODIS, VIIRS, OLCI) was utilized to achieve high spatial coverage of chlorophyll-a concentration in the waters around the Korean Peninsula. The spatial coverage in the results of this study increased by approximately 30% compared to polar-orbiting sensor data, effectively compensating for gaps caused by clouds. Additionally, we aimed to quantitatively assess accuracy through comparison with global chlorophyll-a composite data provided by Ocean Colour Climate Change Initiative (OC-CCI) and GlobColour, along with in-situ observation data. However, due to the limited number of in-situ observation data, we could not provide statistically significant results. Nevertheless, we observed a tendency for underestimation compared to global data. Furthermore, for the evaluation of practical applications in response to marine disasters such as red tides, we qualitatively compared our results with a case of a red tide in the East Sea in 2013. The results showed similarities to OC-CCI rather than standalone geostationary sensor results. Through this study, we plan to use the generated data for future research in artificial intelligence models for prediction and anomaly utilization. It is anticipated that the results will be beneficial for monitoring chlorophyll-a events in the coastal waters around Korea.

A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network (뉴럴 텐서 네트워크 기반 주식 개별종목 지식개체명 추출 방법에 관한 연구)

  • Yang, Yunseok;Lee, Hyun Jun;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.25-38
    • /
    • 2019
  • Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it's constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge. So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied. Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented. The empirical study to confirm the usefulness of the presented model, experts' reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set. As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge. In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user's investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement. Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.

Assessment Study on Educational Programs for the Gifted Students in Mathematics (영재학급에서의 수학영재프로그램 평가에 관한 연구)

  • Kim, Jung-Hyun;Whang, Woo-Hyung
    • Communications of Mathematical Education
    • /
    • v.24 no.1
    • /
    • pp.235-257
    • /
    • 2010
  • Contemporary belief is that the creative talented can create new knowledge and lead national development, so lots of countries in the world have interest in Gifted Education. As we well know, U.S.A., England, Russia, Germany, Australia, Israel, and Singapore enforce related laws in Gifted Education to offer Gifted Classes, and our government has also created an Improvement Act in January, 2000 and Enforcement Ordinance for Gifted Improvement Act was also announced in April, 2002. Through this initiation Gifted Education can be possible. Enforcement Ordinance was revised in October, 2008. The main purpose of this revision was to expand the opportunity of Gifted Education to students with special education needs. One of these programs is, the opportunity of Gifted Education to be offered to lots of the Gifted by establishing Special Classes at each school. Also, it is important that the quality of Gifted Education should be combined with the expansion of opportunity for the Gifted. Social opinion is that it will be reckless only to expand the opportunity for the Gifted Education, therefore, assessment on the Teaching and Learning Program for the Gifted is indispensible. In this study, 3 middle schools were selected for the Teaching and Learning Programs in mathematics. Each 1st Grade was reviewed and analyzed through comparative tables between Regular and Gifted Education Programs. Also reviewed was the content of what should be taught, and programs were evaluated on assessment standards which were revised and modified from the present teaching and learning programs in mathematics. Below, research issues were set up to assess the formation of content areas and appropriateness for Teaching and Learning Programs for the Gifted in mathematics. A. Is the formation of special class content areas complying with the 7th national curriculum? 1. Which content areas of regular curriculum is applied in this program? 2. Among Enrichment and Selection in Curriculum for the Gifted, which one is applied in this programs? 3. Are the content areas organized and performed properly? B. Are the Programs for the Gifted appropriate? 1. Are the Educational goals of the Programs aligned with that of Gifted Education in mathematics? 2. Does the content of each program reflect characteristics of mathematical Gifted students and express their mathematical talents? 3. Are Teaching and Learning models and methods diverse enough to express their talents? 4. Can the assessment on each program reflect the Learning goals and content, and enhance Gifted students' thinking ability? The conclusions are as follows: First, the best contents to be taught to the mathematical Gifted were found to be the Numeration, Arithmetic, Geometry, Measurement, Probability, Statistics, Letter and Expression. Also, Enrichment area and Selection area within the curriculum for the Gifted were offered in many ways so that their Giftedness could be fully enhanced. Second, the educational goals of Teaching and Learning Programs for the mathematical Gifted students were in accordance with the directions of mathematical education and philosophy. Also, it reflected that their research ability was successful in reaching the educational goals of improving creativity, thinking ability, problem-solving ability, all of which are required in the set curriculum. In order to accomplish the goals, visualization, symbolization, phasing and exploring strategies were used effectively. Many different of lecturing types, cooperative learning, discovery learning were applied to accomplish the Teaching and Learning model goals. For Teaching and Learning activities, various strategies and models were used to express the students' talents. These activities included experiments, exploration, application, estimation, guess, discussion (conjecture and refutation) reconsideration and so on. There were no mention to the students about evaluation and paper exams. While the program activities were being performed, educational goals and assessment methods were reflected, that is, products, performance assessment, and portfolio were mainly used rather than just paper assessment.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

Water Quality and Ecosystem Health Assessments in Urban Stream Ecosystems (도심하천 생태계에서의 수질 및 생태건강성 평가)

  • Kim, Hyun-Mac;Lee, Jae-Hoon;An, Kwang-Guk
    • Korean Journal of Environmental Biology
    • /
    • v.26 no.4
    • /
    • pp.311-322
    • /
    • 2008
  • The objectives of the study were to analyze chemical water quality and physical habitat characteristics in the urban streams (Miho and Gap streams) along with evaluations of fish community structures and ecosystem health, throughout fish composition and guild analyses during 2006$\sim$2007. Concentrations of BOD and COD averaged 3.5 and 5.7 mg L$^{-1}$, in the urban streams, while TN and TP averaged 5.1 mg L$^{-1}$ and 274 ${\mu}g$ L$^{-1}$, indicating an eutrophic state. Especially, organic pollution and eutrophication were most intense in the downstream reach of both streams. Total number of fish was 34 species in the both streams, and the most abundant species was Zacco platypus (32$\sim$42% of the total). In both streams, the relative abundance of sensitive species was low (23%) and tolerant and omnivores were high (45%, 52%), indicating an typical tolerance and trophic guilds of urban streams in Korea. According to multi-metric models of Stream Ecosystem Health Assessments (SEHA), model values were 19 and 24 in Miho Stream and Gap Stream, respectively. Habitat analysis showed that QHEI (Qulatitative Habitat Evaluation Index) values were 123 and 135 in the two streams, respectively. The minimum values in the SEHA and QHEI were observed in the both downstreams, and this was mainly attributed to chemical pollutions, as shown in the water quality parameters. The model values of SEHA were strongly correlated with conductivity (r=-0.530, p=0.016), BOD (r=-0.578, p< 0.01), COD (r=-0.603, p< 0.01), and nutrients (TN, TP: r>0.40, p<0.05). This model applied in this study seems to be a useful tool, which could reflect the chemical water quality in the urban streams. Overall, this study suggests that consistent ecological monitoring is required in the urban streams for the conservations along with ecological restorations in the degradated downstrems.