• Title/Summary/Keyword: DISPERSION

Search Result 6,143, Processing Time 0.036 seconds

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

Studies on the Occurrence of Upland Weeds and the Competition with Soybeans (전지(田地)와 콩밭에 있어서 잡초(雜草)의 발생(發生) 및 경합(競合)에 관한 조사(調査) 연구(硏究))

  • Lee, Key-Hong;Lee, Eun-Woong
    • Korean Journal of Weed Science
    • /
    • v.2 no.2
    • /
    • pp.75-113
    • /
    • 1982
  • Studies were carried out 1) to define the shape and size of sampling quadrat and its number of observations for weed experiments, 2) to characterize the growth and community of major summer weeds under upland condition and 3) to investigate the factors influencing competition between weeds and soybeans under weed-free and weedy conditions in early and late season cultures. No significant difference was noted among different shapes of quadrat (regular, rectangular, band, and circular) in the sampling efficiency of weeds. The results also suggested that the minimum size of quadrat was 0.25$m^2$ and the minimum number of replication was 2 times per plot. The major dominant weeds were about 10 species in the experimental field and the total number of weeds was in the range of 70 - 1,600 plants per $m^2$. Among the weeds Digitaria sanguinalis and Portulaca oleracea were the most dominant species. Growth amount and reproduction capability were also measured by weed species. Five different weed communities were identified in the field. The degree of dispersion by weed species and association among weeds were investigated. Intra-(within soybeans) and inter-specific (between soybeans and weeds) competition were studied in early and late season cultures of soybeans. The average yield of soybeans per plant was significantly decreased in both season cultures due to intra-specific competition as the planting density of soybeans increased, On the other hand, the average yield of soybeans per l0a was proportionally increased to the increase of planting density and the rate of its increase was more significant under weedy than weed-free condition. Most of the agronomic characteristics of soybeans were affected by weeds and its degree was greater in sparse planting than in dense planting and in early season than in late-season culture. Digitaria sanguinalis was the most competitive to soybeans in early season and both of Digitaria sanguinalis and Portulaca oleracea affected primarily the growth of soybeans in late season with about the same competitiveness. The occurrence of weeds was significantly decreased in early season and slightly decreased in late-season by dense planting of soybeans. The total growth amount of weeds was also considerably decreased by increase of soybean planting density both in early- and late-season cultures. The occurrence of Digitaria sanguinalis which was the most dominant in both seasons, and its growth amount was significantly decreased as the planting density of soybean was increased. On the other hand, the occurrence of Portulaca oleracea which was only dominant in late-season culture did not show significant response to the planting density of soybeans.

  • PDF

An Analytical Study on the Stem-Growth by the Principal Component and Canonical Correlation Analyses (주성분(主成分) 및 정준상관분석(正準相關分析)에 의(依)한 수간성장(樹幹成長) 해석(解析)에 관(關)하여)

  • Lee, Kwang Nam
    • Journal of Korean Society of Forest Science
    • /
    • v.70 no.1
    • /
    • pp.7-16
    • /
    • 1985
  • To grasp canonical correlations, their related backgrounds in various growth factors of stem, the characteristics of stem by synthetical dispersion analysis, principal component analysis and canonical correlation analysis as optimum method were applied to Larix leptolepis. The results are as follows; 1) There were high or low correlation among all factors (height ($x_1$), clear height ($x_2$), form height ($x_3$), breast height diameter (D. B. H.: $x_4$), mid diameter ($x_5$), crown diameter ($x_6$) and stem volume ($x_7$)) except normal form factor ($x_8$). Especially stem volume showed high correlation with the D.B.H., height, mid diameter (cf. table 1). 3) (1) Canonical correlation coefficients and canonical variate between stem volume and composite variate of various height growth factors ($x_1$, $x_2$ and $x_3$) are ${\gamma}_{u1,v1}=0.82980^{**}$, $\{u_1=1.00000x_7\\v_1=1.08323x_1-0.04299x_2-0.07080x_3$. (2) Those of stem volume and composite variate of various diameter growth factors ($x_4$, $x_5$ and $x_6$) are ${\gamma}_{u1,v1}=0.98198^{**}$, $\{{u_1=1.00000x_7\\v_1=0.86433x_4+0.11996x_5+0.02917x_6$. (3) And canonical correlation between stem volume and composite variate of six factors including various heights and diameters are ${\gamma}_{u1,v1}=0.98700^{**}$, $\{^u_1=1.00000x_7\\v1=0.12948x_1+0.00291x_2+0.03076x_3+0.76707x_4+0.09107x_5+0.02576x_6$. All the cases showed the high canonical correlation. Height in the case of (1), D.B.H. in that of (2), and the D.B.H, and height in that of (3) respectively make an absolute contribution to the canonical correlation. Synthetical characteristics of each qualitative growth are largely affected by each factor. Especially in the case of (3) the influence by the D.B.H. is the most significant in the above six factors (cf. table 2). 3) Canonical correlation coefficient and canonical variate between composite variate of various height growth factors and that of the various diameter factors are ${\gamma}_{u1,v1}=0.78556^{**}$, $\{u_1=1.20569x_1-0.04444x_2-0.21696x_3\\v_1=1.09571x_4-0.14076x_5+0.05285x_6$. As shown in the above facts, only height and D.B.H. affected considerably to the canonical correlation. Thus, it was revealed that the synthetical characteristics of height growth was determined by height and those of the growth in thickness by D.B.H., respectively (cf. table 2). 4) Synthetical characteristics (1st-3rd principal component) derived from eight growth factors of stem, on the basis of 85% accumulated proportion aimed, are as follows; Ist principal component ($z_1$): $Z_1=0.40192x_1+0.23693x_2+0.37047x_3+0.41745x_4+0.41629x_5+0.33454x_60.42798x_7+0.04923x_8$, 2nd principal component ($z_2$): $z_2=-0.09306x_1-0.34707x_2+0.08372x_3-0.03239x_4+0.11152x_5+0.00012x_6+0.02407x_7+0.92185x_8$, 3rd principal component ($z_3$): $Z_3=0.19832x_1+0.68210x_2+0.35824x_3-0.22522x_4-0.20876x_5-0.42373x_6-0.15055x_7+0.26562x_8$. The first principal component ($z_1$) as a "size factor" showed the high information absorption power with 63.26% (proportion), and its principal component score is determined by stem volume, D.B.H., mid diameter and height, which have considerably high factor loading. The second principal component ($z_2$) is the "shape factor" which indicates cubic similarity of the stem and its score is formed under the absolute influence of normal form factor. The third principal component ($z_3$) is the "shape factor" which shows the degree of thickness and length of stem. These three principal components have the satisfactory information absorption power with 88.36% of the accumulated percentage. variance (cf. table 3). 5) Thus the principal component and canonical correlation analyses could be applied to the field of forest measurement, judgement of site qualities, management diagnoses for the forest management and the forest products industries, and the other fields which require the assessment of synthetical characteristics.

  • PDF