• Title/Summary/Keyword: Data Sets

Search Result 3,761, Processing Time 0.036 seconds

Development an Artificial Neural Network to Predict Infectious Bronchitis Virus Infection in Laying Hen Flocks (산란계의 전염성 기관지염을 예측하기 위한 인공신경망 모형의 개발)

  • Pak Son-Il;Kwon Hyuk-Moo
    • Journal of Veterinary Clinics
    • /
    • v.23 no.2
    • /
    • pp.105-110
    • /
    • 2006
  • A three-layer, feed-forward artificial neural network (ANN) with sixteen input neurons, three hidden neurons, and one output neuron was developed to identify the presence of infectious bronchitis (IB) infection as early as possible in laying hen flocks. Retrospective data from flocks that enrolled IB surveillance program between May 2003 and November 2005 were used to build the ANN. Data set of 86 flocks was divided randomly into two sets: 77 cases for training set and 9 cases for testing set. Input factors were 16 epidemiological findings including characteristics of the layer house, management practice, flock size, and the output was either presence or absence of IB. ANN was trained using training set with a back-propagation algorithm and test set was used to determine the network's capability to predict outcomes that it has never seen. Diagnostic performance of the trained network was evaluated by constructing receiver operating characteristic (ROC) curve with the area under the curve (AUC), which were also used to determine the best positivity criterion for the model. Several different ANNs with different structures were created. The best-fitted trained network, IBV_D1, was able to predict IB in 73 cases out of 77 (diagnostic accuracy 94.8%) in the training set. Sensitivity and specificity of the trained neural network was 95.5% (42/44, 95% CI, 84.5-99.4) and 93.9% (31/33, 95% CI, 79.8-99.3), respectively. For testing set, AVC of the ROC curve for the IBV_D1 network was 0.948 (SE=0.086, 95% CI 0.592-0.961) in recognizing IB infection status accurately. At a criterion of 0.7149, the diagnostic accuracy was the highest with a 88.9% with the highest sensitivity of 100%. With this value of sensitivity and specificity together with assumed 44% of IB prevalence, IBV_D1 network showed a PPV of 80% and an NPV of 100%. Based on these findings, the authors conclude that neural network can be successfully applied to the development of a screening model for identifying IB infection in laying hen flocks.

Semantic Process Retrieval with Similarity Algorithms (유사도 알고리즘을 활용한 시맨틱 프로세스 검색방안)

  • Lee, Hong-Joo;Klein, Mark
    • Asia pacific journal of information systems
    • /
    • v.18 no.1
    • /
    • pp.79-96
    • /
    • 2008
  • One of the roles of the Semantic Web services is to execute dynamic intra-organizational services including the integration and interoperation of business processes. Since different organizations design their processes differently, the retrieval of similar semantic business processes is necessary in order to support inter-organizational collaborations. Most approaches for finding services that have certain features and support certain business processes have relied on some type of logical reasoning and exact matching. This paper presents our approach of using imprecise matching for expanding results from an exact matching engine to query the OWL(Web Ontology Language) MIT Process Handbook. MIT Process Handbook is an electronic repository of best-practice business processes. The Handbook is intended to help people: (1) redesigning organizational processes, (2) inventing new processes, and (3) sharing ideas about organizational practices. In order to use the MIT Process Handbook for process retrieval experiments, we had to export it into an OWL-based format. We model the Process Handbook meta-model in OWL and export the processes in the Handbook as instances of the meta-model. Next, we need to find a sizable number of queries and their corresponding correct answers in the Process Handbook. Many previous studies devised artificial dataset composed of randomly generated numbers without real meaning and used subjective ratings for correct answers and similarity values between processes. To generate a semantic-preserving test data set, we create 20 variants for each target process that are syntactically different but semantically equivalent using mutation operators. These variants represent the correct answers of the target process. We devise diverse similarity algorithms based on values of process attributes and structures of business processes. We use simple similarity algorithms for text retrieval such as TF-IDF and Levenshtein edit distance to devise our approaches, and utilize tree edit distance measure because semantic processes are appeared to have a graph structure. Also, we design similarity algorithms considering similarity of process structure such as part process, goal, and exception. Since we can identify relationships between semantic process and its subcomponents, this information can be utilized for calculating similarities between processes. Dice's coefficient and Jaccard similarity measures are utilized to calculate portion of overlaps between processes in diverse ways. We perform retrieval experiments to compare the performance of the devised similarity algorithms. We measure the retrieval performance in terms of precision, recall and F measure? the harmonic mean of precision and recall. The tree edit distance shows the poorest performance in terms of all measures. TF-IDF and the method incorporating TF-IDF measure and Levenshtein edit distance show better performances than other devised methods. These two measures are focused on similarity between name and descriptions of process. In addition, we calculate rank correlation coefficient, Kendall's tau b, between the number of process mutations and ranking of similarity values among the mutation sets. In this experiment, similarity measures based on process structure, such as Dice's, Jaccard, and derivatives of these measures, show greater coefficient than measures based on values of process attributes. However, the Lev-TFIDF-JaccardAll measure considering process structure and attributes' values together shows reasonably better performances in these two experiments. For retrieving semantic process, we can think that it's better to consider diverse aspects of process similarity such as process structure and values of process attributes. We generate semantic process data and its dataset for retrieval experiment from MIT Process Handbook repository. We suggest imprecise query algorithms that expand retrieval results from exact matching engine such as SPARQL, and compare the retrieval performances of the similarity algorithms. For the limitations and future work, we need to perform experiments with other dataset from other domain. And, since there are many similarity values from diverse measures, we may find better ways to identify relevant processes by applying these values simultaneously.

The Situation of Genetic Exchange in Duroc Breed and Impacts on Genetic Evaluation (국내 듀록의 종돈장간의 교류현황과 유전능력평가에 미치는 효과)

  • Seo, Jae-Ho;Shin, Ji-Seob;Noh, Jae-Kwang;Song, Chi-Eun;Do, Chang-Hee
    • Journal of Animal Science and Technology
    • /
    • v.53 no.5
    • /
    • pp.397-408
    • /
    • 2011
  • The study was carried to identify the impact on nation-wide genetic evaluation and to obtain basic materials for the development of strategies in Swine Improvement Network Project (SINP). Data consisted of pedigree records of 235,511 and performance records of 70,747 for Duroc from 1987 to 2010 were collected by Korea Animal Improvement Association. Performance traits included three point back fat thickness (Shoulder, Belly, Waist), loin area, days to 90 kg and average daily gain. Exchange of genetic resources cross the breeding farms was not high, and furthermore the sizable farms which can accommodate genetic evaluation within the farm were scarce. Three data sets (individual farm evaluation: I, two sub-group evaluation: S, and whole eight farm evaluation: P) were used for genetic analysis. Genetic variances were larger in subordinate farms than in joiners farms for connectedness, and consequently the heritabilities were generally higher in subordinate farms than in joiner farms with I. The standard errors of heritability were small in the order of I, S and P. Estimated average inbreeding coefficients were 1.12%, 0.95% and 1.53% for joiner and subordinate group with S and population with P, respectively. The estimated correlations of breeding values with I and P were lowest. The correlations of breeding values with I and P for traits ranged 0.22 to 0.45 for moved parent animals and 0.24 to 0.72 for all animals. The results in the study suggest that nation-wide evaluation uses more pedigree information and improves accuracy. Furthermore SINP for connectedness could help to improve the accuracy of evaluation.

Simulation of Local Climate and Crop Productivity in Andong after Multi-Purpose Dam Construction (임하 다목적댐 건설 후 주변지역 기후 및 작물생산력 변화)

  • 윤진일;황재문;이순구
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.42 no.5
    • /
    • pp.579-596
    • /
    • 1997
  • A simulation study was carried out to delineate potential effects of the lake-induced climate change on crop productivity around Lake Imha which was formed after a multi-purpose dam construction in Andong, Korea. Twenty seven cropping zones were identified within the 30 km by 25 km study area. Five automated weather stations were installed within the study area and operated for five years after the lake formation. A geostatistical method was used to calculate the monthly climatological normals of daily maximum and minimum temperature, solar radiation and precipitation for each cropping zone before and after the dam construction. Daily weather data sets for 30 years were generated for each cropping zone from the monthly normals data representing "No lake" and "After lake" climatic scenarios, respectively. They were fed into crop models (ORYZA1 for rice, SOYGRO for soybean, CERES-maize for corn) to simulate the yield potential of each cropping zone. Calculated daily maximum temperature was higher after the dam construction for the period of October through March and lower for the remaining months except June and July. Decrease in daily minimum temperature was predicted for the period of April through August. Monthly total radiation was predicted to decrease after the lake formation in all the months except February, June, and September and the largest drop was found in winter. But there was no consistent pattern in precipitation change. According to the model calculation, the number of cropping zones which showed a decreased yield potential was 2 for soybean and 6 for corn out of 27 zones with a 10 to 17% yield drop. Little change in yield potential was found at most cropping zones in the case of paddy rice, but interannual variation was predicted to increase after the lake formation. the lake formation.

  • PDF

A Study on the Transformation of Traditional Laboratories into Instructional Media Centers for Education of Library and Information Science (문헌정보학 실습실의 교수매체 센터화에 관한 연구)

  • Lee, Man-Soo
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.34 no.1
    • /
    • pp.265-295
    • /
    • 2000
  • Education of library and information science must focus on practical education acted upon as a laboratory room in the characteristics of learning, because it cultivates a librarian as an information expert who can conduct professional affairs and services, applying traditional theory to the practical business of library and information. This dissertation suggested a new paradigm of an instructional media center as an advanced laboratory room which faithfully can run the curriculum of a library and information science for cultivating librarians, information experts who can satisfy the 21C information society. To carry out this purpose, I considered the various opinions of professors and librarians, after investigating and analyzing facilities and furnishings of laboratory rooms and teaching and learning data related to departments of library and information science in 32 universities. These contents can be summarized as follows : 1) Constructional media centers connected to education of library and information science sets laboratory rooms for practical classification and cataloging classes; laboratory rooms for film media which can utilize advanced media, listening tools, and practical materials; information management laboratory rooms which can experience the various information research methods through the Internet, cultivate the ability of information application, and teach the curriculum of library and information science related to computers. 2) Arrangement plans linked to laboratory rooms for classification and cataloging, one for film media, and one for information proceedings are as follows: , , and . 3) The size of each room is $162m^2$ (49.1pying); the number of persons to be admitted is about 40 to 50; each room has one media expert and one assistant as operating manager of exclusive responsibility. 4) Instructional & learning data which must be contained as instructional media of library and information science include computers, marginal tools related to it, listening materials, supplies for ordering books, teaching aids containing various equipment and tools, textbooks for practice, books connected to classification and cataloging for practice, and textbooks related to practical subjects and reference books. 5) Industrial media centers belonging to library and information science require for practice, general furnishings like bookshelves, and various material depository boxes.

  • PDF

A STUDY ON TEMPERATURE VARIATION OF THE UPPER THERMOSPHERE IN THE HIGH LATITUDE THROUGH THE ANALYSIS OF 6300 $\AA$ AIRGLOW DATA (6300 $\AA$ 대기광 자료 분석을 통한 고위도 열권 상부에서의 온도 변화)

  • 정종균;김용하;원영인;이방용
    • Journal of Astronomy and Space Sciences
    • /
    • v.14 no.1
    • /
    • pp.94-108
    • /
    • 1997
  • The temperature of the upper thermosphere is generally varied with the solar activity, and largely with geomagnetic activity in the high latitude. The data analyzed in this study are acquired at two ground stations, Thule Air Base($76,5{deg} N, 68.4{deg} W, A = 86{deg}$) and $S{psi}ndre Str{psi}mfjord (67.0{deg} N, 50.9{deg} W, A = 74{deg}$), Greenland. Both stations are located in the high latitude not only geographically but also geomagnetically. The terrestrial night glow at 6300 ${angs}$ from atomic oxygen has been observed from the two ground-based Fabry-Perot interferometers, during periods of 1986~1991 in Thule Air Base and 1986~1994 in $S{psi}ndre Str{psi}mfjord$. Some features noted in this study are as follows: (1) The correlation between the solar activity and the measured thermospheric temperature is highest in the case of $3{leq}Kp{leq}4$ in Thule, and increases with the geomagnetic activity in $S{psi}ndre Str{psi}mfjord$. (2) The measured temperatures at Thule is generally higher than those at $S{psi}ndre Str{psi}mfjord$, but the latter shows steeper slope with the solar activity. (3) The harmonic analysis shows that the diurnal variation(24hrs) is the main feature of the daily temperature variation with a temperature peak at about 13-14 LT (LT=UT-4). However, the semi-diurnal variation is evident during the period of weak solar activity. (4) Generally the predicted temperatures from both MSIS86 and VSH models are lower than the measured temperature, and this discrepancy grows as the solar activity increases. Therefore, we urge modelers to develope a new thermospheric model utilizing broader sets of measurements, especially for high solar activity.

  • PDF

Recognizing the Direction of Action using Generalized 4D Features (일반화된 4차원 특징을 이용한 행동 방향 인식)

  • Kim, Sun-Jung;Kim, Soo-Wan;Choi, Jin-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.5
    • /
    • pp.518-528
    • /
    • 2014
  • In this paper, we propose a method to recognize the action direction of human by developing 4D space-time (4D-ST, [x,y,z,t]) features. For this, we propose 4D space-time interest points (4D-STIPs, [x,y,z,t]) which are extracted using 3D space (3D-S, [x,y,z]) volumes reconstructed from images of a finite number of different views. Since the proposed features are constructed using volumetric information, the features for arbitrary 2D space (2D-S, [x,y]) viewpoint can be generated by projecting the 3D-S volumes and 4D-STIPs on corresponding image planes in training step. We can recognize the directions of actors in the test video since our training sets, which are projections of 3D-S volumes and 4D-STIPs to various image planes, contain the direction information. The process for recognizing action direction is divided into two steps, firstly we recognize the class of actions and then recognize the action direction using direction information. For the action and direction of action recognition, with the projected 3D-S volumes and 4D-STIPs we construct motion history images (MHIs) and non-motion history images (NMHIs) which encode the moving and non-moving parts of an action respectively. For the action recognition, features are trained by support vector data description (SVDD) according to the action class and recognized by support vector domain density description (SVDDD). For the action direction recognition after recognizing actions, each actions are trained using SVDD according to the direction class and then recognized by SVDDD. In experiments, we train the models using 3D-S volumes from INRIA Xmas Motion Acquisition Sequences (IXMAS) dataset and recognize action direction by constructing a new SNU dataset made for evaluating the action direction recognition.

A Methodology for Automatic Multi-Categorization of Single-Categorized Documents (단일 카테고리 문서의 다중 카테고리 자동확장 방법론)

  • Hong, Jin-Sung;Kim, Namgyu;Lee, Sangwon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.77-92
    • /
    • 2014
  • Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we propose a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. First, we attempt to find the relationship between documents and topics by using the result of topic analysis for single-categorized documents. Second, we construct a correspondence table between topics and categories by investigating the relationship between them. Finally, we calculate the matching scores for each document to multiple categories. The results imply that a document can be classified into a certain category if and only if the matching score is higher than the predefined threshold. For example, we can classify a certain document into three categories that have larger matching scores than the predefined threshold. The main contribution of our study is that our methodology can improve the applicability of traditional multi-category classifiers by generating multi-categorized documents from single-categorized documents. Additionally, we propose a module for verifying the accuracy of the proposed methodology. For performance evaluation, we performed intensive experiments with news articles. News articles are clearly categorized based on the theme, whereas the use of vulgar language and slang is smaller than other usual text document. We collected news articles from July 2012 to June 2013. The articles exhibit large variations in terms of the number of types of categories. This is because readers have different levels of interest in each category. Additionally, the result is also attributed to the differences in the frequency of the events in each category. In order to minimize the distortion of the result from the number of articles in different categories, we extracted 3,000 articles equally from each of the eight categories. Therefore, the total number of articles used in our experiments was 24,000. The eight categories were "IT Science," "Economy," "Society," "Life and Culture," "World," "Sports," "Entertainment," and "Politics." By using the news articles that we collected, we calculated the document/category correspondence scores by utilizing topic/category and document/topics correspondence scores. The document/category correspondence score can be said to indicate the degree of correspondence of each document to a certain category. As a result, we could present two additional categories for each of the 23,089 documents. Precision, recall, and F-score were revealed to be 0.605, 0.629, and 0.617 respectively when only the top 1 predicted category was evaluated, whereas they were revealed to be 0.838, 0.290, and 0.431 when the top 1 - 3 predicted categories were considered. It was very interesting to find a large variation between the scores of the eight categories on precision, recall, and F-score.

Estimation of Ground-level PM10 and PM2.5 Concentrations Using Boosting-based Machine Learning from Satellite and Numerical Weather Prediction Data (부스팅 기반 기계학습기법을 이용한 지상 미세먼지 농도 산출)

  • Park, Seohui;Kim, Miae;Im, Jungho
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.2
    • /
    • pp.321-335
    • /
    • 2021
  • Particulate matter (PM10 and PM2.5 with a diameter less than 10 and 2.5 ㎛, respectively) can be absorbed by the human body and adversely affect human health. Although most of the PM monitoring are based on ground-based observations, they are limited to point-based measurement sites, which leads to uncertainty in PM estimation for regions without observation sites. It is possible to overcome their spatial limitation by using satellite data. In this study, we developed machine learning-based retrieval algorithm for ground-level PM10 and PM2.5 concentrations using aerosol parameters from Geostationary Ocean Color Imager (GOCI) satellite and various meteorological parameters from a numerical weather prediction model during January to December of 2019. Gradient Boosted Regression Trees (GBRT) and Light Gradient Boosting Machine (LightGBM) were used to estimate PM concentrations. The model performances were examined for two types of feature sets-all input parameters (Feature set 1) and a subset of input parameters without meteorological and land-cover parameters (Feature set 2). Both models showed higher accuracy (about 10 % higher in R2) by using the Feature set 1 than the Feature set 2. The GBRT model using Feature set 1 was chosen as the final model for further analysis(PM10: R2 = 0.82, nRMSE = 34.9 %, PM2.5: R2 = 0.75, nRMSE = 35.6 %). The spatial distribution of the seasonal and annual-averaged PM concentrations was similar with in-situ observations, except for the northeastern part of China with bright surface reflectance. Their spatial distribution and seasonal changes were well matched with in-situ measurements.

Analysis of the Characteristics of Water Quality Difference Occurring between High Tide and Low Tide in Masan Bay (만조와 간조시 마산만 수질의 농도차 발생 특성의 분석)

  • Yoo, Youngjin;Kim, Sung Jae
    • Journal of Wetlands Research
    • /
    • v.21 no.2
    • /
    • pp.102-113
    • /
    • 2019
  • Slack-tide sampling was carried out at 6 stations at high and low tide for a tidal cycle during spring tide of the early summer (June) and summer (July, August) of 2016 to determine the difference of water quality according to tide in Masan Bay, Korea. The mixing regime of all the water quality components investigated was well explained through the correlation with SAL. In the early summer and summer, TURB, DSi and NNN which mainly flow into the bay from the streams and SS, COD, AMN and $H_2S$ which mainly indicate the internal sink and source materials have a property of conservative mixing and non-conservative mixing, respectively. The conservative mixing showed a good linear relationship of the water quality between high and low tide, and the non-conservative mixing showed a variation of different pattern each other. Factor analysis performed on the concentration difference data sets between high and low tide helped in identifying the principal latent variables for them. In early summer, multiple effects (tidal action, natural influx and internal sinks and sources etc.) acted in combination for the differences to be distributed evenly in four factors (VF1~4), since there were few allochthonous inputs as a low-water season. On the contrary, in summer, the parameters showing large concentration difference at ST-1 affected by stream water were concentrated in one factor (VF1) and clearly distinguished from the parameters affected by the internal sinks and sources. In fact, there is no estuary (bay) that always maintains steady state flow conditions. The mixing regime of an estuary might be changed at any time due to the change of flushing time, and furthermore the change of end-member conditions due to the internal sinks and sources makes the occurrence of concentration difference inevitable. Therefore, when investigating the water quality of the estuary, it is necessary to take a sampling method considering the tide to obtain average water quality data.