• Title/Summary/Keyword: data validation

Search Result 3,346, Processing Time 0.034 seconds

An Efficient RDF Query Validation for Access Authorization in Subsumption Inference (포함관계 추론에서 접근 권한에 대한 효율적 RDF 질의 유효성 검증)

  • Kim, Jae-Hoon;Park, Seog
    • Journal of KIISE:Databases
    • /
    • v.36 no.6
    • /
    • pp.422-433
    • /
    • 2009
  • As an effort to secure Semantic Web, in this paper, we introduce an RDF access authorization model based on an ontology hierarchy and an RDF triple pattern. In addition, we apply the authorization model to RDF query validation for approved access authorizations. A subscribed SPARQL or RQL query, which has RDF triple patterns, can be denied or granted according to the corresponding access authorizations which have an RDF triple pattern. In order to efficiently perform the query validation process, we first analyze some primary authorization conflict conditions under RDF subsumption inference, and then we introduce an efficient query validation algorithm using the conflict conditions and Dewey graph labeling technique. Through experiments, we also show that the proposed validation algorithm provides a reasonable validation time and when data and authorizations increase it has scalability.

Digital Forensics: Review of Issues in Scientific Validation of Digital Evidence

  • Arshad, Humaira;Jantan, Aman Bin;Abiodun, Oludare Isaac
    • Journal of Information Processing Systems
    • /
    • v.14 no.2
    • /
    • pp.346-376
    • /
    • 2018
  • Digital forensics is a vital part of almost every criminal investigation given the amount of information available and the opportunities offered by electronic data to investigate and evidence a crime. However, in criminal justice proceedings, these electronic pieces of evidence are often considered with the utmost suspicion and uncertainty, although, on occasions are justifiable. Presently, the use of scientifically unproven forensic techniques are highly criticized in legal proceedings. Nevertheless, the exceedingly distinct and dynamic characteristics of electronic data, in addition to the current legislation and privacy laws remain as challenging aspects for systematically attesting evidence in a court of law. This article presents a comprehensive study to examine the issues that are considered essential to discuss and resolve, for the proper acceptance of evidence based on scientific grounds. Moreover, the article explains the state of forensics in emerging sub-fields of digital technology such as, cloud computing, social media, and the Internet of Things (IoT), and reviewing the challenges which may complicate the process of systematic validation of electronic evidence. The study further explores various solutions previously proposed, by researchers and academics, regarding their appropriateness based on their experimental evaluation. Additionally, this article suggests open research areas, highlighting many of the issues and problems associated with the empirical evaluation of these solutions for immediate attention by researchers and practitioners. Notably, academics must react to these challenges with appropriate emphasis on methodical verification. Therefore, for this purpose, the issues in the experiential validation of practices currently available are reviewed in this study. The review also discusses the struggle involved in demonstrating the reliability and validity of these approaches with contemporary evaluation methods. Furthermore, the development of best practices, reliable tools and the formulation of formal testing methods for digital forensic techniques are highlighted which could be extremely useful and of immense value to improve the trustworthiness of electronic evidence in legal proceedings.

Estimation of the Hapcheon Dam Inflow Using HSPF Model (HSPF 모형을 이용한 합천댐 유입량 추정)

  • Cho, Hyun Kyung;Kim, Sang Min
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.61 no.5
    • /
    • pp.69-77
    • /
    • 2019
  • The objective of this study was to calibrate and validate the HSPF (Hydrological Simulation Program-Fortran) model for estimating the runoff of the Hapcheon dam watershed. Spatial data, such as watershed, stream, land use, and a digital elevation map, were used as input data for the HSPF model. Observed runoff data from 2000 to 2016 in study watershed were used for calibration and validation. Hydrologic parameters for runoff calibration were selected based on the user's manual and references, and trial and error method was used for parameter calibration. The $R^2$, RMSE (root-mean-square error), RMAE (relative mean absolute error), and NSE (Nash-Sutcliffe efficiency coefficient) were used to evaluate the model's performance. Calibration and validation results showed that annual mean runoff was within ${\pm}4%$ error. The model performance criteria for calibration and validation showed that $R^2$ was in the rang of 0.78 to 0.83, RMSE was 2.55 to 2.76 mm/day, RMAE was 0.46 to 0.48 mm/day, and NSE was 0.81 to 0.82 for daily runoff. The amount of inflow to Hapcheon Dam was calculated from the calibrated HSPF model and the result was compared with observed inflow, which was -0.9% error. As a result of analyzing the relation between inflow and storage capacity, it was found that as the inflow increases, the storage increases, and when the inflow decreases, the storage also decreases. As a result of correlation between inflow and storage, $R^2$ of the measured inflow and storage was 0.67, and the simulated inflow and storage was 0.61.

A Study on the Land Cover Classification and Cross Validation of AI-based Aerial Photograph

  • Lee, Seong-Hyeok;Myeong, Soojeong;Yoon, Donghyeon;Lee, Moung-Jin
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.4
    • /
    • pp.395-409
    • /
    • 2022
  • The purpose of this study is to evaluate the classification performance and applicability when land cover datasets constructed for AI training are cross validation to other areas. For study areas, Gyeongsang-do and Jeolla-do in South Korea were selected as cross validation areas, and training datasets were obtained from AI-Hub. The obtained datasets were applied to the U-Net algorithm, a semantic segmentation algorithm, for each region, and the accuracy was evaluated by applying them to the same and other test areas. There was a difference of about 13-15% in overall classification accuracy between the same and other areas. For rice field, fields and buildings, higher accuracy was shown in the Jeolla-do test areas. For roads, higher accuracy was shown in the Gyeongsang-do test areas. In terms of the difference in accuracy by weight, the result of applying the weights of Gyeongsang-do showed high accuracy for forests, while that of applying the weights of Jeolla-do showed high accuracy for dry fields. The result of land cover classification, it was found that there is a difference in classification performance of existing datasets depending on area. When constructing land cover map for AI training, it is expected that higher quality datasets can be constructed by reflecting the characteristics of various areas. This study is highly scalable from two perspectives. First, it is to apply satellite images to AI study and to the field of land cover. Second, it is expanded based on satellite images and it is possible to use a large scale area and difficult to access.

Estimation of Pollutant Load Using Genetic-algorithm and Regression Model (유전자 알고리즘과 회귀식을 이용한 오염부하량의 예측)

  • Park, Youn Shik
    • Korean Journal of Environmental Agriculture
    • /
    • v.33 no.1
    • /
    • pp.37-43
    • /
    • 2014
  • BACKGROUND: Water quality data are collected less frequently than flow data because of the cost to collect and analyze, while water quality data corresponding to flow data are required to compute pollutant loads or to calibrate other hydrology models. Regression models are applicable to interpolate water quality data corresponding to flow data. METHODS AND RESULTS: A regression model was suggested which is capable to consider flow and time variance, and the regression model coefficients were calibrated using various measured water quality data with genetic-algorithm. Both LOADEST and the regression using genetic-algorithm were evaluated by 19 water quality data sets through calibration and validation. The regression model using genetic-algorithm displayed the similar model behaviors to LOADEST. The load estimates by both LOADEST and the regression model using genetic-algorithm indicated that use of a large proportion of water quality data does not necessarily lead to the load estimates with smaller error to measured load. CONCLUSION: Regression models need to be calibrated and validated before they are used to interpolate pollutant loads, as separating water quality data into two data sets for calibration and validation.

COMPARISON OF LINEAR AND NON-LINEAR NIR CALIBRATION METHODS USING LARGE FORAGE DATABASES

  • Berzaghi, Paolo;Flinn, Peter C.;Dardenne, Pierre;Lagerholm, Martin;Shenk, John S.;Westerhaus, Mark O.;Cowe, Ian A.
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1141-1141
    • /
    • 2001
  • The aim of the study was to evaluate the performance of 3 calibration methods, modified partial least squares (MPLS), local PLS (LOCAL) and artificial neural network (ANN) on the prediction of chemical composition of forages, using a large NIR database. The study used forage samples (n=25,977) from Australia, Europe (Belgium, Germany, Italy and Sweden) and North America (Canada and U.S.A) with information relative to moisture, crude protein and neutral detergent fibre content. The spectra of the samples were collected with 10 different Foss NIR Systems instruments, which were either standardized or not standardized to one master instrument. The spectra were trimmed to a wavelength range between 1100 and 2498 nm. Two data sets, one standardized (IVAL) and the other not standardized (SVAL) were used as independent validation sets, but 10% of both sets were omitted and kept for later expansion of the calibration database. The remaining samples were combined into one database (n=21,696), which was split into 75% calibration (CALBASE) and 25% validation (VALBASE). The chemical components in the 3 validation data sets were predicted with each model derived from CALBASE using the calibration database before and after it was expanded with 10% of the samples from IVAL and SVAL data sets. Calibration performance was evaluated using standard error of prediction corrected for bias (SEP(C)), bias, slope and R2. None of the models appeared to be consistently better across all validation sets. VALBASE was predicted well by all models, with smaller SEP(C) and bias values than for IVAL and SVAL. This was not surprising as VALBASE was selected from the calibration database and it had a sample population similar to CALBASE, whereas IVAL and SVAL were completely independent validation sets. In most cases, Local and ANN models, but not modified PLS, showed considerable improvement in the prediction of IVAL and SVAL after the calibration database had been expanded with the 10% samples of IVAL and SVAL reserved for calibration expansion. The effects of sample processing, instrument standardization and differences in reference procedure were partially confounded in the validation sets, so it was not possible to determine which factors were most important. Further work on the development of large databases must address the problems of standardization of instruments, harmonization and standardization of laboratory procedures and even more importantly, the definition of the database population.

  • PDF

Efficient Verifiable Top-k Queries in Two-tiered Wireless Sensor Networks

  • Dai, Hua;Yang, Geng;Huang, Haiping;Xiao, Fu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.6
    • /
    • pp.2111-2131
    • /
    • 2015
  • Tiered wireless sensor network is a network model of flexibility and robustness, which consists of the traditional resource-limited sensor nodes and the resource-abundant storage nodes. In such architecture, collected data from the sensor nodes are periodically submitted to the nearby storage nodes for archive purpose. When a query is requested, storage nodes also process the query and return qualified data as the result to the base station. The role of the storage nodes leads to an attack prone situation and leaves them more vulnerable in a hostile environment. If any of them is compromised, fake data may be injected into and/or qualified data may be discarded. And the base station would receive incorrect answers incurring malfunction to applications. In this paper, an efficient verifiable top-k query processing scheme called EVTQ is proposed, which is capable of verifying the authentication and completeness of the results. Collected data items with the embedded information of ordering and adjacent relationship through a hashed message authentication coding function, which serves as a validation code, are submitted from the sensor nodes to the storage nodes. Any injected or incomplete data in the returned result from a corresponded storage node is detected by the validation code at the base station. For saving communication cost, two optimized solutions that fuse and compress validation codes are presented. Experiments on communication cost show the proposed method is more efficiency than previous works.

Validation Comparison of Credit Rating Models for Categorized Financial Data (범주형 재무자료에 대한 신용평가모형 검증 비교)

  • Hong, Chong-Sun;Lee, Chang-Hyuk;Kim, Ji-Hun
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.4
    • /
    • pp.615-631
    • /
    • 2008
  • Current credit evaluation models based on only financial data except non-financial data are used continuous data and produce credit scores for the ranking. In this work, some problems of the credit evaluation models based on transformed continuous financial data are discussed and we propose improved credit evaluation models based on categorized financial data. After analyzing and comparing goodness-of-fit tests of two models, the availability of the credit evaluation models for categorized financial data is explained.

Design and Implementation of a Geospatial Data Visualization System Considering Validation and Independency of GML Documents (GML 문서의 유효성 및 독립성을 고려한 지리공간 데이터 가시화 시스템 설계 및 구현)

  • Jeong, Dong-Won;Kim, Jang-Won;Ahn, Si-Hoon;Jeong, Young-Sik
    • Journal of Information Technology Services
    • /
    • v.7 no.1
    • /
    • pp.205-218
    • /
    • 2008
  • This paper proposes a geospatial data visualization system supporting validation of GML documents. GIS systems manage and use both of spatial and non-spatial data. Currently, most GIS systems represent spatial data in GML (Geography Markup Language) developed by OGC. GML is a language for representation and sharing of spatial information, and until now many systems have been developed in GML. GML does not support expression of non-spatial data, i.e., relational information of spatial objects, and thus most systems extend GML to describe non-spatial information. However, it causes an issue that the systems only accepting standard GML documents cannot process the extended documents. In this paper, we propose a new GIS data visualization system to resolve the aforementioned Issue. Our proposed system allows the representation of both types of data supporting independency of spatial data and non-spatial data. It enhances interoperability with other relevant systems. Therefore, we can develop a rich and high Quality geospatial information services.

Carbonation depth prediction of concrete bridges based on long short-term memory

  • Youn Sang Cho;Man Sung Kang;Hyun Jun Jung;Yun-Kyu An
    • Smart Structures and Systems
    • /
    • v.33 no.5
    • /
    • pp.325-332
    • /
    • 2024
  • This study proposes a novel long short-term memory (LSTM)-based approach for predicting carbonation depth, with the aim of enhancing the durability evaluation of concrete structures. Conventional carbonation depth prediction relies on statistical methodologies using carbonation influencing factors and in-situ carbonation depth data. However, applying in-situ data for predictive modeling faces challenges due to the lack of time-series data. To address this limitation, an LSTM-based carbonation depth prediction technique is proposed. First, training data are generated through random sampling from the distribution of carbonation velocity coefficients, which are calculated from in-situ carbonation depth data. Subsequently, a Bayesian theorem is applied to tailor the training data for each target bridge, which are depending on surrounding environmental conditions. Ultimately, the LSTM model predicts the time-dependent carbonation depth data for the target bridge. To examine the feasibility of this technique, a carbonation depth dataset from 3,960 in-situ bridges was used for training, and untrained time-series data from the Miho River bridge in the Republic of Korea were used for experimental validation. The results of the experimental validation demonstrate a significant reduction in prediction error from 8.19% to 1.75% compared with the conventional statistical method. Furthermore, the LSTM prediction result can be enhanced by sequentially updating the LSTM model using actual time-series measurement data.