• Title/Summary/Keyword: post data processing

Search Result 558, Processing Time 0.028 seconds

Methodology of Automatic Editing for Academic Writing Using Bidirectional RNN and Academic Dictionary (양방향 RNN과 학술용어사전을 이용한 영문학술문서 교정 방법론)

  • Roh, Younghoon;Chang, Tai-Woo;Won, Jongwun
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.2
    • /
    • pp.175-192
    • /
    • 2022
  • Artificial intelligence-based natural language processing technology is playing an important role in helping users write English-language documents. For academic documents in particular, the English proofreading services should reflect the academic characteristics using formal style and technical terms. But the services usually does not because they are based on general English sentences. In addition, since existing studies are mainly for improving the grammatical completeness, there is a limit of fluency improvement. This study proposes an automatic academic English editing methodology to deliver the clear meaning of sentences based on the use of technical terms. The proposed methodology consists of two phases: misspell correction and fluency improvement. In the first phase, appropriate corrective words are provided according to the input typo and contexts. In the second phase, the fluency of the sentence is improved based on the automatic post-editing model of the bidirectional recurrent neural network that can learn from the pair of the original sentence and the edited sentence. Experiments were performed with actual English editing data, and the superiority of the proposed methodology was verified.

High-Resolution Seismic Reflection Profiling on Land with Hydrophones Employed in the Stream-Water Driven Trench (하천수유입과 하이드로폰을 이용한 육상 고분해능 탄성파반사법탐사)

  • Kim Ji-Soo;Han Su-Hyung;Kim Hak-Soo;Choi Won-Suk;Jung Chang-Ho
    • Geophysics and Geophysical Exploration
    • /
    • v.4 no.4
    • /
    • pp.133-144
    • /
    • 2001
  • An effective seismic reflection technique for mapping the cavities and bedrock surface in carbonate rocks is described. The high resolution seismic reflection images were successfully registered by using the hydrophones employed in the stream-water driven trench, and were effectively focused by applying optimal data processing sequences. The strategy included enhancement of the signal interfered with the large-amplitude scattering noise, through pre- and post stack processing such as time-variant filtering, bad-trace editing, residual statics, velocity analysis, and careful muting after NMO (normal moveout) correction. The major reflections including the bedrock surface were mapped with the desired resolution and were correlated to the seismic crosshole tomographic data. Shallow major reflectors could be identified and analyzed on the AGC (auto gain control)-applied field records. Three subhorizontal layers were identified with their distinct velocities; overburden (<3000 m/s), sediments (3000-4000 m/s), limestone bedrock (>4000 m/s). Taking into account of no diffraction effects in the field records, gravel-rich overburdens and sediments are considered to be well sorted. Based on the images mapped consistently on the whole survey line and seismic velocity increasing with depth, this area probably lacks in sizable cavities (if any, no air-filled cavities).

  • PDF

The Use and Abuse of Climate Scenarios in Agriculture (농업부문 기후시나리오 활용의 주의점)

  • Kim, Jin-Hee;Yun, Jin I.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.18 no.3
    • /
    • pp.170-178
    • /
    • 2016
  • It is not clear how to apply the climate scenario to assess the impact of climate change in the agricultural sector. Even if you apply the same scenario, the result can vary depending on the temporal-spatial downscaling, the post-treatment to adjust the bias of a model, and the prediction model selection (used for an impact assessment). The end user, who uses the scenario climate data, should select climate factors, a spatial extend, and a temporal range appropriate for the objectives of an analysis. It is important to draw the impact assessment results with minimum uncertainty by evaluating the suitability of the data including the reproducibility of the past climate and calculating the optimum future climate change scenario. This study introduced data processing methods for reducing the uncertainties in the process of applying the future climate change scenario to users in the agricultural sector and tried to provide basic information for appropriately using the scenario data in accordance with the study objectives.

Development of Land fog Detection Algorithm based on the Optical and Textural Properties of Fog using COMS Data

  • Suh, Myoung-Seok;Lee, Seung-Ju;Kim, So-Hyeong;Han, Ji-Hye;Seo, Eun-Kyoung
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.4
    • /
    • pp.359-375
    • /
    • 2017
  • We developed fog detection algorithm (KNU_FDA) based on the optical and textural properties of fog using satellite (COMS) and ground observation data. The optical properties are dual channel difference (DCD: BT3.7 - BT11) and albedo, and the textural properties are normalized local standard deviation of IR1 and visible channels. Temperature difference between air temperature and BT11 is applied to discriminate the fog from other clouds. Fog detection is performed according to the solar zenith angle of pixel because of the different availability of satellite data: day, night and dawn/dusk. Post-processing is also performed to increase the probability of detection (POD), in particular, at the edge of main fog area. The fog probability is calculated by the weighted sum of threshold tests. The initial threshold and weighting values are optimized using sensitivity tests for the varying threshold values using receiver operating characteristic analysis. The validation results with ground visibility data for the validation cases showed that the performance of KNU_FDA show relatively consistent detection skills but it clearly depends on the fog types and time of day. The average POD and FAR (False Alarm Ratio) for the training and validation cases are ranged from 0.76 to 0.90 and from 0.41 to 0.63, respectively. In general, the performance is relatively good for the fog without high cloud and strong fog but that is significantly decreased for the weak fog. In order to improve the detection skills and stability, optimization of threshold and weighting values are needed through the various training cases.

Implementation of Precise Drone Positioning System using Differential Global Positioning System (차등 위성항법 보정을 이용한 정밀 드론 위치추적 시스템 구현)

  • Chung, Jae-Young
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.1
    • /
    • pp.14-19
    • /
    • 2020
  • This paper proposes a precise drone-positioning technique using a differential global positioning system (DGPS). The proposed system consists of a reference station for error correction data production, and a mobile station (a drone), which is the target for real-time positioning. The precise coordinates of the reference station were acquired by post-processing of received satellite data together with the reference station location data provided by government infrastructure. For the system's implementation, low-cost commercial GPS receivers were used. Furthermore, a Zigbee transmitter/receiver pair was used to wirelessly send control signals and error correction data, making the whole system affordable for personal use. To validate the system, a drone-tracking experiment was conducted. The results show that the average real-time position error is less than 0.8 m.

Fast k-NN based Malware Analysis in a Massive Malware Environment

  • Hwang, Jun-ho;Kwak, Jin;Lee, Tae-jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.12
    • /
    • pp.6145-6158
    • /
    • 2019
  • It is a challenge for the current security industry to respond to a large number of malicious codes distributed indiscriminately as well as intelligent APT attacks. As a result, studies using machine learning algorithms are being conducted as proactive prevention rather than post processing. The k-NN algorithm is widely used because it is intuitive and suitable for handling malicious code as unstructured data. In addition, in the malicious code analysis domain, the k-NN algorithm is easy to classify malicious codes based on previously analyzed malicious codes. For example, it is possible to classify malicious code families or analyze malicious code variants through similarity analysis with existing malicious codes. However, the main disadvantage of the k-NN algorithm is that the search time increases as the learning data increases. We propose a fast k-NN algorithm which improves the computation speed problem while taking the value of the k-NN algorithm. In the test environment, the k-NN algorithm was able to perform with only the comparison of the average of similarity of 19.71 times for 6.25 million malicious codes. Considering the way the algorithm works, Fast k-NN algorithm can also be used to search all data that can be vectorized as well as malware and SSDEEP. In the future, it is expected that if the k-NN approach is needed, and the central node can be effectively selected for clustering of large amount of data in various environments, it will be possible to design a sophisticated machine learning based system.

Classifier Integration Model for Image Classification (영상 분류를 위한 분류기 통합모델)

  • Park, Dong-Chul
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.2
    • /
    • pp.96-102
    • /
    • 2012
  • An advanced form of the Partitioned Feature-based Classifier with Expertise Table(PFC-ET) is proposed in this paper. As is the case with the PFC-ET, the proposed classifier model, called Classifier Integration Model(CIM), does not use the entire feature vectors extracted from the original data in a concatenated form to classify each datum, but rather uses groups of features related to each feature vector separately. The proposed CIM utilizes a proportion of selected cluster members instead of the expertise table in PFC-ET to minimize the error in confusion table. The proposed CIM is applied to the classification problem on two data sets, Caltech data set and collected terrain data sets. When compared with PFC model and PFC-ET model. the proposed CIM shows improvements in terms of classification accuracy and post processing efforts.

Efficient Subsequence Searching in Sequence Databases : A Segment-based Approach (시퀀스 데이터베이스를 위한 서브시퀀스 탐색 : 세그먼트 기반 접근 방안)

  • Park, Sang-Hyun;Kim, Sang-Wook;Loh, Woong-Kee
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.344-356
    • /
    • 2001
  • This paper deals with the subsequence searching problem under time-warping in sequence databases. Our work is motivated by the observation that subsequence searches slow down quadratically as the average length of data sequences increases. To resolve this problem, the Segment-Based Approach for Subsequence Searches (SBSS) is proposed. The SBASS divides data and query sequences into a series of segments, and retrieves all data subsequences that satisfy the two conditions: (1) the number of segments is the same as the number of segments in a query sequence, and (2) the distance of every segment pair is less than or equal to a tolerance. Our segmentation scheme allows segments to have different lengths; thus we employ the time warping distance as a similarity measure for each segment pair. For efficient retrieval of similar subsequences, we extract feature vectors from all data segments exploiting their monotonically changing properties, and build a spatial index using feature vectors. Using this index, queries are processed with the four steps: (1) R-tree filtering, (2) feature filtering, (3) successor filtering, and (4) post-processing. The effectiveness of our approach is verified through extensive experiments.

  • PDF

A Study on Classification Models for Predicting Bankruptcy Based on XAI (XAI 기반 기업부도예측 분류모델 연구)

  • Jihong Kim;Nammee Moon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.8
    • /
    • pp.333-340
    • /
    • 2023
  • Efficient prediction of corporate bankruptcy is an important part of making appropriate lending decisions for financial institutions and reducing loan default rates. In many studies, classification models using artificial intelligence technology have been used. In the financial industry, even if the performance of the new predictive models is excellent, it should be accompanied by an intuitive explanation of the basis on which the result was determined. Recently, the US, EU, and South Korea have commonly presented the right to request explanations of algorithms, so transparency in the use of AI in the financial sector must be secured. In this paper, an artificial intelligence-based interpretable classification prediction model was proposed using corporate bankruptcy data that was open to the outside world. First, data preprocessing, 5-fold cross-validation, etc. were performed, and classification performance was compared through optimization of 10 supervised learning classification models such as logistic regression, SVM, XGBoost, and LightGBM. As a result, LightGBM was confirmed as the best performance model, and SHAP, an explainable artificial intelligence technique, was applied to provide a post-explanation of the bankruptcy prediction process.

Translation of 3D CAD Data to X3D Dataset Maintaining the Product Structure (3차원 CAD 데이터의 제품구조를 포함하는 X3D 기반 데이터로의 변환 기법)

  • Cho, Gui-Mok;Hwang, Jin-Sang;Kim, Young-Kuk
    • The KIPS Transactions:PartA
    • /
    • v.18A no.3
    • /
    • pp.81-92
    • /
    • 2011
  • There has been a number of attempts to apply 3D CAD data created in the design stage of product life cycle to various applications of the other stages in related industries. But, 3D CAD data requires a large amount of computing resources for data processing, and it is not suitable for post applications such as distributed collaboration, marketing tool, or Interactive Electronic Technical Manual because of the design information security problem and the license cost. Therefore, various lightweight visualization formats and application systems have been suggested to overcome these problems. However, most of these lightweight formats are dependent on the companies or organizations which suggested them and cannot be shared with each other. In addition, product structure information is not represented along with the product geometric information. In this paper, we define a dataset called prod-X3D(Enhanced X3D Dataset for Web-based Visualization of 3D CAD Product Model) based on the international standard graphic format, X3D, which can represent the structure information as well as the geometry information of a product, and propose a translation method from 3D CAD data to an prod-X3D.