• Title/Summary/Keyword: data extract

Search Result 3,927, Processing Time 0.032 seconds

A Deep Learning Application for Automated Feature Extraction in Transaction-based Machine Learning (트랜잭션 기반 머신러닝에서 특성 추출 자동화를 위한 딥러닝 응용)

  • Woo, Deock-Chae;Moon, Hyun Sil;Kwon, Suhnbeom;Cho, Yoonho
    • Journal of Information Technology Services
    • /
    • v.18 no.2
    • /
    • pp.143-159
    • /
    • 2019
  • Machine learning (ML) is a method of fitting given data to a mathematical model to derive insights or to predict. In the age of big data, where the amount of available data increases exponentially due to the development of information technology and smart devices, ML shows high prediction performance due to pattern detection without bias. The feature engineering that generates the features that can explain the problem to be solved in the ML process has a great influence on the performance and its importance is continuously emphasized. Despite this importance, however, it is still considered a difficult task as it requires a thorough understanding of the domain characteristics as well as an understanding of source data and the iterative procedure. Therefore, we propose methods to apply deep learning for solving the complexity and difficulty of feature extraction and improving the performance of ML model. Unlike other techniques, the most common reason for the superior performance of deep learning techniques in complex unstructured data processing is that it is possible to extract features from the source data itself. In order to apply these advantages to the business problems, we propose deep learning based methods that can automatically extract features from transaction data or directly predict and classify target variables. In particular, we applied techniques that show high performance in existing text processing based on the structural similarity between transaction data and text data. And we also verified the suitability of each method according to the characteristics of transaction data. Through our study, it is possible not only to search for the possibility of automated feature extraction but also to obtain a benchmark model that shows a certain level of performance before performing the feature extraction task by a human. In addition, it is expected that it will be able to provide guidelines for choosing a suitable deep learning model based on the business problem and the data characteristics.

Improved Similarity Detection Algorithm of the Video Scene (개선된 비디오 장면 유사도 검출 알고리즘)

  • Yu, Ju-Won;Kim, Jong-Weon;Choi, Jong-Uk;Bae, Kyoung-Yul
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.2
    • /
    • pp.43-50
    • /
    • 2009
  • We proposed similarity detection method of the video frame data that extracts the feature data of own video frame and creates the 1-D signal in this paper. We get the similar frame boundary and make the representative frames within the frame boundary to extract the similarity extraction between video. Representative frames make blurring frames and extract the feature data using DOG values. Finally, we convert the feature data into the 1-D signal and compare the contents similarity. The experimental results show that the proposed algorithm get over 0.9 similarity value against noise addition, rotation change, size change, frame delete, frame cutting.

Using a Cellular Automaton to Extract Medical Information from Clinical Reports

  • Barigou, Fatiha;Atmani, Baghdad;Beldjilali, Bouziane
    • Journal of Information Processing Systems
    • /
    • v.8 no.1
    • /
    • pp.67-84
    • /
    • 2012
  • An important amount of clinical data concerning the medical history of a patient is in the form of clinical reports that are written by doctors. They describe patients, their pathologies, their personal and medical histories, findings made during interviews or during procedures, and so forth. They represent a source of precious information that can be used in several applications such as research information to diagnose new patients, epidemiological studies, decision support, statistical analysis, and data mining. But this information is difficult to access, as it is often in unstructured text form. To make access to patient data easy, our research aims to develop a system for extracting information from unstructured text. In a previous work, a rule-based approach is applied to a clinical reports corpus of infectious diseases to extract structured data in the form of named entities and properties. In this paper, we propose the use of a Boolean inference engine, which is based on a cellular automaton, to do extraction. Our motivation to adopt this Boolean modeling approach is twofold: first optimize storage, and second reduce the response time of the entities extraction.

RDNN: Rumor Detection Neural Network for Veracity Analysis in Social Media Text

  • SuthanthiraDevi, P;Karthika, S
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.3868-3888
    • /
    • 2022
  • A widely used social networking service like Twitter has the ability to disseminate information to large groups of people even during a pandemic. At the same time, it is a convenient medium to share irrelevant and unverified information online and poses a potential threat to society. In this research, conventional machine learning algorithms are analyzed to classify the data as either non-rumor data or rumor data. Machine learning techniques have limited tuning capability and make decisions based on their learning. To tackle this problem the authors propose a deep learning-based Rumor Detection Neural Network model to predict the rumor tweet in real-world events. This model comprises three layers, AttCNN layer is used to extract local and position invariant features from the data, AttBi-LSTM layer to extract important semantic or contextual information and HPOOL to combine the down sampling patches of the input feature maps from the average and maximum pooling layers. A dataset from Kaggle and ground dataset #gaja are used to train the proposed Rumor Detection Neural Network to determine the veracity of the rumor. The experimental results of the RDNN Classifier demonstrate an accuracy of 93.24% and 95.41% in identifying rumor tweets in real-time events.

Alzheimer progression classification using fMRI data (fMRI 데이터를 이용한 알츠하이머 진행상태 분류)

  • Ju Hyeon-Noh;Hee-Deok Yang
    • Smart Media Journal
    • /
    • v.13 no.4
    • /
    • pp.86-93
    • /
    • 2024
  • The development of functional magnetic resonance imaging (fMRI) has significantly contributed to mapping brain functions and understanding brain networks during rest. This paper proposes a CNN-LSTM-based classification model to classify the progression stages of Alzheimer's disease. Firstly, four preprocessing steps are performed to remove noise from the fMRI data before feature extraction. Secondly, the U-Net architecture is utilized to extract spatial features once preprocessing is completed. Thirdly, the extracted spatial features undergo LSTM processing to extract temporal features, ultimately leading to classification. Experiments were conducted by adjusting the temporal dimension of the data. Using 5-fold cross-validation, an average accuracy of 96.4% was achieved, indicating that the proposed method has high potential for identifying the progression of Alzheimer's disease by analyzing fMRI data.

Trend-based Sequential Pattern Discovery from Time-Series Data (시계열 데이터로부터의 경향성 기반 순차패턴 탐색)

  • 오용생;이동하;남도원;이전영
    • Journal of Intelligence and Information Systems
    • /
    • v.7 no.1
    • /
    • pp.27-45
    • /
    • 2001
  • Sequential discovery from time series data has mainly concerned about events or item sets. Recently, the research has stated to applied to the numerical data. An example is sensor information generated by checking a machine state. The numerical data hardly have the same valuers while making patterns. So, it is important to extract suitable number of pattern features, which can be transformed to events or item sets and be applied to sequential pattern mining tasks. The popular methods to extract the patterns are sliding window and clustering. The results of these methods are sensitive to window sine or clustering parameters; that makes users to apply data mining task repeatedly and to interpret the results. This paper suggests the method to retrieve pattern features making numerical data into vector of an angle and a magnitude. The retrieved pattern features using this method make the result easy to understand and sequential patterns finding fast. We define an inclusion relation among pattern features using angles and magnitudes of vectors. Using this relation, we can fad sequential patterns faster than other methods, which use all data by reducing the data size.

  • PDF

An Implementation of Markerless Augmented Reality Using Efficient Reference Data Sets (효율적인 레퍼런스 데이터 그룹의 활용에 의한 마커리스 증강현실의 구현)

  • Koo, Ja-Myoung;Cho, Tai-Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.11
    • /
    • pp.2335-2340
    • /
    • 2009
  • This paper presents how to implement Markerless Augmented Reality and how to create and apply reference data sets. There are three parts related with implementation: setting camera, creation of reference data set, and tracking. To create effective reference data sets, we need a 3D model such as CAD model. It is also required to create reference data sets from various viewpoints. We extract the feature points from the mode1 image and then extract 3D positions corresponding to the feature points using ray tracking. These 2D/3D correspondence point sets constitute a reference data set of the model. Reference data sets are constructed for various viewpoints of the model. Fast tracking can be done using a reference data set the most frequently matched with feature points of the present frame and model data near the reference data set.

Design and Implementation of XML Web Agent for Data Exchange and Replication between Heterogeneous DBMSs (이기종 DBMS간 데이터 교환과 복제를 위한 XML 웹 에이전트 설계 및 구현)

  • Yu, Sun-Young;Lee, Chun-Keun;Yim, Jae-Hong
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.7
    • /
    • pp.967-975
    • /
    • 2004
  • HTML is unstructured document because of using restricted tag. HTML is difficult to extract data from HTML document. But XML is able to use user definition tag, that is easy to store information. Also XML is easy to extract data from XML document. This is the reason why XML is a standard for data exchange format on the Internet, so XML is fitted to exchange data between heterogeneous DBMSs(DataBase Management System). In this paper, we designed and implemented of XML web agent for data replication between heterogeneous DBMSs. A XML web agent system controls data of DBMS, and generates a XML document from data of DBMS. Also XML web agent is data exchange or replication between heterogeneous DBMS by the medium of XML.

  • PDF

An Implementation of Markerless Augmented Reality and Creation and Application of Efficient Reference Data Sets (마커리스 증강현실의 구현과 효율적인 레퍼런스 데이터 그룹의 생성 및 활용)

  • Koo, Ja-Myoung;Cho, Tai-Hoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.204-207
    • /
    • 2009
  • This paper presents how to implement Markerless Augmented Reality and how to create and apply reference data sets. There are three parts related with implementation: setting camera, creation of reference data set, and tracking. To create effective reference data sets, we need a 3D model such as CAD model. It is also required to create reference data sets from various viewpoints. We extract the feature points from the model image and then extract 3D positions corresponding to the feature points using ray tracking. These 2D/3D correspondence point sets constitute a reference data set of the model. Reference data sets are constructed for various viewpoints of the model. Fast tracking can be done using a reference data set the most frequently matched with feature points of the present frame and model data near the reference data set.

  • PDF

Effect of Ethanol Concentration on Extraction of Vlolatile Components in Cinnamon (에탄올의 농도가 계피가 향기성분 용출에 미치는 영향)

  • 김나미;김영희
    • The Korean Journal of Food And Nutrition
    • /
    • v.13 no.1
    • /
    • pp.45-52
    • /
    • 2000
  • In order to select the optimum ethanol concentration for extraction of volatile components in cinnamon, the dried cinnamon was extracted with water and 30∼90% ethanol. The volatile components of cinnamon extracts were isolated by the simultaneous distillation extraction method using Likens and Nickerson's extraction apparatus, and analyzed by GC-MS. In cinnamon bark powder 45 components were detected and 21 components were identified. The major component of cinnamon bark powder was cinnamic aldehyde. In water extract of cinnamon, volatile components were not extracted sufficiently. The volatile components of cinnamon were increased with the increment of ethanol concentraction upto 70%. The volatile component of 70% ethanol extract showed similar pattern and amount to cinnamon bark powder. But in 90% ethanol extracts, the number and amount of volatile component were reduced. The above data suggested that 70% ethanol was the most effective solvent for volatile components extraction of cinnamon.

  • PDF