• Title/Summary/Keyword: Generate Data

Search Result 3,066, Processing Time 0.03 seconds

The proposition of compared and attributably pure confidence in association rule mining (연관 규칙 마이닝에서 비교 기여 순수 신뢰도의 제안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.3
    • /
    • pp.523-532
    • /
    • 2013
  • Generally, data mining is the process of analyzing big data from different perspectives and summarizing it into useful information. The most widely used data mining technique is to generate association rules, and it finds the relevance between two items in a huge database. This technique has been used to find the relationship between each set of items based on the interestingness measures such as support, confidence, lift, etc. Among many interestingness measures, confidence is the most frequently used, but it has the drawback that it can not determine the direction of the association. The attributably pure confidence and compared confidence are able to determine the direction of the association, but their ranges are not [-1, +1]. So we can not interpret the degree of association operationally by their values. This paper propose a compared and attributably pure confidence to compensate for this drawback, and then describe some properties for a proposed measure. The comparative studies with confidence, compared confidence, attributably pure confidence, and a proposed measure are shown by numerical example. The results show that the a compared and attributably pure confidence is better than any other confidences.

EST Analysis system for panning gene

  • Hur, Cheol-Goo;Lim, So-Hyung;Goh, Sung-Ho;Shin, Min-Su;Cho, Hwan-Gue
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.21-22
    • /
    • 2000
  • Expressed sequence tags (EFTs) are the partial segments of cDNA produced from 5 or 3 single-pass sequencing of cDNA clones, error-prone and generated in highly redundant sets. Advancement and expansion of Genomics made biologists to generate huge amount of ESTs from variety of organisms-human, microorganisms as well as plants, and the cumulated number of ESTs is over 5.3 million, As the EST data being accumulate more rapidly, it becomes bigger that the needs of the EST analysis tools for extraction of biological meaning from EST data. Among the several needs of EST analyses, the extraction of protein sequence or functional motifs from ESTs are important for the identification of their function in vivo. To accomplish that purpose the precise and accurate identification of the region where the coding sequences (CDSs) is a crucial problem to solve primarily, and it will be helpful to extract and detect of genuine CD5s and protein motifs from EST collections. Although several public tools are available for EST analysis, there is not any one to accomplish the object. Furthermore, they are not targeted to the plant ESTs but human or microorganism. Thus, to correspond the urgent needs of collaborators deals with plant ESTs and to establish the analysis system to be used as general-purpose public software we constructed the pipelined-EST analysis system by integration of public software components. The software we used are as follows - Phred/Cross-match for the quality control and vector screening, NCBI Blast for the similarity searching, ICATools for the EST clustering, Phrap for EST contig assembly, and BLOCKS/Prosite for protein motif searching. The sample data set used for the construction and verification of this system was 1,386 ESTs from human intrathymic T-cells that verified using UniGene and Nr database of NCBI. The approach for the extraction of CDSs from sample data set was carried out by comparison between sample data and protein sequences/motif database, determining matched protein sequences/motifs that agree with our defined parameters, and extracting the regions that shows similarities. In recent future, in addition to these components, it is supposed to be also integrated into our system and served that the software for the peptide mass spectrometry fingerprint analysis, one of the proteomics fields. This pipelined-EST analysis system will extend our knowledge on the plant ESTs and proteins by identification of unknown-genes.

  • PDF

Mining Interesting Sequential Pattern with a Time-interval Constraint for Efficient Analyzing a Web-Click Stream (웹 클릭 스트림의 효율적 분석을 위한 시간 간격 제한을 활용한 관심 순차패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.16 no.2
    • /
    • pp.19-29
    • /
    • 2011
  • Due to the development of web technologies and the increasing use of smart devices such as smart phone, in recent various web services are widely used in many application fields. In this environment, the topic of supporting personalized and intelligent web services have been actively researched, and an analysis technique on a web-click stream generated from web usage logs is one of the essential techniques related to the topic. In this paper, for efficient analyzing a web-click stream of sequences, a sequential pattern mining technique is proposed, which satisfies the basic requirements for data stream processing and finds a refined mining result. For this purpose, a concept of interesting sequential patterns with a time-interval constraint is defined, which uses not on1y the order of items in a sequential pattern but also their generation times. In addition, A mining method to find the interesting sequential patterns efficiently over a data stream such as a web-click stream is proposed. The proposed method can be effectively used to various computing application fields such as E-commerce, bio-informatics, and USN environments, which generate data as a form of data streams.

Interoperability Test Suite Generation for the TCP Data Part using Experimental Design Techniques (실험계획법을 이용한 TCP 데이터 부분에 대한 상호운용성 시험스위트 생성)

  • Ryu, Ji-Won;Kim, Myung-Chul;Seol, Soon-Uk;Kang, Sung-Won;Lee, Young-Hee;Lee, Keun-Ku
    • Journal of KIISE:Information Networking
    • /
    • v.28 no.2
    • /
    • pp.277-287
    • /
    • 2001
  • Test derivation methods suitable for interoperability testing of communication protocols were proposed in [1,2, 3] and applied to the TCP and the ATM protocols, The test cases that were generated by them deal with only the control part of the protocols. However, in real protocol testing, the test cases must manage the data part as well. For complete testing, in principle we must test all possible values of data part although it is impractical to do so. In this paper, we present a method generating the interoperability test suite for both the data part and the control part of protocols with the example of Tep connection establishment. In this process, we make use of experimental design techniques from industrial engineering to minimize the size of test suite while keeping testing capability. Experimental design techniques have been used for protocol confom1ance testing but not for intcruperability testing so far. We generate the test suite for data part by this method and show a possibility that we can test interoperability of protocols with the minimum number of test cases while maintaining the testing power.

  • PDF

Predicting Power Generation Patterns Using the Wind Power Data (풍력 데이터를 이용한 발전 패턴 예측)

  • Suh, Dong-Hyok;Kim, Kyu-Ik;Kim, Kwang-Deuk;Ryu, Keun-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.11
    • /
    • pp.245-253
    • /
    • 2011
  • Due to the imprudent spending of the fossil fuels, the environment was contaminated seriously and the exhaustion problems of the fossil fuels loomed large. Therefore people become taking a great interest in alternative energy resources which can solve problems of fossil fuels. The wind power energy is one of the most interested energy in the new and renewable energy. However, the plants of wind power energy and the traditional power plants should be balanced between the power generation and the power consumption. Therefore, we need analysis and prediction to generate power efficiently using wind energy. In this paper, we have performed a research to predict power generation patterns using the wind power data. Prediction approaches of datamining area can be used for building a prediction model. The research steps are as follows: 1) we performed preprocessing to handle the missing values and anomalous data. And we extracted the characteristic vector data. 2) The representative patterns were found by the MIA(Mean Index Adequacy) measure and the SOM(Self-Organizing Feature Map) clustering approach using the normalized dataset. We assigned the class labels to each data. 3) We built a new predicting model about the wind power generation with classification approach. In this experiment, we built a forecasting model to predict wind power generation patterns using the decision tree.

Fast Combinatorial Programs Generating Total Data (전수데이터를 생성하는 빠른 콤비나토리얼 프로그램)

  • Jang, Jae-Soo;Won, Shin-Jae;Cheon, Hong-Sik;Suh, Chang-Jin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.3
    • /
    • pp.1451-1458
    • /
    • 2013
  • This paper deals with the programs and algorithms that generate the full data set that satisfy the basic combinatorial requirement of combination, permutation, partial permutation or shortly r-permutation, which are used in the application of the total data testing or the simulation input. We search the programs able to meet the rules which is permutations and combinations, r-permutations, select the fastest program by field. With further study, we developed a new program reducing the time required to processing. Our research performs the following pre-study. Firstly, hundreds of algorithms and programs in the internet are collected and corrected to be executable. Secondly, we measure running time for all completed programs and select a few fast ones. Thirdly, the fast programs are analyzed in depth and its pseudo-code programs are provided. We succeeded in developing two programs that run faster. Firstly, the combination program can save the running time by removing recursive function and the r-permutation program become faster by combining the best combination program and the best permutation program. According to our performance test, the former and later program enhance the running speed by 22% to 34% and 62% to 226% respectively compared with the fastest collected program. The programs suggested in this study could apply to a particular cases easily based on Pseudo-code., Predicts the execution time spent on data processing, determine the validity of the processing, and also generates total data with minimum access programming.

The Development of Major Tree Species Classification Model using Different Satellite Images and Machine Learning in Gwangneung Area (이종센서 위성영상과 머신 러닝을 활용한 광릉지역 주요 수종 분류 모델 개발)

  • Lim, Joongbin;Kim, Kyoung-Min;Kim, Myung-Kil
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.6_2
    • /
    • pp.1037-1052
    • /
    • 2019
  • We had developed in preceding study a classification model for the Korean pine and Larch with an accuracy of 98 percent using Hyperion and Sentinel-2 satellite images, texture information, and geometric information as the first step for tree species mapping in the inaccessible North Korea. Considering a share of major tree species in North Korea, the classification model needs to be expanded as it has a large share of Oak(29.5%), Pine (12.7%), Fir (8.2%), and as well as Larch (17.5%) and Korean pine (5.8%). In order to classify 5 major tree species, national forest type map of South Korea was used to build 11,039 training and 2,330 validation data. Sentinel-2 data was used to derive spectral information, and PlanetScope data was used to generate texture information. Geometric information was built from SRTM DEM data. As a machine learning algorithm, Random forest was used. As a result, the overall accuracy of classification was 80% with 0.80 kappa statistics. Based on the training data and the classification model constructed through this study, we will extend the application to Mt. Baekdu and North and South Goseong areas to confirm the applicability of tree species classification on the Korean Peninsula.

Radarsat-1 ScanSAR Quick-look Signal Processing and Demonstration Using SPECAN Algorithm (SPECAN 알고리즘을 이용한 Radatsat-1 ScanSAR Quick-look 신호 처리 및 검증 알고리즘 구현)

  • Song, Jung-Hwan;Lee, Woo-Kyung;Kim, Dong-Hyun
    • Korean Journal of Remote Sensing
    • /
    • v.26 no.2
    • /
    • pp.75-86
    • /
    • 2010
  • As the performance of the spaceborne SAR has been dramatically enhanced and demonstrated through advanced missions such as TerraSAR and LRO(Lunar Reconnaissance Orbiter), the need for highly sophisticated and efficient SAR processor is also highlighted. In Korea, the activity of SAR researches has been mainly concerned with SAR image applications and the current SAR raw data studies are mostly limited to stripmap mode cases. The first Korean spaceborne SAR is scheduled to be operational from 2010 and expected to deliver vast amount of SAR raw data acquired from multiple operational scenarios including ScanSAR mode. Hence there will be an increasing demand to implement ground processing systems that enable to analyze the acquired ScanSAR data and generate corresponding images. In this paper, we have developed an efficient ScanSAR processor that can be directly applied to spaceborne ScanSAR mode data. The SPECAN(Spectrum Analysis) algorithm is employed for this purpose and its performance is verified through RADARSAT-1 ScanSAR raw data taken over Korean peninsular. An efficient quick-look processing is carried out to produce a wide-swath SAR image and compared with the conventional RDA processing case.

Design and Implemention of Real-time web Crawling distributed monitoring system (실시간 웹 크롤링 분산 모니터링 시스템 설계 및 구현)

  • Kim, Yeong-A;Kim, Gea-Hee;Kim, Hyun-Ju;Kim, Chang-Geun
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.1
    • /
    • pp.45-53
    • /
    • 2019
  • We face problems from excessive information served with websites in this rapidly changing information era. We find little information useful and much useless and spend a lot of time to select information needed. Many websites including search engines use web crawling in order to make data updated. Web crawling is usually used to generate copies of all the pages of visited sites. Search engines index the pages for faster searching. With regard to data collection for wholesale and order information changing in realtime, the keyword-oriented web data collection is not adequate. The alternative for selective collection of web information in realtime has not been suggested. In this paper, we propose a method of collecting information of restricted web sites by using Web crawling distributed monitoring system (R-WCMS) and estimating collection time through detailed analysis of data and storing them in parallel system. Experimental results show that web site information retrieval is applied to the proposed model, reducing the time of 15-17%.

Application of Terrestrial LiDAR for Displacement Detecting on Risk Slope (위험 경사면의 변위 검출을 위한 지상 라이다의 활용)

  • Lee, Keun-Wang;Park, Joon-Kyu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.1
    • /
    • pp.323-328
    • /
    • 2019
  • In order to construct 3D geospatial information about the terrain, current measurement using a total station, remote sensing, GNSS(Global Navigation Satellite System) have been used. However, ground survey and GNSS survey have time and economic disadvantages because they have to be surveyed directly in the field. In case of using aerial photographs and satellite images, these methods have the disadvantage that it is difficult to obtain the three-dimensional shape of the terrain. The terrestrial LiDAR can acquire 3D information of X, Y, Z coordinate and shape obtained by scanning innumerable laser pulses at densely spaced intervals on the surface of the object to be observed at high density, and the processing can also be automated. In this study, terrestrial LiDAR was used to analyze slope displacement. Study area slopes were selected and data were acquired using LiDAR in 2016 and 2017. Data processing has been used to generate slope cross section and slope data, and the overlay analysis of the generated data identifies slope displacements within 0.1 m and suggests the possibility of using slope LiDAR on land to manage slopes. If periodic data acquisition and analysis is performed in the future, the method using the terrestrial lidar will contribute to effective risk slope management.