• 제목/요약/키워드: Large data

검색결과 14,050건 처리시간 0.037초

Gene Algorithm of Crowd System of Data Mining

  • Park, Jong-Min
    • Journal of information and communication convergence engineering
    • /
    • 제10권1호
    • /
    • pp.40-44
    • /
    • 2012
  • Data mining, which is attracting public attention, is a process of drawing out knowledge from a large mass of data. The key technique in data mining is the ability to maximize the similarity in a group and minimize the similarity between groups. Since grouping in data mining deals with a large mass of data, it lessens the amount of time spent with the source data, and grouping techniques that shrink the quantity of the data form to which the algorithm is subjected are actively used. The current grouping algorithm is highly sensitive to static and reacts to local minima. The number of groups has to be stated depending on the initialization value. In this paper we propose a gene algorithm that automatically decides on the number of grouping algorithms. We will try to find the optimal group of the fittest function, and finally apply it to a data mining problem that deals with a large mass of data.

효과적인 웹 사용자의 패턴 분석을 위한 하둡 시스템의 웹 로그 분석 방안 (A Method for Analyzing Web Log of the Hadoop System for Analyzing a Effective Pattern of Web Users)

  • 이병주;권정숙;고기철;최용락
    • 한국IT서비스학회지
    • /
    • 제13권4호
    • /
    • pp.231-243
    • /
    • 2014
  • Of the various data that corporations can approach, web log data are important data that correspond to data analysis to implement customer relations management strategies. As the volume of approachable data has increased exponentially due to the Internet and popularization of smart phone, web log data have also increased a lot. As a result, it has become difficult to expand storage to process large amounts of web logs data flexibly and extremely hard to implement a system capable of categorizing, analyzing, and processing web log data accumulated over a long period of time. This study thus set out to apply Hadoop, a distributed processing system that had recently come into the spotlight for its capacity of processing large volumes of data, and propose an efficient analysis plan for large amounts of web log. The study checked the forms of web log by the effective web log collection methods and the web log levels by using Hadoop and proposed analysis techniques and Hadoop organization designs accordingly. The present study resolved the difficulty with processing large amounts of web log data and proposed the activity patterns of users through web log analysis, thus demonstrating its advantages as a new means of marketing.

Min-Hash를 이용한 효율적인 대용량 그래프 클러스터링 기법 (An Efficient Large Graph Clustering Technique based on Min-Hash)

  • 이석주;민준기
    • 정보과학회 논문지
    • /
    • 제43권3호
    • /
    • pp.380-388
    • /
    • 2016
  • 그래프 클러스터링은 서로 유사한 특성을 갖는 정점들을 동일한 클러스터로 묶는 기법으로 그래프 데이터를 분석하고 그 특성을 파악하는데 폭넓게 사용된다. 최근 소셜 네트워크 서비스와 월드 와이드 웹, 텔레폰 네트워크 등의 다양한 응용분야에서 크기가 큰 대용량 그래프 데이터가 생성되고 있다. 이에 따라서 대용량 그래프 데이터를 효율적으로 처리하는 클러스터링 기법의 중요성이 증가하고 있다. 본 논문에서는 대용량 그래프 데이터의 클러스터들을 효율적으로 생성하는 클러스터링 알고리즘을 제안한다. 우리의 제안 기법은 그래프 내의 클러스터들 간의 유사도를 Min-Hash를 이용하여 효과적으로 추정하고 계산된 유사도에 따라서 클러스터들을 생성한다. 실세계 데이터를 이용한 실험에서 우리는 본 논문에서 제안하는 기법과 기존 그래프 클러스터링 기법들과 비교하여 제안기법의 효율성을 보였다.

A Simple and Fast Web Alignment Tool for Large Amount of Sequence Data

  • Lee, Yong-Seok;Oh, Jeong-Su
    • Genomics & Informatics
    • /
    • 제6권3호
    • /
    • pp.157-159
    • /
    • 2008
  • Multiple sequence alignment (MSA) is the most important step for many of biological sequence analyses, homology search, and protein structural assignments. However, large amount of data make biologists difficult to perform MSA analyses and it requires much computational time to align many sequences. Here, we have developed a simple and fast web alignment tool for aligning, editing, and visualizing large amount of sequence data. We used a cluster server installed ClustalW-MPI using web services and message passing interface (MPI). It also enables users to edit multiple sequence alignments for manual editing and to download the input data and results such as alignments and phylogenetic tree.

Privacy Enhanced Data Security Mechanism in a Large-Scale Distributed Computing System for HTC and MTC

  • Rho, Seungwoo;Park, Sangbae;Hwang, Soonwook
    • International Journal of Contents
    • /
    • 제12권2호
    • /
    • pp.6-11
    • /
    • 2016
  • We developed a pilot-job based large-scale distributed computing system to support HTC and MTC, called HTCaaS (High-Throughput Computing as a Service), which helps scientists solve large-scale scientific problems in areas such as pharmaceutical domains, high-energy physics, nuclear physics and bio science. Since most of these problems involve critical data that affect the national economy and activate basic industries, data privacy is a very important issue. In this paper, we implement a privacy enhanced data security mechanism to support HTC and MTC in a large-scale distributed computing system and show how this technique affects performance in our system. With this mechanism, users can securely store data in our system.

The Negative Impact Study on the Information of the Large Discount Retailers

  • Kim, Jong-Jin
    • 유통과학연구
    • /
    • 제13권7호
    • /
    • pp.33-40
    • /
    • 2015
  • Purpose - This study aims to find out what impacts large retailers' behaviors appearing when they promote the strengthening of their market dominating power in the trade relations with small and medium suppliers or in the market can have on consumers. Research design, data, methodology - This study analyzed negative information (news) on large retailers (Lotte Mart, E-Mart and Homeplus) based on the monthly data over the past five years from 2008 to 2012 and also analyzed the correlation between dependent variables that are likely to affect sales through large retailer economic index, Results - This study conducted a correlation analysis on the time lag of the factors that have an impact on the negative information and sales of large retailers in order to analyze how consumers respond to the choice of large retailers' store (store sales) when they perceived negative information about the un- ethical behaviors of large retailers. Conclusions - Unfair and negative information on large retailers appeared significant for the hypothesis that sales will be affected by the image of large retailers and change of consumer attitudes.

TCP/IP 소켓통신에서 대용량 스트링 데이터의 전송 속도를 높이기 위한 송수신 모델 설계 및 구현 (A design and implementation of transmit/receive model to speed up the transmission of large string-data sets in TCP/IP socket communication)

  • 강동조;박현주
    • 한국정보통신학회논문지
    • /
    • 제17권4호
    • /
    • pp.885-892
    • /
    • 2013
  • TCP/IP소켓 통신을 활용하여 데이터를 송수신하는 송수신 모델에서 데이터의 크기가 작고 데이터 전송 요청이 빈번하지 않을 경우 서버와 클라이언트 간 통신 속도의 중요성은 부각되지 않지만 오늘날 대용량 데이터에 대한 전송 요청과 빈번한 데이터 전송 요청에서 송수신 모델의 통신 속도에 대한 중요성이 부각되고 있다. 본 논문은 대용량의 데이터를 전송하는 서버의 전송 구조와 데이터를 수신하는 클라이언트의 수신 구조를 변경하여 멀티 코어(이하 CMP : ChipMulti Processor) 환경에서 데이터 전송 속도에 대한 성능향상을 기대할 수 있는 보다 효율적인 TCP/IP 송수신 모델을 제안한다.

Asymptotics in Load-Balanced Tandem Networks

  • Lee, Ji-Yeon
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권3호
    • /
    • pp.715-723
    • /
    • 2003
  • A tandem network in which all nodes have the same load is considered. We derive bounds on the probability that the total population of the tandem network exceeds a large value by using its relation to the stationary distribution. These bounds imply a stronger asymptotic limit than that in the large deviation theory.

  • PDF

The Effects of Trading-Hour Regulations on Large Stores in Korea

  • Kim, Woohyoung;Lee, Hahn-Shik
    • 유통과학연구
    • /
    • 제15권8호
    • /
    • pp.5-14
    • /
    • 2017
  • Purpose - This study empirically analyses the sale changes in large retail stores directly resulting from increased controls on those stores. More specifically, we discuss the economic impacts on Korean regulations that restrict trading hours and mandate statutory store closure 'holidays' twice per month. Research design, data and methodology - we attempt to empirically analyse the economic effects of trading hours regulations through quantitative analysis of the sales revenue data of large retail stores. We introduce the data and methods of empirical analysis used to analyse the economic effects of trading-hour regulations on large retail stores. We use a panel regression to analyse the sales losses of large retail stores caused by the new constraints on business hours. Results - The results of this study show that the sales of large retail stores fell by the average of 3.4% per month during the regulation periods. However, regulations affecting large retail stores have various economic impacts, including variations in sales, changes in consumption patterns, and influences on consumer welfare and national economy. Conclusions - Such changes may also be captured by other metrics: accordingly, further researches are needed to measure the impact of regulations on economic indicators such as employment and GDP.

대규모의 정보 검색을 위한 효율적인 최소 완전 해시함수의 생성 (Effective Generation of Minimal Perfect hash Functions for Information retrival from large Sets of Data)

  • 김수희;박세영
    • 한국정보처리학회논문지
    • /
    • 제5권9호
    • /
    • pp.2256-2270
    • /
    • 1998
  • 대량의 정보를 빠르게 검색하기 위해 성능좋은 인덱스를 개발하는 것은 매우 중요하다. 본 연구에서는 5ㆍm개의 키들을 m개의 버켓에 충돌없게 해시하는 최소 완전 해시함수를 다시 고려하게 되었다. 대량의 정보를 대상으로 최적의 인덱스를 성공적으로 구축하기 위해 Heath가 개발한 MOS 알고리즘을 개선하고, 이를 토대로 최소 완전 해시함수들을 생성하는 시스템을 개발하였다. 이를 실험하기 위해 대량의 데이터들에 적용한 결과 Heath의 알고리즘보다 효율적으로 각각의 최소 완전 해시함수를 계산하였다. 본 연구에서 개발한 시스템은 자주 변하지 않는 대량의 정보나 탐색 속도가 매우 느린 저장 매체에 저장할 데이터를 대상으로 인덱스를 구축하는 데 이용할 수 있다.

  • PDF