• 제목/요약/키워드: Data Deluge

검색결과 14건 처리시간 0.021초

빅 데이터의 새로운 고객 가치와 비즈니스 창출을 위한 대응 전략 (Correspondence Strategy for Big Data's New Customer Value and Creation of Business)

  • 고준철;이해욱;정지윤;강경식
    • 대한안전경영과학회지
    • /
    • 제14권4호
    • /
    • pp.229-238
    • /
    • 2012
  • Within last 10 years, internet has become a daily activity, and humankind had to face the Data Deluge, a dramatic increase of digital data (Economist 2012). Due to exponential increase in amount of digital data, large scale data has become a big issue and hence the term 'big data' appeared. There is no official agreement in quantitative and detailed definition of the 'big data', but the meaning is expanding to its value and efficacy. Big data not only has the standardized personal information (internal) like customer information, but also has complex data of external, atypical, social, and real time data. Big data's technology has the concept that covers wide range technology, including 'data achievement, save/manage, analysis, and application'. To define the connected technology of 'big data', there are Big Table, Cassandra, Hadoop, MapReduce, Hbase, and NoSQL, and for the sub-techniques, Text Mining, Opinion Mining, Social Network Analysis, Cluster Analysis are gaining attention. The three features that 'bid data' needs to have is about creating large amounts of individual elements (high-resolution) to variety of high-frequency data. Big data has three defining features of volume, variety, and velocity, which is called the '3V'. There is increase in complexity as the 4th feature, and as all 4features are satisfied, it becomes more suitable to a 'big data'. In this study, we have looked at various reasons why companies need to impose 'big data', ways of application, and advanced cases of domestic and foreign applications. To correspond effectively to 'big data' revolution, paradigm shift in areas of data production, distribution, and consumption is needed, and insight of unfolding and preparing future business by considering the unpredictable market of technology, industry environment, and flow of social demand is desperately needed.

Development of Scoring Model on Customer Attrition Probability by Using Data Mining Techniques

  • Han, Sang-Tae;Lee, Seong-Keon;Kang, Hyun-Cheol;Ryu, Dong-Kyun
    • Communications for Statistical Applications and Methods
    • /
    • 제9권1호
    • /
    • pp.271-280
    • /
    • 2002
  • Recently, many companies have applied data mining techniques to promote competitive power in the field of their business market. In this study, we address how data mining, that is a technique to enable to discover knowledge from a deluge of data, Is used in an executed project in order to support decision making of an enterprise. Also, we develope scoring model on customer attrition probability for automobile-insurance company using data mining techniques. The development of scoring model in domestic insurance is given as an example concretely.

Performance of Distributed Database System built on Multicore Systems

  • Kim, Kangseok
    • 인터넷정보학회논문지
    • /
    • 제18권6호
    • /
    • pp.47-53
    • /
    • 2017
  • Recently, huge datasets have been generating rapidly in a variety of fields. Then, there is an urgent need for technologies that will allow efficient and effective processing of huge datasets. Therefore the problems of partitioning a huge dataset effectively and alleviating the processing overhead of the partitioned data efficiently have been a critical factor for scalability and performance in distributed database system. In our work we utilized multicore servers to provide scalable service to our distributed system. The partitioning of database over multicore servers have emerged from a need for new architectural design of distributed database system from scalability and performance concerns in today's data deluge. The system allows uniform access through a web service interface to concurrently distributed databases over multicore servers, using SQMD (Single Query Multiple Database) mechanism based on publish/subscribe paradigm. We will present performance results with the distributed database system built on multicore server, which is time intensive with traditional architectures. We will also discuss future works.

A data management system for microbial genome projects

  • Ki-Bong Kim;Hyeweon Nam;Hwajung Seo and Kiejung Park
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2000년도 International Symposium on Bioinformatics
    • /
    • pp.83-85
    • /
    • 2000
  • A lot of microbial genome sequencing projects is being done in many genome centers around the world, since the first genome, Haemophilus influenzae, was sequenced in 1995. The deluge of microbial genome sequence data demands new and highly automatic data flow system in order for genome researchers to manage and analyze their own bulky sequence data from low-level to high-level. In such an aspect, we developed the automatic data management system for microbial genome projects, which consists mainly of local database, analysis programs, and user-friendly interface. We designed and implemented the local database for large-scale sequencing projects, which makes systematic and consistent data management and retrieval possible and is tightly coupled with analysis programs and web-based user interface, That is, parsing and storage of the results of analysis programs in local database is possible and user can retrieve the data in any level of data process by means of web-based graphical user interface. Contig assembly, homology search, and ORF prediction, which are essential in genome projects, make analysis programs in our system. All but Contig assembly program are open as public domain. These programs are connected with each other by means of a lot of utility programs. As a result, this system will maximize the efficiency in cost and time in genome research.

  • PDF

XML Schema 기반 이질 정보 통합의 충돌 분류와 해결 방안 (Classification and Resolution of Conflicts for Integration of Heterogeneous Information Based on XML Schema)

  • 권석훈;이경하;이규철
    • Journal of Information Technology Applications and Management
    • /
    • 제10권3호
    • /
    • pp.55-74
    • /
    • 2003
  • Due to the evolution of computer systems and the proliferation of Internet, numerous information resources have been constructed. The deluge of information makes the need to integrate information, which are distributed on the internet and are handled in heterogeneous systems. Recently, most of the XML -based information integration systems use XML DTD(Document Type Definition) for describing integrated global schema. However, DTD has some limitations in modeling local information resources such as datatypes. Although W3C's XML Schema is more flexible and powerful than XML DTD in specifying integrated global schema, it has more complex problems in resolving conflicts than using DTD. In this paper, we provide a taxonomy of conflict problems in integration information resources using XML Schema, and propose conflict resolution mechanism using XQuery.

  • PDF

Data anomaly detection for structural health monitoring of bridges using shapelet transform

  • Arul, Monica;Kareem, Ahsan
    • Smart Structures and Systems
    • /
    • 제29권1호
    • /
    • pp.93-103
    • /
    • 2022
  • With the wider availability of sensor technology through easily affordable sensor devices, several Structural Health Monitoring (SHM) systems are deployed to monitor vital civil infrastructure. The continuous monitoring provides valuable information about the health of the structure that can help provide a decision support system for retrofits and other structural modifications. However, when the sensors are exposed to harsh environmental conditions, the data measured by the SHM systems tend to be affected by multiple anomalies caused by faulty or broken sensors. Given a deluge of high-dimensional data collected continuously over time, research into using machine learning methods to detect anomalies are a topic of great interest to the SHM community. This paper contributes to this effort by proposing a relatively new time series representation named "Shapelet Transform" in combination with a Random Forest classifier to autonomously identify anomalies in SHM data. The shapelet transform is a unique time series representation based solely on the shape of the time series data. Considering the individual characteristics unique to every anomaly, the application of this transform yields a new shape-based feature representation that can be combined with any standard machine learning algorithm to detect anomalous data with no manual intervention. For the present study, the anomaly detection framework consists of three steps: identifying unique shapes from anomalous data, using these shapes to transform the SHM data into a local-shape space and training machine learning algorithms on this transformed data to identify anomalies. The efficacy of this method is demonstrated by the identification of anomalies in acceleration data from an SHM system installed on a long-span bridge in China. The results show that multiple data anomalies in SHM data can be automatically detected with high accuracy using the proposed method.

댐붕괴시 홍수가 하천하류에 미치는 영향 (The Effect of Flood Discharge due to Dam Breach on Downstream Channel)

  • 안상진;이준근;연인성;유형규
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2006년도 학술발표회 논문집
    • /
    • pp.1666-1670
    • /
    • 2006
  • The purpose of this study is to analyze how a downstream channel is affected in case of hypothetical dam failure. The object of it is Hwacheon dam basin within the basin of North Han river. This study has analyzed the influence on Pyeonghwa(Peace) dam and Hwacheon dam supposing that the Imnam dam in North Korea on the upper stream of North Han river is failed hypothetically at the MFWL(maximum flood water level) by a deluge of rain. The model applied at the main study is NWS(National Weather Service) FLDWAV(Flood Wave Routing Model). Dam breach characteristics data are analyzed by making nine hypothetical scenarios on the basis of other studies on the shape and size of dam breach, time of failure and so on. Expected peak discharge through the breach is verified to have the propriety in comparison with empirical function which is developed on the basis of the case of dam breach in the foreign countries and it is observed that peak discharge is more increasing, as the time of breach gets shorter and the breach width gets bigger. As a result of main study, even though the Imnam dam is hypothetically failed down, there has no influence on the Hwacheon dam of the downstream as the extended Pyeonghwa dam on the downstream controls the volume of discharge properly.

  • PDF

내부 사용자에 의한 불법 데이터 유출 방지를 위한 안전한 지식관리 시스템 (Secure Knowledge Management for Prevent illegal data leakage by Internal users)

  • 서대희;백장미;이민경;윤미연;조동섭
    • 인터넷정보학회논문지
    • /
    • 제11권2호
    • /
    • pp.73-84
    • /
    • 2010
  • 인터넷의 급속한 발전은 사용자들의 정보 욕구를 증대시고 있으며, 이로 인해 정보의 홍수라 불리울 만큼 많은 정보들이 생성되고 사용되고 있다. 특히, 이윤을 추구하는 기업에서는 독자적인 기술력 확보를 위해 다양한 연구들을 수행하고 있다. 그러나 불법적인 외부 사용자 혹은 내부 사용자에 의한 정보의 불법적 유출로 인한 피해가 사회적 문제로 대두되고 있다. 따라서 본 논문에서는 내부 사용자에 의한 불법 데이터 유출 방지를 위한 안전한 지식 관리 시스템에 대해 제안하고자 한다. 제안된 방식은 내부 사용자들에 대한 명시적 인증을 수행하고 이를 기반해 데이터를 제공하고 2MAC을 이용해 악의적인 내부 사용자에 의한 불법적 데이터 유출을 방지하는 안전한 지식 관리 시스템이다.

X2RD: XPath를 이용한 XML 데이터의 관계형 데이터베이스로의 저장과 질의 (X2RD: Storing and Querying XML Data Using XPath To Relational Database)

  • 오상윤
    • 한국컴퓨터정보학회논문지
    • /
    • 제14권3호
    • /
    • pp.57-64
    • /
    • 2009
  • XML은 웹 환경 정보의 표준으로 자리 잡고 있으며, 웹 서비스, 시멘틱 웹 등의 출현으로 XML을 이용한 정보교환은 더욱 확산될 것으로 예상되고 있다. 대부분의 데이터들은 관계형 데이터베이스에 저장되어 있으므로 XML 데이터의 저장과 질의에 관계형 데이터베이스를 이용하려는 연구가 최근 주목을 받고 있으며, 특별히 XPath, XQauery들과 같은 XML 관련규약들을 지원하는 방식에 대한 시도가 이루어져 왔다. 본 논문에서는 기존에 제안된 XML을 관계형 데이터베이스에 저장하고 질의를 수행하는 구조들의 특성들을 분석하고, 관계형 데이터베이스를 이용한 새로운 XML 저장 및 질의 방식을 제안한다. 제안된 방식은 XML 데이터를 분할 (Shred) 하여 관계로 표현하며, XQuery의 기본이 되는 XPath를 이용한 Query를 SQL로 변환하여 적용하는 구조를 가진다. 본 제안 방법론을 이용하여 Query Processor를 구현하고 실제 RDBMS를 연동하고 실험한 결과, XML 데이터를 효과적으로 RDBMS에 효과적으로 저장하고 질의할 수 있는 것을 확인할 수 있었다.

과학데이터에 관한 입법례와 관리정책 그리고 대응방안 -호주, 미국, 중국을 중심으로- (Legislation Cases, Management Policies and Countermeasures on Scientific Data -Focusing Australia, the United States and China-)

  • 윤종민;김규빈
    • 기술혁신학회지
    • /
    • 제16권1호
    • /
    • pp.63-100
    • /
    • 2013
  • 과학데이터는 사실, 관찰, 이미지, 컴퓨터프로그램결과, 기록, 측량 또는 경험(논거, 이론, 테스트 또는 가설 또는 기타 연구물에 기초한)의 형태에서 생성되는 데이터를 의미한다. 연구패러다임이 데이터 중심의 연계 융합연구로 전환되면서 이러한 과학데이터에 대한 중요성과 그 가치는 매우 높아지고 있다. 과학데이터가 창의적인 연구개발을 위해 효율적으로 재사용될 수 있기 위해서는 공유와 활용을 위한 관리체계의 구축이 필수적이다. 과학데이터의 공유와 활용을 위한 관리체제의 구축은 국가적 차원에서 이루어져야 하지만, 우리나라의 경우 관리체제의 수준은 호주, 미국, 중국 또는 유럽에 비해서 연계성으로나 효율성으로나 내실을 기하지 못하고 있다. 호주, 미국, 중국 등은 국가차원에서 관련 기관을 통해 과학데이터를 수집, 관리 및 유지하는 등 데이터 활용을 적극적으로 추진하기 위하여 중장기적인 정책수립, 법제도 정비, 기반시설에 대한 투자를 지속적으로 확대하고 있다. 본 연구는 효율적이고 공정한 과학데이터의 공유 및 활용을 위한 국가적인 관리체계구축 및 이를 뒷받침할 수 있는 법제도를 정비함에 있어서 해외 관련 입법례 및 정책동향에 관하여 살펴보고, 향후 우리나라의 대응방안을 제시하였다.

  • PDF