• Title/Summary/Keyword: Pattern mining

Search Result 624, Processing Time 0.033 seconds

The Development of Design Knowledge Management System Using Data Mining (Data Mining 기법을 활용한 디자인 지식경영 시스템 구축)

  • 양종열;오민권;최경은
    • Archives of design research
    • /
    • v.16 no.2
    • /
    • pp.281-290
    • /
    • 2003
  • In the knowledge and information-based age of today, it would be fair to say that the compatibility of each person, enterprise, and nation can be evaluated by how each of them manages and maintains the knowledge created from data and information. Since the importance and necessity of knowledge management has been acknowledged, there have been studies to create, apply, and evaluate the knowledge concerning design. Previous studies done on this subject can be divided into three main categories - CRM, online statistical research, and eCRM - according to the materials used to create knowledge. These studies are meaningful in that they can create knowledge in their respective fields, although they are somewhat inadequate because the designers can't create as much knowledge as can be applied in business; design-related consumers demand composite knowledge integrating the characteristics of all three fields. In other words, they want to know the ordinary customers'preferences in the previous off-line market in the CRM field, the research results of statistical questionnaires to the various elements of design in statistical research fields, and even the pattern of preference and consumption of many and unspecified persons transcending the time and place in eCRU field. This study proposes to solve the problem related with web-based design knowledge maintenance through the synthetic application of CRM, Statistical Research, and eCRM The information proposed in the solution can De expected to help designers working at design-related enterprises, as well as research institutes, to develop the knowledge necessary to design more consumer-oriented products.

  • PDF

An Efficient Web Search Method Based on a Style-based Keyword Extraction and a Keyword Mining Profile (스타일 기반 키워드 추출 및 키워드 마이닝 프로파일 기반 웹 검색 방법)

  • Joo, Kil-Hong;Lee, Jun-Hwl;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.11D no.5
    • /
    • pp.1049-1062
    • /
    • 2004
  • With the popularization of a World Wide Web (WWW), the quantity of web information has been increased. Therefore, an efficient searching system is needed to offer the exact result of diverse Information to user. Due to this reason, it is important to extract and analysis of user requirements in the distributed information environment. The conventional searching method used the only keyword for the web searching. However, the searching method proposed in this paper adds the context information of keyword for the effective searching. In addition, this searching method extracts keywords by the new keyword extraction method proposed in this paper and it executes the web searching based on a keyword mining profile generated by the extracted keywords. Unlike the conventional searching method which searched for information by a representative word, this searching method proposed in this paper is much more efficient and exact. This is because this searching method proposed in this paper is searched by the example based query included content information as well as a representative word. Moreover, this searching method makes a domain keyword list in order to perform search quietly. The domain keyword is a representative word of a special domain. The performance of the proposed algorithm is analyzed by a series of experiments to identify its various characteristic.

An Evaluation of the Effects of Rehabilitation Practiced in Coal Mining Spoils in Korea: 2. An Evaluation Based on the Physicochemical Properties of Soil

  • Lee, Chang-Seok;Cho, Yong-Chan;Shin, Hyun-Chul;Lee, Seon-Mi;Oh, Woo-Seok;Park, Sung-Ae;Seol, Eun-Sil;Lee, Choong-Hwa;Eom, Ahn-Heum;Cho, Hyun-Je
    • Journal of Ecology and Environment
    • /
    • v.31 no.1
    • /
    • pp.23-29
    • /
    • 2008
  • The effectiveness of rehabilitation programs for coal mining spoils in Samcheok, Jeongsun, and Mungyung were evaluated based on the physicochemical properties of soil in the rehabilitated areas. These spoils were reclaimed by introducing plants such as black locust (Robinia pseudoacacia), pitch pine (Pinus rigida), birch (Betula platyphylla var. japonica), alder (Alnus hirsuta), bush clover (Lespedeza cyrtobotrya), and grass (Lolium perenne) in planting beds covered with forest soil. In the surface soil, the pH, organic matter, total N, available P, and exchangeable Ca showed significant changes over the years after reclamation. The pH and exchangeable Ca content decreased exponentially over time, whereas organic matter increased linearly and total N and available P increased exponentially. Changes in the physicochemical properties of subsurface soils displayed a different pattern. There were significant changes over time in the organic matter, available P, and exchangeable Ca and Mg contents of the soil. Organic matter increased logarithmically with years since rehabilitation and available P increased exponentially. Meanwhile, exchangeable Ca decreased exponentially, and Mg decreased logarithmically. The changes in the subsurface soil were not as dramatic as those in the surface soil. This result suggests that the ameliorating effects of the establishment and growth of plants more pronounced on the surface soil layer. Stand ordination data showed different relationships with time since rehabilitation in the early and later stages of the rehabilitation process. In the early stages of rehabilitation, stands tended to be arranged in the order of reclamation age. However, in the later stages, there was not a clear relationship between reclamation age and vegetation characteristics. This result suggests that soil amelioration is required for the early stages, after which an autogenic effect becomes more prominent as the vegetation becomes better established.

A Geochemical Study on the Dispersion of Heavy Metal Elements in Dusts and Soils in Urban and Industrial Environments (도시 및 산업환경 분진 및 토양중의 중금속 원소들의 분산에 관한 지구화학적 연구)

  • Chon, Hyo-Taek;Choi, Wan-Joo
    • Economic and Environmental Geology
    • /
    • v.25 no.3
    • /
    • pp.317-336
    • /
    • 1992
  • The garden soils, main road dusts, residential road dusts, and playground soils/dusts of Seoul, Geumsan, Onsan, and Taebaek areas were analyzed in order to investigate the level of heavy metal pollution by urbanization and industrialization. The soil pH is in the range of 5.48~8.40 and was generally neutral. The color of soils and dusts is mainly Raw Umber to dark greyish Raw Umber. Some samples from Taebaek city, a coal mining area, showed a deep black color due to contamination by coal dusts. Major minerals of the dusts and soils are quartz, feldspars, and micas, reflecting the composition of the parent rocks. However, pyrite was found as a major mineral in the samples of industrial road dusts of Onsan, a smelting area, and resicential road dusts of Taebaek. Thus, the high level of heavy metals in mining and smelting areas can be explained with the sulfide minerals. The mode of occurences of heavy metals in Seoul, a comprehensive urbanized area, were related to the metallic pollutants and organic materials through observation by scanning eletron microscopy. In main road and residential road dusts of Onsan area, Cd, Zn, and Cu were extremely high. Some industrial road and residential road dusts of Seoul area showed high Cu, Zn, and Pb contents, wereas some garden soils and residential road dusts of Taebaek area were high in As content. In general, the heavy metal contents in dust samples were two to three times higher than those in soil samples. Main road dust samples were the most reflective from the discriminant analysis of multi-element data. Cadmium, Sb, and Se in Onsan area, As in Taebaek area, Pb and Te in Seoul area were most characteristic in discriminating the studied areas. Therefore, Cd in smelting areas, As in coal mining areas, and Pb in metropolitan areas can be suggested as the characteristic elements of each pollution pattern. The dispersion of heavy metal elements in urban areas tends to orignate in main roads and deposit in garden soils through the atmosphere and residential roads. The heavy metal contamination in Seoul is characteristic in areas with high population, factory, road, and traffic decsities. Heavy metal contents are high in the vicinity of smelters in Onsan area and are decayed to background levels from one kilometer away from the smelters.

  • PDF

Data Mining Algorithm Based on Fuzzy Decision Tree for Pattern Classification (퍼지 결정트리를 이용한 패턴분류를 위한 데이터 마이닝 알고리즘)

  • Lee, Jung-Geun;Kim, Myeong-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.11
    • /
    • pp.1314-1323
    • /
    • 1999
  • 컴퓨터의 사용이 일반화됨에 따라 데이타를 생성하고 수집하는 것이 용이해졌다. 이에 따라 데이타로부터 자동적으로 유용한 지식을 얻는 기술이 필요하게 되었다. 데이타 마이닝에서 얻어진 지식은 정확성과 이해성을 충족해야 한다. 본 논문에서는 데이타 마이닝을 위하여 퍼지 결정트리에 기반한 효율적인 퍼지 규칙을 생성하는 알고리즘을 제안한다. 퍼지 결정트리는 ID3와 C4.5의 이해성과 퍼지이론의 추론과 표현력을 결합한 방법이다. 특히, 퍼지 규칙은 속성 축에 평행하게 판단 경계선을 결정하는 방법으로는 어려운 속성 축에 평행하지 않는 경계선을 갖는 패턴을 효율적으로 분류한다. 제안된 알고리즘은 첫째, 각 속성 데이타의 히스토그램 분석을 통해 적절한 소속함수를 생성한다. 둘째, 주어진 소속함수를 바탕으로 ID3와 C4.5와 유사한 방법으로 퍼지 결정트리를 생성한다. 또한, 유전자 알고리즘을 이용하여 소속함수를 조율한다. IRIS 데이타, Wisconsin breast cancer 데이타, credit screening 데이타 등 벤치마크 데이타들에 대한 실험 결과 제안된 방법이 C4.5 방법을 포함한 다른 방법보다 성능과 규칙의 이해성에서 보다 효율적임을 보인다.Abstract With an extended use of computers, we can easily generate and collect data. There is a need to acquire useful knowledge from data automatically. In data mining the acquired knowledge needs to be both accurate and comprehensible. In this paper, we propose an efficient fuzzy rule generation algorithm based on fuzzy decision tree for data mining. We combine the comprehensibility of rules generated based on decision tree such as ID3 and C4.5 and the expressive power of fuzzy sets. Particularly, fuzzy rules allow us to effectively classify patterns of non-axis-parallel decision boundaries, which are difficult to do using attribute-based classification methods.In our algorithm we first determine an appropriate set of membership functions for each attribute of data using histogram analysis. Given a set of membership functions then we construct a fuzzy decision tree in a similar way to that of ID3 and C4.5. We also apply genetic algorithm to tune the initial set of membership functions. We have experimented our algorithm with several benchmark data sets including the IRIS data, the Wisconsin breast cancer data, and the credit screening data. The experiment results show that our method is more efficient in performance and comprehensibility of rules compared with other methods including C4.5.

A MapReduce-Based Workflow BIG-Log Clustering Technique (맵리듀스기반 워크플로우 빅-로그 클러스터링 기법)

  • Jin, Min-Hyuck;Kim, Kwanghoon Pio
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.87-96
    • /
    • 2019
  • In this paper, we propose a MapReduce-supported clustering technique for collecting and classifying distributed workflow enactment event logs as a preprocessing tool. Especially, we would call the distributed workflow enactment event logs as Workflow BIG-Logs, because they are satisfied with as well as well-fitted to the 5V properties of BIG-Data like Volume, Velocity, Variety, Veracity and Value. The clustering technique we develop in this paper is intentionally devised for the preprocessing phase of a specific workflow process mining and analysis algorithm based upon the workflow BIG-Logs. In other words, It uses the Map-Reduce framework as a Workflow BIG-Logs processing platform, it supports the IEEE XES standard data format, and it is eventually dedicated for the preprocessing phase of the ${\rho}$-Algorithm that is a typical workflow process mining algorithm based on the structured information control nets. More precisely, The Workflow BIG-Logs can be classified into two types: of activity-based clustering patterns and performer-based clustering patterns, and we try to implement an activity-based clustering pattern algorithm based upon the Map-Reduce framework. Finally, we try to verify the proposed clustering technique by carrying out an experimental study on the workflow enactment event log dataset released by the BPI Challenges.

Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)

  • Yu, Eunji;Kim, Yoosin;Kim, Namgyu;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.95-110
    • /
    • 2013
  • Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.

The Analysis on the Relationship between Firms' Exposures to SNS and Stock Prices in Korea (기업의 SNS 노출과 주식 수익률간의 관계 분석)

  • Kim, Taehwan;Jung, Woo-Jin;Lee, Sang-Yong Tom
    • Asia pacific journal of information systems
    • /
    • v.24 no.2
    • /
    • pp.233-253
    • /
    • 2014
  • Can the stock market really be predicted? Stock market prediction has attracted much attention from many fields including business, economics, statistics, and mathematics. Early research on stock market prediction was based on random walk theory (RWT) and the efficient market hypothesis (EMH). According to the EMH, stock market are largely driven by new information rather than present and past prices. Since it is unpredictable, stock market will follow a random walk. Even though these theories, Schumaker [2010] asserted that people keep trying to predict the stock market by using artificial intelligence, statistical estimates, and mathematical models. Mathematical approaches include Percolation Methods, Log-Periodic Oscillations and Wavelet Transforms to model future prices. Examples of artificial intelligence approaches that deals with optimization and machine learning are Genetic Algorithms, Support Vector Machines (SVM) and Neural Networks. Statistical approaches typically predicts the future by using past stock market data. Recently, financial engineers have started to predict the stock prices movement pattern by using the SNS data. SNS is the place where peoples opinions and ideas are freely flow and affect others' beliefs on certain things. Through word-of-mouth in SNS, people share product usage experiences, subjective feelings, and commonly accompanying sentiment or mood with others. An increasing number of empirical analyses of sentiment and mood are based on textual collections of public user generated data on the web. The Opinion mining is one domain of the data mining fields extracting public opinions exposed in SNS by utilizing data mining. There have been many studies on the issues of opinion mining from Web sources such as product reviews, forum posts and blogs. In relation to this literatures, we are trying to understand the effects of SNS exposures of firms on stock prices in Korea. Similarly to Bollen et al. [2011], we empirically analyze the impact of SNS exposures on stock return rates. We use Social Metrics by Daum Soft, an SNS big data analysis company in Korea. Social Metrics provides trends and public opinions in Twitter and blogs by using natural language process and analysis tools. It collects the sentences circulated in the Twitter in real time, and breaks down these sentences into the word units and then extracts keywords. In this study, we classify firms' exposures in SNS into two groups: positive and negative. To test the correlation and causation relationship between SNS exposures and stock price returns, we first collect 252 firms' stock prices and KRX100 index in the Korea Stock Exchange (KRX) from May 25, 2012 to September 1, 2012. We also gather the public attitudes (positive, negative) about these firms from Social Metrics over the same period of time. We conduct regression analysis between stock prices and the number of SNS exposures. Having checked the correlation between the two variables, we perform Granger causality test to see the causation direction between the two variables. The research result is that the number of total SNS exposures is positively related with stock market returns. The number of positive mentions of has also positive relationship with stock market returns. Contrarily, the number of negative mentions has negative relationship with stock market returns, but this relationship is statistically not significant. This means that the impact of positive mentions is statistically bigger than the impact of negative mentions. We also investigate whether the impacts are moderated by industry type and firm's size. We find that the SNS exposures impacts are bigger for IT firms than for non-IT firms, and bigger for small sized firms than for large sized firms. The results of Granger causality test shows change of stock price return is caused by SNS exposures, while the causation of the other way round is not significant. Therefore the correlation relationship between SNS exposures and stock prices has uni-direction causality. The more a firm is exposed in SNS, the more is the stock price likely to increase, while stock price changes may not cause more SNS mentions.

A Study on Web-User Clustering Algorithm for Web Personalization (웹 개인화를 위한 웹사용자 클러스터링 알고리즘에 관한 연구)

  • Lee, Hae-Kag
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.5
    • /
    • pp.2375-2382
    • /
    • 2011
  • The user clustering for web navigation pattern discovery is very useful to get preference and behavior pattern of users for web pages. In addition, the information by the user clustering is very essential for web personalization or customer grouping. In this paper, an algorithm for clustering the web navigation path of users is proposed and then some special navigation patterns can be recognized by the algorithm. The proposed algorithm has two clustering phases. In the first phase, all paths are classified into k-groups on the bases of the their similarities. The initial solution obtained in the first phase is not global optimum but it gives a good and feasible initial solution for the second phase. In the second phase, the first phase solution is improved by revising the k-means algorithm. In the revised K-means algorithm, grouping the paths is performed by the hyperplane instead of the distance between a path and a group center. Experimental results show that the proposed method is more efficient.

Extracting Method of User's Interests by Using SNS Follower's Relationship and Sequential Pattern Evaluation Indices for Keyword (키워드를 위한 시퀀셜 패턴 평가 지표와 SNS 팔로워의 관계를 이용한 사용자 관심사항 추출방법)

  • Shin, Bong-Hi;Jeon, Hye-Kyoung
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.8
    • /
    • pp.71-75
    • /
    • 2017
  • Due to the spread of SNS, web-based consumer-generated data is increasing exponentially. It is important in many fields to accurately extract what is appropriate for the user's interest in a large amount of data. It is especially important for business mangers to establish marketing policies to find the right customers for them in many users. In this paper, we try to obtain important information centering on customers who are interested in each account through Twitter follow - following relationship. Because Twitter's current follower relationships do not reflect the user's interests, we try to figure out the details of interest using keyword extraction methods for tweets of followers. To do this, we select two domestic commercial Twitter accounts and apply the sequential pattern evaluation index to the mining key phrase of the text data collected from the follower.