• Title/Summary/Keyword: Apriori Algorithm

Search Result 107, Processing Time 0.023 seconds

An Efficient Clustering Algorithm based on Heuristic Evolution (휴리스틱 진화에 기반한 효율적 클러스터링 알고리즘)

  • Ryu, Joung-Woo;Kang, Myung-Ku;Kim, Myung-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.1_2
    • /
    • pp.80-90
    • /
    • 2002
  • Clustering is a useful technique for grouping data points such that points within a single group/cluster have similar characteristics. Many clustering algorithms have been developed and used in engineering applications including pattern recognition and image processing etc. Recently, it has drawn increasing attention as one of important techniques in data mining. However, clustering algorithms such as K-means and Fuzzy C-means suffer from difficulties. Those are the needs to determine the number of clusters apriori and the clustering results depending on the initial set of clusters which fails to gain desirable results. In this paper, we propose a new clustering algorithm, which solves mentioned problems. In our method we use evolutionary algorithm to solve the local optima problem that clustering converges to an undesirable state starting with an inappropriate set of clusters. We also adopt a new measure that represents how well data are clustered. The measure is determined in terms of both intra-cluster dispersion and inter-cluster separability. Using the measure, in our method the number of clusters is automatically determined as the result of optimization process. And also, we combine heuristic that is problem-specific knowledge with a evolutionary algorithm to speed evolutionary algorithm search. We have experimented our algorithm with several sets of multi-dimensional data and it has been shown that one algorithm outperforms the existing algorithms.

Design and Implementation of Multi-dimensional Learning Path Pattern Analysis System (다차원 학습경로 패턴 분석 시스템의 설계 및 구현)

  • Baek, Jang-Hyeon;Kim, Yung-Sik
    • The KIPS Transactions:PartA
    • /
    • v.12A no.5 s.95
    • /
    • pp.461-470
    • /
    • 2005
  • In leaner-controlled environment where learners can decide and restructure the contents, methods and order of learning by themselves, it is possible to apply individualized learning in consideration of each learner's characteristics. The present study analyzed learners' learning path pattern, which is one of learners' characteristics important in Web-based teaching-learning process, using the Apriori algorithm and grouped learners according to their learning path pattern. Based on the result, we designed and implemented a multi-dimensional learning path pattern analysis system to provide individual learners with teaming paths, learning contents, learning media, supplementary teaming contents, the pattern of material presentation, etc. multi-dimensionally. According to the result of surveying satisfaction with the developed system satisfaction with supplementary learning contents was highest (Highly satisfied '$24.5\%$, Satisfied'$35.7\%$). By learners' level, satisfaction was higher in low-level learners (Highly satisfied'$20.2\%$, Satisfied'$31.2\%$) than in high-level learners (Highly satisfied'$18.4\%$, 'Satisfied'$28.54\%$). The developed system is expected to provide learners with multi-dimensionally meaningful information from various angles using OLAP technologies such as drill-up and drill-down.

A Study on Management of Student Retention Rate Using Association Rule Mining (연관관계 규칙을 이용한 학생 유지율 관리 방안 연구)

  • Kim, Jong-Man;Lee, Dong-Cheol
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.23 no.6
    • /
    • pp.67-77
    • /
    • 2018
  • Currently, there are many problems due to the decline in school-age population. Moreover, Korea has the largest number of universities compared to the population, and the university enrollment rate is also the highest in the world. As a result, the minimum student retention rate required for the survival of each university is becoming increasingly important. The purpose of this study was to examine the effects of reducing the number of graduates of education and the social climate that prioritizes employment. And to determine what the basic direction is for students to manage the student retention rate, which can be maintained from admission to graduation, to determine the optimal input variables, Based on the input parameters, we will make associative analysis using apriori algorithm to collect training data that is most suitable for maintenance rate management and make base data for development of the most efficient Deep Learning module based on it. The accuracy of Deep Learning was 75%, which is a measure of graduation using decision trees. In decision tree, factors that determine whether to graduate are graduated from general high school and students who are female and high in residence in urban area have high probability of graduation. As a result, the Deep Learning module developed rather than the decision tree was identified as a model for evaluating the graduation of students more efficiently.

Trend-based Sequential Pattern Discovery from Time-Series Data (시계열 데이터로부터의 경향성 기반 순차패턴 탐색)

  • 오용생;이동하;남도원;이전영
    • Journal of Intelligence and Information Systems
    • /
    • v.7 no.1
    • /
    • pp.27-45
    • /
    • 2001
  • Sequential discovery from time series data has mainly concerned about events or item sets. Recently, the research has stated to applied to the numerical data. An example is sensor information generated by checking a machine state. The numerical data hardly have the same valuers while making patterns. So, it is important to extract suitable number of pattern features, which can be transformed to events or item sets and be applied to sequential pattern mining tasks. The popular methods to extract the patterns are sliding window and clustering. The results of these methods are sensitive to window sine or clustering parameters; that makes users to apply data mining task repeatedly and to interpret the results. This paper suggests the method to retrieve pattern features making numerical data into vector of an angle and a magnitude. The retrieved pattern features using this method make the result easy to understand and sequential patterns finding fast. We define an inclusion relation among pattern features using angles and magnitudes of vectors. Using this relation, we can fad sequential patterns faster than other methods, which use all data by reducing the data size.

  • PDF

A Study on Projection Image Restoration by Adaptive Filtering (적응적 필터링에 의한 투사영상 복원에 관한 연구)

  • 김정희;김광익
    • Journal of Biomedical Engineering Research
    • /
    • v.19 no.2
    • /
    • pp.119-128
    • /
    • 1998
  • This paper describes a filtering algorithm which employs apriori information of SPECT lesion detectability potential for the filtering of degraded projection images prior to the backprojection reconstruction. In this algorithm, we determined m minimum detectable lesion sized(MDLSs) by assuming m object contrasts uniformly-chosen in the range of 0.0-1.0, based on a signal/noise model which provides the capability potential of SPECT in terms of physical factors. A best estimate of given projection image is attempted as a weighted combination of the subimages from m optimal filters whose design is focused on maximizing the local S/N ratios for the MDLS-lesions. These subimages show relatively larger resolution recovery effect and relatively smaller noise reduction effect with the decreased MDLS, and the weighting on each subimage was controlled by the difference between the subimage and the maximum-resolution-recovered projection image. The proposed filtering algoritym was tested on SPECT image reconstruction problems, and produced good results. Especially, this algorithm showed the adaptive effect that approximately averages the filter outputs in homogeneous areas and sensitively depends on each filter strength on contrast preserving/enhancing in textured lesion areas of the reconstructed image.

  • PDF

Performance analysis of Frequent Itemset Mining Technique based on Transaction Weight Constraints (트랜잭션 가중치 기반의 빈발 아이템셋 마이닝 기법의 성능분석)

  • Yun, Unil;Pyun, Gwangbum
    • Journal of Internet Computing and Services
    • /
    • v.16 no.1
    • /
    • pp.67-74
    • /
    • 2015
  • In recent years, frequent itemset mining for considering the importance of each item has been intensively studied as one of important issues in the data mining field. According to strategies utilizing the item importance, itemset mining approaches for discovering itemsets based on the item importance are classified as follows: weighted frequent itemset mining, frequent itemset mining using transactional weights, and utility itemset mining. In this paper, we perform empirical analysis with respect to frequent itemset mining algorithms based on transactional weights. The mining algorithms compute transactional weights by utilizing the weight for each item in large databases. In addition, these algorithms discover weighted frequent itemsets on the basis of the item frequency and weight of each transaction. Consequently, we can see the importance of a certain transaction through the database analysis because the weight for the transaction has higher value if it contains many items with high values. We not only analyze the advantages and disadvantages but also compare the performance of the most famous algorithms in the frequent itemset mining field based on the transactional weights. As a representative of the frequent itemset mining using transactional weights, WIS introduces the concept and strategies of transactional weights. In addition, there are various other state-of-the-art algorithms, WIT-FWIs, WIT-FWIs-MODIFY, and WIT-FWIs-DIFF, for extracting itemsets with the weight information. To efficiently conduct processes for mining weighted frequent itemsets, three algorithms use the special Lattice-like data structure, called WIT-tree. The algorithms do not need to an additional database scanning operation after the construction of WIT-tree is finished since each node of WIT-tree has item information such as item and transaction IDs. In particular, the traditional algorithms conduct a number of database scanning operations to mine weighted itemsets, whereas the algorithms based on WIT-tree solve the overhead problem that can occur in the mining processes by reading databases only one time. Additionally, the algorithms use the technique for generating each new itemset of length N+1 on the basis of two different itemsets of length N. To discover new weighted itemsets, WIT-FWIs performs the itemset combination processes by using the information of transactions that contain all the itemsets. WIT-FWIs-MODIFY has a unique feature decreasing operations for calculating the frequency of the new itemset. WIT-FWIs-DIFF utilizes a technique using the difference of two itemsets. To compare and analyze the performance of the algorithms in various environments, we use real datasets of two types (i.e., dense and sparse) in terms of the runtime and maximum memory usage. Moreover, a scalability test is conducted to evaluate the stability for each algorithm when the size of a database is changed. As a result, WIT-FWIs and WIT-FWIs-MODIFY show the best performance in the dense dataset, and in sparse dataset, WIT-FWI-DIFF has mining efficiency better than the other algorithms. Compared to the algorithms using WIT-tree, WIS based on the Apriori technique has the worst efficiency because it requires a large number of computations more than the others on average.

Development of Intelligent Job Classification System based on Job Posting on Job Sites (구인구직사이트의 구인정보 기반 지능형 직무분류체계의 구축)

  • Lee, Jung Seung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.123-139
    • /
    • 2019
  • The job classification system of major job sites differs from site to site and is different from the job classification system of the 'SQF(Sectoral Qualifications Framework)' proposed by the SW field. Therefore, a new job classification system is needed for SW companies, SW job seekers, and job sites to understand. The purpose of this study is to establish a standard job classification system that reflects market demand by analyzing SQF based on job offer information of major job sites and the NCS(National Competency Standards). For this purpose, the association analysis between occupations of major job sites is conducted and the association rule between SQF and occupation is conducted to derive the association rule between occupations. Using this association rule, we proposed an intelligent job classification system based on data mapping the job classification system of major job sites and SQF and job classification system. First, major job sites are selected to obtain information on the job classification system of the SW market. Then We identify ways to collect job information from each site and collect data through open API. Focusing on the relationship between the data, filtering only the job information posted on each job site at the same time, other job information is deleted. Next, we will map the job classification system between job sites using the association rules derived from the association analysis. We will complete the mapping between these market segments, discuss with the experts, further map the SQF, and finally propose a new job classification system. As a result, more than 30,000 job listings were collected in XML format using open API in 'WORKNET,' 'JOBKOREA,' and 'saramin', which are the main job sites in Korea. After filtering out about 900 job postings simultaneously posted on multiple job sites, 800 association rules were derived by applying the Apriori algorithm, which is a frequent pattern mining. Based on 800 related rules, the job classification system of WORKNET, JOBKOREA, and saramin and the SQF job classification system were mapped and classified into 1st and 4th stages. In the new job taxonomy, the first primary class, IT consulting, computer system, network, and security related job system, consisted of three secondary classifications, five tertiary classifications, and five fourth classifications. The second primary classification, the database and the job system related to system operation, consisted of three secondary classifications, three tertiary classifications, and four fourth classifications. The third primary category, Web Planning, Web Programming, Web Design, and Game, was composed of four secondary classifications, nine tertiary classifications, and two fourth classifications. The last primary classification, job systems related to ICT management, computer and communication engineering technology, consisted of three secondary classifications and six tertiary classifications. In particular, the new job classification system has a relatively flexible stage of classification, unlike other existing classification systems. WORKNET divides jobs into third categories, JOBKOREA divides jobs into second categories, and the subdivided jobs into keywords. saramin divided the job into the second classification, and the subdivided the job into keyword form. The newly proposed standard job classification system accepts some keyword-based jobs, and treats some product names as jobs. In the classification system, not only are jobs suspended in the second classification, but there are also jobs that are subdivided into the fourth classification. This reflected the idea that not all jobs could be broken down into the same steps. We also proposed a combination of rules and experts' opinions from market data collected and conducted associative analysis. Therefore, the newly proposed job classification system can be regarded as a data-based intelligent job classification system that reflects the market demand, unlike the existing job classification system. This study is meaningful in that it suggests a new job classification system that reflects market demand by attempting mapping between occupations based on data through the association analysis between occupations rather than intuition of some experts. However, this study has a limitation in that it cannot fully reflect the market demand that changes over time because the data collection point is temporary. As market demands change over time, including seasonal factors and major corporate public recruitment timings, continuous data monitoring and repeated experiments are needed to achieve more accurate matching. The results of this study can be used to suggest the direction of improvement of SQF in the SW industry in the future, and it is expected to be transferred to other industries with the experience of success in the SW industry.