• Title/Summary/Keyword: Pattern mining

Search Result 624, Processing Time 0.027 seconds

Early Detection of Lung Cancer Risk Using Data Mining

  • Ahmed, Kawsar;Abdullah-Al-Emran, Abdullah-Al-Emran;Jesmin, Tasnuba;Mukti, Roushney Fatima;Rahman, Md. Zamilur;Ahmed, Farzana
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.1
    • /
    • pp.595-598
    • /
    • 2013
  • Background: Lung cancer is the leading cause of cancer death worldwide Therefore, identification of genetic as well as environmental factors is very important in developing novel methods of lung cancer prevention. However, this is a multi-layered problem. Therefore a lung cancer risk prediction system is here proposed which is easy, cost effective and time saving. Materials and Methods: Initially 400 cancer and non-cancer patients' data were collected from different diagnostic centres, pre-processed and clustered using a K-means clustering algorithm for identifying relevant and non-relevant data. Next significant frequent patterns are discovered using AprioriTid and a decision tree algorithm. Results: Finally using the significant pattern prediction tools for a lung cancer prediction system were developed. This lung cancer risk prediction system should prove helpful in detection of a person's predisposition for lung cancer. Conclusions: Most of people of Bangladesh do not even know they have lung cancer and the majority of cases are diagnosed at late stages when cure is impossible. Therefore early prediction of lung cancer should play a pivotal role in the diagnosis process and for an effective preventive strategy.

An Active Candidate Set Management Model on Association Rule Discovery using Database Trigger and Incremental Update Technique (트리거와 점진적 갱신기법을 이용한 연관규칙 탐사의 능동적 후보항목 관리 모델)

  • Hwang, Jeong-Hui;Sin, Ye-Ho;Ryu, Geun-Ho
    • Journal of KIISE:Databases
    • /
    • v.29 no.1
    • /
    • pp.1-14
    • /
    • 2002
  • Association rule discovery is a method of mining for the associated item set on large databases based on support and confidence threshold. The discovered association rules can be applied to the marketing pattern analysis in E-commerce, large shopping mall and so on. The association rule discovery makes multiple scan over the database storing large transaction data, thus, the algorithm requiring very high overhead might not be useful in real-time association rule discovery in dynamic environment. Therefore this paper proposes an active candidate set management model based on trigger and incremental update mechanism to overcome non-realtime limitation of association rule discovery. In order to implement the proposed model, we not only describe an implementation model for incremental updating operation, but also evaluate the performance characteristics of this model through the experiment.

Development of the Power Consumption Simulator and Classification of the Types of Household by Using Data Mining Over Smart Grid (스마트 그리드 환경에서 가정의 소비전력 생성 시뮬레이터 개발 및 데이터 마이닝 기법을 이용한 가족 유형 분류)

  • Kim, Ji-Hyun;Lee, Yun-Jin;Kim, Ho-Won
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39C no.1
    • /
    • pp.72-81
    • /
    • 2014
  • Recently, because of irregular power demand, we have suffered from an electric power shortage. The necessity of the adoption of smart grid which makes effective supply of power by using the two-way communication across the grid between the customers and electric energy providers is growing more and more. If smart grid set up in our country, the third-parties which provide services to customer using the information acquired from smart grid, might be revved up. In this paper, we suggest a methodology how classify the types of family by analysing an power consumption pattern using data mining technique. To make a classifier for categorizing the household types, we need power consumption data and their family type. However, it is hard to get both of them. Therefore we develop the simulator that generates power consumption patterns of the household and classify the types of family. Also, we present a potential for application services such as customized services for a specific family or goods marketing.

An Insight Study on Keyword of IoT Utilizing Big Data Analysis (빅데이터 분석을 활용한 사물인터넷 키워드에 관한 조망)

  • Nam, Soo-Tai;Kim, Do-Goan;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.146-147
    • /
    • 2017
  • Big data analysis is a technique for effectively analyzing unstructured data such as the Internet, social network services, web documents generated in the mobile environment, e-mail, and social data, as well as well formed structured data in a database. The most big data analysis techniques are data mining, machine learning, natural language processing, and pattern recognition, which were used in existing statistics and computer science. Global research institutes have identified analysis of big data as the most noteworthy new technology since 2011. Therefore, companies in most industries are making efforts to create new value through the application of big data. In this study, we analyzed using the Social Matrics which a big data analysis tool of Daum communications. We analyzed public perceptions of "Internet of things" keyword, one month as of october 8, 2017. The results of the big data analysis are as follows. First, the 1st related search keyword of the keyword of the "Internet of things" has been found to be technology (995). This study suggests theoretical implications based on the results.

  • PDF

Investigation of shear behavior of soil-concrete interface

  • Haeri, Hadi;Sarfarazi, Vahab;Zhu, Zheming;Marji, Mohammad Fatehi;Masoumi, Alireza
    • Smart Structures and Systems
    • /
    • v.23 no.1
    • /
    • pp.81-90
    • /
    • 2019
  • The shear behavior of soil-concrete interface is mainly affected by the surface roughness of the two contact surfaces. The present research emphasizes on investigating the effect of roughness of soil-concrete interface on the interface shear behavior in two-layered laboratory testing samples. In these specially prepared samples, clay silt layer with density of $2027kg/m^3$ was selected to be in contact a concrete layer for simplifying the laboratory testing. The particle size testing and direct shear tests are performed to determine the appropriate particles sizes and their shear strength properties such as cohesion and friction angle. Then, the surface undulations in form of teeth are provided on the surfaces of both concrete and soil layers in different testing carried out on these mixed specimens. The soil-concrete samples are prepared in form of cubes of 10*10*30 cm. in dimension. The undulations (inter-surface roughness) are provided in form of one tooth or two teeth having angles $15^{\circ}$ and $30^{\circ}$, respectively. Several direct shear tests were carried out under four different normal loads of 80, 150, 300 and 500 KPa with a constant displacement rate of 0.02 mm/min. These testing results show that the shear failure mechanism is affected by the tooth number, the roughness angle and the applied normal stress on the sample. The teeth are sheared from the base under low normal load while the oblique cracks may lead to a failure under a higher normal load. As the number of teeth increase the shear strength of the sample also increases. When the tooth roughness angle increases a wider portion of the tooth base will be failed which means the shear strength of the sample is increased.

Factors Clustering Approach to Parametric Cost Estimates And OLAP Driver

  • JaeHo, Cho;BoSik, Son;JaeYoul, Chun
    • International conference on construction engineering and project management
    • /
    • 2009.05a
    • /
    • pp.707-716
    • /
    • 2009
  • The role of cost modeller is to facilitate the design process by systematic application of cost factors so as to maintain a sensible and economic relationship between cost, quantity, utility and appearance which thus helps in achieving the client's requirements within an agreed budget. There are a number of research on cost estimates in the early design stage based on the improvement of accuracy or impact factors. It is common knowledge that cost estimates are undertaken progressively throughout the design stage and make use of the information that is available at each phase, through the related research up to now. In addition, Cost estimates in the early design stage shall analyze the information under the various kinds of precondition before reaching the more developed design because a design can be modified and changed in all process depending on clients' requirements. Parametric cost estimating models have been adopted to support decision making in a changeable environment, in the early design stage. These models are using a similar instance or a pattern of historical case to be constituted in project information, geographic design features, relevant data to quantity or cost, etc. OLAP technique analyzes a subject data by multi-dimensional points of view; it supports query, analysis, comparison of required information by diverse queries. OLAP's data structure matches well with multiview-analysis framework. Accordingly, this study implements multi-dimensional information system for case based quantity data related to design information that is utilizing OLAP's technology, and then analyzes impact factors of quantity by the design criteria or parameter of the same meaning. On the basis of given factors examined above, this study will generate the rules on quantity measure and produce resemblance class using clustering of data mining. These sorts of knowledge-base consist of a set of classified data as group patterns, of which will be appropriate stand on the parametric cost estimating method.

  • PDF

Mechanism of failure in the Semi-Circular Bend (SCB) specimen of gypsum-concrete with an edge notch

  • Fu, Jinwei;Sarfarazi, Vahab;Haeri, Hadi;Marji, Mohammad Fatehi;Guo, Mengdi
    • Structural Engineering and Mechanics
    • /
    • v.81 no.1
    • /
    • pp.81-91
    • /
    • 2022
  • The effects of interaction between concrete-gypsum interface and edge crack on the failure behavior of the specimens in senicircular bend (SCB) test were studied in the laboratory and also simulated numerically using the discrete element method. Some quarter circular specimens of gypsum and concrete with 5 cm radii and hieghts were separately prepared. Then the semicircular testing specimens were made by attaching one gypsum and one concrete sample to one another using a special glue and one edge crack is produced (in the interface) by do not using the glue in that part of the interface. The tensile strengths of concrete and gypsum samples were separately measured as 2.2 MPa and 1.3 MPa, respectively. during all testing performances a constant loading rate of 0.005 mm/s were stablished. The proposed testing method showed that the mechanism of failure and fracture in the brittle materials were mostly governed by the dimensions and number of discontinuities. The fracture toughnesses of the SCB samples were related to the fracture patterns during the failure processes of these specimens. The tensile behaviour of edge notch was related to the number of induced tensile cracks which were increased by decreasing the joint length. The fracture toughness of samples was constant by increasing the joint length. The failure process and fracture pattern in the notched semi-circular bending specimens were similar for both methods used in this study (i.e., the laboratory tests and the simulation procedure using the particle flow code (PFC2D)).

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution (불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측)

  • Kim, Eunmi;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.29-45
    • /
    • 2015
  • Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.

Distribution of ATP in the Deep-Sea Sediment in the KODOS 97-2 Area, Northeast Equatorial Pacific Ocean (북동적도 태평양 KODOS 97-2 해역 심해저 퇴적물 내의 ATP 분포양상)

  • Hyun, Jung-Ho;Kim, Kyeong-Hong;Chi, Sang-Bum;Moon, Jai-Woon
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.3 no.3
    • /
    • pp.142-148
    • /
    • 1998
  • Environmental baseline information is necessary in order to assess the potential environmental impact of future manganese-nodule mining on the deep-seabed ecosystem. Total ATP (T-ATP), dissolved ATP (D-ATP) and particulate ATP (P-ATP) were measured to estimate total microbial biomass and to elucidate their vertical distribution patterns in the seabed of KODOS (Korea Deep Ocean Study) area, northeast equatorial Pacific Ocean. Within the upper 6 cm depth of sediment, the concentrations of T-ATP, D-ATP and P-ATP ranged from 4.4 to 40.6, from 0.6 to 16.1, and from 3.0 to 29.2 ng/g dry sediment, respectively. Approximately 84% of T-ATP, 81% of D-ATP, and 74% of P-ATP were present within the topmost 2 cm depth of sediment, and the distributions of ATP were well correlated with water content in the sediment. These results indicate that the distribution of total microbial biomass was largely determined by the supply of organic matter from surface water column. Fine-scale vertical variations of ATP were detected within 1-cm thick veneer of the sediment samples collected by multiple corer, while no apparent vertical changes were observed in the box-cored samples. It is evident that the box-core samples were disturbed extensively during sampling, which suggests that the multiple corer is a more appropriate sampling gear for measuring fine-scale vertical distribution pattern of ATP within thin sediment veneer. Overall results suggest that the concentrations of ATP, given their clear changes in vertical distribution pattern within 6 cm depth of sediment, are a suitable environmental baseline parameter in evaluating the variations of benthic microbial biomass that are likely to be caused by deep-seabed mining operation.

  • PDF

A Study on the Prediction of Residual Probability of Fine Dust in Complex Urban Area (복잡한 도심에서의 유입된 미세먼지 잔류 가능성 예보 연구)

  • Park, Sung Ju;Seo, You Jin;Kim, Dong Wook;Choi, Hyun Jeong
    • Journal of the Korean earth science society
    • /
    • v.41 no.2
    • /
    • pp.111-128
    • /
    • 2020
  • This study presents a possibility of intensification of fine dust mass concentration due to the complex urban structure using data mining technique and clustering analysis. The data mining technique showed no significant correlation between fine dust concentration and regional-use public urban data over Seoul. However, clustering analysis based on nationwide-use public data showed that building heights (floors) have a strong correlation particularly with PM10. The modeling analyses using the single canopy model and the micro-atmospheric modeling program (ENVI-Met. 4) conducted that the controlled atmospheric convection in urban area leaded to the congested flow pattern depending on the building along the distribution and height. The complex structure of urban building controls convective activity resulted in stagnation condition and fine dust increase near the surface. Consequently, the residual effect through the changes in the thermal environment caused by the shape and structure of the urban buildings must be considered in the fine dust distribution. It is notable that the atmospheric congestion may be misidentified as an important implications for providing information about the residual probability of fine dust mass concentration in the complex urban area.