• Title/Summary/Keyword: Activity Pattern Mining

Search Result 20, Processing Time 0.021 seconds

Research on Methods for Processing Nonstandard Korean Words on Social Network Services (소셜네트워크서비스에 활용할 비표준어 한글 처리 방법 연구)

  • Lee, Jong-Hwa;Le, Hoanh Su;Lee, Hyun-Kyu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.3
    • /
    • pp.35-46
    • /
    • 2016
  • Social network services (SNS) that help to build relationship network and share a particular interest or activity freely according to their interests by posting comments, photos, videos,${\ldots}$ on online communities such as blogs have adopted and developed widely as a social phenomenon. Several researches have been done to explore the pattern and valuable information in social networks data via text mining such as opinion mining and semantic analysis. For improving the efficiency of text mining, keyword-based approach have been applied but most of researchers argued the limitations of the rules of Korean orthography. This research aims to construct a database of non-standard Korean words which are difficulty in data mining such abbreviations, slangs, strange expressions, emoticons in order to improve the limitations in keyword-based text mining techniques. Based on the study of subjective opinions about specific topics on blogs, this research extracted non-standard words that were found useful in text mining process.

A Study on Autonomic Analysis for Servicing Intelligent Gas Safety Management Based on RFID/USN (RFID/USN 기반 지능형 가스안전관리 서비스를 위한 자율적 분석 연구)

  • Oh, Jeong-Seok;Choi, Kyung-Seok;Kwon, Jeong-Rock;Yoon, Ki-Bong
    • Journal of the Korean Society of Safety
    • /
    • v.23 no.6
    • /
    • pp.51-56
    • /
    • 2008
  • As RFID/USN technology is used in the latest industry trend, the information analysis paradigm shifts to intelligence service environment. The intelligent service includes autonomic operation, which select activity by defining itself to the status of industry facilities. Furthermore, information analysis based on IT used to frequently data mining for detecting the meaning information and deriving new pattern. This paper suggest self-classifying of context-aware by applying data mining in gas facilities for serving the intelligent gas safety management. We modify data algorithm for fitting the domain of gas safety, construct context-aware model by using the proposed algorithm, and demonstrate our method. As the accuracy of our model is improved over 90%, the our approach can apply to intelligent gas safety management based on RFID/USN environments.

The Analysis of Individual Learning Status on Web-Based Instruction (웹기반 교육에서 학습자별 학습현황 분석에 관한 연구)

  • Shin, Ji-Yeun;Jeong, Ok-Ran;Cho, Dong-Sub
    • The Journal of Korean Association of Computer Education
    • /
    • v.6 no.2
    • /
    • pp.107-120
    • /
    • 2003
  • In Web Based Instruction, as evaluation of learning process means individual student's learning activity, it demands data on learning time, pattern, participation, environment in a specific learning contents. The purpose of this paper is to reflect analysis results of individual student's learning status in achievement evaluation using the most suitable web log mining to settle evaluation problem of learning process, an issue in web based instruction. The contents and results of this study are as following. First, conformity item for learning status analysis is determined and web log data preprocessing is executed. Second, on the basis of web log data, I construct student's database and analyze learning status using data mining techniques.

  • PDF

A MapReduce-Based Workflow BIG-Log Clustering Technique (맵리듀스기반 워크플로우 빅-로그 클러스터링 기법)

  • Jin, Min-Hyuck;Kim, Kwanghoon Pio
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.87-96
    • /
    • 2019
  • In this paper, we propose a MapReduce-supported clustering technique for collecting and classifying distributed workflow enactment event logs as a preprocessing tool. Especially, we would call the distributed workflow enactment event logs as Workflow BIG-Logs, because they are satisfied with as well as well-fitted to the 5V properties of BIG-Data like Volume, Velocity, Variety, Veracity and Value. The clustering technique we develop in this paper is intentionally devised for the preprocessing phase of a specific workflow process mining and analysis algorithm based upon the workflow BIG-Logs. In other words, It uses the Map-Reduce framework as a Workflow BIG-Logs processing platform, it supports the IEEE XES standard data format, and it is eventually dedicated for the preprocessing phase of the ${\rho}$-Algorithm that is a typical workflow process mining algorithm based on the structured information control nets. More precisely, The Workflow BIG-Logs can be classified into two types: of activity-based clustering patterns and performer-based clustering patterns, and we try to implement an activity-based clustering pattern algorithm based upon the Map-Reduce framework. Finally, we try to verify the proposed clustering technique by carrying out an experimental study on the workflow enactment event log dataset released by the BPI Challenges.

Usefulness of Data Mining in Criminal Investigation (데이터 마이닝의 범죄수사 적용 가능성)

  • Kim, Joon-Woo;Sohn, Joong-Kweon;Lee, Sang-Han
    • Journal of forensic and investigative science
    • /
    • v.1 no.2
    • /
    • pp.5-19
    • /
    • 2006
  • Data mining is an information extraction activity to discover hidden facts contained in databases. Using a combination of machine learning, statistical analysis, modeling techniques and database technology, data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis. Law enforcement agencies deal with mass data to investigate the crime and its amount is increasing due to the development of processing the data by using computer. Now new challenge to discover knowledge in that data is confronted to us. It can be applied in criminal investigation to find offenders by analysis of complex and relational data structures and free texts using their criminal records or statement texts. This study was aimed to evaluate possibile application of data mining and its limitation in practical criminal investigation. Clustering of the criminal cases will be possible in habitual crimes such as fraud and burglary when using data mining to identify the crime pattern. Neural network modelling, one of tools in data mining, can be applied to differentiating suspect's photograph or handwriting with that of convict or criminal profiling. A case study of in practical insurance fraud showed that data mining was useful in organized crimes such as gang, terrorism and money laundering. But the products of data mining in criminal investigation should be cautious for evaluating because data mining just offer a clue instead of conclusion. The legal regulation is needed to control the abuse of law enforcement agencies and to protect personal privacy or human rights.

  • PDF

A Study on the Developement of Soil Geochemical Exploration Method for Metal Ore Deposits Affected by Agricultural Activity (농경작업 영향지역의 금속광상에 대한 토양 지구화학 탐사법 개발 연구)

  • Kim, Oak-Bae;Lee, Moo-Sung
    • Economic and Environmental Geology
    • /
    • v.25 no.2
    • /
    • pp.145-151
    • /
    • 1992
  • In order to study the optimum depth for the soil geochemical exploration in the area which is affected by agricultural activities and waste disposal of metal mine, the soil samples were sampled from the B layer of residual soil and vertical 7 layers up to 250 cm in the rice field and 3 layers up to 90 cm in the ordinary field. They were analyzed for Au, As, Cu, Pb and Zn by AAS, AAS-graphite furnace and ICP. To investigate the proper depth for the soil sampling in the contaminated area, the data were treated statistically by applying correlation coefficient, factor analysis and trend analysis. It is conclude that soil geochemical exploration method could be applied in the farm-land and a little contaminated area. The optimum depth of soil sampling is 60 cm in the ordinary field, and 150~200 cm in the rice field. Soil sampling in the area of a huge mine waste disposal is not recommendable. Plotting of geochemical map with factor scores as a input data shows a clear pattern compared with the map of indicater element such as As or Au. The second or third degree trend surface analysis is effective in inferring the continuity of vein in the area where the outcrop is invisible.

  • PDF

A Study on the Prediction of Residual Probability of Fine Dust in Complex Urban Area (복잡한 도심에서의 유입된 미세먼지 잔류 가능성 예보 연구)

  • Park, Sung Ju;Seo, You Jin;Kim, Dong Wook;Choi, Hyun Jeong
    • Journal of the Korean earth science society
    • /
    • v.41 no.2
    • /
    • pp.111-128
    • /
    • 2020
  • This study presents a possibility of intensification of fine dust mass concentration due to the complex urban structure using data mining technique and clustering analysis. The data mining technique showed no significant correlation between fine dust concentration and regional-use public urban data over Seoul. However, clustering analysis based on nationwide-use public data showed that building heights (floors) have a strong correlation particularly with PM10. The modeling analyses using the single canopy model and the micro-atmospheric modeling program (ENVI-Met. 4) conducted that the controlled atmospheric convection in urban area leaded to the congested flow pattern depending on the building along the distribution and height. The complex structure of urban building controls convective activity resulted in stagnation condition and fine dust increase near the surface. Consequently, the residual effect through the changes in the thermal environment caused by the shape and structure of the urban buildings must be considered in the fine dust distribution. It is notable that the atmospheric congestion may be misidentified as an important implications for providing information about the residual probability of fine dust mass concentration in the complex urban area.

An Expert System for the Estimation of the Growth Curve Parameters of New Markets (신규시장 성장모형의 모수 추정을 위한 전문가 시스템)

  • Lee, Dongwon;Jung, Yeojin;Jung, Jaekwon;Park, Dohyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.17-35
    • /
    • 2015
  • Demand forecasting is the activity of estimating the quantity of a product or service that consumers will purchase for a certain period of time. Developing precise forecasting models are considered important since corporates can make strategic decisions on new markets based on future demand estimated by the models. Many studies have developed market growth curve models, such as Bass, Logistic, Gompertz models, which estimate future demand when a market is in its early stage. Among the models, Bass model, which explains the demand from two types of adopters, innovators and imitators, has been widely used in forecasting. Such models require sufficient demand observations to ensure qualified results. In the beginning of a new market, however, observations are not sufficient for the models to precisely estimate the market's future demand. For this reason, as an alternative, demands guessed from those of most adjacent markets are often used as references in such cases. Reference markets can be those whose products are developed with the same categorical technologies. A market's demand may be expected to have the similar pattern with that of a reference market in case the adoption pattern of a product in the market is determined mainly by the technology related to the product. However, such processes may not always ensure pleasing results because the similarity between markets depends on intuition and/or experience. There are two major drawbacks that human experts cannot effectively handle in this approach. One is the abundance of candidate reference markets to consider, and the other is the difficulty in calculating the similarity between markets. First, there can be too many markets to consider in selecting reference markets. Mostly, markets in the same category in an industrial hierarchy can be reference markets because they are usually based on the similar technologies. However, markets can be classified into different categories even if they are based on the same generic technologies. Therefore, markets in other categories also need to be considered as potential candidates. Next, even domain experts cannot consistently calculate the similarity between markets with their own qualitative standards. The inconsistency implies missing adjacent reference markets, which may lead to the imprecise estimation of future demand. Even though there are no missing reference markets, the new market's parameters can be hardly estimated from the reference markets without quantitative standards. For this reason, this study proposes a case-based expert system that helps experts overcome the drawbacks in discovering referential markets. First, this study proposes the use of Euclidean distance measure to calculate the similarity between markets. Based on their similarities, markets are grouped into clusters. Then, missing markets with the characteristics of the cluster are searched for. Potential candidate reference markets are extracted and recommended to users. After the iteration of these steps, definite reference markets are determined according to the user's selection among those candidates. Then, finally, the new market's parameters are estimated from the reference markets. For this procedure, two techniques are used in the model. One is clustering data mining technique, and the other content-based filtering of recommender systems. The proposed system implemented with those techniques can determine the most adjacent markets based on whether a user accepts candidate markets. Experiments were conducted to validate the usefulness of the system with five ICT experts involved. In the experiments, the experts were given the list of 16 ICT markets whose parameters to be estimated. For each of the markets, the experts estimated its parameters of growth curve models with intuition at first, and then with the system. The comparison of the experiments results show that the estimated parameters are closer when they use the system in comparison with the results when they guessed them without the system.

Development of Yóukè Mining System with Yóukè's Travel Demand and Insight Based on Web Search Traffic Information (웹검색 트래픽 정보를 활용한 유커 인바운드 여행 수요 예측 모형 및 유커마이닝 시스템 개발)

  • Choi, Youji;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.155-175
    • /
    • 2017
  • As social data become into the spotlight, mainstream web search engines provide data indicate how many people searched specific keyword: Web Search Traffic data. Web search traffic information is collection of each crowd that search for specific keyword. In a various area, web search traffic can be used as one of useful variables that represent the attention of common users on specific interests. A lot of studies uses web search traffic data to nowcast or forecast social phenomenon such as epidemic prediction, consumer pattern analysis, product life cycle, financial invest modeling and so on. Also web search traffic data have begun to be applied to predict tourist inbound. Proper demand prediction is needed because tourism is high value-added industry as increasing employment and foreign exchange. Among those tourists, especially Chinese tourists: Youke is continuously growing nowadays, Youke has been largest tourist inbound of Korea tourism for many years and tourism profits per one Youke as well. It is important that research into proper demand prediction approaches of Youke in both public and private sector. Accurate tourism demands prediction is important to efficient decision making in a limited resource. This study suggests improved model that reflects latest issue of society by presented the attention from group of individual. Trip abroad is generally high-involvement activity so that potential tourists likely deep into searching for information about their own trip. Web search traffic data presents tourists' attention in the process of preparation their journey instantaneous and dynamic way. So that this study attempted select key words that potential Chinese tourists likely searched out internet. Baidu-Chinese biggest web search engine that share over 80%- provides users with accessing to web search traffic data. Qualitative interview with potential tourists helps us to understand the information search behavior before a trip and identify the keywords for this study. Selected key words of web search traffic are categorized by how much directly related to "Korean Tourism" in a three levels. Classifying categories helps to find out which keyword can explain Youke inbound demands from close one to far one as distance of category. Web search traffic data of each key words gathered by web crawler developed to crawling web search data onto Baidu Index. Using automatically gathered variable data, linear model is designed by multiple regression analysis for suitable for operational application of decision and policy making because of easiness to explanation about variables' effective relationship. After regression linear models have composed, comparing with model composed traditional variables and model additional input web search traffic data variables to traditional model has conducted by significance and R squared. after comparing performance of models, final model is composed. Final regression model has improved explanation and advantage of real-time immediacy and convenience than traditional model. Furthermore, this study demonstrates system intuitively visualized to general use -Youke Mining solution has several functions of tourist decision making including embed final regression model. Youke Mining solution has algorithm based on data science and well-designed simple interface. In the end this research suggests three significant meanings on theoretical, practical and political aspects. Theoretically, Youke Mining system and the model in this research are the first step on the Youke inbound prediction using interactive and instant variable: web search traffic information represents tourists' attention while prepare their trip. Baidu web search traffic data has more than 80% of web search engine market. Practically, Baidu data could represent attention of the potential tourists who prepare their own tour as real-time. Finally, in political way, designed Chinese tourist demands prediction model based on web search traffic can be used to tourism decision making for efficient managing of resource and optimizing opportunity for successful policy.

The Geochemistry of Copper-bearing Hydrothermal Vein Deposits in Goseong Mining District (Samsan Area), Gyeongsang Basin, Korea (경상분지내 삼산지역 열수동광상에 관한 지화학적 연구)

  • Choi, Sang Hoon;So, Chil Sup;Kweon, Soon Hag;Choi, Kwang Jun
    • Economic and Environmental Geology
    • /
    • v.27 no.2
    • /
    • pp.147-160
    • /
    • 1994
  • Copper-bearing hydrothermal vein mineralization of the Samsan area was deposited in two stages (I and II) of quartz-calcite-sulfide veins which fill fissures in Cretaceous volcanic and sedimentary rocks of the Gyeongsang basin. The major ore minerals, chalcopyrite and sphalerite, together with pyrite, galena, hematite, and minor sulfosalts, occur with epidote and chlorite as gangue minerals in stage I quartz veins. Chlorite geothermometry, fluid inclusion and stable isotope data indicate that copper ore was deposited mainly at temperatures between $330^{\circ}C$ and $280^{\circ}C$ from fluids with salinities between 12 and 3 equiv. wt % NaCl. Evidence of fluid boiling indicates a range of pressures from ${\leq}100$ to 200 bars bars. Within ore stage I there was an apparent decrease in ${\delta}^{34}S$ values of $H_{2}S$ with paragenetic time, from 8.0 to 2.3 per mil. This pattern was likely achieved through progressive increases in activity of oxygen accompanying boiling and mixing. In the early part of the first stage, the high temperature, high salinity fluids gave way to progressively cooler and more dilute fluids of the late parts in the first stage and of the second stage. There is a systematic decrease in calculated ${\delta}^{18}O_{water}$ values with decreasing temperature in the Samsan hydrothermal system, from values of -86 per mil for early portion of stage I through -5.9 per mil for late portion of stage I to -6.3 per mil for stage II. The ${\delta}D$ values of fluid inclusion waters also decrease with paragenetic time from -76 per mil to -86 per mil. These trends combined with mineral paragenesis and fluid inclusion data are interpreted to indicate progressive cooler, more oxidizing meteoric water inundation of an early exchanged meteoric hydrothermal system.

  • PDF