• Title/Summary/Keyword: data-mining method

Search Result 1,372, Processing Time 0.028 seconds

A Study on Factors of Internet Overdependence for Adults Using the Decision Tree Analysis Model (성인층의 인터넷 과의존 영향요인: 의사결정나무분석을 활용하여)

  • Seo, Hyung-Jun;Shin, Ji-Woong
    • Informatization Policy
    • /
    • v.25 no.2
    • /
    • pp.20-45
    • /
    • 2018
  • This study aims to find the factors of Internet overdependence in adults, through the decision tree analysis model, which is a data mining method using National Information Society Agency's raw data from the survey on Internet overdependence in 2016. As a result of the decision tree analysis, a total 16 nodes of Internet overdependence risk groups were identified. The main predicated variables were the amount of time spent per smart media usage in weekdays; amount of time spent per smart media usage in weekends; experiences of purchasing cash items; percentage of using smart media for leisure; negative personality; percentage of using smart media for information search and utilization; and awareness on good functions of the Internet, all of which in order had greater impact on the risk groups. Users in the highest risk node spent the smart media for more than 5 minutes per use and less than 5~10 minutes in weekdays, had experiences of cash item purchase, and had lower level of awareness on the good functions of the Internet. The analysis led to the following recommendations: First, even a short-time use has higher chances of causing Internet overdependence, and therefore, guidelines need to be developed based on research on the usage behavior rather than the usage time. Second, self-regulation is required because factors that affect overindulgence in games, such as the cash items, increase Internet overdependence. Third, using the Internet for leisure causes higher risk of overdependence and therefore, other means of leisure should be recommended.

Visualization of Flow in a Transonic Centrifugal Compressor

  • Hayami Hiroshi
    • 한국가시화정보학회:학술대회논문집
    • /
    • 2002.11a
    • /
    • pp.1-6
    • /
    • 2002
  • How is the flow in a rotating impeller. About 35 years have passed since one experimentalist rotating with the impeller. of a huge centrifugal blower made the flow measurements using a hot-wire anemometer (Fowler 1968). Optical measurement methods have great advantages over the intrusive methods especially for the flow measurement in a rotating impeller. One is the optical flow visualization (FV) technique (Senoo, et al., 1968) and the other is the application of laser velocimetry (LV) (Hah and Krain, 1990). Particle image velocimetries (PIVs) combine major features of both FV and LV, and are very attractive due to the feasibility of simultaneous and multi-points measurements (Hayami and Aramaki, 1999). A high-pressure-ratio transonic centrifugal compressor with a low-solidity cascade diffuser was tested in a closed loop with HFC134a gas at 18,000rpm (Hayami, 2000). Two kinds of measurement techniques by image processing were applied to visualize a flow in the compressor. One is a velocity field measurement at the inducer of the impeller using a PIV and the other is a pressure field measurement on the side wall of the cascade diffuser using a pressure sensitive paint (PSP) measurement technique. The PIV was successfully applied for visualization of an unsteady behavior of a shock wave based on the instantaneous velocity field measurement (Hayami, et al., 2002b) as well as a phase-averaged velocity vector field with a shock wave over one blade pitch (Hayami, et al., 2002a. b). A violent change in pressure was successfully visualized using a PSP measurement during a surge condition even though there are still some problems to be overcome (Hayami, et al., 2002c). Both PIV and PSP results are discussed in comparison with those of laser-2-focus (L2F) velocimetry and those of semiconductor pressure sensors. Experimental fluid dynamics (EFDs) are still growing up more and more both in hardware and in software. On the other hand, computational fluid dynamics (CFDs) are very attractive to understand the details of flow. A secondary flow on the side wall of the cascade diffuser was visualized based either steady or unsteady CFD calculations (Bonaiuti, et al.,2002). EFD and CFD methods will be combined to a hybrid method being complementary to each other. Measurement techniques by image processing as well as CFD calculations give a huge amount of data. Then, data mining technique will become more important to understand the flow mechanism both for EFD and CFD.

  • PDF

Construction of Mine Geospatial Information by Total Station and 3D Laser Scanner (토털스테이션과 3D 레이저 스캐너에 의한 광산공간정보 구축)

  • Park, Joon-Kyu;Lee, Keun-Wang
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.3
    • /
    • pp.520-525
    • /
    • 2019
  • Mines are an important infrastructure for securing resources, but safety problems can arise in the course of operation. Recently, the mining process is very complicated due to the large scale and mechanization. Therefore, it is necessary to construct accurate geospatial information on mine for systematic and safe mine operation. The geospatial information construction using the existing total station has a disadvantage that a lot of work time is required because the target must be collimated and measured. In this study, the data of the mines were acquired with the total station and the 3D laser scanner, and the mine spatial information was constructed by using the shape based registration method. By using the static scanner data of some area applying the reference point surveying result of the total station, it was possible to construct the accurate result on the wide area acquired by the mobile scanner effectively. Also, the accuracy of the constructed geospatial information was evaluated and the deviation of mean 0.083m was shown. Point cloud products constructed through the research can contribute to the efficiency improvement of mine management by enabling quantitative analysis such as visualization of mine shape, distance, area and slope, and automation of drawing creation for cross section shape.

A Classification Model for Illegal Debt Collection Using Rule and Machine Learning Based Methods

  • Kim, Tae-Ho;Lim, Jong-In
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.93-103
    • /
    • 2021
  • Despite the efforts of financial authorities in conducting the direct management and supervision of collection agents and bond-collecting guideline, the illegal and unfair collection of debts still exist. To effectively prevent such illegal and unfair debt collection activities, we need a method for strengthening the monitoring of illegal collection activities even with little manpower using technologies such as unstructured data machine learning. In this study, we propose a classification model for illegal debt collection that combine machine learning such as Support Vector Machine (SVM) with a rule-based technique that obtains the collection transcript of loan companies and converts them into text data to identify illegal activities. Moreover, the study also compares how accurate identification was made in accordance with the machine learning algorithm. The study shows that a case of using the combination of the rule-based illegal rules and machine learning for classification has higher accuracy than the classification model of the previous study that applied only machine learning. This study is the first attempt to classify illegalities by combining rule-based illegal detection rules with machine learning. If further research will be conducted to improve the model's completeness, it will greatly contribute in preventing consumer damage from illegal debt collection activities.

The Effect of Changes in Airbnb Host's Marketing Strategy on Listing Performance in the COVID-19 Pandemic (COVID-19 팬데믹에서 Airbnb 호스트의 마케팅 전략의 변화가 공유성과에 미치는 영향)

  • Kim, So Yeong;Sim, Ji Hwan;Chung, Yeo Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.1-27
    • /
    • 2021
  • The entire tourism industry is being hit hard by the COVID-19 as a global pandemic. Accommodation sharing services such as Airbnb, which have recently expanded due to the spread of the sharing economy, are particularly affected by the pandemic because transactions are made based on trust and communication between consumer and supplier. As the pandemic situation changes individuals' perceptions and behavior of travel, strategies for the recovery of the tourism industry have been discussed. However, since most studies present macro strategies in terms of traditional lodging providers and the government, there is a significant lack of discussion on differentiated pandemic response strategies considering the peculiarity of the sharing economy centered on peer-to-peer transactions. This study discusses the marketing strategy for individual hosts of Airbnb during COVID-19. We empirically analyze the effect of changes in listing descriptions posted by the Airbnb hosts on listing performance after COVID-19 was outbroken. We extract nine aspects described in the listing descriptions using the Attention-Based Aspect Extraction model, which is a deep learning-based aspect extraction method. We model the effect of aspect changes on listing performance after the COVID-19 by observing the frequency of each aspect appeared in the text. In addition, we compare those effects across the types of Airbnb listing. Through this, this study presents an idea for a pandemic crisis response strategy that individual service providers of accommodation sharing services can take depending on the listing type.

The Analysis of Changes in East Coast Tourism using Topic Modeling (토핑 모델링을 활용한 동해안 관광의 변화 분석)

  • Jeong, Eun-Hee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.6
    • /
    • pp.489-495
    • /
    • 2020
  • The amount of data is increasing through various IT devices in a hyper-connected society where the 4th revolution is progressing, and new value can be created by analyzing that data. This paper was collected total 1,526 articles from 2017 to 2019 in central magazines, economic magazines, regional associations, and major broadcasting companies with the keyword "(East Coast Tourism or East Coast Travel) and Gangwon-do" through Bigkinds. It was performed the topic modeling using LDA algorithm implemented in the R language to analyze the collected 1,526 articles. It was extracted keywords for each year from 2017 to 2019, and classified and compared keywords with high frequency for each year. It was setted the optimal number of topics to 8 using Log Likelihood and Perplexity, and then inferred 8 topics using the Gibbs Sampling method. The inferred topics were Gangneung and Beach, Goseong and Mt.Geumgang, KTX and Donghae-Bukbu line, weekend sea tour, Sokcho and Unification Observatory, Yangyang and Surfing, experience tour, and transportation network infra. The changes of articles on East coast tourism was was analyzed using the proportion of the inferred eight topics. As the result, the proportion of Unification Observatory and Mt. Geumgang showed no significant change, the proportion of KTX and experience tour increased, and the proportion of other topics decreased in 2018 compared to 2017. In 2019, the proportion of KTX and experience tour decreased, but the proportion of other topics showed no significant change.

A Study on the Analysis of Related Information through the Establishment of the National Core Technology Network: Focused on Display Technology (국가핵심기술 관계망 구축을 통한 연관정보 분석연구: 디스플레이 기술을 중심으로)

  • Pak, Se Hee;Yoon, Won Seok;Chang, Hang Bae
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.2
    • /
    • pp.123-141
    • /
    • 2021
  • As the dependence of technology on the economic structure increases, the importance of National Core Technology is increasing. However, due to the nature of the technology itself, it is difficult to determine the scope of the technology to be protected because the scope of the relation is abstract and information disclosure is limited due to the nature of the National Core Technology. To solve this problem, we propose the most appropriate literature type and method of analysis to distinguish important technologies related to National Core Technology. We conducted a pilot test to apply TF-IDF, and LDA topic modeling, two techniques of text mining analysis for big data analysis, to four types of literature (news, papers, reports, patents) collected with National Core Technology keywords in the field of Display industry. As a result, applying LDA theme modeling to patent data are highly relevant to National Core Technology. Important technologies related to the front and rear industries of displays, including OLEDs and microLEDs, were identified, and the results were visualized as networks to clarify the scope of important technologies associated with National Core Technology. Throughout this study, we have clarified the ambiguity of the scope of association of technologies and overcome the limited information disclosure characteristics of national core technologies.

Suggestion of Urban Regeneration Type Recommendation System Based on Local Characteristics Using Text Mining (텍스트 마이닝을 활용한 지역 특성 기반 도시재생 유형 추천 시스템 제안)

  • Kim, Ikjun;Lee, Junho;Kim, Hyomin;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.149-169
    • /
    • 2020
  • "The Urban Renewal New Deal project", one of the government's major national projects, is about developing underdeveloped areas by investing 50 trillion won in 100 locations on the first year and 500 over the next four years. This project is drawing keen attention from the media and local governments. However, the project model which fails to reflect the original characteristics of the area as it divides project area into five categories: "Our Neighborhood Restoration, Housing Maintenance Support Type, General Neighborhood Type, Central Urban Type, and Economic Base Type," According to keywords for successful urban regeneration in Korea, "resident participation," "regional specialization," "ministerial cooperation" and "public-private cooperation", when local governments propose urban regeneration projects to the government, they can see that it is most important to accurately understand the characteristics of the city and push ahead with the projects in a way that suits the characteristics of the city with the help of local residents and private companies. In addition, considering the gentrification problem, which is one of the side effects of urban regeneration projects, it is important to select and implement urban regeneration types suitable for the characteristics of the area. In order to supplement the limitations of the 'Urban Regeneration New Deal Project' methodology, this study aims to propose a system that recommends urban regeneration types suitable for urban regeneration sites by utilizing various machine learning algorithms, referring to the urban regeneration types of the '2025 Seoul Metropolitan Government Urban Regeneration Strategy Plan' promoted based on regional characteristics. There are four types of urban regeneration in Seoul: "Low-use Low-Level Development, Abandonment, Deteriorated Housing, and Specialization of Historical and Cultural Resources" (Shon and Park, 2017). In order to identify regional characteristics, approximately 100,000 text data were collected for 22 regions where the project was carried out for a total of four types of urban regeneration. Using the collected data, we drew key keywords for each region according to the type of urban regeneration and conducted topic modeling to explore whether there were differences between types. As a result, it was confirmed that a number of topics related to real estate and economy appeared in old residential areas, and in the case of declining and underdeveloped areas, topics reflecting the characteristics of areas where industrial activities were active in the past appeared. In the case of the historical and cultural resource area, since it is an area that contains traces of the past, many keywords related to the government appeared. Therefore, it was possible to confirm political topics and cultural topics resulting from various events. Finally, in the case of low-use and under-developed areas, many topics on real estate and accessibility are emerging, so accessibility is good. It mainly had the characteristics of a region where development is planned or is likely to be developed. Furthermore, a model was implemented that proposes urban regeneration types tailored to regional characteristics for regions other than Seoul. Machine learning technology was used to implement the model, and training data and test data were randomly extracted at an 8:2 ratio and used. In order to compare the performance between various models, the input variables are set in two ways: Count Vector and TF-IDF Vector, and as Classifier, there are 5 types of SVM (Support Vector Machine), Decision Tree, Random Forest, Logistic Regression, and Gradient Boosting. By applying it, performance comparison for a total of 10 models was conducted. The model with the highest performance was the Gradient Boosting method using TF-IDF Vector input data, and the accuracy was 97%. Therefore, the recommendation system proposed in this study is expected to recommend urban regeneration types based on the regional characteristics of new business sites in the process of carrying out urban regeneration projects."

EEG Classification for depression patients using decision tree and possibilistic support vector machines (뇌파의 의사 결정 트리 분석과 가능성 기반 서포트 벡터 머신 분석을 통한 우울증 환자의 분류)

  • Sim, Woo-Hyeon;Lee, Gi-Yeong;Chae, Jeong-Ho;Jeong, Jae-Seung;Lee, Do-Heon
    • Bioinformatics and Biosystems
    • /
    • v.1 no.2
    • /
    • pp.134-138
    • /
    • 2006
  • Depression is the most common and widespread mood disorder. About 20% of the population might suffer a major, incapacitating episode of depression during their lifetime. This disorder can be classified into two types: major depressive disorders and bipolar disorder. Since pharmaceutical treatments are different according to types of depression disorders, correct and fast classification is quite critical for depression patients. Yet, classical statistical method, such as minnesota multiphasic personality inventory (MMPI), have some difficulties in applying to depression patients, because the patients suffer from concentration. We used electroencephalogram (EEG) analysis method fer classification of depression. We extracted nonlinearity of information flows between channels and estimated approximate entropy (ApEn) for the EEG at each channel. Using these attributes, we applied two types of data mining classification methods: decision tree and possibilistic support vector machines (PSVM). We found that decision tree showed 85.19% accuracy and PSVM exhibited 77.78% accuracy for classification of depression, 30 patients with major depressive disorder and 24 patients having bipolar disorder.

  • PDF

NBR-Safe Transform: Lower-Dimensional Transformation of High-Dimensional MBRs in Similar Sequence Matching (MBR-Safe 변환 : 유사 시퀀스 매칭에서 고차원 MBR의 저차원 변환)

  • Moon, Yang-Sae
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.693-707
    • /
    • 2006
  • To improve performance using a multidimensional index in similar sequence matching, we transform a high-dimensional sequence to a low-dimensional sequence, and then construct a low-dimensional MBR that contains multiple transformed sequences. In this paper we propose a formal method that transforms a high-dimensional MBR itself to a low-dimensional MBR, and show that this method significantly reduces the number of lower-dimensional transformations. To achieve this goal, we first formally define the new notion of MBR-safe. We say that a transform is MBR-safe if a low-dimensional MBR to which a high-dimensional MBR is transformed by the transform contains every individual low-dimensional sequence to which a high-dimensional sequence is transformed. We then propose two MBR-safe transforms based on DFT and DCT, the most representative lower-dimensional transformations. For this, we prove the traditional DFT and DCT are not MBR-safe, and define new transforms, called mbrDFT and mbrDCT, by extending DFT and DCT, respectively. We also formally prove these mbrDFT and mbrDCT are MBR-safe. Moreover, we show that mbrDFT(or mbrDCT) is optimal among the DFT-based(or DCT-based) MBR-safe transforms that directly convert a high-dimensional MBR itself into a low-dimensional MBR. Analytical and experimental results show that the proposed mbrDFT and mbrDCT reduce the number of lower-dimensional transformations drastically, and improve performance significantly compared with the $na\"{\i}ve$ transforms. These results indicate that our MBR- safe transforms provides a useful framework for a variety of applications that require the lower-dimensional transformation of high-dimensional MBRs.