• 제목/요약/키워드: Large Data Set

검색결과 1,063건 처리시간 0.025초

A Case Study: Unsupervised Approach for Tourist Profile Analysis by K-means Clustering in Turkey

  • Yildirim, Mustafa Eren;Kaya, Murat;FurkanInce, Ibrahim
    • 인터넷정보학회논문지
    • /
    • 제23권1호
    • /
    • pp.11-17
    • /
    • 2022
  • Data mining is the task of accessing useful information from a large capacity of data. It can also be referred to as searching for correlations that can provide clues about the future in large data warehouses by using computer algorithms. It has been used in the tourism field for marketing, analysis, and business improvement purposes. This study aims to analyze the tourist profile in Turkey through data mining methods. The reason relies behind the selection of Turkey is the fact that Turkey welcomes millions of tourist every year which can be a role model for other touristic countries. In this study, an anonymous and large-scale data set was used under the law on the protection of personal data. The dataset was taken from a leading tourism company that is still active in Turkey. By using the k-means clustering algorithm on this data, key parameters of profiles were obtained and people were clustered into groups according to their characteristics. According to the outcomes, distinguishing characteristics are gathered under three main titles. These are the age of the tourists, the frequency of their vacations and the period between the reservation and the vacation itself. The results obtained show that the frequency of tourist vacations, the time between bookings and vacations, and age are the most important and characteristic parameters for a tourist's profile. Finally, planning future investments, events and campaign packages can make tourism companies more competitive and improve quality of service. For both businesses and tourists, it is advantageous to prepare individual events and offers for the three major groups of tourists.

집합 결합과 신경망을 이용한 복합질환의 예측 (A Prediction Model for Complex Diseases using Set Association & Artificial Neural Network)

  • 최현주;김승현;위규범
    • 정보처리학회논문지B
    • /
    • 제15B권4호
    • /
    • pp.323-330
    • /
    • 2008
  • 복합질환은 다수의 유전자들이 상호작용하여 유발되는 질병으로서, 여러 유전자들이 관여한다는 복잡성 때문에 전통적인 분석 방법을 적용하는데 한계가 있다. 최근에는 기계학습 기법을 이용한 새로운 분석 방법들이 제안되고 있다. 신경망은 이처럼 복잡한 데이터에서 일정한 패턴을 찾아 이를 분류하는데 적합한 모델이다. 그러나 다량의 데이터가 입력으로 들어오는 경우에 학습에 오랜 시간이 걸리고 패턴을 찾기가 어려워지는 단점이 있다. 본 연구에서는 다량의 SNP 데이터로부터 질병에 연관된 소수의 중요 SNP을 찾기 위한 통계학적인 방법인 집합결합(set association)과 신경망을 결합한 모델을 제시한다. 이 모델을 천식 관련 SNP 데이터에 적용하여 천식 발병 여부를 예측한 결과, 신경망만 사용했을 때보다 실행 시간도 빠르고 예측 정확도도 높았다. 이 모델은 다른 복합질환의 예측에도 효과적으로 사용할 수 있을 것으로 기대한다.

SURFACE RECONSTRUCTION FROM SCATTERED POINT DATA ON OCTREE

  • Park, Chang-Soo;Min, Cho-Hon;Kang, Myung-Joo
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • 제16권1호
    • /
    • pp.31-49
    • /
    • 2012
  • In this paper, we propose a very efficient method which reconstructs the high resolution surface from a set of unorganized points. Our method is based on the level set method using adaptive octree. We start with the surface reconstruction model proposed in [20]. In [20], they introduced a very fast and efficient method which is different from the previous methods using the level set method. Most existing methods[21, 22] employed the time evolving process from an initial surface to point cloud. But in [20], they considered the surface reconstruction process as an elliptic problem in the narrow band including point cloud. So they could obtain very speedy method because they didn't have to limit the time evolution step by the finite speed of propagation. However, they implemented that model just on the uniform grid. So they still have the weakness that it needs so much memories because of being fulfilled only on the uniform grid. Their algorithm basically solves a large linear system of which size is the same as the number of the grid in a narrow band. Besides, it is not easy to make the width of band narrow enough since the decision of band width depends on the distribution of point data. After all, as far as it is implemented on the uniform grid, it is almost impossible to generate the surface on the high resolution because the memory requirement increases geometrically. We resolve it by adapting octree data structure[12, 11] to our problem and by introducing a new redistancing algorithm which is different from the existing one[19].

여수연안 정치망어장의 환경요인과 어황변동에 관한 연구 - 4 . 수온 염분과 어획량 변동 - (Environmental Factors and Catch Fluctuation of Set Net Grounds in the Coastal Waters of Yosu - 4 . Water Temperature and Salinity and Fluctuation of Catch -)

  • 김동수;노홍길
    • 수산해양기술연구
    • /
    • 제32권2호
    • /
    • pp.125-131
    • /
    • 1996
  • In order to investigate the relation between the environmental properties and the catch fluctuation of set net fishing grounds located in the coastal waters of Yosu, oceanographic observations on the fishing grounds were carried out by the training ship ofYosu Fisheries University from January, 1990 to September, 1992, and the data obtained were compared with the catch data from the joint market ofYosu fisheries cooperative society from 1984 to 1993. The resuItes obtained are summerized as follows : 1. The ranges of water temperature and salinity in the fishing ground was 7.0 to $27^{\circ}C.$and 26.6 to 33.2${\textperthousand}$, and water temperature increased from March to August and decreased from September to February of following year. 2. The salinity in the fishing grounds was relatively high without significant changes from November to June of the following year. From July, however, the salinity decreased to continue a low value till September and then increased. The salinity in the fishing ground was dominated mainly by the precipitation and its variation was large at the north entrance of set net fishing ground, influenced greatly by the land waters from the river of Somjin, but small in the offshore of the fishing grounds. 3. The fishes caught by the set nets were arranged in order of catch as follows; Spanish mackerel> Horse mackerel > Sardine > Anchovy > Hair tail. The catches of Anchovy and Sardine were high in April to May and those of Hair tail in June to July, but Spanish mackerel and Horse mackerel were caught for whole period of fishing. Spanish mackerel was caught most in September and least in April and their means were largest in August and smallest in June. 4. The ranges of optimum water temperature for fishing by the set nets was 13.5 to $25^{\circ}C.$, and in the ranges the catches increased with increasing temperature. The ranges of optimum salinity for fishing varied between 25.0 and 32.0${\textperthousand}$.

  • PDF

Stability evaluation model for loess deposits based on PCA-PNN

  • Li, Guangkun;Su, Maoxin;Xue, Yiguo;Song, Qian;Qiu, Daohong;Fu, Kang;Wang, Peng
    • Geomechanics and Engineering
    • /
    • 제27권6호
    • /
    • pp.551-560
    • /
    • 2021
  • Due to the low strength and high compressibility characteristics, the loess deposits tunnels are prone to large deformations and collapse. An accurate stability evaluation for loess deposits is of considerable significance in deformation control and safety work during tunnel construction. 37 groups of representative data based on real loess deposits cases were adopted to establish the stability evaluation model for the tunnel project in Yan'an, China. Physical and mechanical indices, including water content, cohesion, internal friction angle, elastic modulus, and poisson ratio are selected as index system on the stability level of loess. The data set is randomly divided into 80% as the training set and 20% as the test set. Firstly, principal component analysis (PCA) is used to convert the five index system to three linearly independent principal components X1, X2 and X3. Then, the principal components were used as input vectors for probabilistic neural network (PNN) to map the nonlinear relationship between the index system and stability level of loess. Furthermore, Leave-One-Out cross validation was applied for the training set to find the suitable smoothing factor. At last, the established model with the target smoothing factor 0.04 was applied for the test set, and a 100% prediction accuracy rate was obtained. This intelligent classification method for loess deposits can be easily conducted, which has wide potential applications in evaluating loess deposits.

광역 시계열 원격탐사자료 분석의 특성과 응용 (Characteristics and Application of Large-area Multi-temporal Remote Sensing Data)

  • 성정창
    • 대한원격탐사학회지
    • /
    • 제16권1호
    • /
    • pp.1-11
    • /
    • 2000
  • 시계열 자료의 분석은 분광대에 기초한 분석과는 달리 생태계의 동적특성 연구에 자주 이용되어왔다. 그러나 시계열 자료의 처리가 갖는 문제점과 대륙이나 전세계를 대상으로한 광역자료가 갖는 문제점에 대하여 해결방안을 제시한 연구는 미미하다. 이 연구에서는 광역 시계열 자료 분석의 특징들을 살펴본 후, 지역간 식생성장패턴의 차이와 검정자료 화보의 어려움을 지적하였다 이들 문제에 대한 해결방안으로 위도별 화상분할기법과 불변화소의 이용법을 제시하였다. 사례연구로 아시아지역의 일부를 대상으로 1982년에서 1993년까지의 AVHRR 자료를 이용하여 화상분류를 실시하였다. 불변화소들은 한 시점의 검정자료 정보를 다른 시점으로 확대 적용을 가능케하여, 다른 시점에 대해서도 충분한 양의 검정자료 정보를 확보할 수 있었으며, 위도별 화상분할을 통하여 지역간 식생성장패턴의 차이를 연구에 포함시킬 수 있었다. 퍼지화상분류를 통한 사례연구는 또한 인구밀집 지역에서의 삼림의 감소와 경작지의 증가 추세를 보여주었으며, 인구 희소지역에서의 반대패턴을 보여주었다.

웨이블릿-몬테 카를로법을 이용한 가상 직물이미지의 모사 (Wavelet-Monte Carlo Simulation for Virtual Fabric Imaging)

  • Joo-Yong, Kim
    • 감성과학
    • /
    • 제7권3호
    • /
    • pp.1-6
    • /
    • 2004
  • 제한된 방적사의 두께신호로부터 유사구조를 갖는 다량의 데이터를 합성하는 방법이 개발되었다. 모사된 방적사의 두께신호는 넵 등의 발생횟수는 물론 발생위치에 관한 정보를 원래의 데이터와 유사하게 포함하고 있으므로, 이를 통해 모사한 직물이미지의 외관도 원래의 것과 크게 다르지 않은 특징을 가지고 있다. 몬테 카를로 모사법과 웨이블릿 변환을 결합한 알고리듬을 결합하여 개발된 방법은 넵, 두꺼운 부분, 가는 부분 등 방적사 외관상의 특징을 나타내는 요소들을 그대로 유지함으로써, 가상의 직물이미지를 모사하는 데 매우 효과적인 방법을 제공한다.

  • PDF

Far-infrared Study of Supernova Remnants in the Large Megellanic Cloud

  • 김예솔;구본철;석지연
    • 천문학회보
    • /
    • 제38권1호
    • /
    • pp.53-53
    • /
    • 2013
  • We present preliminary results of far-infrared(FIR) study of the supernova remnant(SNR)s in the Large Magellanic Cloud using the Herschel HERITAGE (HERschel Inventory of The Agents of Galaxy Evolution) data set. HERITAGE provides FIR data covering the entire LMC at 100,160, 250, 350, and 500 um. In order to confirm FIR emission associated with SNRs, we refer to Magellanic Cloud Emission-Line Survey (MCELS) H-alpha & SII data, Spitzer surveying the Agents of a Galaxy's Evolution (SAGE) Multiband Imaging Photometer (MIPS) 24um & 70um data, Chandra Supernova Remnants Catalog, and ATCA 4.8GHz continuum images of Dickel et al. (2005). Among 47 SNRs in the LMC, 7 SNRs show associated FIR emission. We present multi-wavelength view of 5 SNRs; DEM L249, N49, N63A, N132D, and the SNR in N4. N49 and N132D show morphological correlation in FIR and X-ray, suggesting that the FIR emission is from dust grains collisionally heated by X-ray emitting plasma. The FIR emission of N63A resembles H-alpha emission, which implies that the FIR line radiation could be dominant. The FIR images of the rest two objects, DEM L249 and SNR in N4, show no correlation to the other-waveband images.

  • PDF

Genome-Wide Association Studies of the Korea Association REsource (KARE) Consortium

  • Hong, Kyung-Won;Kim, Hyung-Lae;Oh, Berm-Seok
    • Genomics & Informatics
    • /
    • 제8권3호
    • /
    • pp.101-102
    • /
    • 2010
  • During the last decade, large community cohorts have been established by the Korea National Institutes of Health (KNIH), and enormous epidemiological and clinical data have been accumulated. Using these information and samples in the cohorts, KNIH set out to do a large-scale genome-wide association study (GWAS) in 2007, and the Korea Association REsource (KARE) consortium was launched to analyze the data to identify the underlying genetic risk factors of diseases and diverse health indexes, such as blood pressure, obesity, bone density, and blood biochemical traits. The consortium consisted of 6 research divisions, formed by 25 principal investigators in 19 organizations, including 18 universities, 2 institutes, and 1 company. Each division focused on one of the following subjects: the identification of genetic factors, the statistical analysis of gene-gene interactions, the genetic epidemiology of gene-environment interactions, copy number variation, the bioinformatics related to a GWAS, and a GWAS of nutrigenomics. In this special issue, the study results of the KARE consortium are provided as 9 articles. We hope that this special issue might encourage the genomics community to share data and scientists, including clinicians, to analyze the valuable Korean data of KARE.

Numerical simulation of set-up around shaft of XCC pile in clay

  • Liu, Fei;Yi, Jiangtao;Cheng, Po;Yao, Kai
    • Geomechanics and Engineering
    • /
    • 제21권5호
    • /
    • pp.489-501
    • /
    • 2020
  • This paper conducts a complicated coupled effective stress analysis of X-section-in-place concrete (XCC) pile installation and consolidation processes using the dual-stage Eulerian-Lagrangian (DSEL) technique incorporating the modified Cam-clay model. The numerical model is verified by centrifuge data and field test results. The main objective of this study is to investigate the shape effect of XCC pile cross-section on radial total stress, excess pore pressure and time-dependent strength. The discrepancies of the penetration mechanism and set-up effects on pile shaft resistance between the XCC pile and circular pile are discussed. Particular attention is placed on the time-dependent strength around the XCC pile shaft. The results show that soil strength improved more significantly close to the flat side compared with the concave side. Additionally, the computed ultimate shaft resistance of XCC pile incorporating set-up effects is 1.45 times that of the circular pile. The present findings are likely helpful in facilitating the incorporation of set-up effects into XCC pile design practices.