• Title/Summary/Keyword: Data Clustering

Search Result 2,754, Processing Time 0.031 seconds

Analyzing the Factors of Gentrification After Gradual Everyday Recovery

  • Yoon-Ah Song;Jeongeun Song;ZoonKy Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.8
    • /
    • pp.175-186
    • /
    • 2023
  • In this paper, we aim to build a gentrification analysis model and examine its characteristics, focusing on the point at which rents rose sharply alongside the recovery of commercial districts after the gradual resumption of daily life. Recently, in Korea, the influence of social distancing measures after the pandemic has led to the formation of small-scale commercial districts, known as 'hot places', rather than large-scale ones. These hot places have gained popularity by leveraging various media and social networking services to attract customers effectively. As a result, with an increase in the floating population, commercial districts have become active, leading to a rapid surge in rents. However, for small business owners, coping with the sudden rise in rent even with increased sales can lead to gentrification, where they might be forced to leave the area. Therefore, in this study, we seek to analyze the periods before and after by identifying points where rents rise sharply as commercial districts experience revitalization. Firstly, we collect text data to explore topics related to gentrification, utilizing LDA topic modeling. Based on this, we gather data at the commercial district level and build a gentrification analysis model to examine its characteristics. We hope that the analysis of gentrification through this model during a time when commercial districts are being revitalized after facing challenges due to the pandemic can contribute to policies supporting small businesses.

Analysis of public library book loan demand according to weather conditions using machine learning (머신러닝을 활용한 기상조건에 따른 공공도서관 도서대출 수요분석)

  • Oh, Min-Ki;Kim, Keun-Wook;Shin, Se-Young;Lee, Jin-Myeong;Jang, Won-Jun
    • Journal of Digital Convergence
    • /
    • v.20 no.3
    • /
    • pp.41-52
    • /
    • 2022
  • Although domestic public libraries achieved quantitative growth based on the 1st and 2nd comprehensive library development plans, there were some qualitative shortcomings, and various studies have been conducted to improve them. Most of the preceding studies have limitations in that they are limited to social and economic factors and statistical analysis. Therefore, in this study, by applying the spatiotemporal concept to quantitatively calculate the decrease in public library loan demand due to rainfall and heatwave, by clustering areas with high demand for book loan due to weather changes and areas where it is not, factors inside and outside public libraries and After the combination, changes in public library loan demand according to weather changes were analyzed. As a result of the analysis, there was a difference in the decrease due to the weather for each public library, and it was found that there were some differences depending on the characteristics and spatial location of the public library. Also, when the temperature was over 35℃, the decrease in book loan demand increased significantly. As internal factors, the number of seats, the number of books, and area were derived. As external factors, the public library access ramp, cafe, reading room, floating population in their teens, and floating population of women in their 30s/40s were analyzed as important variables. The results of this analysis are judged to contribute to the establishment of policies to promote the use of public libraries in consideration of the weather in a specific season, and also suggested limitations of the study.

Analyzing the Relationship between Environmental Consciousness and Railway Choice Behavior (환경의식과 철도이용행동의 관련성 분석)

  • Lee, Jae-Boong;Kim, Hyun;Oh, Seung Hwoon
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.30 no.6D
    • /
    • pp.697-705
    • /
    • 2010
  • The purpose of this research is to clarify the relation between environmental consciousness and railway usage behavior. Author would locate this research on position of basic survey to promote railway use according to Low Carbon Green Growth policy in Korea. In this research, we would perform descriptive analysis using data of research on the actual condition of railway use in 2008, Daegu, and describe its relationship. In addition, we would suggest some idea about policy which can promote railway use. The order of railway choice behavior noticed in clustering of environmental consciousness was cooperative behavior type, middle type and non-cooperative behavior type. It suggests that environmental consciousness has effect on transportation choice behavior. Specially, railway improvement isn't enough to promote railway use. And, it is advisable to carry out the improvement in such a way that it may encourage the nation to move from the current environmental consciousness stage to cooperative behavior. Moreover, we assumed Binary Probit(BP) model using SP data of time or condition of transportation expense compared with passenger car and bus. As the results, modified likelihood ratio of two BP models is favorable variables. And it occurred that mode was transferred from passenger car to railway when it showed higher social environment consciousness and low selfish environment consciousness, because t-statistic which represents selfish environment consciousness showed significance in 95% confidence level. That is, it can be described that environment consciousness affect on the intention of railway use.

Analysis of Research Trends Related to drug Repositioning Based on Machine Learning (머신러닝 기반의 신약 재창출 관련 연구 동향 분석)

  • So Yeon Yoo;Gyoo Gun Lim
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.21-37
    • /
    • 2022
  • Drug repositioning, one of the methods of developing new drugs, is a useful way to discover new indications by allowing drugs that have already been approved for use in people to be used for other purposes. Recently, with the development of machine learning technology, the case of analyzing vast amounts of biological information and using it to develop new drugs is increasing. The use of machine learning technology to drug repositioning will help quickly find effective treatments. Currently, the world is having a difficult time due to a new disease caused by coronavirus (COVID-19), a severe acute respiratory syndrome. Drug repositioning that repurposes drugsthat have already been clinically approved could be an alternative to therapeutics to treat COVID-19 patients. This study intends to examine research trends in the field of drug repositioning using machine learning techniques. In Pub Med, a total of 4,821 papers were collected with the keyword 'Drug Repositioning'using the web scraping technique. After data preprocessing, frequency analysis, LDA-based topic modeling, random forest classification analysis, and prediction performance evaluation were performed on 4,419 papers. Associated words were analyzed based on the Word2vec model, and after reducing the PCA dimension, K-Means clustered to generate labels, and then the structured organization of the literature was visualized using the t-SNE algorithm. Hierarchical clustering was applied to the LDA results and visualized as a heat map. This study identified the research topics related to drug repositioning, and presented a method to derive and visualize meaningful topics from a large amount of literature using a machine learning algorithm. It is expected that it will help to be used as basic data for establishing research or development strategies in the field of drug repositioning in the future.

Analysis of Policy Trends in Convergence Research and Development Using Unstructured Text Data (비정형 텍스트 데이터를 활용한 융합연구개발의 정책 동향 분석 )

  • Jiye Rhee;JaeEun Shin
    • Knowledge Management Research
    • /
    • v.25 no.2
    • /
    • pp.177-191
    • /
    • 2024
  • This study aims to analyze policy changes over time by conducting a textual analysis of the basic plan for activating convergence research and development. By examining the basic plan for convergence research development, this study looks into changes in convergence research policies and suggests future directions, thereby exploring strategic approaches that can contribute to the advancement of science and technology and societal development in our country. In particular, it sought to understand the policy changes proposed by the basic plan by identifying the relevance and trends of topics over time. Various analytical methods such as TF-IDF analysis, topic modeling (LDA), and network (CONCOR) analysis were used to identify the key topics of each period and grasp the trends in policy changes. The analysis revealed clustering of topics by period and changes in topics, providing directions for the convergence research ecosystem and addressing pressing issues. The results of this study are expected to provide important insights to various stakeholders such as governments, businesses, academia, and research institutions, offering new insights into the changes in policies proposed by previous basic plans from a macroscopic perspective.

Improving the Performance of Radiologists Using Artificial Intelligence-Based Detection Support Software for Mammography: A Multi-Reader Study

  • Jeong Hoon Lee;Ki Hwan Kim;Eun Hye Lee;Jong Seok Ahn;Jung Kyu Ryu;Young Mi Park;Gi Won Shin;Young Joong Kim;Hye Young Choi
    • Korean Journal of Radiology
    • /
    • v.23 no.5
    • /
    • pp.505-516
    • /
    • 2022
  • Objective: To evaluate whether artificial intelligence (AI) for detecting breast cancer on mammography can improve the performance and time efficiency of radiologists reading mammograms. Materials and Methods: A commercial deep learning-based software for mammography was validated using external data collected from 200 patients, 100 each with and without breast cancer (40 with benign lesions and 60 without lesions) from one hospital. Ten readers, including five breast specialist radiologists (BSRs) and five general radiologists (GRs), assessed all mammography images using a seven-point scale to rate the likelihood of malignancy in two sessions, with and without the aid of the AI-based software, and the reading time was automatically recorded using a web-based reporting system. Two reading sessions were conducted with a two-month washout period in between. Differences in the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, and reading time between reading with and without AI were analyzed, accounting for data clustering by readers when indicated. Results: The AUROC of the AI alone, BSR (average across five readers), and GR (average across five readers) groups was 0.915 (95% confidence interval, 0.876-0.954), 0.813 (0.756-0.870), and 0.684 (0.616-0.752), respectively. With AI assistance, the AUROC significantly increased to 0.884 (0.840-0.928) and 0.833 (0.779-0.887) in the BSR and GR groups, respectively (p = 0.007 and p < 0.001, respectively). Sensitivity was improved by AI assistance in both groups (74.6% vs. 88.6% in BSR, p < 0.001; 52.1% vs. 79.4% in GR, p < 0.001), but the specificity did not differ significantly (66.6% vs. 66.4% in BSR, p = 0.238; 70.8% vs. 70.0% in GR, p = 0.689). The average reading time pooled across readers was significantly decreased by AI assistance for BSRs (82.73 vs. 73.04 seconds, p < 0.001) but increased in GRs (35.44 vs. 42.52 seconds, p < 0.001). Conclusion: AI-based software improved the performance of radiologists regardless of their experience and affected the reading time.

Discrimination of the drinking water taste by potentiometric electronic tongue and multivariate analysis (전자혀 및 다변량 분석법을 활용한 먹는물의 구별 방법)

  • Eunju Kim;Tae-Mun Hwang;Jae-Wuk Koo;Jaeyong Song;Hongkyeong Park;Sookhyun Nam
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.37 no.6
    • /
    • pp.425-435
    • /
    • 2023
  • Organoleptic parameters such as color, odor, and flavor influence consumer perception of drinking water quality. This study aims to evaluate the taste of the selected bottled and tap water samples using an electronic tongue (E-tongue) instead of a sensory test. Bottled and tap water's mineral components are related to the overall preference for water taste. Contrary to the sensory test, the potentiometric E-tongue method presented in this study distinguishes taste by measuring the mineral components in water, and the data obtained can be statistically analyzed. Eleven bottled water products from various brands and one tap water from I city in Korea were evaluated. The E-tongue data were statistically analyzed using multivariate statistical tools such as hierarchical clustering analysis (HCA), principal component analysis (PCA), and partial least squares discriminant analysis (PLS-DA). The results show that the E-tongue method can clearly distinguish taste discrimination in drinking water differing in water quality based on the ion-related water quality parameters. The water quality parameters that affect taste discrimination were found to be total dissolved solids (TDS), sodium (Na+), calcium (Ca2+), magnesium (Mg2+), sulfate (SO42-), chloride (Cl-), potassium (K+) and pH. The distance calculation of HCA was used to quantify the differences between 12 different types of drinking water. The proposed E-tongue method is a practical tool to quantitatively evaluate the differences between samples in water quality items related to the ionic components. It can be helpful in quality control of drinking water.

A Study on Korean Local Governments' Operation of Participatory Budgeting System : Classification by Support Vector Machine Technique (한국 지방자치단체의 주민참여예산제도 운영에 관한 연구 - Support Vector Machine 기법을 이용한 유형 구분)

  • Junhyun Han;Jaemin Ryou;Jayon Bae;Chunghyeok Im
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.461-466
    • /
    • 2024
  • Korean local governments operates the participatory budgeting system autonomously. This study is to classify these entities into clusters. Among the diverse machine learning methodologies(Neural Network, Rule Induction(CN2), KNN, Decision Tree, Random Forest, Gradient Boosting, SVM, Naïve Bayes), the Support Vector Machine technique emerged as the most efficacious in the analysis of 2022 Korean municipalities data. The first cluster C1 is characterized by minimal committee activity but a substantial allocation of participatory budgeting; another cluster C3 comprises cities that exhibit a passive stance. The majority of cities falls into the final cluster C2 which is noted for its proactive engagement in. Overall, most Korean local government operates the participatory busgeting system in good shape. Only a small number of cities is less active in this system. We anticipate that analyzing time-series data from the past decade in follow-up studies will further enhance the reliability of classifying local government types regarding participatory budgeting.

A Review of Multivariate Analysis Studies Applied for Plant Morphology in Korea (국내 식물 형태 연구에 사용된 다변량분석 논문에 대한 재고)

  • Chang, Kae Sun;Oh, Hana;Kim, Hui;Lee, Heung Soo;Chang, Chin-Sung
    • Journal of Korean Society of Forest Science
    • /
    • v.98 no.3
    • /
    • pp.215-224
    • /
    • 2009
  • A review was given of the role of traditional morphometrics in plant morphological studies using 54 published studies in three major journals and others in Korea, such as Journal of Korean Forestry Society, Korean Journal of Plant Taxonomy, Korean Journal of Breeding, Korean Journal of Apiculture, Journal of Life Science, and Korean Journal of Plant Resources from 1997 to 2008. The two most commonly used techniques of data analysis, cluster analysis (CA) and principal components analysis (PCA) with other statistical tests were discussed. The common problem of PCA is the underlying assumptions of methods, like random sampling and multivariate normal distribution of data. The procedure was intended mainly for continuous data and was not efficient for data which were not well summarized by variances or covariances. Likewise CA was most appropriate for categorical rather than continuous data. Also, the CA produced clusters whether or not natural groupings existed, and the results depended on both the similarity measure chosen and the algorithm used for clustering. An additional problems of the PCA and the CA arised with both qualitative and quantitative data with a limited number of variables and/or too few numbers of samples. Some of these problems may be avoided if a certain number of variables (more than 20 at least) and sufficient samples (40-50 at least) are considered for morphometric analyses, but we do not think that the methods are all mighty tools for data analysts. Instead, we do believe that reasonable applications combined with focus on objectives and limitations of each procedure would be a step forward.

Price Volatility, Seasonality and Day-of-the Week Effect for Aquacultural Fishes in Korean Fishery Markets (수산물 시장에서의 양식 어류 가격변동성.계절성.요일효과에 관한 연구 - 노량진수산시장의 넙치와 조피볼락을 중심으로 -)

  • Ko, Bong-Hyun
    • The Journal of Fisheries Business Administration
    • /
    • v.40 no.2
    • /
    • pp.49-70
    • /
    • 2009
  • This study proviedes GARCH model(Bollerslev, 1986) to analyze the structural characteristics of price volatility in domestic aquacultural fish market of Korea. As a case study, flatfish and rock-fish are analyzed as major species with relatively high portion in an aspect of production volume among fish captured in Korea. For analyzing, this study uses daily market data (dating from Jan 1 2000 to June 30, 2008) published by the Noryangjin Fisheries Wholesale Market which is located in Seoul of Korea. This study performs normality test on trading volume and price volatility of flatfish and rock-fish as an advanced empirical approach. The normality test adopted is Jarque-Bera test statistic. As a result, first, a null hypothesis that "an empirical distribution follows normal distribution" was rejected in both fishes. The distribution of daily market data of them were not only biased toward positive(+) direction in terms of kurtosis and skewness, but also characterized by leptokurtic distribution with long right tail. Secondly, serial correlations were found in data on market trading volume and price volatility of two species during very long period. Thirdly, the results of unit root test and ARCH-LM test showed that all data of time series were very stationary and demonstrated effects of ARCH. These statistical characteristics can be explained as a reasonable ground for supporting the fitness of GARCH model in order to estimate conditional variances that reveal price volatility in empirical analysis. From empirical data analysis above, this study drew the following conclusions. First of all, from an empirical analysis on potential effects of seasonality and the day of week on price volatility of aquacultural fish, Monday effects were found in both species and Thursday and Friday effects were also found in flatfish. This indicates that Monday is effective in expanding price volatility of aquacultural fish market and also Monday has higher effects upon the price volatility of fish than other days of week have since it has more new information for weekend. Secondly, the empirical analysis led to a common conclusion that there was very high price volatility of flatfish and rock-fish. This points out that the persistency parameter($\lambda$), an index of possibility for current volatility to sustain similarly in the future, was higher than 0.8-equivalently nearly to 1-in both flatfish and rock-fish, which presents volatility clustering. Also, this study estimated and compared and model that hypothesized normal distributions in order to determine fitness of respective models. As a result, the fitness of GARCH(1, 1)-t model was better than model where the distribution of error term was hypothesized through-distribution due to characteristics of fat-tailed distribution, was also better than model, as described in the results of basic statistic analysis. In conclusion, this study has an important mean in that it was introduced firstly in Korea to investigate in price volatility of Korean aquacultural fishery products, although there was partially a limited of official statistic data. Therefore, it is expected that the results of this study will be useful as a reference material for making and assessing governmental policies. Also, it is looked forward that the results will be helpful to build a fishery business plan as and aspect of producer, and also to take timely measures to potential price fluctuations of fishery products in market. Hence, it is advisable that further studies related to such price volatility in fishery market will extend and evolve into a wider variety of articles and issues in near future.

  • PDF