• Title/Summary/Keyword: Term Clustering

Search Result 177, Processing Time 0.022 seconds

A study on the establishment and development of the Daesoon Thought Thesaurus (대순사상 시소러스의 구축에 관한 연구)

  • Lee, Sang-Bok;Jang, In-Ho
    • Journal of the Daesoon Academy of Sciences
    • /
    • v.19
    • /
    • pp.21-45
    • /
    • 2005
  • The purpose of this study is to examine the establishment and development of Daesoon Thought Thesaurus. Specifically, this study examined the matters to be considered in the stage of Thesauri planning according to the Thesauri Construction process : presents the methods and standards of Thesauri Construction according to processes such as identification of the indexing policy, establishment of Thesauri system, collection of vocabulary, selection of preferred term, clustering of the terms, establishment of term relationships, overall adjustment, Thesauri test, proofreading by professional display, maintenance and updating. Since religion information is unique or totally different from the information in other areas, it is most important to construct the Thesauri suitable for system after carefully recognizing the concept of religion terms.

  • PDF

Feature-selection algorithm based on genetic algorithms using unstructured data for attack mail identification (공격 메일 식별을 위한 비정형 데이터를 사용한 유전자 알고리즘 기반의 특징선택 알고리즘)

  • Hong, Sung-Sam;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.1-10
    • /
    • 2019
  • Since big-data text mining extracts many features and data, clustering and classification can result in high computational complexity and low reliability of the analysis results. In particular, a term document matrix obtained through text mining represents term-document features, but produces a sparse matrix. We designed an advanced genetic algorithm (GA) to extract features in text mining for detection model. Term frequency inverse document frequency (TF-IDF) is used to reflect the document-term relationships in feature extraction. Through a repetitive process, a predetermined number of features are selected. And, we used the sparsity score to improve the performance of detection model. If a spam mail data set has the high sparsity, detection model have low performance and is difficult to search the optimization detection model. In addition, we find a low sparsity model that have also high TF-IDF score by using s(F) where the numerator in fitness function. We also verified its performance by applying the proposed algorithm to text classification. As a result, we have found that our algorithm shows higher performance (speed and accuracy) in attack mail classification.

A study on solar radiation prediction using medium-range weather forecasts (중기예보를 이용한 태양광 일사량 예측 연구)

  • Sujin Park;Hyojeoung Kim;Sahm Kim
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.1
    • /
    • pp.49-62
    • /
    • 2023
  • Solar energy, which is rapidly increasing in proportion, is being continuously developed and invested. As the installation of new and renewable energy policy green new deal and home solar panels increases, the supply of solar energy in Korea is gradually expanding, and research on accurate demand prediction of power generation is actively underway. In addition, the importance of solar radiation prediction was identified in that solar radiation prediction is acting as a factor that most influences power generation demand prediction. In addition, this study can confirm the biggest difference in that it attempted to predict solar radiation using medium-term forecast weather data not used in previous studies. In this paper, we combined the multi-linear regression model, KNN, random fores, and SVR model and the clustering technique, K-means, to predict solar radiation by hour, by calculating the probability density function for each cluster. Before using medium-term forecast data, mean absolute error (MAE) and root mean squared error (RMSE) were used as indicators to compare model prediction results. The data were converted into daily data according to the medium-term forecast data format from March 1, 2017 to February 28, 2022. As a result of comparing the predictive performance of the model, the method showed the best performance by predicting daily solar radiation with random forest, classifying dates with similar climate factors, and calculating the probability density function of solar radiation by cluster. In addition, when the prediction results were checked after fitting the model to the medium-term forecast data using this methodology, it was confirmed that the prediction error increased by date. This seems to be due to a prediction error in the mid-term forecast weather data. In future studies, among the weather factors that can be used in the mid-term forecast data, studies that add exogenous variables such as precipitation or apply time series clustering techniques should be conducted.

A Theoretical Study on Indexing Methods using the Metadata for the Automatic Construction of a Thesaurus Browser (시소러스 브라우저 자동구현을 위한 Metadata를 이용한 색인어 처리방안에 대한 연구)

  • Seo , Whee
    • Journal of Korean Library and Information Science Society
    • /
    • v.35 no.4
    • /
    • pp.451-467
    • /
    • 2004
  • This paper is intended to present the theoretical analyses on automatic indexing, which is vital in the process of constructing a thesaurus browser, and clustering algorithms to construct hierarchical relations among terms as well as the methods for the automatic construction of a thesaurus browser. The methods to select the index term automatically in the web documents are studied by surveying the methods for analyzing and processing metadata which conforms to bibliographical roles of traditional paper documents in web documents. Also, the result of the study suggests to adding or involving the metadata in web documents, using the metadata automatic editor because metadata is not listed in most of the web documents.

  • PDF

An Analysis of Indications of Meridians in DongUiBoGam Using Data Mining (데이터마이닝을 이용한 동의보감에서 경락의 주치특성 분석)

  • Chae, Younbyoung;Ryu, Yeonhee;Jung, Won-Mo
    • Korean Journal of Acupuncture
    • /
    • v.36 no.4
    • /
    • pp.292-299
    • /
    • 2019
  • Objectives : DongUiBoGam is one of the representative medical literatures in Korea. We used text mining methods and analyzed the characteristics of the indications of each meridian in the second chapter of DongUiBoGam, WaeHyeong, which addresses external body elements. We also visualized the relationships between the meridians and the disease sites. Methods : Using the term frequency-inverse document frequency (TF-IDF) method, we quantified values regarding the indications of each meridian according to the frequency of the occurrences of 14 meridians and 14 disease sites. The spatial patterns of the indications of each meridian were visualized on a human body template according to the TF-IDF values. Using hierarchical clustering methods, twelve meridians were clustered into four groups based on the TF-IDF distributions of each meridian. Results : TF-IDF values of each meridian showed different constellation patterns at different disease sites. The spatial patterns of the indications of each meridian were similar to the route of the corresponding meridian. Conclusions : The present study identified spatial patterns between meridians and disease sites. These findings suggest that the constellations of the indications of meridians are primarily associated with the lines of the meridian system. We strongly believe that these findings will further the current understanding of indications of acupoints and meridians.

Optimization of Long-term Generator Maintenance Scheduling considering Network Congestion and Equivalent Operating Hours (송전제약과 등가운전시간을 고려한 장기 예방정비계획 최적화에 관한 연구)

  • Shin, Hansol;Kim, Hyoungtae;Lee, Sungwoo;Kim, Wook
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.2
    • /
    • pp.305-314
    • /
    • 2017
  • Most of the existing researches on systemwide optimization of generator maintenance scheduling do not consider the equivalent operating hours(EOHs) mainly due to the difficulties of calculating the EOHs of the CCGTs in the large scale system. In order to estimate the EOHs not only the operating hours but also the number of start-up/shutdown during the planning period should be estimated, which requires the mathematical model to incorporate the economic dispatch model and unit commitment model. The model is inherently modelled as a large scale mixed-integer nonlinear programming problem and the computation time increases exponentially and intractable as the system size grows. To make the problem tractable, this paper proposes an EOH calculation based on demand grouping by K-means clustering algorithm. Network congestion is also considered in order to improve the accuracy of EOH calculation. This proposed method is applied to the actual Korean electricity market and compared to other existing methods.

Topic-based Multi-document Summarization Using Non-negative Matrix Factorization and K-means (비음수 행렬 분해와 K-means를 이용한 주제기반의 다중문서요약)

  • Park, Sun;Lee, Ju-Hong
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.4
    • /
    • pp.255-264
    • /
    • 2008
  • This paper proposes a novel method using K-means and Non-negative matrix factorization (NMF) for topic -based multi-document summarization. NMF decomposes weighted term by sentence matrix into two sparse non-negative matrices: semantic feature matrix and semantic variable matrix. Obtained semantic features are comprehensible intuitively. Weighted similarity between topic and semantic features can prevent meaningless sentences that are similar to a topic from being selected. K-means clustering removes noises from sentences so that biased semantics of documents are not reflected to summaries. Besides, coherence of document summaries can be enhanced by arranging selected sentences in the order of their ranks. The experimental results show that the proposed method achieves better performance than other methods.

Atomistic simulations of defect accumulation and evolution in heavily irradiated titanium for nuclear-powered spacecraft

  • Hai Huang;Xiaoting Yuan;Longjingrui Ma;Jiwei Lin;Guopeng Zhang;Bin Cai
    • Nuclear Engineering and Technology
    • /
    • v.55 no.6
    • /
    • pp.2298-2304
    • /
    • 2023
  • Titanium alloys are expected to become one of the candidate materials for nuclear-powered spacecraft due to their excellent overall performance. Nevertheless, atomistic mechanisms of the defect accumulation and evolution of the materials due to long-term exposure to irradiation remain scarcely understood by far. Here we investigate the heavy irradiation damage in a-titanium with a dose as high as 4.0 canonical displacements per atom (cDPA) using atomistic simulations of Frenkel pair accumulation. Results show that the content of surviving defects increases sharply before 0.04 cDPA and then decreases slowly to stabilize, exhibiting a strong correlation with the system energy. Under the current simulation conditions, the defect clustering fraction may be not directly dependent on the irradiation dose. Compared to vacancies, interstitials are more likely to form clusters, which may further cause the formation of 1/3<1210> interstitial-type dislocation loops extended along the (1010) plane. This study provides an important insight into the understanding of the irradiation damage behaviors for titanium.

Indoor Environment Drone Detection through DBSCAN and Deep Learning

  • Ha Tran Thi;Hien Pham The;Yun-Seok Mun;Ic-Pyo Hong
    • Journal of IKEEE
    • /
    • v.27 no.4
    • /
    • pp.439-449
    • /
    • 2023
  • In an era marked by the increasing use of drones and the growing demand for indoor surveillance, the development of a robust application for detecting and tracking both drones and humans within indoor spaces becomes imperative. This study presents an innovative application that uses FMCW radar to detect human and drone motions from the cloud point. At the outset, the DBSCAN (Density-based Spatial Clustering of Applications with Noise) algorithm is utilized to categorize cloud points into distinct groups, each representing the objects present in the tracking area. Notably, this algorithm demonstrates remarkable efficiency, particularly in clustering drone point clouds, achieving an impressive accuracy of up to 92.8%. Subsequently, the clusters are discerned and classified into either humans or drones by employing a deep learning model. A trio of models, including Deep Neural Network (DNN), Residual Network (ResNet), and Long Short-Term Memory (LSTM), are applied, and the outcomes reveal that the ResNet model achieves the highest accuracy. It attains an impressive 98.62% accuracy for identifying drone clusters and a noteworthy 96.75% accuracy for human clusters.

Mitochondrial DNA-based investigation of dead rorqual (Cetacea: Balaenopteridae) from the west coast of India

  • Shantanu Kundu;Manokaran Kamalakannan;Dhriti Banerjee;Flandrianto Sih Palimirmo;Arif Wibowo;Hyun-Woo Kim
    • Fisheries and Aquatic Sciences
    • /
    • v.27 no.1
    • /
    • pp.48-55
    • /
    • 2024
  • The study assessed the utility of mitochondrial DNA for identifying a deceased rorqual discovered off the western coast of India. Both the COI and Cytb genes exhibited remarkable 99-100% similarity with the GenBank sequence of Balaenoptera musculus through a global BLAST search, confirming their affiliation with this species. Inter-species genetic distances for COI and Cytb genes ranged from 6.75% to 9.80% and 7.37% to 10.96% respectively, compared with other Balaenopteridae species. The Bayesian phylogenies constructed based on both COI and Cytb genes demonstrated clear and separate clustering for all Balaenopteridae species, further reaffirming their distinctiveness, while concurrently revealing a cohesive clustering pattern of the generated sequences within the B. musculus clade. Beyond species confirmation, this study provides valuable insights into the presence of live and deceased B. musculus individuals within Indian marine ecosystems. This information holds significant potential for guiding conservation efforts aimed at safeguarding Important Marine Mammal Areas (IMMAs) in India over the long term.