• Title/Summary/Keyword: 문헌 군집화

Search Result 50, Processing Time 0.027 seconds

Designing a FRBR Work Grouping Algorithm of Bibliographic Records using a Role Term Dictionary of Authors (저자역할용어사전 구축 및 저작군집화에 관한 연구)

  • Yun, Jaehyuk;Do, Seulki;Oh, Sam G.
    • Journal of the Korean Society for information Management
    • /
    • v.37 no.2
    • /
    • pp.197-223
    • /
    • 2020
  • The purpose of this study is to analyze the issues resulted from the process of grouping KORMARC records using FRBR WORK concept and to suggest a new method. The previous studies did not sufficiently address the criteria or processes for identifying representative authors of records and their derivatives. Therefore, our study focused on devising a method of identifying the representative author when there are multiple contributors in a work. The study developed a method of identifying representative authors using an author role dictionary constructed by extracting role-terms from the statement of responsibility field (245). We also designed another way to group records as a work by calculating similarity measures of authors and titles. The accuracy rate of WORK grouping was the highest when blank spaces, parentheses, and controling processes were removed from titles and the measured similarity rates of authors and titles were higher than 80 percent. This was an experiment study where we developed an author-role dictionary that can be utilized in selecting a representative author and measured the similarity rate of authors and titles in order to achieve effective WORK grouping of KORMARC records. The future study will attempt to devise a way to improve the similarity measure of titles, incorporate FRBR Group 1 entities such as expression, manifestation and item data into the algorithm, and a method of improving the algorithm by utilizing other forms of MARC data that are widely used in Korea.

A Research for Clustering of Conflict in Public Construction Project (군집분석을 통한 공공 건설사업 갈등 유형화 연구)

  • Lee, Jiseop;Kim, Doyun;Lee, Changjun;Lee, Jeonghun;Han, Seungheon
    • Korean Journal of Construction Engineering and Management
    • /
    • v.19 no.2
    • /
    • pp.61-72
    • /
    • 2018
  • Conflicts in public construction projects lead to increase social costs as well as construction costs and schedule delay. Therefore, it is important to evaluate the conflict in construction project and find appropriate solutions based on previous cases. In this research, the conflict factors and criteria for evaluating conflict are derived and 30 cases are evaluated by 11 conflict experts. Using k-means clustering, the cases are clustered by three clusters. The cases were analyzed according to the characteristics of each cluster and labeled as 'NIMBY and harmful facility conflict cluster', 'environmental and pollution conflict cluster', and 'PIMFY and small conflicts'. In the future, when conflict occurs in the public construction projects, the conflict can be evaluated using this clustering and the characteristics of the conflicts can be found. As a result, it will be helpful to mitigate the conflict quickly and effectively by looking for previous cases that are suitable for resolving the conflict through appropriate clusters.

A Study of Computational Literature Analysis based Classification for a Pairwise Comparison by Contents Similarity in a section of Tokkijeon, 'Fish Tribe Conference' (컴퓨터 문헌 분석 기반의 토끼전 '어족회의' 대목 내용 유사도에 따른 이본 계통 분류 연구)

  • Kim, Dong-Keon;Jeong, Hwa-Young
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.5
    • /
    • pp.15-25
    • /
    • 2022
  • This study aims to identify the family and lineage of a part of a "Fish Tribe Conference" in the section Tokkijeon by utilizing computer literature analysis techniques. First of all, we encode the classification for a pairwise comparison's type of each paragraph to build a corpus, and based on this, we use the Hamming distance to calculate the distance matrix between each classification for a pairwise comparison's. We visualized classification for a pairwise comparison's clustering pattern by applying multidimensional scale method, and hierarchical clustering to explore the characteristics of the 'fish family' line and lineage compared to the existing cluster analysis study on entire paragraphs of "Tokkijeon". As a result, unlike the cluster analysis of the entire paragraph of "Tokkijeon", which consists of six categories, the "Fish Tribe Conference" section has five categories and some classification for a pairwise comparison's accesses. The results of this study are that the relative distance between Yibon was measured and systematic classification was performed in an objective and empirical way by calculation, and the characteristics of the line of the fish family were revealed compared to the analysis of the entire rabbit exhibition.

Analysis of the abstracts of research articles in food related to climate change using a text-mining algorithm (텍스트 마이닝 기법을 활용한 기후변화관련 식품분야 논문초록 분석)

  • Bae, Kyu Yong;Park, Ju-Hyun;Kim, Jeong Seon;Lee, Yung-Seop
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1429-1437
    • /
    • 2013
  • Research articles in food related to climate change were analyzed by implementing a text-mining algorithm, which is one of nonstructural data analysis tools in big data analysis with a focus on frequencies of terms appearing in the abstracts. As a first step, a term-document matrix was established, followed by implementing a hierarchical clustering algorithm based on dissimilarities among the selected terms and expertise in the field to classify the documents under consideration into a few labeled groups. Through this research, we were able to find out important topics appearing in the field of food related to climate change and their trends over past years. It is expected that the results of the article can be utilized for future research to make systematic responses and adaptation to climate change.

A Comparative Study on Clustering Methods for Grouping Related Tags (연관 태그의 군집화를 위한 클러스터링 기법 비교 연구)

  • Han, Seung-Hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.3
    • /
    • pp.399-416
    • /
    • 2009
  • In this study, clustering methods with related tags were discussed for improving search and exploration in the tag space. The experiments were performed on 10 Delicious tags and the strongly-related tags extracted by each 300 documents, and hierarchical and non-hierarchical clustering methods were carried out based on the tag co-occurrences. To evaluate the experimental results, cluster relevance was measured. Results showed that Ward's method with cosine coefficient, which shows good performance to term clustering, was best performed with consistent clustering tendency. Furthermore, it was analyzed that cluster membership among related tags is based on users' tagging purposes or interest and can disambiguate word sense. Therefore, tag clusters would be helpful for improving search and exploration in the tag space.

Towards Next Generation Multimedia Information Retrieval by Analyzing User-centered Image Access and Use (이용자 중심의 이미지 접근과 이용 분석을 통한 차세대 멀티미디어 검색 패러다임 요소에 관한 연구)

  • Chung, EunKyung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.51 no.4
    • /
    • pp.121-138
    • /
    • 2017
  • As information users seek multimedia with a wide variety of information needs, information environments for multimedia have been developed drastically. More specifically, as seeking multimedia with emotional access points has been popular, the needs for indexing in terms of abstract concepts including emotions have grown. This study aims to analyze the index terms extracted from Getty Image Bank. Five basic emotion terms, which are sadness, love, horror, happiness, anger, were used when collected the indexing terms. A total 22,675 index terms were used for this study. The data are three sets; entire emotion, positive emotion, and negative emotion. For these three data sets, co-word occurrence matrices were created and visualized in weighted network with PNNC clusters. The entire emotion network demonstrates three clusters and 20 sub-clusters. On the other hand, positive emotion network and negative emotion network show 10 clusters, respectively. The results point out three elements for next generation of multimedia retrieval: (1) the analysis on index terms for emotions shown in people on image, (2) the relationship between connotative term and denotative term and possibility for inferring connotative terms from denotative terms using the relationship, and (3) the significance of thesaurus on connotative term in order to expand related terms or synonyms for better access points.

Automatic Clustering of Same-Name Authors Using Full-text of Articles (논문 원문을 이용한 동명 저자 자동 군집화)

  • Kang, In-Su;Jung, Han-Min;Lee, Seung-Woo;Kim, Pyung;Goo, Hee-Kwan;Lee, Mi-Kyung;Goo, Nam-Ang;Sung, Won-Kyung
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.652-656
    • /
    • 2006
  • Bibliographic information retrieval systems require bibliographic data such as authors, organizations, source of publication to be uniquely identified using keys. In particular, when authors are represented simply as their names, users bear the burden of manually discriminating different users of the same name. Previous approaches to resolving the problem of same-name authors rely on bibliographic data such as co-author information, titles of articles, etc. However, these methods cannot handle the case of single author articles, or the case when articles do not have common terms in their titles. To complement the previous methods, this study introduces a classification-based approach using similarity between full-text of articles. Experiments using recent domestic proceedings showed that the proposed method has the potential to supplement the previous meta-data based approaches.

  • PDF

A Study on the Intellectual Structure of Library and Information Science in Korea by Author Bibliographic Coupling Analysis (저자서지결합분석에 의한 문헌정보학의 지적구조 분석에 관한 연구)

  • Park, Ji Yeon;Jeong, Dong Youl
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.4
    • /
    • pp.31-59
    • /
    • 2013
  • The purpose of this study was to examine the intellectual structure of domestic LIS in the 1990s and 2000s using author bibliographic coupling analysis (ABCA). First, cluster analysis and multi-dimensional scaling analysis were performed to examine core subject areas and to map authors in two-dimensional space. Second, network analysis was used to visualize intellectual relationships among subject areas and to reveal the top subject areas for global centrality. Third, the 1990s and 2000s intellectual structures was compared to identify the changes of the intellectual structure over the course of time.

Analysis of Research Trends Related to drug Repositioning Based on Machine Learning (머신러닝 기반의 신약 재창출 관련 연구 동향 분석)

  • So Yeon Yoo;Gyoo Gun Lim
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.21-37
    • /
    • 2022
  • Drug repositioning, one of the methods of developing new drugs, is a useful way to discover new indications by allowing drugs that have already been approved for use in people to be used for other purposes. Recently, with the development of machine learning technology, the case of analyzing vast amounts of biological information and using it to develop new drugs is increasing. The use of machine learning technology to drug repositioning will help quickly find effective treatments. Currently, the world is having a difficult time due to a new disease caused by coronavirus (COVID-19), a severe acute respiratory syndrome. Drug repositioning that repurposes drugsthat have already been clinically approved could be an alternative to therapeutics to treat COVID-19 patients. This study intends to examine research trends in the field of drug repositioning using machine learning techniques. In Pub Med, a total of 4,821 papers were collected with the keyword 'Drug Repositioning'using the web scraping technique. After data preprocessing, frequency analysis, LDA-based topic modeling, random forest classification analysis, and prediction performance evaluation were performed on 4,419 papers. Associated words were analyzed based on the Word2vec model, and after reducing the PCA dimension, K-Means clustered to generate labels, and then the structured organization of the literature was visualized using the t-SNE algorithm. Hierarchical clustering was applied to the LDA results and visualized as a heat map. This study identified the research topics related to drug repositioning, and presented a method to derive and visualize meaningful topics from a large amount of literature using a machine learning algorithm. It is expected that it will help to be used as basic data for establishing research or development strategies in the field of drug repositioning in the future.

Cluster Analysis Study based on Content Types of <Heungbu-jeon> versions (<흥부전> 이본의 내용 유형에 따른 군집 분석 연구)

  • Woonho Choi;Dong Gun Kim
    • Journal of Platform Technology
    • /
    • v.11 no.5
    • /
    • pp.23-36
    • /
    • 2023
  • This study aims to analyze the similarities and dissimilarities of various versions of <Heungbu-jeon> at both micro- and macro-levels using contents analysis techniques and the Hamming distance metrics. The 28 versions of <Heungbu-jeon> were segmented into 341 content units, and for each unit, the value of the content type was encoded. The dissimilarities between content types were compared among all versions by the content unit, respectively. The (dis-)similarities based on the content types of the 28 versions were aggregated and transformed into a distance matrix. The matrix was interpreted by multi-dimensional scaling, resulting into the two-dimensional coordinates. By visualizing the results by multi-dimensional scaling analysis, it was confirmed that the versions of <Heungbu-jeon> can be broadly divided into two groups. Hierarchical clustering and phylogenetic analysis were applied to analyze the clusters of the 28 versions, using the same distance matrix. The results showed that there are five clusters based on the micro-level analysis of (dis-)similarities within two major clusters. This study demonstrated the usefulness of applying digital humanities methods to encode the content of classical literary versions and analyze the data using clustering analysis techniques based on the (dis-)similarity of literary content.

  • PDF