• Title/Summary/Keyword: 동시단어분석

Search Result 186, Processing Time 0.031 seconds

Verbal Collocation Extraction from Sejong Tagged Corpus (세종 말뭉치로부터 용언연어 추출)

  • Lee, Jeong-Tae;Cheon, Min-Ah;Kim, Jae-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2015.10a
    • /
    • pp.121-123
    • /
    • 2015
  • 연어는 둘 이상의 단어로 구성된 표현으로 연어에 속하는 개개의 단어의 의미로써 연어의 의미를 유추할 수 없다. 따라서 연어의 의미를 분석하거나 번역할 경우 개개의 단어보다는 연어 그 자체를 하나의 분석 단위로 간주하는 것이 훨씬 더 효과적이다. 이를 위해 본 논문에서는 통계기법을 활용하여 세종 말뭉치로 부터 용언연어의 추출 방법을 제시하고 그 성능을 평가한다. 연어 패턴과 통계 정보를 이용해서 연어를 추출한다. 평가를 위해서 연어 사전과 전문가의 주관적 평가를 동시에 수행했다.

  • PDF

An Analysis of the Intellectual Structure of Assistive Technology Journal Using Co-Word Analysis (동시출현단어 분석을 이용한 보조공학 저널의 지적구조 분석)

  • Yang, Hyunkieu
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.11 no.1
    • /
    • pp.15-20
    • /
    • 2017
  • The purpose of this study is to present the intellectual structure of Assistive Technology Journal using co-word analysis of keywords. The articles of Assistive Technology Journal were collected from Web of Science citation database. 255 articles during the period from 2003 to 2015 were selected for the analysis. And 1,359 author keywords were extracted from the articles. In order to analyze the intellectual structure of Assistive Technology Journal, clustering analysis was conducted and 5 clusters were determined. Next, 5 clusters are presented in the map of multidimensional scaling. The results of this study are expected to assist in exploring the future directions of the researches on assistive technology.

An Informetric Study on Academic Activities and Environmental Movements in Solving Global Environmental Problems (지구적 환경문제 해결을 위한 학술활동과 환경운동 경향 연구)

  • Park, Jae-Shin;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.3
    • /
    • pp.83-102
    • /
    • 2010
  • This study aims to understand and compare the characteristics of two major approaches to solving global environmental problems - an academic approach including scholarly activities of environmental sciences and a practical approach of environmental movements led by NGOs - by employing informetric analysis methods. Knowledge structure of environmental sciences is depicted through co-citation networks of subject categories assigned to the cited journals in the discipline of environmental sciences for the 10-year period from 2000 to 2009. Furthermore, major interests of environmental NGOs are identified on the basis of external link data collected from web sites of the NGOs. Co-word analyses are also performed using the texts of journal papers in environmental sciences as well as news articles provided by NGO sites. Through the analyses, dominant subject areas of environmental sciences and environmental movements are identified demonstrating similarities and differences between the two approaches.

Korean Morphological Analyzer and POS Tagger Just Using Finite-State Transducers (유한상태변환기만을 이용한 한국어 형태소 분석 및 품사 태깅)

  • Park, Won-Byeong;Kim, Jae-Hoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2006.11a
    • /
    • pp.165-168
    • /
    • 2006
  • 이 논문은 유한상태변환기만을 이용하여 한국어 형태소 분석 및 품사 태깅 시스템을 제안한다. 기존의 한국어 형태소 분석 시스템들은 규칙기반 형태소 분석기가 주를 이루고 한국어 품사 태깅 시스템은 은닉마르코프 모델 기반 품사 태깅이 주를 이루었다. 한국어 형태소 분석의 경우 유한상태변환기를 이용한 경우도 있었으나, 이 방법은 변환기를 작성하기 위한 규칙을 수작업으로 구축해야 하며, 그 규칙에 따라서 사전이 작성되어야 한다. 이 논문에서는 품사 태깅 말뭉치를 이용해서 유한상태변환기에서 필요한 모든 변환 규칙을 자동으로 추출한다. 이런 방법으로 네 종류의 변환기, 즉, 자소분리변환기, 단어분리변환기, 단어형성변환기, 품사결정변환기를 자동으로 구축한다. 구축된 변환기들은 결합연산(composition operation)을 이용하여 하나의 유한상태변환기를 구성하여 한국어 형태소 분석과 동시에 한국어 품사 태깅을 수행한다. 이 방법은 하나의 유한상태변환기만을 이용하기 때문에 복잡도는 선형시간(linear complexity)을 가지면, 형태소 분석기와 품사 태깅 시스템을 매우 짧은 시간 내에 개발 할 수 있었다.

  • PDF

Exploring the Research Topic Networks in the Technology Management Field Using Association Rule-based Co-word Analysis (연관규칙 기반 동시출현단어 분석을 활용한 기술경영 연구 주제 네트워크 분석)

  • Jeon, Ikjin;Lee, Hakyeon
    • Journal of Technology Innovation
    • /
    • v.24 no.4
    • /
    • pp.101-126
    • /
    • 2016
  • This paper identifies core research topics and their relationships by deriving the research topic networks in the technology management field using co-word analysis. Contrary to the conventional approach in which undirected networks are constructed based on normalized co-occurrence frequency, this study analyzes directed networks of keywords by employing the confidence index of association rule mining for pairs of keywords. Author keywords included in 2,456 articles published in nine international journals of technology management in 2011~2014 are extracted and categorized into three types: THEME, METHOD, and FIELD. One-mode networks for each type of keywords are constructed to identify core research keywords and their interrelationships with each type. We then derive the two-mode networks composed of different two types of keywords, THEME-METHOD and THEME-FIELD, to explore which methods or fields are frequently employed or studied for each theme. The findings of this study are expected to be fruitfully referred for researchers in the field of technology management to grasp research trends and set the future research directions.

An Analysis of Related Movie Information Using The Co-Word Method (동시출현단어분석을 이용한 연관영화정보 분석 연구)

  • Choi, Sanghee
    • Journal of the Korean Society for information Management
    • /
    • v.31 no.4
    • /
    • pp.161-178
    • /
    • 2014
  • Recently, many information services allow users to collaborate to produce and use information. Sharing information is also important for users who have similar taste or interest. As various channels are available for users to share their experiences and knowledge, users' data have also been accumulated within the information services. This study collected movie lists made by users of IMDB service. Co-word analysis and ego-centered network analysis were adapted to discover relevant information for users who chose a specific movie. Three factors of movies including movie title, director and genre were used to present related movie information. Movie title is an effective feature to present related movies with various aspects such as theme or characters and the popularity of directors affects on identifying related directors. Genre is not useful to find related movies due to the complexity in the topic of a movie.

Affinity and Variety between Words in the Framework of Hypernetwork (하이퍼네트워크에서 본 단어간 긴밀성과 다양성)

  • Kim, Joon-Shik;Park, Chan-Hoon;Lee, Eun-Seok;Zhang, Byoung-Tak
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.4
    • /
    • pp.166-171
    • /
    • 2008
  • We studied the variety and affinity between the successive words in the text document A number of groups were defined by the frequency of a following word in the whole text (corpus). In the previous studies, the Zipf's power law was explained by Chinese restaurant process and hub node was searched after by examining the edge number profile in scale free network. We have observed both a power law and a hub profile at the same time by studying the conditional frequency and degeneracy of a group. A symmetry between the affinity and the variety between words were found during the data analysis. And this phenomenon can be explained within a viewpoint of "exploitation and exploration." We also remark on a small symmetry breaking phenomenon in TIPSTER data.

Analyzing Self-Introduction Letter of Freshmen at Korea National College of Agricultural and Fisheries by Using Semantic Network Analysis : Based on TF-IDF Analysis (언어네트워크분석을 활용한 한국농수산대학 신입생 자기소개서 분석 - TF-IDF 분석을 기초로 -)

  • Joo, J.S.;Lee, S.Y.;Kim, J.S.;Kim, S.H.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.23 no.1
    • /
    • pp.89-104
    • /
    • 2021
  • Based on the TF-IDF weighted value that evaluates the importance of words that play a key role, the semantic network analysis(SNA) was conducted on the self-introduction letter of freshman at Korea National College of Agriculture and Fisheries(KNCAF) in 2020. The top three words calculated by TF-IDF weights were agriculture, mathematics, study (Q. 1), clubs, plants, friends (Q. 2), friends, clubs, opinions, (Q. 3), mushrooms, insects, and fathers (Q. 4). In the relationship between words, the words with high betweenness centrality are reason, high school, attending (Q. 1), garbage, high school, school (Q. 2), importance, misunderstanding, completion (Q.3), processing, feed, and farmhouse (Q. 4). The words with high degree centrality are high school, inquiry, grades (Q. 1), garbage, cleanup, class time (Q. 2), opinion, meetings, volunteer activities (Q.3), processing, space, and practice (Q. 4). The combination of words with high frequency of simultaneous appearances, that is, high correlation, appeared as 'certification - acquisition', 'problem - solution', 'science - life', and 'misunderstanding - concession'. In cluster analysis, the number of clusters obtained by the height of cluster dendrogram was 2(Q.1), 4(Q.2, 4) and 5(Q. 3). At this time, the cohesion in Cluster was high and the heterogeneity between Clusters was clearly shown.

네트워크 분석을 통한 정부 R&D 사업 유사연구영역 분석

  • Jeong, Jae-Ung;Han, Yu-Ri;Gang, In-Je;Choe, San;Jeong, Jae-Yeon;Park, Hyeon-U;Jeon, Seung-Pyo
    • Proceedings of the Korea Technology Innovation Society Conference
    • /
    • 2017.05a
    • /
    • pp.559-570
    • /
    • 2017
  • 우리나라는 과거부터 현재까지 미래 성장동력 육성을 목표로 정부주도하에 국가 R&D 투자를 점진적으로 늘려왔다. 그 결과, 최근에는 GDP 대비 연구개발비 비중이 세계 최고 수준에 이르렀다. 이렇게 연구개발 예산의 양적인 확대와 함께 연구개발 예산의 효율적 활용은 더욱 중요한 과학기술 분야의 정책적 이슈로 부각되고 있다. 연구개발 예산의 효율적인 집행을 위해서는 R&D 사업의 유사 중복성의 검토가 필수적이지만, 대부분의 유사 중복성 검토는 전문가의 직관적인 판단에 근거하여 이루어져왔다. 하지만, 전문가의 직관에만 의지한 판단은 때로는 불명확하거나 잘못된 결과를 가져올 수도 있다. 따라서, 본 연구에서는 네트워크 분석을 통해 정부 R&D 사업의 유사 중복성을 체계적으로 검토하기 위한 데이터기반의 방법론을 제안하여 전문가의 직관에 의한 유사 중복성 검토를 보완할 수 있는 가능성을 모색하고자 한다. 먼저, 본 연구에서는 정부 R&D사업 유사영역의 전체적인 구조 및 형태와 국가과학기술연구회 소속 25개 정부출연연구기관 R&D사업의 유사영역의 전반적인 형태를 시각화하여 유사영역을 파악하고 직관적인 판단과 선택을 할 수 있는 의사결정 정보를 제공하는데 초점을 두었다. 이를 위해, NTIS의 2015년 데이터를 사용하여 과제 키워드 기반으로 동시단어출현 분석을 수행하였다. 본 분석을 통해 25개 기관의 세부적인 유사연구영역 형태를 제시하였으며, 국내의 과학기술정책적 또는 과학기술학적인 현상들을 시각화하였다. 그 결과, 국내 출연연 R&D사업이 기관별 고유영역이 확고히 보이는 Mode 1적인 형태와 사회경제적인 맥락과 필요 및 유망성을 따르고, 다학제적, 적용중심적이며 과제별로 다양한 과제수행기관들이 과제들을 동시에 수행하는 Mode 2적인 형태가 출연연의 R&D사업 내에 공존하고 있음을 확인하였다.

  • PDF

Correlation Analysis of the Arirangs Based on the Informatics Algorithms (정보 알고리즘 기반 아리랑의 계통도 및 상관관계 분석)

  • Kim, Hak Yong
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.4
    • /
    • pp.407-417
    • /
    • 2014
  • An arirang is the most famous Korean folk song and was registered in UNESCO(Unitied Nations Educational, Scientific and cultural Organization) as an intangible cultural heritage in 2012. Most arirangs are composed of text and refrain parts. Genealogy of the arirang was classified in refrain patterns by using multiple sequence alignment algorithm. There are two different refrain patterns, slow and fast melodies. Of 106 arirangs, 38 and 68 arirangs contain fast and slow melodies, respectively. 73 arirangs and 104 their key words were extracted from bipartate arirang network that composed of arirangs, text works, and their relationships. The correlation among the arirangs was analyzed from the selected arirangs and key words by using pairwise comparison matrix. Also, analysis of correlation among the arirnags was performed by stepwise removal of the single degree nodes from the bipartate arirang network In this study, arirangs were analyzed in genealogy and correlation among arirangs by using informatic algorithm and network technology, in which arirang research will be constructed a stepping stone for the popularization and globalization of the arirangs.