• Title/Summary/Keyword: Zipf's power law

Search Result 7, Processing Time 0.023 seconds

Affinity and Variety between Words in the Framework of Hypernetwork (하이퍼네트워크에서 본 단어간 긴밀성과 다양성)

  • Kim, Joon-Shik;Park, Chan-Hoon;Lee, Eun-Seok;Zhang, Byoung-Tak
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.4
    • /
    • pp.166-171
    • /
    • 2008
  • We studied the variety and affinity between the successive words in the text document A number of groups were defined by the frequency of a following word in the whole text (corpus). In the previous studies, the Zipf's power law was explained by Chinese restaurant process and hub node was searched after by examining the edge number profile in scale free network. We have observed both a power law and a hub profile at the same time by studying the conditional frequency and degeneracy of a group. A symmetry between the affinity and the variety between words were found during the data analysis. And this phenomenon can be explained within a viewpoint of "exploitation and exploration." We also remark on a small symmetry breaking phenomenon in TIPSTER data.

A Study on the Behaviors of Complex System Revealed in the Sizes of Public Libraries in Korea (우리나라 공공도서관의 규모에 나타나는 복잡계 현상에 관한 연구)

  • Lee, Soo-Sang
    • Journal of Korean Library and Information Science Society
    • /
    • v.44 no.4
    • /
    • pp.399-419
    • /
    • 2013
  • This paper conducted the empirical analysis of the behaviors revealed in the eight size distributions of the public libraries in Korea. As a result, the behaviors of complex system appeared in all eight size factors. This means that the sizes of public libraries in Korea were highly polarized. Especially, the zipf's law were found in the size factors such as gross area, number of staffs, volume of books, total budget. And the highly uneven distributions were occurred in the size factors such as membership, number of users, number of borrowers, number of borrowed books. This research outcomes show that a new policy of public libraries is needed to resolve the polarization revealed in the sizes of public libraries in Korea.

Indian Buffet Process Inspired Component Analysis for fMRI Data (fMRI 데이터에 적용한 인디언 뷔페 프로세스 닮은 성분 분석법)

  • Kim, Joon-Shik;Kim, Eun-Sol;Lim, Byoung-Kwon;Lee, Chung-Yeon;Zhang, Byoung-Tak
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06c
    • /
    • pp.191-194
    • /
    • 2011
  • 문서를 이루는 단어들의 빈도수가 지수법칙(power law)를 따른다는 지프의 법칩(Zipf's law)이 있다. 이러한 단어분포를 고려하여 문서의 토픽을 찾아내는 기계학습법이 디리쉴레 프로세스(Dirichlet process) 이다. 이를 발전시켜서 데이터의 잠재 요인(latent factor)들을 베이즈 확률모델에 기반한 샘플링 바탕으로 찾는 방법이 인디언 뷔페 과정(Indian buffet process) 이다. 우리는 25가지의 특징(feature)들에 대한 점수(rating)들이 볼드(blood oxygen dependent level) 신호와 함께 주어지는 PBAIC 2007 데이터에 주성분 분석법(principal component analysis)를 적용했다. PBAIC 2007 데이터는 비디오 게임을 수행하며 기능적뇌영상(functional magnetic resonance imaging, fMRI) 촬영을 하여 얻어진 공개데이터이다. 우리의 연구에서는 주성분 분석법을 이용하여 10개의 독립 성분(independent component)들을 찾았다. 그리고 1.75초 마다 촬영된 BOLD 신호와 10개의 고유벡터(eigenvector)들간의 내적을 취하여 가중치(weight)를 구하였다. 성분들의 가중치를 낮은 순서로 정렬함으로써 각 시간마다 주도적으로 영향을 미치는 성분들을 알아낼 수 있었다.

A Study of Themes and Trends in Research of Global Maritime Economics through Keyword Network Analysis (키워드 네트워크 분석을 통한 세계 해운경제의 연구 주제와 동향에 대한 연구)

  • Jhang, Se-Eun;Lee, Su-Ho
    • Journal of Korea Port Economic Association
    • /
    • v.32 no.1
    • /
    • pp.79-95
    • /
    • 2016
  • This study identifies themes and trends in maritime economics and logistics by examining 303 papers published in international journals from 2000 to 2014 using keyword network analysis. Network analysis can be used because the collected data follow Zipf's law and the power law. Utilizing the degree centrality and betweenness centrality, we find the important keywords in each five year period and determine the importance of shared keywords. To further explain keyword centralities, we invented a Delta-C algorithm to show the trends of keywords over time. We found that degree centrality is useful for identifying important research themes in each period because it is mainly concerned with the number of connections. On the other hands, betweenness centrality is useful to determine the unique themes that emerge in each of the specific periods.

Analysis of Keywords in national river occupancy permits by region using text mining and network theory (텍스트 마이닝과 네트워크 이론을 활용한 권역별 국가하천 점용허가 키워드 분석)

  • Seong Yun Jeong
    • Smart Media Journal
    • /
    • v.12 no.11
    • /
    • pp.185-197
    • /
    • 2023
  • This study was conducted using text mining and network theory to extract useful information for application for occupancy and performance of permit tasks contained in the permit contents from the permit register, which is used only for the simple purpose of recording occupancy permit information. Based on text mining, we analyzed and compared the frequency of vocabulary occurrence and topic modeling in five regions, including Seoul, Gyeonggi, Gyeongsang, Jeolla, Chungcheong, and Gangwon, as well as normalization processes such as stopword removal and morpheme analysis. By applying four types of centrality algorithms, including stage, proximity, mediation, and eigenvector, which are widely used in network theory, we looked at keywords that are in a central position or act as an intermediary in the network. Through a comprehensive analysis of vocabulary appearance frequency, topic modeling, and network centrality, it was found that the 'installation' keyword was the most influential in all regions. This is believed to be the result of the Ministry of Environment's permit management office issuing many permits for constructing facilities or installing structures. In addition, it was found that keywords related to road facilities, flood control facilities, underground facilities, power/communication facilities, sports/park facilities, etc. were at a central position or played a role as an intermediary in topic modeling and networks. Most of the keywords appeared to have a Zipf's law statistical distribution with low frequency of occurrence and low distribution ratio.

Research Trends in Global Cruise Industry Using Keyword Network Analysis (키워드 네트워크 분석을 활용한 세계 크루즈산업 연구동향)

  • Jhang, Se-Eun;Lee, Su-Ho
    • Journal of Navigation and Port Research
    • /
    • v.38 no.6
    • /
    • pp.607-614
    • /
    • 2014
  • This article aims to explore and discuss research trends in global cruise industry using keyword network analysis. We visualize keyword networks in each of four groups of 1982-1999, 2000-2004, 2005-2009, 2010-2014 based on the top 20 keyword nodes' degree centrality and betweenness centrality which are selected among four centrality measurements, comparing them with frequency order. The article shows that keyword frequency collected from 240 articles published in international journals is subject to Zipf's law and nodes degree distribution also exhibits power law. We try to find out research trends in global cruise industry to change some important keywords diachronically, visualizing several networks focusing on the top two keywords, cruise and tourism, belonging to all the four year groups, with high degree and betweenness centrality values. Interestingly enough, a new node, China, connecting the top most keywords, appears in the most recent period of 2010-2014 when China has emerged as one of the rapid development countries in global cruise industry. Therefore keyword network analysis used in this article will be useful to understand research trends in global cruise industry because of increase and decrease of numbers of network types in different year groups and the visual connection between important nodes in giant components.