• Title/Summary/Keyword: Zipf's Law

Search Result 20, Processing Time 0.026 seconds

A Study of Zipfian Phenomena in Hangul Literaure (한글 문헌에 있어서 Zipfian 현상에 관한 연구)

  • 신강현;이두영
    • Journal of the Korean Society for information Management
    • /
    • v.5 no.2
    • /
    • pp.53-98
    • /
    • 1988
  • The purpose of this Study is to irwest~gate the Zipfian distribution in Har~gul literature. The result shows that the formulas derived from the liangul Ilterature are it1 accordance with the getlcra\ized Zipf's first law. The result also shows that the formulas derived from the Harlgul literature arc2 not in accordance with the Zlpf's second law and the penerali~ed Zipf's second law.

  • PDF

Affinity and Variety between Words in the Framework of Hypernetwork (하이퍼네트워크에서 본 단어간 긴밀성과 다양성)

  • Kim, Joon-Shik;Park, Chan-Hoon;Lee, Eun-Seok;Zhang, Byoung-Tak
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.4
    • /
    • pp.166-171
    • /
    • 2008
  • We studied the variety and affinity between the successive words in the text document A number of groups were defined by the frequency of a following word in the whole text (corpus). In the previous studies, the Zipf's power law was explained by Chinese restaurant process and hub node was searched after by examining the edge number profile in scale free network. We have observed both a power law and a hub profile at the same time by studying the conditional frequency and degeneracy of a group. A symmetry between the affinity and the variety between words were found during the data analysis. And this phenomenon can be explained within a viewpoint of "exploitation and exploration." We also remark on a small symmetry breaking phenomenon in TIPSTER data.

On Regularity of Daily Distribution of Queries in Search Engine (검색엔진에서 일간질의 어분포의 정상성에 관한 연구)

  • Park, Sang-Gue;Lee, Chan-Kyu;Yoon, Kyung-Hyun;Kim, Seong-Hee;Lee, Jun-Ho
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.4
    • /
    • pp.255-265
    • /
    • 2007
  • In this paper we analyzed regularity of daily patterns of distribution of Queries coming from internet search engine. And then, we proposed a Pareto distribution and Zipf law for identifying the query distribution and applied them to daily queries on the search engine during 2 week. We found that there is some evidence that Pareto and Zipf laws can be applied to evaluate the regularity of daily patterns of distribution of queries in search engine. Those results can be used to provide a better understanding of the social interests and trends using the query distribution patterns.

Operation Policy for Enhancing Availability of a Web Server against DoS Attacks (서비스 거부 공격에 대응한 웹서버 가용성 향상을 위한 운용 정책 방안)

  • Baik, Nam-Kyun;Jung, Sou-Hwan
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.8B
    • /
    • pp.735-744
    • /
    • 2008
  • This paper proposes a 'secure node' to be robust against network-based DoS attacks. The secure node selectively accepts new sessions based on the Zipf's law while a link is in the overload state. Our scheme calculates a threshold value for overload state, and provides a dynamic service mechanism for enhancing availability of a web server. The simulation results show performance improvement of the proposed scheme in terms of completion/connection ratios.

A Study on the Behaviors of Complex System Revealed in the Sizes of Public Libraries in Korea (우리나라 공공도서관의 규모에 나타나는 복잡계 현상에 관한 연구)

  • Lee, Soo-Sang
    • Journal of Korean Library and Information Science Society
    • /
    • v.44 no.4
    • /
    • pp.399-419
    • /
    • 2013
  • This paper conducted the empirical analysis of the behaviors revealed in the eight size distributions of the public libraries in Korea. As a result, the behaviors of complex system appeared in all eight size factors. This means that the sizes of public libraries in Korea were highly polarized. Especially, the zipf's law were found in the size factors such as gross area, number of staffs, volume of books, total budget. And the highly uneven distributions were occurred in the size factors such as membership, number of users, number of borrowers, number of borrowed books. This research outcomes show that a new policy of public libraries is needed to resolve the polarization revealed in the sizes of public libraries in Korea.

Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text

  • Atwan, Jaffar
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.65-74
    • /
    • 2022
  • In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf's law, and Combined Stop-list. An experiment was conducted using a selected file from the Arabic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.

Method for Designing Adaptive UI Based on User's Context in the Environment Including Mobile Device and Public Display Device (모바일 장치와 공용 디스플레이 장치를 포함하는 환경에서 사용자의 특성에 기반한 Adaptive UI 설계 방안)

  • Kang, Seung-Soo;Ko, Hyun;Youn, Hee Yong
    • Journal of Information Technology Services
    • /
    • v.11 no.4
    • /
    • pp.181-194
    • /
    • 2012
  • The one of the most meaningful change in the recent ubiquitous environment is the omnipresence of public digital display device for providing ubiquitous information. It is the important issue to provide publicity as well as adaptive information to each user in the field of the public digital display device. This research proposes the idea ensuring fast response speed by the selection of user preference function. The preference function is selected by statistics using Zipf distribution in the system comprising mobile device and digital display device based on NFC (Near Field Communication). The idea is proved by CPM-GOMS model and the improvement of user response can be achieved.

Values and Future Research Issues In Bibliometrics (도서관/정보학적 측에서 본 계량서지학의 가치와 중요성 및 연구방향 제시)

  • Jeong Dong Youl
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.19
    • /
    • pp.243-261
    • /
    • 1990
  • 계량서지학이 도서관 정보학 분야에 응용된 지 20년이 지난 지금, 이론 및 실무에 남긴 발전적 기여를 고찰함과 동시에, 컴퓨터를 비롯한 정보기술의 발달로 계량서지학의 가치와 중요성은 한층 더 폭넓게 인지되고 있다. 본 연구는 계량서지학의 개념 정리와 그 특성을 분석함으로써 다양한 이론적 근거 및 장$\cdot$단점을 파악하여 향후 연구방향 설정에 기초를 제시함을 그 목적으로 한다. 문헌구조를 분석하는 군집분석(cluster analysis), 동시인용분석(co-citation analysis), 인용문맥분석 (citation context analysis), 다차원축적기법(multidimensional scaling technique) 등에 대한 최근의 연구 동향 및 추이를 분석함으로써 도서관 실무 혹은 정보시스팀에 계량서지학의 실제 응용을 제시하였다. 아울러, 계량서지학의 3대법칙-Lotka's law, Brandford's law, Zipf's law-의 발달 단계, 상관관계 및 응용분야를 연구함으로써 전반적인 도서관 관리와 이론정보학의 연구방향을 설정하고 있다.

  • PDF

Genetic Algorithm for Lewdness Web Site Detection (유전 알고리즘을 이용한 음란사이트 식별)

  • 한수경
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.211-213
    • /
    • 2004
  • 오늘날 인터넷은 의식주와 더불어 삶에 유용한 다양한 정보를 제공하늘 생활 필수품이다. 의식주가 인간의 육체적인 건강을 담당한다면, 인터넷은 정신적인 삶의 질을 담당한다. 그런데 음란사이트는 아직 정신적으로 미숙한 청소년들에게 선별 없이 개방되고 쉽게 노출될 수 있다. 이 논문에서는 웹사이트의 문서가 음란 문서인지 비음란 문서인지를 바르게 판정하기 위하여 유전 알고리즘을 이용하여 단어에 가중치를 배정하는 문제에 대하여 연구한다. 실험 결과 이렇게 배정된 가중치를 이용하여 평균 93.84%의 인식률로 음란 문서와 비음란 문서를 식별할 수 있었다. 여기서 문서의 음란여부를 판정하기 위하여 가중치를 배정하는 단어는 Zipf's law에 기반 하여 선정하였다.

  • PDF