• Title/Summary/Keyword: Library Big Data

Search Result 96, Processing Time 0.024 seconds

Implementation of public data contents using Big data Visualization technology - Map visualization technique (빅 데이터 가시화 기술을 적용한 공공데이터 콘텐츠 구현 - Map가시화 기법)

  • Bak, Seon-Hui;Kim, Jong Ho;You, Hyun-Bea
    • Journal of Digital Contents Society
    • /
    • v.18 no.7
    • /
    • pp.1427-1434
    • /
    • 2017
  • Due to the acceleration of the 4th industrialization, the data around us rapidly increased. Therefore, it is necessary to be able to more easily grasp the nature and meaning of data obtained through data analysis than to collect data, and apply it flexibly to the value judgment of data. Visualization technology is now attracting attention in many fields. Visualization allows the user to more easily grasp the information of the data with graphs, charts, etc. so that the data analysis result can be understood more easily, so that the user can make an immediate judgment and make a quick decision. Among them, there is a high degree of interest in visualization using public data, which is highly useful to users. In this paper, we implemented R - library and R Studio to visualize public data at the installation sites of bicycle storage sites among various software that can express visualization.

Methodological Problems in Information Retrieval Research (정보검색 연구의 방법론에 관한 고찰)

  • 이명희
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.7 no.1
    • /
    • pp.231-246
    • /
    • 1994
  • A major problem for information retrieval research in the past three decades has been methodology, even though some progress has been made in obtaining useful results from methodologically sound experiments. Within a methodology, potential problems include artificial data generated by the researcher, small sample size interpretation of findings. Critics have pointed out that some room exists for improving methodology of information retrieval research; using existing data, having big enough sample size, including large numbers of search queries, introducing more control in relation to variables, utilizing more appropriate performance measures, conducting rests carefully and evaluating findings properly. Relevance judgments depend entirely on the perception of the user and on the situation of the moment. In an experiment, the best judge of relevance is a user with a well defined information need. Normally more than two categories for relevance judgments are desirable becase there are degrees of relevance. In experimental design, careful control of variables is meeded for internal validity. When no single database exists for comparison, existing operational databases should be used cautiously, Careful control for the variations of search queries, inter-searcher sonsistency, intra-searcher consistency and search strategies is necessary. Parametric statistics requiring rigid assumptions are not appropriate in information retrieval research and non-parametric statistics requiring few assumptions are necessary. Particularly, the sign test and the Wilcoxon test are good alternatives.

  • PDF

The Effects of the Bestseller Ranks on Public Library Circulation: Based on Panel Data Analysis (베스트셀러 순위가 공공도서관 대출에 미치는 영향 분석: 패널자료 분석을 중심으로)

  • Lee, Jongwook;Kang, Woojin;Park, Jungkyu
    • Journal of the Korean Society for information Management
    • /
    • v.38 no.4
    • /
    • pp.1-23
    • /
    • 2021
  • The purpose of this study is to analyze the effects of the bestseller ranks on the book circulations in public libraries. To achieve this goal, the weekly data sets of 179 books' library circulation and bestseller list from January 1, 2018 to December 29, 2019 were constructed based on the data collected from BigData MarketC and YES24. Three methods for analyzing panel data including linear regression, fixed-effect, and random effect models were compared, and it turned out that fixed-effect model was better than other methods. The results show that the average ranks of bestsellers were associated with their public library circulations visually. Also, the analysis of fixed-effect model showed that the single rank decline of a book on the bestseller list decreases its average circulation of 0.108 while the size of effect varied depending on subject of books. The study empirically demonstrated the impact of a bestseller list on people's book circulation behavior, suggesting that public libraries need to reference sociocultural context as well as bestseller book lists to predict library user needs and to formulate collection development policy.

A Study on Status and Necessity of the Curriculum for the Department of Libraries and Information Sciences in Korea (문헌정보학 교과과정에 대한 현황조사 및 인식조사 연구)

  • Hong, Hyun-Jin;Noh, Younghee;Kim, Dongseok
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.55 no.1
    • /
    • pp.5-36
    • /
    • 2021
  • This study attempted to present the direction of development of the curriculum of Library & Information Science by investigating and analyzing the current status of the curriculum of Library & Information Science in Korea and the perception of the necessity of each major subject. To this end, the curriculum of the Department of Library and Information Sciences nationwide was thoroughly investigated. Based on the subjects, a questionnaire survey was conducted for all professors of the Department of Library and Information Science on the degree of consent for required and elective subjects. As a result, first, the total number of courses opened in the Department of Library and Information Science has recently decreased. It was confirmed that the proportion of the required subjects and basic subjects decreased, and the proportion of elective subjects increased. Second, it was found that the importance and weight of informatics are constantly increasing, and there is a high demand for new subjects such as big data, programming, and data analysis. Third, the proportion of library management in all subjects is decreasing, but the necessity of detailed subjects is highly recognized. Fourth, it was confirmed that the proportion of bibliography was gradually decreasing. Fifth, although records management was not a required major subject, its weight increased as an elective subject, while language subjects showed almost no awareness of the necessity.

Research on Overseas Trends and Emerging Topics in Field of Library and Information Science (문헌정보학분야 해외 연구 동향 및 유망 주제 분석 연구)

  • Bon Jin Koo;Durk Hyun Chang
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.3
    • /
    • pp.71-96
    • /
    • 2023
  • This study aimed to investigate key research areas in the field of Library and Information Science (LIS) by analyzing trends and identifying emerging topics. To facilitate the research, a collection of 40,897 author keywords from 11,252 papers published in the past 30 years (1993-2022) in five journals was gathered. In addition, keyword analysis, as well as Principal Component Analysis (PCA) and correlation analysis were conducted, utilizing variables such as the number of articles, number of authors, ratio of co-authored papers, and cited counts. The findings of the study suggest that two topics are likely to develop as promising research areas in LIS in the future: machine learning/algorithm and research impact. Furthermore, it is anticipated that future research will focus on topics such as social media and big data, natural language processing, research trends, and research assessment, as they are expected to emerge as prominent areas of study.

Text Document Classification Scheme using TF-IDF and Naïve Bayes Classifier (TF-IDF와 Naïve Bayes 분류기를 활용한 문서 분류 기법)

  • Yoo, Jong-Yeol;Hyun, Sang-Hyun;Yang, Dong-Min
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.242-245
    • /
    • 2015
  • Recently due to large-scale data spread in digital economy, the era of big data is coming. Through big data, unstructured text data consisting of technical text document, confidential document, false information documents are experiencing serious problems in the runoff. To prevent this, the need of art to sort and process the document consisting of unstructured text data has increased. In this paper, we propose a novel text classification scheme which learns some data sets and correctly classifies unstructured text data into two different categories, True and False. For the performance evaluation, we implement our proposed scheme using $Na{\ddot{i}}ve$ Bayes document classifier and TF-IDF modules in Python library, and compare it with the existing document classifier.

  • PDF

Data Science Degree and Curriculum in Korea and its Implications for the Information Field (국내 데이터사이언스 학위 및 교과 운영 현황과 문헌정보학과로의 함의)

  • Park, Hyoungjoo;Lee, Heejin
    • Journal of Korean Library and Information Science Society
    • /
    • v.53 no.3
    • /
    • pp.431-454
    • /
    • 2022
  • This study examined data science degree programs and courses offered by universities, and those offered by the Library and Information Science (LIS) degree programs, to understand its implications for the LIS programs in Korea. This research assessed the status of data science degrees from 439 schools using the list released by the Korea Educational Development Institute in 2022. To be specific, this study analyzed universities, colleges, majors, sub-majors, interdisciplinary majors, convergence majors, micro-degrees, nanodegrees, tracks, modules, and industry-university cooperative programs within the data science field. This research examined 1,148 courses offered by data science degree programs and 1,325 courses offered by LIS degree programs. Data science degrees in Korea offer courses such as introductory, technical, practical, applied, and in-depth subjects related to data science. Although the LIS programs in Korea do not always offer data science, the courses included topics such as the introduction to data science, database, data visualization, data curation, metadata, big data, and information technology, when courses were offered. The researchers hope the findings of this study will be useful as a starting point for the development and revisions of LIS curriculum on data science in Korea.

A Case Study on HathiTrust as a Sustainable Cooperative Model of Digital Repositories (디지털 리포지터리의 지속가능한 협력 모델로서 하티트러스트 사례 연구)

  • Lee, You-Kyoung;Sung, Yunah;Jung, Young-Mi
    • Journal of Korean Library and Information Science Society
    • /
    • v.47 no.4
    • /
    • pp.443-464
    • /
    • 2016
  • A great number of institutions around the world have been building digital repositories to communicate scholarly information. Meanwhile, digital repositories have been struggling with how to preserve increased volume of digital contents for the long term and how to build a sustainable information environment. The HathiTrust partnership was established to meet the need of a sustainable collaborative model of digital repositories in research libraries, mainly in North America, and has been expanded globally by signing with other libraries around the world. This paper is dealt with the establishment, operation and policy, construction status, and user service of the HathiTrust. The results presented in this paper include the benefits and potential opportunities of the HathiTrust as a participating member. Partnership in HathiTrust would allow each member institution to provide more cost-effective operations, shared management and long-term preservation of digital content, ease of copyright management, and increased accessibility. In the future it is expected to provide a shared storage of printed materials and to facilitate a big data research center.

Twitter Crawling System

  • Ganiev, Saydiolim;Nasridinov, Aziz;Byun, Jeong-Yong
    • Journal of Multimedia Information System
    • /
    • v.2 no.3
    • /
    • pp.287-294
    • /
    • 2015
  • We are living in epoch of information when Internet touches all aspects of our lives. Therefore, it provides a plenty of services each of which benefits people in different ways. Electronic Mail (E-mail), File Transfer Protocol (FTP), Voice/Video Communication, Search Engines are bright examples of Internet services. Between them Social Network Services (SNS) continuously gain its popularity over the past years. Most popular SNSs like Facebook, Weibo and Twitter generate millions of data every minute. Twitter is one of SNS which allows its users post short instant messages. They, 100 million, posted 340 million tweets per day (2012)[1]. Often big amount of data contains lots of noisy data which can be defined as uninteresting and unclassifiable data. However, researchers can take advantage of such huge information in order to analyze and extract meaningful and interesting features. The way to collect SNS data as well as tweets is handled by crawlers. Twitter crawler has recently emerged as a great tool to crawl Twitter data as well as tweets. In this project, we develop Twitter Crawler system which enables us to extract Twitter data. We implemented our system in Java language along with MySQL. We use Twitter4J which is a java library for communicating with Twitter API. The application, first, connects to Twitter API, then retrieves tweets, and stores them into database. We also develop crawling strategies to efficiently extract tweets in terms of time and amount.

Information-providing Application Based on Web Crawling (웹 크롤링을 통한 개인 맞춤형 정보제공 애플리케이션)

  • Ju-Hyeon Kim;Jeong-Eun Choi;U-Gyeong Shin;Min-Jun Piao;Tae-Kook Kim
    • Journal of Internet of Things and Convergence
    • /
    • v.10 no.1
    • /
    • pp.21-27
    • /
    • 2024
  • This paper presents the implementation of a personalized real-time information-providing application utilizing filtering and web crawling technologies. The implemented application performs web crawling based on the user-set keywords within web pages, using the Jsoup library as a basis for the selected keywords. The crawled data is then stored in a MySQL database. The stored data is presented to the user through an application implemented using Flutter. Additionally, mobile push notifications are provided using Firebase Cloud Messaging (FCM). Through these methods, users can efficiently obtain the desired information quickly. Furthermore, there is an expectation that this approach can be applied to the Internet of Things (IoT) where big data is generated, allowing users to receive only the information they need.