• Title/Summary/Keyword: Title Classification

Search Result 77, Processing Time 0.023 seconds

A Study on Applicability of Machine Learning for Book Classification of Public Libraries: Focusing on Social Science and Arts (공공도서관 도서 분류를 위한 머신러닝 적용 가능성 연구 - 사회과학과 예술분야를 중심으로 -)

  • Kwak, Chul Wan
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.1
    • /
    • pp.133-150
    • /
    • 2021
  • The purpose of this study is to identify the applicability of machine learning targeting titles in the classification of books in public libraries. Data analysis was performed using Python's scikit-learn library through the Jupiter notebook of the Anaconda platform. KoNLPy analyzer and Okt class were used for Hangul morpheme analysis. The units of analysis were 2,000 title fields and KDC classification class numbers (300 and 600) extracted from the KORMARC records of public libraries. As a result of analyzing the data using six machine learning models, it showed a possibility of applying machine learning to book classification. Among the models used, the neural network model has the highest accuracy of title classification. The study suggested the need for improving the accuracy of title classification, the need for research on book titles, tokenization of titles, and stop words.

Automatic Title Detection by Spatial Feature and Projection Profile for Document Images (공간 정보와 투영 프로파일을 이용한 문서 영상에서의 타이틀 영역 추출)

  • Park, Hyo-Jin;Kim, Bo-Ram;Kim, Wook-Hyun
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.3
    • /
    • pp.209-214
    • /
    • 2010
  • This paper proposes an algorithm of segmentation and title detection for document image. The automated title detection method that we have developed is composed of two phases, segmentation and title area detection. In the first phase, we extract and segment the document image. To perform this operation, the binary map is segmented by combination of morphological operation and CCA(connected component algorithm). The first phase provides segmented regions that would be detected as title area for the second stage. Candidate title areas are detected using geometric information, then we can extract the title region that is performed by removing non-title regions. After classification step that removes non-text regions, projection is performed to detect a title region. From the fact that usually the largest font is used for the title in the document, horizontal projection is performed within text areas. In this paper, we proposed a method of segmentation and title detection for various forms of document images using geometric features and projection profile analysis. The proposed system is expected to have various applications, such as document title recognition, multimedia data searching, real-time image processing and so on.

도서분류자동화를 위한 지식베이스의 설계에 관한 연구

  • 이경호
    • Journal of Korean Library and Information Science Society
    • /
    • v.18
    • /
    • pp.139-192
    • /
    • 1991
  • Though the computer has become deeply entrenched as the major tool in information processing(library works), it may be obvious that automatic book classification techniques ate still under experimentation, and the techniques have not yet been tested against the criterion of usefulness. The purpose of this study is to design of knowledge base for automatic book classification which can be put to use in library operation, and to present a methodology of application of the automatic classification into the library. Since the enumerative classification schemes which are existing are manual systems, it cannot be applied to the automatic classification, the principle of faceted classification based on concept analysis is brought in and studied. The result of this study are summarized as follows : 1. The design of knowledge base confined the field of agriculture and medicine. 2. If title is entered by the computer keyboard it will be searched in knowledge base, and then be classified by the principle of automatic classification. 3. Program flowcharts are designed as a bases of classification procedures for automatic subject recognition and classification. 4. 283 books in agriculture, 196 books in medicine were drawn at random from Taegu University Library and Young-Nal Medical Center Library respectively. 5. The experiment of automatic classification is performed 143 books in agriculture 166 books in medicine except for other subject books. 6. It was proved that automatic book classification is possible by design of knowledge base. In addition the expected values from design of knowledge base for automatic book classification are as follows : 1. The prompt and accurate process of classification is possible. 2. Though some title is classified in any library, it can be classified the some classification number by a program. 3. The user can retrieve the classification codes of books for which he or she wants to search through the computer. 4. Since the concept coordination method is employed the representing of a multisubject concept is make simple. 5. By performing automatic book classification the automation of total system can be achieved. 6. The efficient international information transfer will be advanced since all the institution maintain unified classification number.

  • PDF

Organization and use of theses collections in university libraries (학위논문의 정리와 이용)

  • 최달현;변우열
    • Journal of Korean Library and Information Science Society
    • /
    • v.12
    • /
    • pp.161-198
    • /
    • 1985
  • This paper is a study of the organization and use of theses collections in university libraries of Korea. A questionnaire consisted of 31 questions on 6 items was sent to 44 university libraries of which 40 libraries responded. Results of the study can be summarized as follows: 1. Figures concerning registration of theses can be tabulated as follows. 2. In differentiation of oriental and occidental theses, 20 libraries (50%) depend on the basis of the text language. 3. Thirty-four libraries (85%) classify the theses and 27 (80%) of them use the same tables with book classification schedules. For classification level, 17 libraries (48.6%) classify them in section numbers whereas 13 (37.1%) in sub-sections. 4. Catalog or index cards of theses are made in 35 libraries (87.5%) of which 20 libraries are using the second level of bibliographic description. 5. Roman alphabets in a title are described a such 27 libraries (67.5%). 6. Most of respondents are preparing author, title and classified catalog cards for users. The research reveals that only 8 libraries are giving subject headings to the theses. 7. Twenty-three libraries (63.9%) have theses catalogs in separation from their book catalogs. 8. Most helpful bibliographic elements in an entry for users are reported to be author, title, date and notes. In general, theses collections have many different features in various aspects compared with book materials. Therefore it is desirable to process the former differently with the latter. Firstly, it would be more convenient to register theses on the different register from the book register. Secondly, minute classification of theses would be necessary for their users. thirdly, text language is the common basis of discriminating oriental materials and occidental ones. Fourthly, a simple catalog would be quite good enough to use theses collection, for most helpful elements in an entry are limited to author, title, date and notes. Fifthly, it is strongly recommendable to transcribe all the roman alphabets on the titles into Korean alphabets. Sixthly, the research revealed that our library would needs to develop subject heading work which is for behind other library works.

  • PDF

Academic Conference Categorization According to Subjects Using Topical Information Extraction from Conference Websites (학회 웹사이트의 토픽 정보추출을 이용한 주제에 따른 학회 자동분류 기법)

  • Lee, Sue Kyoung;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.22 no.2
    • /
    • pp.61-77
    • /
    • 2017
  • Recently, the number of academic conference information on the Internet has rapidly increased, the automatic classification of academic conference information according to research subjects enables researchers to find the related academic conference efficiently. Information provided by most conference listing services is limited to title, date, location, and website URL. However, among these features, the only feature containing topical words is title, which causes information insufficiency problem. Therefore, we propose methods that aim to resolve information insufficiency problem by utilizing web contents. Specifically, the proposed methods the extract main contents from a HTML document collected by using a website URL. Based on the similarity between the title of a conference and its main contents, the topical keywords are selected to enforce the important keywords among the main contents. The experiment results conducted by using a real-world dataset showed that the use of additional information extracted from the conference websites is successful in improving the conference classification performances. We plan to further improve the accuracy of conference classification by considering the structure of websites.

Automatic Subject Classification of Korean Journals

  • Choi, Seon-Heui;Kim, Byung-Kyu
    • International Journal of Contents
    • /
    • v.10 no.1
    • /
    • pp.43-46
    • /
    • 2014
  • Subject classification of journals is important because it can be utilized for the improvement of scholarly information services and analysis by research area. The classification by experts in a subject area wastes a lot of time and expense. On the other hand, the simple classification with basic information, such as the journal title has limitations. To solve this problem, this paper suggests the automatic classification of Korean journals using the SCI journals information cited by Korean journals, and an analysis of the classification result. In particular, this study adopted the WoS subject categories for classification to support the base for comparison between the Korean citation database and the global citation database (KSCI vs. SCI).

A Study on the Validity of Changing the Job Title of Medical Technologist (임상병리사 명칭 변경을 위한 타당성 연구)

  • Koo, Bon-Kyeong;Kim, Won Shik;Park, Sun Gu;Park, Jong O;Yoon, Seong Min
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.53 no.1
    • /
    • pp.105-121
    • /
    • 2021
  • To investigate and accommodate opinions on the revision of the official occupational title of the medical technologist, the Korean Association of Medical Technologists (KAMT) requested 22,638 people registered as its regular members to participate in an online survey and select their two preferred options from the alternative job titles presented. Survey responses were collected from 3,999 people (17.66%). To examine job title preferences among the KAMT members, each respondent was asked to choose two terms from the choice set. As a result, 6,958 responses were obtained, and out of the total responses, 5,555 (79.83%) indicated a choice for a job title that included the word 'analyst' as the preferred alternative. The survey results showed that "Diagnostic Laboratory Analyst" was the most preferred alternative selected by the largest proportion of respondents (2,417 responses, 34.73%), followed by "Clinical Laboratory Analyst" (1,710 responses, 24.57%), "Biomedical Pathology Technologist" (758 responses, 10.89%), "Biomedical Analyst" (730 responses, 10.49%), "Biomedical Laboratory Analyst" (730 responses, 10.03%), and "Clinical Laboratory Scientist" (646 responses, 9.26%). Therefore, based on the responses of the surveyed members, results of consultation and literature review, the Standard Classification of Occupations (SCO), and the current status of the job titles used in major countries, it is suggested that the occupational title of medical technologists should be changed by adopting "Diagnostic Laboratory Analyst", "Biomedical Laboratory Analyst", or "Biomedical Analyst" as their new official job title.

A Three-Step Preprocessing Algorithm for Enhanced Classification of E-Mail Recommendation System (이메일 추천 시스템의 분류 향상을 위한 3단계 전처리 알고리즘)

  • Jeong Ok-Ran;Cho Dong-Sub
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.54 no.4
    • /
    • pp.251-258
    • /
    • 2005
  • Automatic document classification may differ significantly according to the characteristics of documents that are subject to classification, as well as classifier's performance. This research identifies e-mail document's characteristics to apply a three-step preprocessing algorithm that can minimize e-mail document's atypical characteristics. In the first 5go, uncertain based sampling algorithm that used Mean Absolute Deviation(MAD), is used to address the question of selection learning document for the rule generation at the time of classification. In the subsequent stage, Weighted vlaue assigning method by attribute is applied to increase the discriminating capability of the terms that appear on the title on the e-mail document characteristic level. in the third and last stage, accuracy level during classification by each category is increased by using Naive Bayesian Presumptive Algorithm's Dynamic Threshold. And, we implemented an E-Mail Recommendtion System using a three-step preprocessing algorithm the enable users for direct and optimal classification with the recommendation of the applicable category when a mail arrives.

Document Classification Model Using Web Documents for Balancing Training Corpus Size per Category

  • Park, So-Young;Chang, Juno;Kihl, Taesuk
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.4
    • /
    • pp.268-273
    • /
    • 2013
  • In this paper, we propose a document classification model using Web documents as a part of the training corpus in order to resolve the imbalance of the training corpus size per category. For the purpose of retrieving the Web documents closely related to each category, the proposed document classification model calculates the matching score between word features and each category, and generates a Web search query by combining the higher-ranked word features and the category title. Then, the proposed document classification model sends each combined query to the open application programming interface of the Web search engine, and receives the snippet results retrieved from the Web search engine. Finally, the proposed document classification model adds these snippet results as Web documents to the training corpus. Experimental results show that the method that considers the balance of the training corpus size per category exhibits better performance in some categories with small training sets.

Principles of the Automatic Book-Classification (도서분류자동화 원리유도에 관한 연구)

  • 심의순;이경호
    • Journal of Korean Library and Information Science Society
    • /
    • v.11
    • /
    • pp.175-209
    • /
    • 1984
  • The purpose of this study is to build a general principle for the automatic book-classification which can be put to use in library operation, and to present a methodology of the automatic classification for the library. Since the enumerative classification scheme which exist as manual systems cannot be a n.0, pplied to the automation of classification, the principles of Colon Classification by S.R. Ranganathan is brought in and studied. The result of the study can be summarized as follows: (1) Automatic book-classification can be performed by the principles of faceted classification. (2) This study presents a general and an a n.0, pplication principles for the automatic book-classification. (3) File design for the automatic book-classification of a general classification is different from that of special classification, (4) The methodology is to classify the literature by inputting the title into a terminal. In addition, the expected Value from the Automatic Book-classification is as follows: (1) The prompt and accurate process of classification is possible. (2) Though a book is classified in any library it can have the same classification number. (3) The user can retrieve the classification code of a book for which he or she wants to search through the dialogue with the computer. (4) Since the concept coordination method is employed, even the representing of a multi-subject concept is made simple. (5) By performing automatic book-classification, the automation of library operation can be completed.

  • PDF