• Title/Summary/Keyword: Words classification

Search Result 463, Processing Time 0.025 seconds

Consideration of Sri Lanka Stupa Type (스리랑카 불탑 형식에 대한 고찰)

  • Her, Jihye;Cheon, Deuk Youm
    • Journal of architectural history
    • /
    • v.24 no.6
    • /
    • pp.57-66
    • /
    • 2015
  • As Sri Lanka Stupa had been affected by Indian stupa directly, understanding Sri Lanka Stupa is important to know about the flow of Buddhist Art History, which is showing the variation of Initial Buddhist stupa. Due to invasions and disasters, all Sri Lanka's Stupa collapsed and became random mound. After restoration works, Stupa shape changed dramatically from the Initial shape to Existing shape. Since it is hard to find out how Initial stupas were like, Sanchi Stupa needed to be an example for the comparative study as an Initial shape. Sri Lanka Stupa have Square foundation and 3 Basal rings that are supporting the Main Dome. Entrances are on all 4 sides, Railing and Torana(gate) has never found in Sri Lanka stupa. Sri Lanka stupa has been classified with the shape of Dome into 6~8 types according to "Vijayanta Potha", the Ancient Buddhist Description, and described by several researchers confusingly. With the inconvenience of using unfamiliar words and irrational gap between the Initial Sri Lanka stupa and Existing Sri Lanka stupa, proposing new classification of Sri Lanka Stupa is necessary. Existing Sri Lanka Stupa can be classified into 4 types : which is (1)Bell type, (2)Pot type, (3)Mound type, (4)Bubble type. This suggestion is for further studies to use Easier and shorter words to describe the types and make it reasonable to use, since the current classification includes 3 stupa types even there is no case for any of them. Restrict Stupa Classifications within existing Sri Lanka Stupa is needed because the current classification had been continued for hundreds of years without any adjustments. Bell type is mainly located in Anuradhapura. Pot type and Mound type is only found in limited area, and Bubble type is located in most area of Sri Lanka.

A Comparative Study on Requirements Analysis Techniques using Natural Language Processing and Machine Learning

  • Cho, Byung-Sun;Lee, Seok-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.7
    • /
    • pp.27-37
    • /
    • 2020
  • In this paper, we propose the methodology based on data-driven approach using Natural Language Processing and Machine Learning for classifying requirements into functional requirements and non-functional requirements. Through the analysis of the results of the requirements classification, we have learned that the trained models derived from requirements classification with data-preprocessing and classification algorithm based on the characteristics and information of existing requirements that used term weights based on TF and IDF outperformed the results that used stemming and stop words to classify the requirements into functional and non-functional requirements. This observation also shows that the term weight calculated without removal of the stemming and stop words influenced the results positively. Furthermore, we investigate an optimized method for the study of classifying software requirements into functional and non-functional requirements.

Feature Expansion based on LDA Word Distribution for Performance Improvement of Informal Document Classification (비격식 문서 분류 성능 개선을 위한 LDA 단어 분포 기반의 자질 확장)

  • Lee, Hokyung;Yang, Seon;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.43 no.9
    • /
    • pp.1008-1014
    • /
    • 2016
  • Data such as Twitter, Facebook, and customer reviews belong to the informal document group, whereas, newspapers that have grammar correction step belong to the formal document group. Finding consistent rules or patterns in informal documents is difficult, as compared to formal documents. Hence, there is a need for additional approaches to improve informal document analysis. In this study, we classified Twitter data, a representative informal document, into ten categories. To improve performance, we revised and expanded features based on LDA(Latent Dirichlet allocation) word distribution. Using LDA top-ranked words, the other words were separated or bundled, and the feature set was thus expanded repeatedly. Finally, we conducted document classification with the expanded features. Experimental results indicated that the proposed method improved the micro-averaged F1-score of 7.11%p, as compared to the results before the feature expansion step.

Similar Patent Search Service System using Latent Dirichlet Allocation (잠재 의미 분석을 적용한 유사 특허 검색 서비스 시스템)

  • Lim, HyunKeun;Kim, Jaeyoon;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.8
    • /
    • pp.1049-1054
    • /
    • 2018
  • Keyword searching used in the past as a method of finding similar patents, and automated classification by machine learning is using in recently. Keyword searching is a method of analyzing data that is formalized through data refinement. While the accuracy for short text is high, long one consisted of several words like as document that is not able to analyze the meaning contained in sentences. In semantic analysis level, the method of automatic classification is used to classify sentences composed of several words by unstructured data analysis. There was an attempt to find similar documents by combining the two methods. However, it have a problem in the algorithm w the methods of analysis are different ways to use simultaneous unstructured data and regular data. In this paper, we study the method of extracting keywords implied in the document and using the LDA(Latent Semantic Analysis) method to classify documents efficiently without human intervention and finding similar patents.

An Automatic Classification System of Korean Documents Using Weight for Keywords of Document and Word Cluster (문서의 주제어별 가중치 부여와 단어 군집을 이용한 한국어 문서 자동 분류 시스템)

  • Hur, Jun-Hui;Choi, Jun-Hyeog;Lee, Jung-Hyun;Kim, Joong-Bae;Rim, Kee-Wook
    • The KIPS Transactions:PartB
    • /
    • v.8B no.5
    • /
    • pp.447-454
    • /
    • 2001
  • The automatic document classification is a method that assigns unlabeled documents to the existing classes. The automatic document classification can be applied to a classification of news group articles, a classification of web documents, showing more precise results of Information Retrieval using a learning of users. In this paper, we use the weighted Bayesian classifier that weights with keywords of a document to improve the classification accuracy. If the system cant classify a document properly because of the lack of the number of words as the feature of a document, it uses relevance word cluster to supplement the feature of a document. The clusters are made by the automatic word clustering from the corpus. As the result, the proposed system outperformed existing classification system in the classification accuracy on Korean documents.

  • PDF

A study of the Four Category Classification System of Hong Sok-chu (홍석주의 사부분류법에 관한 연구)

  • Lee Sang-Yong
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.30 no.2
    • /
    • pp.149-165
    • /
    • 1996
  • Hong-sii Tokso-rok (홍씨독서록 or Hong's Annotated Bibliography of Korean and Chinise Book) is the only work on the history of Korean bibliographies that has the introductory notes to each class, that is description of the origin of subject fields, transition, and characteristics at the beginning of each class. This paper is aimed to examine the outline of the introductory description of class, to analyze the Four Category Classification System(사부분류법) devised by Hong Sok-chu, and to explain how the classes of Four Category Classification are set and ordered. This paper shows several characteristics in the idea of Hong's classification system. There characteristics were discovered by analyzing the content of each introduction of classes. The characteristics ale as follows First, classes are organize and arranged from the substantial problem to nonsubstantial ones. In other words, the greater the distance of the class from the substantial problem of Confucianism, the farther the order of the class will be found from the substantial problem. The order of classes is set by how the class is closed to the substantial problem in the same hierarchy. This principle is strictly applied to the Hong's classification system. Second, on the basis of democratic thought, he del·eloped the classification system. In other words, when he set up the priority of classes, he put emphasis on the democratism as a guideline. The organization of classes belong to the catagories of history (Sa-bu, 사부) and philosophy(Cha-bu, 자부) showed the application of this principle. Conclusively, this paper found that Hong did not randomly arrange the class older, but he set the class order with objective reasons and logic when he set the class order of arrangement.

  • PDF

A Study on Applicability of Machine Learning for Book Classification of Public Libraries: Focusing on Social Science and Arts (공공도서관 도서 분류를 위한 머신러닝 적용 가능성 연구 - 사회과학과 예술분야를 중심으로 -)

  • Kwak, Chul Wan
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.1
    • /
    • pp.133-150
    • /
    • 2021
  • The purpose of this study is to identify the applicability of machine learning targeting titles in the classification of books in public libraries. Data analysis was performed using Python's scikit-learn library through the Jupiter notebook of the Anaconda platform. KoNLPy analyzer and Okt class were used for Hangul morpheme analysis. The units of analysis were 2,000 title fields and KDC classification class numbers (300 and 600) extracted from the KORMARC records of public libraries. As a result of analyzing the data using six machine learning models, it showed a possibility of applying machine learning to book classification. Among the models used, the neural network model has the highest accuracy of title classification. The study suggested the need for improving the accuracy of title classification, the need for research on book titles, tokenization of titles, and stop words.

Factor-analysis based questionnaire categorization method for reliability improvement of evaluation of working conditions in construction enterprises

  • Lin, Jeng-Wen;Shen, Pu Fun
    • Structural Engineering and Mechanics
    • /
    • v.51 no.6
    • /
    • pp.973-988
    • /
    • 2014
  • This paper presents a factor-analysis based questionnaire categorization method to improve the reliability of the evaluation of working conditions without influencing the completeness of the questionnaire both in Taiwanese and Chinese construction enterprises for structural engineering applications. The proposed approach springs from the AI application and expert systems in structural engineering. Questions with a similar response pattern are grouped into or categorized as one factor. Questions that form a single factor usually have higher reliability than the entire questionnaire, especially in the case when the questionnaire is complex and inconsistent. By classifying questions based on the meanings of the words used in them and the responded scores, reliability could be increased. The principle for classification was that 90% of the questions in the same classified group must satisfy the proposed classification rule and consequently the lowest one was 92%. The results show that the question classification method could improve the reliability of the questionnaires for at least 0.7. Compared to the question deletion method using SPSS, 75% of the questions left were verified the same as the results obtained by applying the classification method.

CREATING MULTIPLE CLASSIFIERS FOR THE CLASSIFICATION OF HYPERSPECTRAL DATA;FEATURE SELECTION OR FEATURE EXTRACTION

  • Maghsoudi, Yasser;Rahimzadegan, Majid;Zoej, M.J.Valadan
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.6-10
    • /
    • 2007
  • Classification of hyperspectral images is challenging. A very high dimensional input space requires an exponentially large amount of data to adequately and reliably represent the classes in that space. In other words in order to obtain statistically reliable classification results, the number of necessary training samples increases exponentially as the number of spectral bands increases. However, in many situations, acquisition of the large number of training samples for these high-dimensional datasets may not be so easy. This problem can be overcome by using multiple classifiers. In this paper we compared the effectiveness of two approaches for creating multiple classifiers, feature selection and feature extraction. The methods are based on generating multiple feature subsets by running feature selection or feature extraction algorithm several times, each time for discrimination of one of the classes from the rest. A maximum likelihood classifier is applied on each of the obtained feature subsets and finally a combination scheme was used to combine the outputs of individual classifiers. Experimental results show the effectiveness of feature extraction algorithm for generating multiple classifiers.

  • PDF

A Study on the Classification of Variables Affecting Smartphone Addiction in Decision Tree Environment Using Python Program

  • Kim, Seung-Jae
    • International journal of advanced smart convergence
    • /
    • v.11 no.4
    • /
    • pp.68-80
    • /
    • 2022
  • Since the launch of AI, technology development to implement complete and sophisticated AI functions has continued. In efforts to develop technologies for complete automation, Machine Learning techniques and deep learning techniques are mainly used. These techniques deal with supervised learning, unsupervised learning, and reinforcement learning as internal technical elements, and use the Big-data Analysis method again to set the cornerstone for decision-making. In addition, established decision-making is being improved through subsequent repetition and renewal of decision-making standards. In other words, big data analysis, which enables data classification and recognition/recognition, is important enough to be called a key technical element of AI function. Therefore, big data analysis itself is important and requires sophisticated analysis. In this study, among various tools that can analyze big data, we will use a Python program to find out what variables can affect addiction according to smartphone use in a decision tree environment. We the Python program checks whether data classification by decision tree shows the same performance as other tools, and sees if it can give reliability to decision-making about the addictiveness of smartphone use. Through the results of this study, it can be seen that there is no problem in performing big data analysis using any of the various statistical tools such as Python and R when analyzing big data.