• Title/Summary/Keyword: Text data

Search Result 2,956, Processing Time 0.026 seconds

Utilizing Unlabeled Documents in Automatic Classification with Inter-document Similarities (문헌간 유사도를 이용한 자동분류에서 미분류 문헌의 활용에 관한 연구)

  • Kim, Pan-Jun;Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.1 s.63
    • /
    • pp.251-271
    • /
    • 2007
  • This paper studies the problem of classifying documents with labeled and unlabeled learning data, especially with regards to using document similarity features. The problem of using unlabeled data is practically important because in many information systems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. There are two steps In general semi-supervised learning algorithm. First, it trains a classifier using the available labeled documents, and classifies the unlabeled documents. Then, it trains a new classifier using all the training documents which were labeled either manually or automatically. We suggested two types of semi-supervised learning algorithm with regards to using document similarity features. The one is one step semi-supervised learning which is using unlabeled documents only to generate document similarity features. And the other is two step semi-supervised learning which is using unlabeled documents as learning examples as well as similarity features. Experimental results, obtained using support vector machines and naive Bayes classifier, show that we can get improved performance with small labeled and large unlabeled documents then the performance of supervised learning which uses labeled-only data. When considering the efficiency of a classifier system, the one step semi-supervised learning algorithm which is suggested in this study could be a good solution for improving classification performance with unlabeled documents.

Development of the Artwork using Music Visualization based on Sentiment Analysis of Lyrics (가사 텍스트의 감성분석에 기반 한 음악 시각화 콘텐츠 개발)

  • Kim, Hye-Ran
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.10
    • /
    • pp.89-99
    • /
    • 2020
  • In this study, we tried to produce moving-image works through sentiment analysis of music. First, Google natural language API was used for the sentiment analysis of lyrics, then the result was applied to the image visualization rules. In prior engineering researches, text-based sentiment analysis has been conducted to understand users' emotions and attitudes by analyzing users' comments and reviews in social media. In this study, the data was used as a material for the creation of artworks so that it could be used for aesthetic expressions. From the machine's point of view, emotions are substituted with numbers, so there is a limit to normalization and standardization. Therefore, we tried to overcome these limitations by linking the results of sentiment analysis of lyrics data with the rules of formative elements in visual arts. This study aims to transform existing traditional art works such as literature, music, painting, and dance to a new form of arts based on the viewpoint of the machine, while reflecting the current era in which artificial intelligence even attempts to create artworks that are advanced mental products of human beings. In addition, it is expected that it will be expanded to an educational platform that facilitates creative activities, psychological analysis, and communication for people with developmental disabilities who have difficulty expressing emotions.

Analysis of Consumer Awareness of Cycling Wear Using Web Mining (웹마이닝을 활용한 사이클웨어 소비자 인식 분석)

  • Kim, Chungjeong;Yi, Eunjou
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.5
    • /
    • pp.640-649
    • /
    • 2018
  • This study analyzed the consumer awareness of cycling wear using web mining, one of the big data analysis methods. For this, the texts of postings and comments related to cycling wear from 2006 to 2017 at Naver cafe, 'people who commute by bicycle' were collected and analyzed using R packages. A total of 15,321 documents were used for data analysis. The keywords of cycling wear were extracted using a Korean morphological analyzer (KoNLP) and converted to TDM (Term Document Matrix) and co-occurrence matrix to calculate the frequency of the keywords. The most frequent keyword in cycling wear was 'tights', including the opinion that they feel embarrassed because they are too tight. When they purchase cycling wear, they appeared to consider 'price', 'size', and 'brand'. Recently 'low price' and 'cost effectiveness' have become more frequent since 2016 than before, which indicates that consumers tend to prefer practical products. Moreover, the findings showed that it is necessary to improve not only the design and wearability, but also the material functionality, such as sweat-absorbance and quick drying, and the function of pad. These showed similar results to previous studies using a questionnaire. Therefore, it is expected to be used as an objective indicator that can be reflected in product development by real-time analysis of the opinions and requirements of consumers using web mining.

Mining Intellectual History Using Unstructured Data Analytics to Classify Thoughts for Digital Humanities (디지털 인문학에서 비정형 데이터 분석을 이용한 사조 분류 방법)

  • Seo, Hansol;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.141-166
    • /
    • 2018
  • Information technology improves the efficiency of humanities research. In humanities research, information technology can be used to analyze a given topic or document automatically, facilitate connections to other ideas, and increase our understanding of intellectual history. We suggest a method to identify and automatically analyze the relationships between arguments contained in unstructured data collected from humanities writings such as books, papers, and articles. Our method, which is called history mining, reveals influential relationships between arguments and the philosophers who present them. We utilize several classification algorithms, including a deep learning method. To verify the performance of the methodology proposed in this paper, empiricists and rationalism - related philosophers were collected from among the philosophical specimens and collected related writings or articles accessible on the internet. The performance of the classification algorithm was measured by Recall, Precision, F-Score and Elapsed Time. DNN, Random Forest, and Ensemble showed better performance than other algorithms. Using the selected classification algorithm, we classified rationalism or empiricism into the writings of specific philosophers, and generated the history map considering the philosopher's year of activity.

A Study on the Research Trends in Fintech using Topic Modeling (토픽 모델링을 이용한 핀테크 기술 동향 분석)

  • Kim, TaeKyung;Choi, HoeRyeon;Lee, HongChul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.11
    • /
    • pp.670-681
    • /
    • 2016
  • Recently, based on Internet and mobile environments, the Fintech industry that fuses finance and IT together has been rapidly growing and Fintech services armed with simplicity and convenience have been leading the conversion of all financial services into online and mobile services. However, despite the rapid growth of the Fintech industry, few studies have classified Fintech technologies into detailed technologies, analyzed the technology development trends of major market countries, and supported technology planning. In this respect, using Fintech technological data in the form of unstructured data, the present study extracts and defines detailed Fintech technologies through the topic modeling technique. Thereafter, hot and cold topics of the derived detailed Fintech technologies are identified to determine the trend of Fintech technologies. In addition, the trends of technology development in the USA, South Korea, and China, which are major market countries for major Fintech industrial technologies, are analyzed. Finally, through the analyses of networks between detailed Fintech technologies, linkages between the technologies are examined. The trends of Fintech industrial technologies identified in the present study are expected to be effectively utilized for the establishment of policies in the area of the Fintech industry and Fintech related enterprises' establishment of technology strategies.

A Case Study on Application of the NABI Program to Realize the 'Practice Centered Mechanism of Manifesting Character' ('실천 중심 인성 발현 메커니즘' 구현을 위한 NABI 프로그램의 적용 사례)

  • Park, Dahye;Park, Jongseok
    • Journal of The Korean Association For Science Education
    • /
    • v.36 no.6
    • /
    • pp.947-957
    • /
    • 2016
  • Today, in accordance with emphasizing the importance of character education, science educators have tried to implement character education in the field of science education. Therefore, this research aims to confirm the possibility of the realization of 'Practice Centered Mechanism of Manifesting Character' developed on the basis of character education theory through the application of the NABI Program based on Nature-Study. For this, the NABI program was applied to 24 3rd grade students of an elementary school for a period of ten months. Qualitative data was collected like students' reports, journals, and video recording of the classes. This data was classified into 'value, judgment, action,' the steps of 'Practice Centered Mechanism of Manifesting Character,' and text was written by interpreting the data. The research resulted in the following: First, students formed 'value' by making a connection with the objects. And the various values - individual, interpersonal, social, ecological, and spiritual - are formed according to the type of objects with which the students felt a connection to. Second, students need to judge the problems of the real world at the 'judgment' step. Third, at the 'action' step, students practice moral behavior in relation to the sympathy or feeling they felt with the objects that they made a connection to. In conclusion, 'Practice Centered Mechanism of Manifesting Character' can be realized through the application of the NABI Program. The NABI Program can be a definite way to implement character education in the field of science education.

Public Perception and Usage Pattern of Science Museum by Social Media Big Data Analysis (소셜 빅데이터 분석을 통해 알아본 대중의 과학관에 대한 인식 및 사용 행태)

  • Yun, Eunjeong;Park, Yunebae
    • Journal of The Korean Association For Science Education
    • /
    • v.37 no.6
    • /
    • pp.1005-1014
    • /
    • 2017
  • Focusing on the role of the science museum as an institution to improve the scientific literacy of the public, this study investigated public perception and behavior about science museum to know how much science museums affect the public by using social media big data analysis. For this purpose, we extracted texts containing 'science museum' in Naver blogs and Twitter, analyzed them by using network, frequency, co-ocurrence, and semantics analysis and compared them with the results in English speaking countries. As a result, blogs were mainly concerned with science museum among parents who have young children, while in Twitter posts from many students who visited as a group appeared. Therefore, the Korean public used science museum mainly as a space for children's experience, and in this case, programs and exhibitions of science museums are perceived positively. On the other hand, students who visited as a group showed some negative emotions. The result of comparison with the cases of foreign countries in terms of the function of the third generation science museum such as communications with the science museum and the public and the participation of the public in science, the Korean public hardly mentioned the scientific contents, words related to communications such as 'argue', and curators or staff after visiting the science museum. In contrast to many verbs related to meaningful activities such as 'learn', 'participate', 'listen', 'read', 'ask', 'think' appeared in English, only a small number of verbs include 'ask' and 'thin' appeared in Korean. Therefore, science museum need to improve impression, communicating with public, and involving activity with impact and variety after visit.

The Study on the Common Definition of Knowledge and its Development Relation -Focused on the General Information Systems, Knowledge Management, DSS and EIS- (지식의 공통적 정의와 발전적 연관 관계에 관한 연구 -일반적 정보시스템과 지식경영, DSS, EIS를 중심으로-)

  • Roh, Jeong-Ran
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.38 no.2
    • /
    • pp.239-259
    • /
    • 2004
  • The purpose of this study is to review the established research practices and managerial methods on the range of Knowledge that have been independently studied from the conventional information system (libraries) and the managerial information system (MIS, DSS and EIS) within the quantitative and the non-quantitative perspective. The information systems were developed through their own purpose since the 1950s and these days the corporate environments have become integrated due to the rapid creation and expansion of information. Therefore, to make fast decisions in this situation it is appropriate that these two systems, Library and the managerial information system, should be dealt within the same category. In other words, not only the quantitative data that become main sources of DSS or EIS, but also the qualitative data such as the text documents, video and audio data, which have been managed in the libraries and information centers and not extracted from the former, can be used as the new knowledge source. Also BSS/EIS can provide the splendid infrastructure for Knowledge Management(KM) while libraries/information centers manage the comprehensive range of explicit and tacit knowledge, which can be a facilitator or main driver for KM.

The Analysis of Inquiry Scopes in High School General Science Textbook Based on the 6th Curriculum - Emphasizing the Analysis of Inquiry Experiment - (제 6차 교육과정에 따른 고등학교 공통과학 교과서의 탐구영역 분석 - 탐구 실험을 중심으로 -)

  • Park, Won-Hyuck;Kim, Eun-A
    • Journal of The Korean Association For Science Education
    • /
    • v.19 no.4
    • /
    • pp.528-541
    • /
    • 1999
  • In order to obtain data for developing an ideal science curriculum. four kinds of General Science textbooks based on the 6th curriculum were analyzed. Particularly inquiry activities were analyzed by Scientific Inquiry Evaluation Inventory(SIEI). The results are as follows: 1) The average number of inquiry activities in four kinds of textbooks is 115.5. And the number in each textbook is very diverse: textbook A contains 94 inquiry activities, textbook B 147. textbook C 100 and textbook D 121. 2) As for the number of inquiry activity scopes in four kinds of textbook. observation comes to 22, experiment 117, interpreting data 196, investigation 64, discussion 51, classification 4 and prediction 8. And then the conceptional inquiry activity is about 2.3 times as many as the inquiry experiment. 3) According to the analysis of each inquiry task by SIEI. textbook A has 268, textbook B 328, textbook C 207 and textbook D 304. 4) In the analysis of the structure of inquiry activity, the evaluation of the competition and cooperation scale shows more emphasis on common tasks. no pooled results(87.1 %). The discussion scale mostly consists of activities without discussion required among students(83.5%). The evaluation of openness scale shows more emphasis on activities with problems, procedures and answers presented(58.3%). In the evaluation of inquiry scope scale, the inquiry scope scale mostly has the activities to demonstrate or verify the contents of the text(66.9%). 5) As for the analysis of inquiry activities as a whole. The inquiry pyramid in four kinds of General Science textbooks shows the type I that emphasizes the inquiry activities in low level such as gathering and organizing data. The inquiry index in four kinds of textbooks is average 47.8, shows very high level (above 35).

  • PDF

A Study on Development of GenBank-based Prototype System for Linking Heterogeneous Content (GenBank를 활용한 이종의 콘텐트 연계 프로토타입 시스템 개발 연구)

  • Ahn, Bu-Young;Shin, Young-Ju;Kim, Dea-Hwan
    • Journal of Information Management
    • /
    • v.40 no.4
    • /
    • pp.109-133
    • /
    • 2009
  • Among biological information, GenBank, provided by the National Center for Biotechnology Information (NCBI)of the United States, is a representative database on genetic information and is the most widely used by researchers around the world. Korea Institute of Science and Technology Information (KISTI) visits NCBI on a regular basis and downloads the latest version of GenBank to reorganize the information gathered there into a database. This database is provided for Korean researchers of science and technology through the Bio-KRISTAL search engine, developed by KISTI. This study aims to design a service model that links information on papers, patents, and biodiversity and other contents of NDSL, an integrated service on scientific and technological information run by KISTI, with GenBank's reference and organism fields and to develop a prototype system. For this purpose, this paper explores the possibility of a linkage and convergence service between heterogeneous content by: (a) collecting GenBank data from NCBI's FTP site; (b) dividing GenBank text files into basic and reference genetic information and restructuring them into a database; (c) extracting article and patent information from the GenBank reference fields to generate new tables; and (d) leveraging data mapping technology to implement a prototype system where GenBank and NDSL data are interlinked and provided.