• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.03 seconds

A Multidimensional Analysis Framework for XML Warehouses (XML 웨어하우스에 대한 다차원 분석 프레임워크)

  • Park, Byung-Kwon;Lee, Jong-Hak
    • Asia pacific journal of information systems
    • /
    • v.15 no.4
    • /
    • pp.153-164
    • /
    • 2005
  • Nowadays, large amounts of XML documents are available in the Internet. Thus, we need to analyze them multidimensionally in the same way as relational data. In this paper, we propose a new framework for multidimensional analysis of XML documents, which we call XML-OLAP. We base XML-OLAP on XML warehouses where all fact and dimension data are stored as XML documents. We build XML cubes from XML warehouses. We propose a new OLAP language for XML cubes, which we call XML-MDX. XML-MDX statements target XML cubes and use XQuery expressions to designate measure, axis and slicer. They incorporate text mining operations for aggregating text data. We apply XML-OLAP to the United States patent XML warehouse to demonstrate multidimensional analysis of XML documents.

Analytical Review of Data Formats and Technological Standards for Multimedia Information (멀티미디어 정보 관련 기술과 표준안에 대한 고찰)

  • 유사라
    • Journal of the Korean Society for information Management
    • /
    • v.13 no.2
    • /
    • pp.39-71
    • /
    • 1996
  • The goal of this paper is mainly to review and summarize the multimedia information formats and technical standards that have been applied so that library and information specialists can understand them systematically. The most fundamental differences between digital library and the classic one are specified by describing the general characteristics of hypermedia information environment. With the proprietary advantages of multimedia technology to digital library developments, the core technological standards (publicly available) of text and non-text data, including multimedia data, are reviewed. Finally, it describes some important and professional perspectives and roles of library practioners as well as researchers who are engaged with the innovation of digital library.

  • PDF

Automatic conversion of machining data by the recognition of press mold (프레스 금형의 특징형상 인식에 의한 가공데이터 자동변환)

  • 최홍태;반갑수;이석희
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1994.04a
    • /
    • pp.703-712
    • /
    • 1994
  • This paper presents an automatic conversion of machining data from the orthographic views of press mold by feature recognition rule. The system includes following 6 modules : separation of views, function support, dimension text recognition, feature recognition, dimension text check and feature processing modules. The characteristic of this system is that with minimum user intervention, it recognizes basic features such as holes, slots, pockets and clamping parts and thus automatically converts CAD drawing details of press mold into machining data using 2D CAD system instead of using an expensive 3D Modeler. The system is developed by using IBM-PC in the environment of AutoCAD R12, AutoLISP and MetaWare High C. Performance of the system is verified as a good interfacing of CAD and CAM when applied to a lot of sample drawings.

Text filtering by Boosting Linear Perceptrons

  • O, Jang-Min;Zhang, Byoung-Tak
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.10 no.4
    • /
    • pp.374-378
    • /
    • 2000
  • in information retrieval, lack of positive examples is a main cause of poor performance. In this case most learning algorithms may not characteristics in the data to low recall. To solve the problem of unbalanced data, we propose a boosting method that uses linear perceptrons as weak learnrs. The perceptrons are trained on local data sets. The proposed algorithm is applied to text filtering problem for which only a small portion of positive examples is available. In the experiment on category crude of the Reuters-21578 document set, the boosting method achieved the recall of 80.8%, which is 37.2% improvement over multilayer with comparable precision.

  • PDF

Continuous Audits Using Decision Support Systems

  • Mohammadi, Shaban
    • The Journal of Industrial Distribution & Business
    • /
    • v.6 no.3
    • /
    • pp.5-8
    • /
    • 2015
  • Purpose - This article's aim is to examine how the utilization of existing and future decision-support systems will lead to a change in the auditing process. Research design, data, and methodology - An information system is a special decision-support system that combines information obtained from various sources and communicates among them to help in assessing appropriate complex financial decisions. This paper analyzes techniques such as data and text mining as components of decision-support systems to be used in the auditing process. Results - We present views about how existing decision-support systems will lead to a change in audits. Auditors, who currently collect significant data manually, will in the future move towards management through complex decision-support systems. Conclusions - Although some internal audit functions are integrated into systems of continuous monitoring, the use of such systems remains limited. Thus, instead of multiple decision-support systems, a unified decision-support system can be deployed for this that includes sensors integrated within a company in different contexts (e.g., production, sales, and accounting) that continually monitors violations of controls, unusual patterns, and unusual transactions.

Integrated Patient Information Management System (환자 정보 통합 관리 시스템의 개발)

  • Jung, Sug-Hee;Park, Seung-Hun;Woo, Eung-Je
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1996 no.11
    • /
    • pp.45-47
    • /
    • 1996
  • we developed an information management system that manages various types of medical information such as text, image, sound, and laboratory data. We also developed a multimedia description system, in which medical doctors can describe his findings and interpretations with text and speech. The descriptions include the references to the data items stored in the information management systems. The communication between the description system and the information management systems is carried out using OLE/COM mechanism. The information management system was implemented by using Microsoft Open Data Base Connectivity(ODBC).

  • PDF

100 Article Paper Text Minning Data Analysis and Visualization in Web Environment (웹 환경에서 100 논문에 대한 텍스트 마이닝, 데이터 분석과 시각화)

  • Li, Xiaomeng;Li, Jiapei;Lee, HyunChang;Shin, SeongYoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.157-158
    • /
    • 2017
  • There is a method to analyze the big data of the article and text mining by using Python language. And Python is a kind of programming language and it is easy to operating. Reaserch and use Python to creat a Web environment that the research result of the analysis can show directly on the browser. In this thesis, there are 100 article paper frrom Altmetric, Altmetric tracks a range of sources to capture. It is necessary to collect and analyze the big data use an effictive method, After the result coming out, Use Python wordcloud to make a directive image that can show the highest frequency of words.

  • PDF

An Improved Text Classification Method for Sentiment Classification

  • Wang, Guangxing;Shin, Seong Yoon
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.1
    • /
    • pp.41-48
    • /
    • 2019
  • In recent years, sentiment analysis research has become popular. The research results of sentiment analysis have achieved remarkable results in practical applications, such as in Amazon's book recommendation system and the North American movie box office evaluation system. Analyzing big data based on user preferences and evaluations and recommending hot-selling books and hot-rated movies to users in a targeted manner greatly improve book sales and attendance rate in movies [1, 2]. However, traditional machine learning-based sentiment analysis methods such as the Classification and Regression Tree (CART), Support Vector Machine (SVM), and k-nearest neighbor classification (kNN) had performed poorly in accuracy. In this paper, an improved kNN classification method is proposed. Through the improved method and normalizing of data, the purpose of improving accuracy is achieved. Subsequently, the three classification algorithms and the improved algorithm were compared based on experimental data. Experiments show that the improved method performs best in the kNN classification method, with an accuracy rate of 11.5% and a precision rate of 20.3%.

Analysis of Laughter Therapy Trend Using Text Network Analysis and Topic Modeling

  • LEE, Do-Young
    • Journal of Wellbeing Management and Applied Psychology
    • /
    • v.5 no.4
    • /
    • pp.33-37
    • /
    • 2022
  • Purpose: This study aims to understand the trend and central concept of domestic researches on laughter therapy. For the analysis, this study used total 72 theses verified by inputting the keyword 'laughter therapy' from 2007 to 2021. Research design, data and methodology: This study performed the development and analysis of keyword co-occurrence network, analyzed the types of researches through topic modeling, and verified the visualized word cloud and sociogram. The keyword data that was cleaned through preprocessing, was analyzed in the method of centrality analysis and topic modeling through the 1-mode matrix conversion process by using the NetMiner (version 4.4) Program. Results: The keywords that most appeared for last 14 years were laughter therapy, depression, the elderly, and stress. The five topics analyzed in thesis data from 2007 to 2021 were therapy, cognitive behavior, quality of life, stress, and the elderly. Conclusions: This study understood the flow and trend of research topics of domestic laughter therapy for last 14 years, and there should be continuous researches on laughter therapy, which reflects the flow of time in the future.

Bigdata Analysis on Keyword by Generations through Text Mining: Focused on Board of Nate Pann in 10s, 20s, 30s (텍스트 마이닝을 활용한 세대별 키워드 빅데이터 분석: 네이트판 10대·20대·30대 게시판을 중심으로)

  • Jeong, Baek;Bae, Sungwon;Hwangbo, Yujeong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.513-516
    • /
    • 2022
  • 본 논문에서는 텍스트 마이닝 기법을 이용하여 MZ 세대를 이해하는 키워드를 도출하고자 한다. MZ 세대의 비중이 높아지면서, MZ 세대를 분석하려고 하는 많은 연구들이 수행되고 있다. 이에 본 연구에서는 MZ 세대를 이해하기 위하여 네이트 판의 연령별 게시판 크롤링을 통해 빅데이터를 수집하였다. 그리고 텍스트 마이닝 기법을 활용하여 10대, 20대, 30대의 각각의 키워드를 도출할 수 있었다. 본 논문에서 도출된 키워드는 이는 MZ 세대를 이해하는데 중요한 키워드로 볼 수 있을 것이다. 향후 연구로는 MZ 세대와 기성 세대를 비교하기 위하여 추가 크롤링을 통해 세대 간 비교 연구를 수행하고자 한다.

  • PDF