• Title/Summary/Keyword: text file

Search Result 195, Processing Time 0.027 seconds

A Study on Educational Data Mining for Public Data Portal through Topic Modeling Method with Latent Dirichlet Allocation (LDA기반 토픽모델링을 활용한 공공데이터 기반의 교육용 데이터마이닝 연구)

  • Seungki Shin
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.5
    • /
    • pp.439-448
    • /
    • 2022
  • This study aims to search for education-related datasets provided by public data portals and examine what data types are constructed through classification using topic modeling methods. Regarding the data of the public data portal, 3,072 cases of file data in the education field were collected based on the classification system. Text mining analysis was performed using the LDA-based topic modeling method with stopword processing and data pre-processing for each dataset. Program information and student-supporting notifications were usually provided in the pre-classified dataset for education from the data portal. On the other hand, the characteristics of educational programs and supporting information for the disabled, parents, the elderly, and children through the perspective of lifelong education were generally indicated in the dataset collected by searching for education. The results of data analysis through this study show that providing sufficient educational information through the public data portal would be better to help the students' data science-based decision-making and problem-solving skills.

The Acoustic Analysis of Korean Read Speech - with respect to the prosodic phrasing - (한국어 낭독체 문장의 음향분석 -바람과 햇님의 운율구 생성을 중심으로-)

  • Sung Chuljae
    • Proceedings of the KSPS conference
    • /
    • 1996.02a
    • /
    • pp.157-172
    • /
    • 1996
  • This study aims to suggest some theoretical methodology for analysis of the prosodic patterns in Korean Read Speech. The engineering effort relevant to the phonetic study has focused to the importance of prosodic phrasing which may play a major role in analyzing the phonetic DB. Before establishing the prosodic phrase as the prosodic unit, we should describe the features of the boundary signal in a target sentence. With this in mind, the general characteristics of Read Speech and the ToBI(tones and Break Indices), which has been currently in vogue with respect to the prosodic labelling system were presented as the first step. The concrete analysis was carried out with the fable 'North Wind and the Sun' Korean version, where about 25 prosodic units were discriminated by perceptual approach for 5 subjects. Establishing various informations which can be used for deciding a boundary position systematically, we can proceed to the next, viz. acoustic analysis of prosodic unit. The most important which we primarily study for improving the naturalness of synthetic speech may be, at first, detecting the boundary signals in the speech file and accordingly reestablishment it within the raw text.

  • PDF

Resolving the Ambigities in World Sense by using Automatic Keyword Network in Information Retrieval (정보검색에서의 어의 중의성 해소를 위한 자동 키워드망의 이용)

  • Kim, Jung-Sae;Jang, Duk-Sung
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.12
    • /
    • pp.3855-3865
    • /
    • 2000
  • The automatic indexing is a compulsory part for the text retrieval system. However it is impossible to rank the appropriate texts at top. Furthermore, it is more difficult to prevent to rank the inappropriate texts having homonyms at top by only the automatic indexing. In this paper, we proposed the two-level retrieval system to enhance the retrieval efficiency, in which Automatic Keyword Network (AKN) is used at the second-level process. The firsHevel search is carried out with an inverted index file generated by the automatic indexing. On the other hand the second-level search exploits AKN based on the degree of asslxiation between terms. We have developed several formulas for rearranging the rank of texts at second-level search, and evaluated the performance of the effects of them on resolving the word sense ambiguities.

  • PDF

A study of analysis and improvement of security vulnerability in Bluetooth for data transfer (블루투스 환경에서 데이터 전송 시 보안 취약점 분석 및 개선 방안 관련 연구)

  • Baek, Jong-Kyung;Park, Jae-Pyo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.6
    • /
    • pp.2801-2806
    • /
    • 2011
  • During data transmissions via Bluetooth networks, data to be encrypted, or plain text between the application layer and the device layer, can be hacked similar to a key-logger by the major function hooking technique of Windows Kernel Driver. In this paper, we introduce an improved protection module which provides data encryption transmission by modifying the data transmission driver of the Bluetooth device layer, and also suggest a self-protecting scheme which prevents data exposure by various hacking tools. We implement the protection module to verify the confidentiality guarantee. Our protection module which provides data encryption with minimal latency can be expected the widespread utilization in Bluetooth data transmission.

Benchmarking of BioPerl, Perl, BioJava, Java, BioPython, and Python for Primitive Bioinformatics Tasks and Choosing a Suitable Language

  • Ryu, Tae-Wan
    • International Journal of Contents
    • /
    • v.5 no.2
    • /
    • pp.6-15
    • /
    • 2009
  • Recently many different programming languages have emerged for the development of bioinformatics applications. In addition to the traditional languages, languages from open source projects such as BioPerl, BioPython, and BioJava have become popular because they provide special tools for biological data processing and are easy to use. However, it is not well-studied which of these programming languages will be most suitable for a given bioinformatics task and which factors should be considered in choosing a language for a project. Like many other application projects, bioinformatics projects also require various types of tasks. Accordingly, it will be a challenge to characterize all the aspects of a project in order to choose a language. However, most projects require some common and primitive tasks such as file I/O, text processing, and basic computation for counting, translation, statistics, etc. This paper presents the benchmarking results of six popular languages, Perl, BioPerl, Python, BioPython, Java, and BioJava, for several common and simple bioinformatics tasks. The experimental results of each language are compared through quantitative evaluation metrics such as execution time, memory usage, and size of the source code. Other qualitative factors, including writeability, readability, portability, scalability, and maintainability, that affect the success of a project are also discussed. The results of this research can be useful for developers in choosing an appropriate language for the development of bioinformatics applications.

Clustering of Web Document Exploiting with the Co-link in Hypertext (동시링크를 이용한 웹 문서 클러스터링 실험)

  • 김영기;이원희;권혁철
    • Journal of Korean Library and Information Science Society
    • /
    • v.34 no.2
    • /
    • pp.233-253
    • /
    • 2003
  • Knowledge organization is the way we humans understand the world. There are two types of information organization mechanisms studied in information retrieval: namely classification md clustering. Classification organizes entities by pigeonholing them into predefined categories, whereas clustering organizes information by grouping similar or related entities together. The system of the Internet information resources extracts a keyword from the words which appear in the web document and draws up a reverse file. Term clustering based on grouping related terms, however, did not prove overly successful and was mostly abandoned in cases of documents used different languages each other or door-way-pages composed of only an anchor text. This study examines infometric analysis and clustering possibility of web documents based on co-link topology of web pages.

  • PDF

Design and Implementation of Image Converter Engine based on Wireless Internet (무선 인터넷 기반의 이미지 변환 엔진의 설계 및 구현)

  • 최병철;박영삼;정영지
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.8
    • /
    • pp.1194-1199
    • /
    • 2002
  • There is big problem that must change contents that is made out by HTML to service existent wire HTML contents in wireless environment to WML, HDML, mHTML. It is Markup language automatic conversion engine that is proposed to solve these problem. But, facility that studied most Markup language automatic conversion engines process text and image information so far is not sufficient. In this thesis, image file format that wireless terminal supports image on wire internet, presented mechanism that change real time automatic and algorithm that can do to optimize output of image which consider screen size of wireless terminal.

Development of a Bridge Disaster Management System Using GIS (GIS를 이용한 교량재해관리시스템 개발)

  • Ahn, Ki-Won;Yoo, Hwan-Hee;Choi, Yun-Soo;Shin, Sok-Hyo
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.7 no.2 s.14
    • /
    • pp.69-80
    • /
    • 1999
  • The purpose of this study is to develop a Bridge Disaster Management System for bridge safety control using Geographic Information Systems(GIS). The constructed database includes several graphic layers such as basemap, road, bridge location, etc. and has related text attributes for 32 bridges and its facilities in Chinju City. Using the language of Visual Basic 5.0, personal computer based Bridge Disaster Management System which has several functions for bridge safety analysis was developed. The developed GIS based Bridge Disaster Management System has the functions of fast and efficient data searching, file management, searching and management of bridge characteristics, bridge related map viewing, searching and management of traffic survey, bridge inspection and repair work results, and evaluating the bridge safety grades, etc..

  • PDF

Trends of Web-based OPAC Search Behavior via Transaction Log Analysis (트랜잭션 로그 분석을 통한 웹기반 온라인목록의 검색행태 추이 분석)

  • Lee, Sung-Sook
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.23 no.2
    • /
    • pp.209-233
    • /
    • 2012
  • In this study in order to verify the overall information seeking behavior of the Web-based OPAC users, it was analyzed transaction log file for 7 years. Regarding Web-based OPAC information seeking behavior, it was studied from the perspective of information seeking strategy and information seeking failure. In search strategy, it was analyzed search type, search options, Boolean operator, length of search text, number of uses of word, number of use Web-based OPAC, number of use by time, by week day. Also, in search failure, search failure ratio, search failure ratio by search options, search failure ratio by Boolean operator were analyzed. The result of this study is expected to be utilized for OPAC system and service improvement in the future.

ManBIF: a Program for Mining and Managing Biobank Impact Factor Data

  • Yu, Ki-Jin;Nam, Jung-Min;Her, Yun;Chu, Min-Seock;Seo, Hyung-Seok;Kim, Jun-Woo;Jeon, Jae-Pil;Park, Hye-Kyung;Park, Kie-Jung
    • Genomics & Informatics
    • /
    • v.9 no.1
    • /
    • pp.37-38
    • /
    • 2011
  • Biobank Impact Factor (BIF), which is a very effective criterion to evaluate the activity of biobanks, can be estimated by the citation information of biobanks from scientific papers. We have developed a program, ManBIF, to investigate the citation information from PDF files in the literature. The program manages a dictionary for expressions to represent biobanks and their resources, mines the citation information by converting PDF files to text files and searching with a dictionary, and produces a statistical report file. It can be used as an important tool by biobanks.