• Title/Summary/Keyword: Text Construction

Search Result 386, Processing Time 0.026 seconds

Developing a Test Collection for Korean Text Categorization (한국어 문서분류 테스트컬렉션 개발)

  • Ra, Dong-Yul;Kim, Yunsik;Shin, Hyun-Joo;Lee, Kyu-Hee;Kim, Tae-Kyu;Kang, Hyun-Kyu;Choe, Ho-Seop;Yoon, Hwa-Mook
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.435-439
    • /
    • 2007
  • Document categorization system is important in the internet age in which huge number of documents are created and need to be dealt with. By this reason a lot of research has been done in this field. For the development of the system, a supervised learning method is widely used. This approach needs a test collection as a prerequisite. For the case of English, several test collections are available which provide a lot of help for developing systems and doing research. But no public test collections have been reported and are not available in the case of Korean. To improve the situation for Korean we are undergoing the construction of a Korean test collection. In this paper the approaches being used and current stage of the collection will be described.

  • PDF

Automatic Construction of Class Hierarchies and Named Entity Dictionaries using Korean Wikipedia (한국어 위키피디아를 이용한 분류체계 생성과 개체명 사전 자동 구축)

  • Bae, Sang-Joon;Ko, Young-Joong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.492-496
    • /
    • 2010
  • Wikipedia as an open encyclopedia contains immense human knowledge written by thousands of volunteer editors and its reliability is also high. In this paper, we propose to automatically construct a Korean named entity dictionary using the several features of the Wikipedia. Firstly, we generate class hierarchies using the class information from each article of Wikipedia. Secondly, the titles of each article are mapped to our class hierarchies, and then we calculate the entropy value of the root node in each class hierarchy. Finally, we construct named entity dictionary with high performance by removing the class hierarchies which have a higher entropy value than threshold. Our experiment results achieved overall F1-measure of 81.12% (precision : 83.94%, recall : 78.48%).

A Spelling Error Correction Model in Korean Using a Correction Dictionary and a Newspaper Corpus (교정사전과 신문기사 말뭉치를 이용한 한국어 철자 오류 교정 모델)

  • Lee, Se-Hee;Kim, Hark-Soo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.427-434
    • /
    • 2009
  • With the rapid evolution of the Internet and mobile environments, text including spelling errors such as newly-coined words and abbreviated words are widely used. These spelling errors make it difficult to develop NLP (natural language processing) applications because they decrease the readability of texts. To resolve this problem, we propose a spelling error correction model using a spelling error correction dictionary and a newspaper corpus. The proposed model has the advantage that the cost of data construction are not high because it uses a newspaper corpus, which we can easily obtain, as a training corpus. In addition, the proposed model has an advantage that additional external modules such as a morphological analyzer and a word-spacing error correction system are not required because it uses a simple string matching method based on a correction dictionary. In the experiments with a newspaper corpus and a short message corpus collected from real mobile phones, the proposed model has been shown good performances (a miss-correction rate of 7.3%, a F1-measure of 97.3%, and a false positive rate of 1.1%) in the various evaluation measures.

Study on Curriculum Model Construction of Media Education (미디어 교육 교육과정 모델 구성에 관한 연구)

  • Kim, Yang-Eun
    • Korean journal of communication and information
    • /
    • v.37
    • /
    • pp.73-99
    • /
    • 2007
  • The study's purpose is to present model for constructing of media education's curriculum. The concept and goal of media education have changed by various media education's paradigm historically. Also this changed the content and the goal of the media education. Curriculum model of media education is various by the nation. AS so the study analyzed the media education's curriculum model to be applying in USA, United British, German, Australia, Canada. Through this, the model for constructing of media education' curriculum is presented. The curriculum model helps to be executed systematically media education in Korea. According to the result, The model is composed of the goal and content. The goal of media education is composed of media knowledge, media appreciation, media analyze, media production. The content of media education is composed of Media, Text, User.

  • PDF

Images of Law and Reality in TV Legal Series: Focusing on (TV 법정 프로그램에 나타난 법 이미지와 현실구성: <실화극장-죄와 벌>을 중심으로)

  • Lee, Hee-Eun
    • Korean journal of communication and information
    • /
    • v.50
    • /
    • pp.121-142
    • /
    • 2010
  • Can law be combined with television entertainment programs? This paper explores the ways in which law systems and law culture are reflected in and reflect the television legal series. TV legal series, such as legal dramas and infotainment shows, provide platforms for the audiences, who otherwise have few opportunity in real life, to engage with legal systems in societies. Adopting loosely dramatized reality programs, these legal series not only entertain and inform audiences but also educate citizens. This paper combines analyses of theoretical debates on law and television with analysis of TV text. The result shows that , dramatized enactment based on true stories and criminal cases, may have an important ideological role in which fictionalized dramas mask the hard realities and authoritative legal systems. By doing so, TV legal shows play their roles not as mere symbolic representation but as powerful institutions that construct the image of law and reality.

  • PDF

The Effects of a Writing Program and the Type of Picture Book Used on the Early Stages of Writing and Creative Writing in Young Children (쓰기지도 프로그램과 프로그램에서 사용된 그림책 유형의 차이가 유아의 기초쓰기와 창의적 쓰기에 미치는 영향)

  • Cho, Kyung Seon;Hyun, Eun Ja
    • Korean Journal of Child Studies
    • /
    • v.33 no.5
    • /
    • pp.91-115
    • /
    • 2012
  • The purpose of this study was to investigate the effects of a writing program and the type of picture book used on the early stages of writing and creative writing in young children. The different stages of writing amongst young children was divided into an early stage of writing for pre-schoolers and creative writing for spontaneous expression and problem solving. The subjects comprised 36 children aged 5 from a child daycare center in Seoul. Among the KISE-BAAT and Creative tests, the writing test and creative writing test were used. The early stage of writing and creative writing were both analyzed by means of both ANCOVA and T-test. Firstly, in the subscale of the early stage of writing(ability to mark, use vocabulary, create sentences and text construction), the early stage writing of the experiment group scored higher than that of the comparison group. Secondly, in terms of the type of picture books, the group using informational picture books had greater effects on the early stage of writing than the group using narrative picture books. Thirdly, the writing program itself had a positive effect on creative writing. In the subscale of creative writing (fluency, flexibility, novelty), the group using informational picture books made greater progress in fluency and novelty than the group using narrative picture books.

Semantic Video Retrieval Based On User Preference (사용자 선호도를 고려한 의미기반 비디오 검색)

  • Jung, Min-Young;Park, Sung-Han
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.4
    • /
    • pp.127-133
    • /
    • 2009
  • To ensure access to rapidly growing video collection, video indexing is becoming more and more essential. A database for video should be build for fast searching and extracting the accurate features of video information with more complex characteristics. Moreover, video indexing structure supports efficient retrieval of interesting contents to reflect user preferences. In this paper, we propose semantic video retrieval method based on user preference. Unlikely the previous methods do not consider user preferences. Futhermore, the conventional methods show the result as simple text matching for the user's query that does not supports the semantic search. To overcome these limitations, we develop a method for user preference analysis and present a method of video ontology construction for semantic retrieval. The simulation results show that the proposed algorithm performs better than previous methods in terms of semantic video retrieval based on user preferences.

Efficient Dynamic Index Structure for SSD (SPM) (SSD에 적합한 동적 색인 저장 구조 : SPM)

  • Jin, Du-Seok;Kim, Jin-Suk;You, Beom-Jong;Jung, Hoe-Kyung
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.2
    • /
    • pp.54-62
    • /
    • 2010
  • Inverted index structures have become the most efficient data structure for high performance indexing of large text collections, especially online index maintenance, In-Place and merge-based index structures are the two main competing strategies for index construction in dynamic search environments. In the above-mentioned two strategies, a contiguity of posting information is the mainstay of design for online index maintenance and query time. Whereas with the emergence of new storage device(SSD, SCRAM), those do not consider a contiguity of posting information in the design of index structures because of its superiority such as low access latency and I/O throughput speeds. However, SSD(Solid State Drive) is not well suited for traditional inverted structures due to the poor random write throughput in practical systems. In this paper, we propose the new efficient online index structure(SPM) for SSD that significantly reduces the query time and improves the index maintenance performance.

Rule-based Speech Recognition Error Correction for Mobile Environment (모바일 환경을 고려한 규칙기반 음성인식 오류교정)

  • Kim, Jin-Hyung;Park, So-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.10
    • /
    • pp.25-33
    • /
    • 2012
  • In this paper, we propose a rule-based model to correct errors in a speech recognition result in the mobile device environment. The proposed model considers the mobile device environment with limited resources such as processing time and memory, as follows. In order to minimize the error correction processing time, the proposed model removes some processing steps such as morphological analysis and the composition and decomposition of syllable. Also, the proposed model utilizes the longest match rule selection method to generate one error correction candidate per point, assumed that an error occurs. For the purpose of deploying memory resource, the proposed model uses neither the Eojeol dictionary nor the morphological analyzer, and stores a combined rule list without any classification. Considering the modification and maintenance of the proposed model, the error correction rules are automatically extracted from a training corpus. Experimental results show that the proposed model improves 5.27% on the precision and 5.60% on the recall based on Eojoel unit for the speech recognition result.

Similarity Analysis of Hospitalization using Crowding Distance

  • Jung, Yong Gyu;Choi, Young Jin;Cha, Byeong Heon
    • International journal of advanced smart convergence
    • /
    • v.5 no.2
    • /
    • pp.53-58
    • /
    • 2016
  • With the growing use of big data and data mining, it serves to understand how such techniques can be used to understand various relationships in the healthcare field. This study uses hierarchical methods of data analysis to explore similarities in hospitalization across several New York state counties. The study utilized methods of measuring crowding distance of data for age-specific hospitalization period. Crowding distance is defined as the longest distance, or least similarity, between urban cities. It is expected that the city of Clinton have the greatest distance, while Albany the other cities are closer because they are connected by the shortest distance to each step. Similarities were stronger across hospital stays categorized by age. Hierarchical clustering can be applied to predict the similarity of data across the 10 cities of hospitalization with the measurement of crowding distance. In order to enhance the performance of hierarchical clustering, comparison can be made across congestion distance when crowding distance is applied first through the application of converting text to an attribute vector. Measurements of similarity between two objects are dependent on the measurement method used in clustering but is distinguished from the similarity of the distance; where the smaller the distance value the more similar two things are to one other. By applying this specific technique, it is found that the distance between crowding is reduced consistently in relationship to similarity between the data increases to enhance the performance of the experiments through the application of special techniques. Furthermore, through the similarity by city hospitalization period, when the construction of hospital wards in cities, by referring to results of experiments, or predict possible will land to the extent of the size of the hospital facilities hospital stay is expected to be useful in efficiently managing the patient in a similar area.