• Title/Summary/Keyword: Text Title

Search Result 150, Processing Time 0.035 seconds

Study on the Improvement of Extraction Performance for Domain Knowledge based Wrapper Generation (도메인 지식 기반 랩퍼 생성의 추출 성능 향상에 관한 연구)

  • Jeong Chang-Hoo;Choi Yun-Soo;Seo Jeong-Hyeon;Yoon Hwa-Mook
    • Journal of Internet Computing and Services
    • /
    • v.7 no.4
    • /
    • pp.67-77
    • /
    • 2006
  • Wrappers play an important role in extracting specified information from various sources. Wrapper rules by which information is extracted are often created from the domain-specific knowledge. Domain-specific knowledge helps recognizing the meaning the text representing various entities and values and detecting their formats However, such domain knowledge becomes powerless when value-representing data are not labeled with appropriate textual descriptions or there is nothing but a hyper link when certain text labels or values are expected. In order to alleviate these problems, we propose a probabilistic method for recognizing the entity type, i.e. generating wrapper rules, when there is no label associated with value-representing text. In addition, we have devised a method for using the information reachable by following hyperlinks when textual data are not immediately available on the target web page. Our experimental work shows that the proposed methods help increasing precision of the resulting wrapper, particularly extracting the title information, the most important entity on a web page. The proposed methods can be useful in making a more efficient and correct information extraction system for various sources of information without user intervention.

  • PDF

Spin in Randomised Clinical Trial Reports of Interventions for Obesity (비만 중재 관련 무작위배정 비교임상연구 보고의 spin 연구)

  • Lee, Sle;Won, Jiyoon;Kim, Seoyeon;Park, Su Jeong;Lee, Hyangsook
    • Korean Journal of Acupuncture
    • /
    • v.34 no.4
    • /
    • pp.251-264
    • /
    • 2017
  • Objectives : To identify the prevalence and types of spin in randomised controlled trials(RCTs) of obesity with statistically non-significant results for primary outcomes to provide adequate reporting directions. Methods : Spin is specific reporting strategy that could lead the readers to misinterpret the results of RCTs. RCTs on obesity with statistically non-significant primary outcomes published from July 2015 to June 2016 were retrieved from PubMed. All included RCTs were classified into 3 intervention categories. The identification and classification of spin in the included articles was performed by two independent researchers. Results : Among 46 RCTs with statistically non-significant primary outcomes, 32 studies were assessed as having at least one spin in title, abstract or main text. Of these, 9 articles were on complementary and alternative medicine, 7 on western medicine and 16 on dietary supplement and exercise. The frequency of spin among the types of interventions was similar. The most common type of spin was 'focusing on statistical significance within-group comparison' in results section of abstract and main text, and 'focusing only on treatment effectiveness with no consideration of statistical significance' in conclusion section of abstract and main text. Studies where random sequence generation was appropriately done was less likely to have spin. Conclusions : As a majority of obesity RCTs have spin, researchers should pay more attention to adequately interpreting and reporting statistically non-significant results.

Trends in FTA Research of Domestic and International Journal using Paper Abstract Data (초록데이터를 활용한 국내외 FTA 연구동향: 2000-2020)

  • Hee-Young Yoon;Il-Youp Kwak
    • Korea Trade Review
    • /
    • v.45 no.5
    • /
    • pp.37-53
    • /
    • 2020
  • This study aims to provide the implications of research development by comparing domestic and international studies conducted on the subject of FTA. To this end, among the papers written during the period from 2000 to July 23, 2020, papers whose title is searched by FTA (Free Trade Agreement) were selected as research data. In the case of domestic research, 1,944 searches from the Korean Citation Index (KCI) and 970 from the Web of Science and SCOPUS were selected for international research, and the research trend was analyzed through keywords and abstracts. Frequency analysis and word embedding (Word2vec) were used to analyze the data and visualized using t-SNE and Scattertext. The results of the analysis are as follows. First, in the top 30 keywords of domestic and international research, 16 out of 30 were found to be the same. In domestic research, many studies have been conducted to analyze the outcomes or expected effects of countries that have concluded or discussed FTAs with Korea, on the other hand there are diverse range of study subjects in international research. Second, in the word embedding analysis, t-SNE was used to visually represent the research connection of the top 60 keywords. Finally, Scattertext was used to visually indicate which keywords were frequently used in studies from 2000 to 2010, and from 2011 to 2020. This study is the first to draw implications for academic development through abstract and keyword analysis by applying various text mining approaches to the FTA related research papers. Further in-depth research is needed, including collecting a variety of FTA related text data, comparing and analyzing FTA studies in different countries.

Technology Clustering Using Textual Information of Reference Titles in Scientific Paper (과학기술 논문의 참고문헌 텍스트 정보를 활용한 기술의 군집화)

  • Park, Inchae;Kim, Songhee;Yoon, Byungun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.2
    • /
    • pp.25-32
    • /
    • 2020
  • Data on patent and scientific paper is considered as a useful information source for analyzing technological information and has been widely utilized. Technology big data is analyzed in various ways to identify the latest technological trends and predict future promising technologies. Clustering is one of the ways to discover new features by creating groups from technology big data. Patent includes refined bibliographic information such as patent classification code whereas scientific paper does not have appropriate bibliographic information for clustering. This research proposes a new approach for clustering data of scientific paper by utilizing reference titles in each scientific paper. In this approach, the reference titles are considered as textual information because each reference consists of the title of the paper that represents the core content of the paper. We collected the scientific paper data, extracted the title of the reference, and conducted clustering by measuring the text-based similarity. The results from the proposed approach are compared with the results using existing methodologies that one is the approach utilizing textual information from titles and abstracts and the other one is a citation-based approach. The suggested approach in this paper shows statistically significant difference compared to the existing approaches and it shows better clustering performance. The proposed approach will be considered as a useful method for clustering scientific papers.

Components for Picturebook Peritext Analysis (그림책 페리텍스트 분석을 위한 구성 요소)

  • A Reum Nam;Sang Lim Kim
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.2
    • /
    • pp.181-188
    • /
    • 2023
  • Academic interest in the educational value of picturebooks for children and the narrative importance of peritexts have been increased. This study was conducted with the purpose of presenting the components for analyzing the picturebook peritext. To this end, the components of the peritext used in 11 previous studies that analyzed the peritext of picturebooks were comprehensively reviewed. Looking at the results of the study, the components used in previous studies were largely categorized into four categories, and according to the characteristics of the components within each category, they were classified into 'basic information', 'physical elements', 'positional elements', and 'content elements.' The first category, 'basic information,' includes the title, authors' name, publication information, award information, and dedication/acknowledgment, laudatory comment. The second category, 'physical elements,' includes the format, book binding, and quality of material. The third category, 'positional elements,' includes cover(front cover, back cover, spine), endpaper, title page, copyright page, dust jacket and belly band. The fourth category, 'content elements,' includes text, illustration, typography, layout and page shape. Through the results of this study, it is expected that research on the analysis and utilization of various picturebooks will be activated.

A Study on Jo Bok-seong's Insect-related Books Published in 1948: Focused on Story of Insects and About Insects (1948년에 출간된 조복성의 곤충 관련 저작에 관한 연구 - 『곤충이야기』와 『곤충기』를 중심으로 -)

  • Jin, Na-Young
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.2
    • /
    • pp.267-294
    • /
    • 2019
  • This study conducted analysis on forms and contents of Story of Insects (Gonchung Iyagi) and About Insects (Gonchung-gi), writings of biologist Jo Bok-seong published in 1948 to examine characteristics of two books and compare them. Story of Insects was made in the form of front cover-title page-foreword-table of contents-main text-copyright clause-advertisement-back cover, with the book size being A5 format. Contents of the book were divided into nine groups according to the characteristics of 65 species insects, to describe their characteristics. While, About Insects was made in the form of cover-title page-foreword-table of contents-main text-copyright clause-publication message of Eulyoo Mungo-advertisement-back cover, with the book size of A6 format. Contents of the book were divided into the author's own 11 groups according to the characteristics of 56 species insects, to describe their characteristics. About Insects being Eulyoo Publishing Co. and Story of Insects being Association of Joseon Children's Culture (abbreviated as Ahyeop) - sister company of Eulyoo Publishing Co. - but with the same basis.

A study on content strategy for long-term exposure of YouTube's 'Trending' (유튜브 '인기급상승' 장기 노출을 위한 콘텐츠 전략에 관한 연구)

  • Lee, Min-Young;Byun, Guk-Do;Choi, Sang-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.4
    • /
    • pp.359-372
    • /
    • 2022
  • This study aimed to derive a YouTube content strategy that can be exposed to Trending for a long time by comparing the features of 20 channels in the short/long term using 'YouTube Trending' data in 2021. First, through Pearson's correlation analysis, we found that various factors such as 'the number of title or tag letters' related to long-term exposure, and set this as an index to compare features. As a result, 1)'video title' of about 40-45 letters without excessive special characters, 2)'video length' within 10 minutes, 3)'Video description' is effective when writing 2-3 sentences and adding SNS information or including 3 key tags. Also, it would be more effective if you set key tag pairs such as (먹방, mukbang), (역대급, 레전드) derived through text mining. Through this, the channel will spread globally, bringing various advantages, and will be used as an indicator to evaluate the globality of the channel.

A study on the improving and constructing the content for the Sijo database in the Period of Modern Enlightenment (계몽기·근대시조 DB의 개선 및 콘텐츠화 방안 연구)

  • Chang, Chung-Soo
    • Sijohaknonchong
    • /
    • v.44
    • /
    • pp.105-138
    • /
    • 2016
  • Recently with the research function, "XML Digital collection of Sijo Texts in the Period of Modern Enlightenment" DB data is being provided through the Korean Research Memory (http://www.krm.or.kr) and the foundation for the constructing the contents of Sijo Texts in the Period of Modern Enlightenment has been laid. In this paper, by reviewing the characteristics and problems of Digital collection of Sijo Texts in the Period of Modern Enlightenment and searching for the improvement, I tried to find a way to make it into the content. This database has the primary meaning in the integrating and glancing at the vast amounts of Sijo in the Period of Modern Enlightenment to reaching 12,500 pieces. In addition, it is the first Sijo data base which is provide the variety of search features according to literature, name of poet, title of work, original text, per period, and etc. However, this database has the limits to verifying the overall aspects of the Sijo in the Period of Modern Enlightenment. The title and original text, which is written in the archaic word or Chinese character, could not be searched, because the standard type text of modern language is not formatted. And also the works and the individual Sijo works released after 1945 were missing in the database. It is inconvenient to extract the datum according to the poet, because poets are marked in the various ways such as one's real name, nom de plume and etc. To solve this kind of problems and improve the utilization of the database, I proposed the providing the standard type text of modern language, giving the index terms about content, providing the information on the work format and etc. Furthermore, if the Sijo database in the Period of Modern Enlightenment which is prepared the character of the Sijo Culture Information System could be built, it could be connected with the academic, educational contents. For the specific plan, I suggested as follow, - learning support materials for the Modern history and the national territory recognition on the Modern Age - source materials for studying indigenous animals and plants characters creating the commercial characters - applicability as the Sijo learning tool such as Sijo Game.

  • PDF

The Android-based Bluetooth Device Application Design and Implementation (안드로이드 기반의 블루투스 디바이스 응용 설계 및 구현)

  • Cho, Hyo-Sung;Lee, Hyuk-Joon
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.11 no.1
    • /
    • pp.72-85
    • /
    • 2012
  • Today, although most bluetooth hands-free devices within a vehicle provide telephone service functions such as voice communication, caller id display and SMS message display and so on, they do not provide a function that displays Internet-based text data. We need to develop a scheme that displays the internet-based text data including existing hands-free function because the request for using the Internet service is increasing within a vehicle recently. The proposed bluetooth device application includes advanced function such as SNS message arrival notification, the message display function and we chose Android as the implementation mobile platform giving consideration to the fact that most SNS applications operate on Android and the platform is easily embedded into small embedded device. Smartphone or tablet PC connected with the proposed bluetooth device is an Android-based device and we designed a form of Android app for the function implementation of the devices. When the audio-text gateway app receives SNS text data, it extracts title and sender information from the message header information in a form of text data and sends them via ACL (Asynchronous Connection-Oriented) link to the bluetooth device showing the data on the screen. Android-based bluetooth devices are not possible to play voice through speaker because the bluetooth hands-free or headset profile ported within Android platform normally only includes audio gateway's function. The proposed bluetooth device application, therefore, applies the streaming scheme that sends data via ACL link instead of the way that sending them via SCO (Synchronous Connection-Oriented) link.

Comparison and Analysis of Web Accessibility for the Korea, USA, and Japan's Broadcast Web Sites (한·미·일 지상파 방송사의 웹 접근성 비교·분석)

  • Park, Seong-Je;Kim, Yung-Keun;Kim, Jong-Weon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.19 no.4
    • /
    • pp.105-117
    • /
    • 2014
  • Acquisition of information through the broadcast media is essential for modern life and each broadcaster has progressed its service over the internet with the development of digital technology. Under this circumstance, this study presented the results which compared and analyzed the web accessibility evaluation for Korea, USA, and Japan's leading broadcaster web sites. According to the study results, there was no significant difference in the level of accessibility in all web sites of three countries, but accessibility compliance rate such as alternate text, skip-navigation of repeated region, and title was somewhat insufficient for Korean web sites. In addition, accessibility errors in the brightness contrast of the text contents, the run of the functions that a user doesn't have any intention, the clear statement of the default language, and the label provision were investigated. Therefore, Korean broadcasters should urgently improve and modify these errors and problems for effective web accessibility.