• Title/Summary/Keyword: Web Documents

Search Result 831, Processing Time 0.026 seconds

Collaboration Networks and Document Networks in Informetrics Research from 2001 to 2011: Finding Influential Nations, Institutions, Documents (계량정보학분야의 협력연구 네트워크 및 문헌네트워크 분석 : 국가, 기관, 문헌단위 분석)

  • Lee, Jae Yun;Choi, Sanghee
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.1
    • /
    • pp.179-191
    • /
    • 2013
  • Since information scientists have begun trying to quantify significant research trends in scientific publications, '-metrics' research such as 'bibliometrics', 'scientometrics', 'informetrics', 'webometrics', and 'citation analysis' have been identified as crucial areas of information science. To illustrate the dynamic research activities in these areas, this study investigated the major contributors of '-metrics' research for the last decade at three levels: nations, institutions, and documents. '-metrics' literature of this study was obtained from the Science Citation Index for the years 2001-2011. In this analysis, we used Pathfinder network, PNNC algorithm, PageRank and several indicators based on h-index. In terms of international collaborations, USA and England were identified as major countries. At the institutional level, Katholieke University, Leuven and the University of Amsterdam in Europe and Indiana University and the Office of Naval Research in the USA have led co-research projects in informetrics areas. At the document level, Hirsch's h-index paper and Ingwersen's web impact factor paper were identified as the most influential work by two methods: PageRank and single paper h-index.

Sensitivity Identification Method for New Words of Social Media based on Naive Bayes Classification (나이브 베이즈 기반 소셜 미디어 상의 신조어 감성 판별 기법)

  • Kim, Jeong In;Park, Sang Jin;Kim, Hyoung Ju;Choi, Jun Ho;Kim, Han Il;Kim, Pan Koo
    • Smart Media Journal
    • /
    • v.9 no.1
    • /
    • pp.51-59
    • /
    • 2020
  • From PC communication to the development of the internet, a new term has been coined on the social media, and the social media culture has been formed due to the spread of smart phones, and the newly coined word is becoming a culture. With the advent of social networking sites and smart phones serving as a bridge, the number of data has increased in real time. The use of new words can have many advantages, including the use of short sentences to solve the problems of various letter-limited messengers and reduce data. However, new words do not have a dictionary meaning and there are limitations and degradation of algorithms such as data mining. Therefore, in this paper, the opinion of the document is confirmed by collecting data through web crawling and extracting new words contained within the text data and establishing an emotional classification. The progress of the experiment is divided into three categories. First, a word collected by collecting a new word on the social media is subjected to learned of affirmative and negative. Next, to derive and verify emotional values using standard documents, TF-IDF is used to score noun sensibilities to enter the emotional values of the data. As with the new words, the classified emotional values are applied to verify that the emotions are classified in standard language documents. Finally, a combination of the newly coined words and standard emotional values is used to perform a comparative analysis of the technology of the instrument.

Design and Implementation of an HTML Converter Supporting Frame for the Wireless Internet (무선 인터넷을 위한 프레임 지원 HTML 변환기의 설계 및 구현)

  • Han, Jin-Seop;Park, Byung-Joon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.6
    • /
    • pp.1-10
    • /
    • 2005
  • This paper describes the implementation of HTML converter for wireless internet access in wireless application protocol environment. The implemented HTML converter consists of the contents conversion module, the conversion rule set, the WML file generation module, and the frame contents reformatting module. Plain text contents are converted to WML contents through one by one mapping, referring to the converting rule set in the contents converting module. For frame contents, the first frameset sources are parsed and the request messages are reconstructed with all the file names, reconnecting to web server as much as the number of files to receive each documents and append to the first document. Finally, after the process of reformatting in the frame contents reformatting module, frame contents are converted to WML's table contents. For image map contents, the image map related tags are parsed and the names of html documents which are linked to any sites are extracted to be replaced with WML contents data and linked to those contents. The proposed conversion method for frame contents provides a better interface for the users convenience and interactions compared to the existing converters. Conversion of image maps in our converter is one of the features not currently supported by other converters.

The Design and Implementation of the System for Processing Well-Formed XML Document on the Client-side (클라이언트 상의 Well-Formed XML 문서 처리 시스템의 설계 및 구현)

  • Song, Jong-Chul;Moon, Byung-Joo;Hong, Gi-Chai;Cheong, Hyun-Soo;Kim, Gyu-Tae;Lee, Soo-Youn
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.10
    • /
    • pp.3236-3246
    • /
    • 2000
  • XML is a meta-language as SGML and also can be xonsructed as an Internet versionof simplified SGML being used in confunction with XLL. Xpointer and XSL. Also W3C established DTDless Well-Formed XML document to use XML document on the Web. But it isnt offered system that consists of browsing, link and DTD generating facihty, and efficiently processes DTDless Well-Formed XML document. This paper studies on an implementation and design of system to process DTDless Well-Formed XML document on the client-side. This system consists of Well-Formed XML viewer displaying Well-Formed XML documet, XLL Processor processing Xll and Auto DTD generator constructing automatically DTDs based on multiple documents of the same class. This study focuses on automatic DTD generation during hyperlink navigation and an implementation of extended links based on XLL and Xpointer. ID and Xpointer location address are used as the address mode in the links. As a result of implement of this system, it conforms to validationof extended link facihties, extracts DTD from Well-Fromed XML Documents including same root element at the same class and constructs generalized DTD.

  • PDF

A Knowledge-based Wrapper Learning Agent for Semi-Structured Information Sources (준구조화된 정보소스에 대한 지식기반의 Wrapper 학습 에이전트)

  • Seo, Hee-Kyoung;Yang, Jae-Young;Choi, Joong-Min
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.1_2
    • /
    • pp.42-52
    • /
    • 2002
  • Information extraction(IE) is a process of recognizing and fetching particular information fragments from a document. In previous work, most IE systems generate the extraction rules called the wrappers manually, and although this manual wrapper generation may achieve more correct extraction, it reveals some problems in flexibility, extensibility, and efficiency. Some other researches that employ automatic ways of generating wrappers are also experiencing difficulties in acquiring and representing useful domain knowledge and in coping with the structural heterogeneity among different information sources, and as a result, the real-world information sources with complex document structures could not be correctly analyzed. In order to resolve these problems, this paper presents an agent-based information extraction system named XTROS that exploits the domain knowledge to learn from documents in a semi-structured information source. This system generates a wrapper for each information source automatically and performs information extraction and information integration by applying this wrapper to the corresponding source. In XTROS, both the domain knowledge and the wrapper are represented as XML-type documents. The wrapper generation algorithm first recognizes the meaning of each logical line of a sample document by using the domain knowledge, and then finds the most frequent pattern from the sequence of semantic representations of the logical lines. Eventually, the location and the structure of this pattern represented by an XML document becomes the wrapper. By testing XTROS on several real-estate information sites, we claim that it creates the correct wrappers for most Web sources and consequently facilitates effective information extraction and integration for heterogeneous and complex information sources.

Message Interoperability in e-Logistics System (e-Logistics시스템의 메시지 상호운용성)

  • Seo Sungbo;Lee Young Joon;Hwang Jaegak;Ryu Keun Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.5
    • /
    • pp.436-450
    • /
    • 2005
  • Existing B2B, B2C computer systems and applications that executed business trans-actions were the client- server based architecture which consists of heterogeneous hardware and software including personal computers and mainframes. Due to the active boom of electronic business, integration and compatibility of exchanged data, applications and hardwares have emerged as hot issue. This paper designs and implements a message transport system and a document transformation system in order to solve the interoperability problem of integrated logistics system in e-Business when doing electronic business. Message transport system integrated ebMS 2.0 which is standard business message exchange format of ebXML, the international standard electronic commerce framework, and JMS of J2EE enable to ensure reliable messaging. The document transformation system could convert non-standard XML documents into standard XML documents and provide the web services after integrating message system. Using suggested business scenario and various test data, our message oriented system preyed to be interoperable and stable. We participated ebXML messaging interoperability test organized by ebXML Asia Committee ITG in oder to evaluate and certify the suitability for message system.

A Parsing Method for an Incomplete XML (불완전 XML을 위한 파싱 방법)

  • Cho, Kyung-Ryong;Cho, Sung-Eon;Park, Jang-Woo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.12
    • /
    • pp.2153-2158
    • /
    • 2008
  • XML is one of standard web languages. XML has a syntax architecture consisted of tags, which are used to descript contents and structures of a XML document. In XML documents, missing of markup tag is one of common factors generating incomplete inputs. Usually, editors will recognize incomplete inputs as syntax errors. And so, when editors find them, they will highlight lines in which syntax errors happened, and execute appropriate error handling routines. But, there are no more parsing actions. In this paper, we propose a method to recognize incomplete input strings and keep parsing phases going. To recognize pars missed grammatically in incomplete inputs and create them newly, we use an expanding parsing table. It includes additional parsing actions for newly generated input symbols. Through the information, incomplete inputs will be completed and parsing steps will be finished successively. Therefore, users can be assured that they make always correct XML documents, even if inputs are incomplete, and can not be nervous about input faults.

HunMinJeomUm: Text Extraction and Braille Conversion System for the Learning of the Blind (시각장애인의 학습을 위한 텍스트 추출 및 점자 변환 시스템)

  • Kim, Chae-Ri;Kim, Ji-An;Kim, Yong-Min;Lee, Ye-Ji;Kong, Ki-Sok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.5
    • /
    • pp.53-60
    • /
    • 2021
  • The number of visually impaired and blind people is increasing, but braille translation textbooks for them are insufficient, which violates their rights to education despite their will. In order to guarantee their rights, this paper develops a learning system, HunMinJeomUm, that helps them access textbooks, documents, and photographs that are not available in braille, without the assistance of others. In our system, a smart phone app and web pages are designed to promote the accessibility of the blind, and a braille kit is produced using Arduino and braille modules. The system supports the following functions. First, users select documents or pictures that they want, and the system extracts the text using OCR. Second, the extracted text is converted into voice and braille. Third, a membership registration function is provided so that the user can view the extracted text. Experiments have confirmed that our system generates braille and audio outputs successfully, and provides high OCR recognition rates. The study has also found that even completely blind users can easily access the smart phone app.

A Study on the Development of Korean Defense Standards through Text Mining-Based Trend Analysis of United States Defense Standards (텍스트 마이닝 기반의 미국 국방 표준 동향 분석을 통한 한국 국방 표준의 발전 방안 연구)

  • Chae, Soohwan;Shim, Bohyun;Yeom, Seulki;Hong, Seongdon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.3
    • /
    • pp.651-660
    • /
    • 2021
  • This study examined the trend of standards established in the United States and to find points that can be applied to Korean defense standards. The titles of various United States defense standard documents registered on the web were selected for this research. The wordcloud was created after analyzing the frequency of words appearing in the title using text mining. The trend of words appearing in MIL-STD by era was obtained. This study identified words that appear often due to the format of the document itself, words that appear regularly throughout the era, words that are used frequently in the past but are not used much in the present, and words that did not receive attention in the past but appeared recurrently in the present. In addition, the characteristics of each document were derived through the wordcloud produced for various defense documents. In conclusion, Korean defense standards also require a consideration of safe and efficient management, transport, and load design of hazardous materials. Furthermore, the quality of defense standards can be expected to improve if the defense standard document system can be established, focusing on efficient management.

A Study of the Transition Process in Presidential Electronic Records Transfer and Improvement Measures : Focused on the Electronic Records of the 19th President Moon Jae-in's Administration (대통령 전자기록물의 이관방식 변천과 개선방안 연구 19대 문재인 정부 대통령 전자기록물을 중심으로 )

  • Yun, Jeonghun
    • The Korean Journal of Archival Studies
    • /
    • no.75
    • /
    • pp.41-89
    • /
    • 2023
  • Since the enactment of the Act on the Management of Presidential Archives in 2007, the cases of electronic records transfer in the 16th President Roh Moo-hyun's administration have played the role of an advance guard in managing public records and served as a test bed for new electronic records management. When transferring the electronic records of the 19th President Moon Jae-in's administration, the electronic records transfer method of President Roh's administration was inherited, while several innovative attempts were made. For instance, the Presidential Archives have for the first time converted the electronic documents from institutions advising the President into a long-term preservation package and transferred them online. In addition, considering the characteristics of the data, the administrative information dataset of the Presidential record creation institutions was transferred to the SIARD standard. Furthermore, the Presidential Archives had websites transferred in the form of OVF as a pilot test and collected social media directly through the API. Thus this study investigated the transition process of the presidential electronic records transfers from the 16th President Roh Moo-hyun's administration to the 19th President Moon Jae-in's. In addition, major achievements and issues were analyzed centering on the transfer method by type of electronic records during President Moon Jae-in's administration, and future improvement plans were presented.