• Title/Summary/Keyword: Web Document

Search Result 759, Processing Time 0.021 seconds

Representation of ambiguous word in Latent Semantic Analysis (LSA모형에서 다의어 의미의 표상)

  • 이태헌;김청택
    • Korean Journal of Cognitive Science
    • /
    • v.15 no.2
    • /
    • pp.23-31
    • /
    • 2004
  • Latent Semantic Analysis (LSA Landauer & Dumais, 1997) is a technique to represent the meanings of words using co-occurrence information of words appearing in he same context, which is usually a sentence or a document. In LSA, a word is represented as a point in multidimensional space where each axis represents a context, and a word's meaning is determined by its frequency in each context. The space is reduced by singular value decomposition (SVD). The present study elaborates upon LSA for use of representation of ambiguous words. The proposed LSA applies rotation of axes in the document space which makes possible to interpret the meaning of un. A simulation study was conducted to illustrate the performance of LSA in representation of ambiguous words. In the simulation, first, the texts which contain an ambiguous word were extracted and LSA with rotation was performed. By comparing loading matrix, we categorized the texts according to meanings. The first meaning of an ambiguous wold was represented by LSA with the matrix excluding the vectors for the other meaning. The other meanings were also represented in the same way. The simulation showed that this way of representation of an ambiguous word can identify the meanings of the word. This result suggest that LSA with axis rotation can be applied to representation of ambiguous words. We discussed that the use of rotation makes it possible to represent multiple meanings of ambiguous words, and this technique can be applied in the area of web searching.

  • PDF

A Study on Spam Document Classification Method using Characteristics of Keyword Repetition (단어 반복 특징을 이용한 스팸 문서 분류 방법에 관한 연구)

  • Lee, Seong-Jin;Baik, Jong-Bum;Han, Chung-Seok;Lee, Soo-Won
    • The KIPS Transactions:PartB
    • /
    • v.18B no.5
    • /
    • pp.315-324
    • /
    • 2011
  • In Web environment, a flood of spam causes serious social problems such as personal information leak, monetary loss from fishing and distribution of harmful contents. Moreover, types and techniques of spam distribution which must be controlled are varying as days go by. The learning based spam classification method using Bag-of-Words model is the most widely used method until now. However, this method is vulnerable to anti-spam avoidance techniques, which recent spams commonly have, because it classifies spam documents utilizing only keyword occurrence information from classification model training process. In this paper, we propose a spam document detection method using a characteristic of repeating words occurring in spam documents as a solution of anti-spam avoidance techniques. Recently, most spam documents have a trend of repeating key phrases that are designed to spread, and this trend can be used as a measure in classifying spam documents. In this paper, we define six variables, which represent a characteristic of word repetition, and use those variables as a feature set for constructing a classification model. The effectiveness of proposed method is evaluated by an experiment with blog posts and E-mail data. The result of experiment shows that the proposed method outperforms other approaches.

A Parsing Method for an Incomplete XML (불완전 XML을 위한 파싱 방법)

  • Cho, Kyung-Ryong;Cho, Sung-Eon;Park, Jang-Woo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.12
    • /
    • pp.2153-2158
    • /
    • 2008
  • XML is one of standard web languages. XML has a syntax architecture consisted of tags, which are used to descript contents and structures of a XML document. In XML documents, missing of markup tag is one of common factors generating incomplete inputs. Usually, editors will recognize incomplete inputs as syntax errors. And so, when editors find them, they will highlight lines in which syntax errors happened, and execute appropriate error handling routines. But, there are no more parsing actions. In this paper, we propose a method to recognize incomplete input strings and keep parsing phases going. To recognize pars missed grammatically in incomplete inputs and create them newly, we use an expanding parsing table. It includes additional parsing actions for newly generated input symbols. Through the information, incomplete inputs will be completed and parsing steps will be finished successively. Therefore, users can be assured that they make always correct XML documents, even if inputs are incomplete, and can not be nervous about input faults.

A Study on the Current Situation and Trend Analysis of The Elderly Healthcare Applications Using Big Data Analysis (텍스트마이닝을 활용한 노인 헬스케어 앱 사용 추이 및 동향 분석)

  • Byun, Hyun;Jeon, Sang-Wan;YI, Eun-Surk
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.5
    • /
    • pp.313-325
    • /
    • 2022
  • The purpose of this study is to examine the changes in the elderly healthcare app market through text mining analysis and to present basic data for activating elderly healthcare apps. Data collection was conducted on Naver, Daum, blog web, and cafe. As for the research method, text mining, TF-IDF(Term frequency-inverse document frequency), emotional analysis, and semantic network analysis were conducted using Textom and Ucinet6, which are big data analysis programs. As a result of this study, a total of six categories were finally derived: resolving the healthcare app information gap, convergence healthcare technology, diffusion media, elderly healthcare app industry, social background, and content. In conclusion, in order for elderly healthcare apps to be accepted and utilized by the elderly, they must have a good diffusion infrastructure, and the effectiveness of healthcare apps must be maximized through the active introduction of convergence technology and content development that can be easily used by the elderly.

A Design and Implementation of XML DTDs for Integrated Medical Information System (통합의료정보 시스템을 위한 XML DTD 설계 및 구현)

  • 안철범;나연묵
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.40 no.6
    • /
    • pp.106-117
    • /
    • 2003
  • The advanced medical information systems usually consist of loosely-coupled interaction of independent systems, such as HIS/RIS and PACS. To support easier information exchange between these systems and between hospitals, and to support new types of medical service such as teleradiology, it becomes essential to integrate separated medical information and allow them to be exchanged and retrieved through internet. This thesis proposes an integrated medical information system using XML. We analyzed HL7 and DICOM standard formats, and designed an integrated XML DTD. We extracted information from HL7 messages and DICOM files and generated XML document instances and XSL stylesheets based on the proposed XML DTD. We implemented the web interface for the integrated medical information system, which supports data sharing, information exchange and retrieval between two different standard formats. The proposed XML-based integrated medical information system will contribute to solve the problems of current medical information systems, by enabling integration of separated medical informations and by allowing data exchange and sharing through internet. The proposed system with XML is more robust than web-based medical information systems developed by using HTML, because XML itself provides more flexibility and extensibility than HTML.

A Smart Mobile Mail System Based on MPEG21-DIDL for Any Mobile Device (모든 모바일 단말기에 서비스 가능한 MPEG21-DIDL 기반의 스마트 모바일 메일 시스템)

  • Zhao, Mei-Hua;Seo, Chang-Wo;Lim, Young-Hwan
    • Journal of Internet Computing and Services
    • /
    • v.11 no.3
    • /
    • pp.1-13
    • /
    • 2010
  • As the computing power of the mobile devices is improving rapidly, many kinds of web services are also available in mobile devices just as Email service. Mobile Mail Service began early, but this service is mostly limited in some specified mobile devices such as Smart Phone. That is a limitation that users have to purchase specified phone to be benefited from Mobile Mail Service. In this thesis, it developed new kind of Mobile Mail System named Smart Mobile Mail System based MPEG21-DIDL Markup, and solved above problem. DIDL could be converted to other Markup types which are displayed in mobile devices by Mobile Gate Server. By transforming PC Web Mail contents including attachment document to DIDL Markup through Mobile Gate Server, the Mobile Mail Service could be available for all kinds of mobile device. The Smart Mobile Mail System also performs real time alarming service for new Email using Callback URL SMS. When there is new Email arriving, the Mail System sends a Call back URL SMS to user. User could directly check Email through Callback URL SMS in real time.

A Design and Implementation of Event Processor for Playing SMIL 2.0 Documents (SMIL 2.0 문서 재생을 위한 이벤트 처리기의 설계 및 구현)

  • 김혜은;채진석;이재원;김성동;이종우
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.2
    • /
    • pp.251-263
    • /
    • 2004
  • The Synchronized Multimedia Integration Language (SMIL), recommended by the World Wide Web Consortium (W3C) in 1998, is an XML-based declarative language to synchronize and present multimedia documents. SMIL can create new multimedia data integrating various types of multimedia objects which exist separately such as text, video, graphics and audio. It can support synchronization of multimedia data which are limited in current HTML-based Web technology. For its popularity, it is required to develop a multimedia server guaranteeing Quality of Service (QoS), authoring tool and player. For developing a SMIL authoring tool and player, the technologies are essentially required to read and analyze a SMIL document and to play synchronized various types of media objects in a timeline. In this paper, we describe a design and implementation of an event processor which supports SMIL 2.0 timing model. Moreover, we also develop a SMIL 2.0 player using the proposed event processor. This will facilitate the play of SMIL contents, so that it can contribute to the prosperity of SMIL technology It is possible to reuse in various language profiles defined in the SMIL standard. This player is expected to be utilized in other standard integrating SMIL such as XHTML+SMIL and SMIL Animation.

  • PDF

Deep Learning Research Trends Analysis with Ego Centered Topic Citation Analysis (자아 중심 주제 인용분석을 활용한 딥러닝 연구동향 분석)

  • Lee, Jae Yun
    • Journal of the Korean Society for information Management
    • /
    • v.34 no.4
    • /
    • pp.7-32
    • /
    • 2017
  • Recently, deep learning has been rapidly spreading as an innovative machine learning technique in various domains. This study explored the research trends of deep learning via modified ego centered topic citation analysis. To do that, a few seed documents were selected from among the retrieved documents with the keyword 'deep learning' from Web of Science, and the related documents were obtained through citation relations. Those papers citing seed documents were set as ego documents reflecting current research in the field of deep learning. Preliminary studies cited frequently in the ego documents were set as the citation identity documents that represents the specific themes in the field of deep learning. For ego documents which are the result of current research activities, some quantitative analysis methods including co-authorship network analysis were performed to identify major countries and research institutes. For the citation identity documents, co-citation analysis was conducted, and key literatures and key research themes were identified by investigating the citation image keywords, which are major keywords those citing the citation identity document clusters. Finally, we proposed and measured the citation growth index which reflects the growth trend of the citation influence on a specific topic, and showed the changes in the leading research themes in the field of deep learning.

The Effect of C Language Output Method to the Performance of CGI Gateway in the UNIX Systems (유닉스 시스템에서 C 언어 출력 방법이 CGI 게이트웨이 성능에 미치는 영향)

  • Lee Hyung-Bong;Jeong Yeon-Chul;Kweon Ki-Hyeon
    • The KIPS Transactions:PartC
    • /
    • v.12C no.1 s.97
    • /
    • pp.147-156
    • /
    • 2005
  • CGI is a standard interface rule between web server and gateway devised for the gateway's standard output to replace a static web document in UNIX environment. So, it is common to use standard I/O statements provided by the programming language for the CGI gateway. But the standard I/O mechanism is one of buffer strategies that are designed transparently to operating system and optimized for generic cases. This means that it nay be useful to apply another optimization to the standard I/O environment in CGI gateway. In this paper, we introduced standard output method and file output method as the two output optimization areas for CGI gateways written in C language in the UNIX/LINUX systems, and applied the proposed methods of each area to Debian LINUX, IBM AIX, SUN Solaris, Digital UNIX respectively. Then we analyzed the effect of them focused on execution time. The results were different from operating system to operating system. Compared to normal situation, the best case of standard output area showed about $10{\%}$ improvement and the worst case showed $60{\%}$ degradation in file output area where some performance improvements were expected.

Improving Business Usability of XBRL Based on Semantic Web Approach (시맨틱 웹을 이용한 XBRL의 비즈니스 활용성 개선)

  • Jeon, Pyo-Jin;Lee, Myung-Jin;Kim, Woo-Ju;Hong, June-S.
    • The Journal of Society for e-Business Studies
    • /
    • v.15 no.3
    • /
    • pp.1-23
    • /
    • 2010
  • It is crucially important to exchange and manage the financial information of an organization for the reason of complexity and diversity of information caused by its implicit information involved. Especially, according to the development of information technology, various approaches appeared to manage financial data of organization. For example, XBRL (Extensible Business Reporting Language) is one of the technologies dealing with the above criteria. Basically, XBRL is a business reporting language to define and exchange financial information, such as a financial statement of organization. XBRL is an international standard which enables the exchange of information between information providers and consumers by adding the tags involving the information of circumstantial factors of data. However, XBRL is not able to describe semantics because XBRL is fundamentally based on XML(Extensible Markup Language) having the purpose of expressing and structuring data. Therefore, this paper aims to enable semantic information to XBRL through the semantic technology. The objective of this paper is an ontologization of the knowledge to perform sharing, reusing, discovering, and inferring the knowledge described and conducted by XBRL. In order to achieve the above objective, this paper suggests the methodology for the ontologization of the category and instance document of XBRL. Furthermore, this paper points out the possibility of suggested methodology in a practical business through indicating the advantages of the knowledge described by XBRL.