• Title/Summary/Keyword: Web Document Analysis

Search Result 139, Processing Time 0.026 seconds

Representation of ambiguous word in Latent Semantic Analysis (LSA모형에서 다의어 의미의 표상)

  • 이태헌;김청택
    • Korean Journal of Cognitive Science
    • /
    • v.15 no.2
    • /
    • pp.23-31
    • /
    • 2004
  • Latent Semantic Analysis (LSA Landauer & Dumais, 1997) is a technique to represent the meanings of words using co-occurrence information of words appearing in he same context, which is usually a sentence or a document. In LSA, a word is represented as a point in multidimensional space where each axis represents a context, and a word's meaning is determined by its frequency in each context. The space is reduced by singular value decomposition (SVD). The present study elaborates upon LSA for use of representation of ambiguous words. The proposed LSA applies rotation of axes in the document space which makes possible to interpret the meaning of un. A simulation study was conducted to illustrate the performance of LSA in representation of ambiguous words. In the simulation, first, the texts which contain an ambiguous word were extracted and LSA with rotation was performed. By comparing loading matrix, we categorized the texts according to meanings. The first meaning of an ambiguous wold was represented by LSA with the matrix excluding the vectors for the other meaning. The other meanings were also represented in the same way. The simulation showed that this way of representation of an ambiguous word can identify the meanings of the word. This result suggest that LSA with axis rotation can be applied to representation of ambiguous words. We discussed that the use of rotation makes it possible to represent multiple meanings of ambiguous words, and this technique can be applied in the area of web searching.

  • PDF

Analysis of online parenting community posts on expanded newborn screening for metabolic disorders using topic modeling: a quantitative content analysis (토픽 모델링을 활용한 광범위 선천성 대사이상 신생아 선별검사 관련 온라인 육아 커뮤니티 게시 글 분석: 계량적 내용분석 연구)

  • Myeong Seon Lee;Hyun-Sook Chung;Jin Sun Kim
    • Women's Health Nursing
    • /
    • v.29 no.1
    • /
    • pp.20-31
    • /
    • 2023
  • Purpose: As more newborns have received expanded newborn screening (NBS) for metabolic disorders, the overall number of false-positive results has increased. The purpose of this study was to explore the psychological impacts experienced by mothers related to the NBS process. Methods: An online parenting community in Korea was selected, and questions regarding NBS were collected using web crawling for the period from October 2018 to August 2021. In total, 634 posts were analyzed. The collected unstructured text data were preprocessed, and keyword analysis, topic modeling, and visualization were performed. Results: Of 1,057 words extracted from posts, the top keyword based on 'term frequency-inverse document frequency' values was "hypothyroidism," followed by "discharge," "close examination," "thyroid-stimulating hormone levels," and "jaundice." The top keyword based on the simple frequency of appearance was "XXX hospital," followed by "close examination," "discharge," "breastfeeding," "hypothyroidism," and "professor." As a result of LDA topic modeling, posts related to inborn errors of metabolism (IEMs) were classified into four main themes: "confirmatory tests of IEMs," "mother and newborn with thyroid function problems," "retests of IEMs," and "feeding related to IEMs." Mothers experienced substantial frustration, stress, and anxiety when they received positive NBS results. Conclusion: The online parenting community played an important role in acquiring and sharing information, as well as psychological support related to NBS in newborn mothers. Nurses can use this study's findings to develop timely and evidence-based information for parents whose children receive positive NBS results to reduce the negative psychological impact.

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.1
    • /
    • pp.47-60
    • /
    • 2010
  • Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.

  • PDF

Design and Implementation of Web-based Problem Management System for CT Radiological Technologist Education (CT 전문방사선사 교육을 위한 웹기반 문항관리 시스템의 설계 및 구현)

  • Shin Yong-Won;Koo Bong-Oh;Shim Choon-Bo
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.1
    • /
    • pp.27-35
    • /
    • 2005
  • Recently, despite of the rapid progress of information technology in the medical and health fields, the development and management of problem sets about medical and education contents related with radiological technologist has been still achieved by manual and offline method using document editor. In this study, the unique web-based problem management system is designed and implemented. That system can efficiently manage and present various kind of problem set about integrated education and personal license without time and space limitations in order to improve the efficiency of supplementary training and to obtain the professional license for CT radiological technologist. The proposed system is composed of administration module and user module. The former supports several functions such as problem creation, problem categorization, user management, and adjustment of leveled assessment. On the other hand, the latter functions examination applying , problem retrieval, personal score retrieval, and interpretation viewing, and so on. In addition, our system is expected as a useful and practical system which provides problem interpretation and analysis of score results after applying for the examination. It can elevate ability of learning and information interchange among them preparing for CT professional radiological technologist licensing examination

  • PDF

Design and Implementation of XML-based Cyber Counseling System Supporting Counseling Analysis Information (상담 분석 정보를 지원하는 XML 기반 사이버 상담 시스템의 설계 및 구현)

  • Choi, Sook-Young;Back, Hyon-Ki
    • Journal of The Korean Association of Information Education
    • /
    • v.7 no.3
    • /
    • pp.341-352
    • /
    • 2003
  • While most researches for cyber counseling until now have been about counseling methods and the effects that teenagers utilize cyber counseling, there have been no efforts that store counseling contents, analyze them using features and technologies of web and use them effectively to guide students. Therefore, we propose a cyber counseling system that provides counseling information so that teacher may grasp students various interests and problems and thus helps to guide students. For this, we used XML. Since XML document can systematically create a structured information and represent a structure with meaningful information unit than the existing file-based information, it can be effectively used to manage, search and store documents. Thus, we implement a cyber counseling system using XML, which can effectively manage and represent the various analysis information of counseling.

  • PDF

Query Processing Model Using Two-level Fuzzy Knowledge Base (2단계 퍼지 지식베이스를 이용한 질의 처리 모델)

  • Lee, Ki-Young;Kim, Young-Un
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.4 s.36
    • /
    • pp.1-16
    • /
    • 2005
  • When Web-based special retrieval systems for scientific field extremely restrict the expression of user's information request, the process of the information content analysis and that of the information acquisition become inconsistent. Accordingly, this study suggests the re-ranking retrieval model which reflects the content based similarity between user's inquiry terms and index words by grasping the document knowledge structure. In order to accomplish this, the former constructs a thesaurus and similarity relation matrix to provide the subject analysis mechanism and the latter propose the algorithm which establishes a search model such as query expansion in order to analyze the user's demands. Therefore, the algorithm that this study suggests as retrieval utilizing the information structure of a retrieval system can be content-based retrieval mechanism to establish a 2-step search model for the preservation of recall and improvement of accuracy which was a weak point of the previous fuzzy retrieval model.

  • PDF

Customer Satisfaction Improvement by Combining the Blue Print and Reliability Technique: Education Service Case Study (Blue Print와 신뢰성 기법을 혼합한 고객만족도 향상에 관한 연구: 교육서비스 사례)

  • Baek, Chun-Joo;Koo, Il-Seob;Lim, Ik-Sung;Kwon, Hong-Kyu
    • Journal of Applied Reliability
    • /
    • v.12 no.1
    • /
    • pp.13-24
    • /
    • 2012
  • This paper applied the Blue Print and FMEA (Failure Mode and Effect Analysis) to education service in order to raise the education service satisfaction. First, the Blue Print is deployed to come up with strategies to overcome the fail possibility point and waiting point. Next, in order to analyze the fail factors and alternative strategies, the Blue Print of education service is applied to FMEA. The results are as follows; first, the ommission from information document by web-mail or e-mail, Second, thing that selected in spite of company uneducated, thing that omitted despite the company is target, and the unsatisfaction of attendee about training contents. Third, the delay of counsel at the telephone reply, erroneous list of course name and attendee at HRD (Human Resource Development), omission of check whether attends or not. Except for unsatisfaction of attendee, all appears at the process that service delivered. And the unsatisfaction of attendee is about education contents. Both is the factor which have influence on the education service quality. The strategies to remove the failure mode are training and manual development on service and work, a thorough management and check of information system like as ERP (Enterprise Resoure Planning), HRD, education institution list DB (Data Base), on-line application system, a development of education program to offer best education that reflect the user needs and continuously changing environment.

ASSESSMENT OF CFD CODES USED IN NUCLEAR REACTOR SAFETY SIMULATIONS

  • Smith, Brian L.
    • Nuclear Engineering and Technology
    • /
    • v.42 no.4
    • /
    • pp.339-364
    • /
    • 2010
  • Following a joint OECD/NEA-IAEA-sponsored meeting to define the current role and future perspectives of the application of Computational Fluid Dynamics (CFD) to nuclear reactor safety problems, three Writing Groups were created, under the auspices of the NEA working group WGAMA, to produce state-of-the-art reports on different aspects of the subject. The work of the second group, WG2, was to document the existing assessment databases for CFD simulation in the context of Nuclear Reactor Safety (NRS) analysis, to gain a measure of the degree of quality and trust in CFD as a numerical analysis tool, and to take initiatives to extend the existing databases. The group worked over the period of 2003-2007 and produced a final state-of-the-art report. The present paper summarises the material gathered during the study, illustrating the points with a few highlights. A total of 22 safety issues were identified for which the application of CFD was considered to potentially bring real benefits in terms of better understanding and increased safety. A list of the existing databases was drawn up and synthesised, both from the nuclear area and from other parallel, non-nuclear, industrial activities. The gaps in the technology base were also identified and discussed. In order to initiate new ways of bringing experimentalists and numerical analysts together, an international workshop -- CFD4NRS (the first in a series) -- was organised, a new blind benchmark activity was set up based on turbulent mixing in T-junctions, and a Wiki-type web portal was created to offer online access to the material put together by the group giving the reader the opportunity to update and extend the contents to keep the information source topical and dynamic.

Media-based Analysis of Gasoline Inventory with Korean Text Summarization (한국어 문서 요약 기법을 활용한 휘발유 재고량에 대한 미디어 분석)

  • Sungyeon Yoon;Minseo Park
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.5
    • /
    • pp.509-515
    • /
    • 2023
  • Despite the continued development of alternative energies, fuel consumption is increasing. In particular, the price of gasoline fluctuates greatly according to fluctuations in international oil prices. Gas stations adjust their gasoline inventory to respond to gasoline price fluctuations. In this study, news datasets is used to analyze the gasoline consumption patterns through fluctuations of the gasoline inventory. First, collecting news datasets with web crawling. Second, summarizing news datasets using KoBART, which summarizes the Korean text datasets. Finally, preprocessing and deriving the fluctuations factors through N-Gram Language Model and TF-IDF. Through this study, it is possible to analyze and predict gasoline consumption patterns.

Implementation of Reporting Tool Supporting OLAP and Data Mining Analysis Using XMLA (XMLA를 사용한 OLAP과 데이타 마이닝 분석이 가능한 리포팅 툴의 구현)

  • Choe, Jee-Woong;Kim, Myung-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.3
    • /
    • pp.154-166
    • /
    • 2009
  • Database query and reporting tools, OLAP tools and data mining tools are typical front-end tools in Business Intelligence environment which is able to support gathering, consolidating and analyzing data produced from business operation activities and provide access to the result to enterprise's users. Traditional reporting tools have an advantage of creating sophisticated dynamic reports including SQL query result sets, which look like documents produced by word processors, and publishing the reports to the Web environment, but data source for the tools is limited to RDBMS. On the other hand, OLAP tools and data mining tools have an advantage of providing powerful information analysis functions on each own way, but built-in visualization components for analysis results are limited to tables or some charts. Thus, this paper presents a system that integrates three typical front-end tools to complement one another for BI environment. Traditional reporting tools only have a query editor for generating SQL statements to bring data from RDBMS. However, the reporting tool presented by this paper can extract data also from OLAP and data mining servers, because editors for OLAP and data mining query requests are added into this tool. Traditional systems produce all documents in the server side. This structure enables reporting tools to avoid repetitive process to generate documents, when many clients intend to access the same dynamic document. But, because this system targets that a few users generate documents for data analysis, this tool generates documents at the client side. Therefore, the tool has a processing mechanism to deal with a number of data despite the limited memory capacity of the report viewer in the client side. Also, this reporting tool has data structure for integrating data from three kinds of data sources into one document. Finally, most of traditional front-end tools for BI are dependent on data source architecture from specific vendor. To overcome the problem, this system uses XMLA that is a protocol based on web service to access to data sources for OLAP and data mining services from various vendors.