• Title/Summary/Keyword: structured document

Search Result 170, Processing Time 0.023 seconds

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.

Investigations on Techniques and Applications of Text Analytics (텍스트 분석 기술 및 활용 동향)

  • Kim, Namgyu;Lee, Donghoon;Choi, Hochang;Wong, William Xiu Shun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.471-492
    • /
    • 2017
  • The demand and interest in big data analytics are increasing rapidly. The concepts around big data include not only existing structured data, but also various kinds of unstructured data such as text, images, videos, and logs. Among the various types of unstructured data, text data have gained particular attention because it is the most representative method to describe and deliver information. Text analysis is generally performed in the following order: document collection, parsing and filtering, structuring, frequency analysis, and similarity analysis. The results of the analysis can be displayed through word cloud, word network, topic modeling, document classification, and semantic analysis. Notably, there is an increasing demand to identify trending topics from the rapidly increasing text data generated through various social media. Thus, research on and applications of topic modeling have been actively carried out in various fields since topic modeling is able to extract the core topics from a huge amount of unstructured text documents and provide the document groups for each different topic. In this paper, we review the major techniques and research trends of text analysis. Further, we also introduce some cases of applications that solve the problems in various fields by using topic modeling.

An Implementation of the Report View Generator using Program Performance Log Information (프로그램 성능 평가 로그 정보를 이용한 레포트 뷰 생성기 구현)

  • Cho Yong-Yoon;Yoo Chae-Woo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.3 s.35
    • /
    • pp.35-44
    • /
    • 2005
  • A software developer can use a performance evaluation tool to elevate development speed and improve qualify of softwares. But, evaluation results that most performance evaluation tools offer are complicated strings. Therefore, a developer cannot intuitively understand the meanings of the results and must make much times and efforts in analysing the result. In this paper, we propose a report view generator that can transform and provide the text-based performance evaluation results for softwares with various graphic-based views. Our proposed generator consists of a screen generator that creates a structural XML document about the text-based performance evaluation results and a log analyzer that makes various report view through the created XML evaluation document. Because the XML evaluation result document can express the result information structured according to performance evaluation items for resources of softwares, it can have flexibility in offering and integrating the result information for the items. Through the suggested report view generator, developers can intuitively understand and analysis performance evaluation results of embedded software. And they can easily and quickly improve software quality and improve development efficiency of softwares.

  • PDF

The Pap-Smear Test Experience of Women in Turkey: A Qualitative Study

  • Arabaci, Zeynep;Ozsoy, Suheyla
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.11
    • /
    • pp.5687-5690
    • /
    • 2012
  • Objective: The study was planned with the purpose of examining the attitude of women who have pap-smear test for the early diagnosis of cervical cancer, factors affecting their decisions and their feelings and experiences during this period. Materials and Methods: A phenomenological method was used. Data were collected between March 2012 and April 2012 using standard and purposive samplings from 17 women. A detailed interview with women were held in their houses and recorded. The data collection tool consisted of two parts, one of which is information form with 17 questions identifying sociodemographic and cervical cancer risk factors of women and the second part is made up of semi-structured interview form with 15 alternative questions taking literature and the pap-smear test into consideration. Collected data were put into a written document. Content analysis was held by loading the documents into NVIVO 8 Statistical Programme. Results: The study comprised themes such as cervical risk factor, decision of taking pap-smear test, taking pap-smear test, knowledge about pap-smear test, relieving factors during pap-smear test, obstructive factors during pap-smear test, gynecological examination and feelings of women during and after pap-smear test while waiting for the results. Conclusions: As women perceive gynaecological examinations differently from other examinations, they have different feelings in each process of the Pap smear test. Medical staff should advise women more clearly on the nature and advantages of the Pap-smear test.

XML-OGL : UML-based Graphical Language for Querying XML Docunents (XML-OGL : XML 문서 질의를 위한 UML 기반 그래픽 언어)

  • Ha, Yan;Kim, Ki-Han
    • The KIPS Transactions:PartD
    • /
    • v.10D no.3
    • /
    • pp.399-406
    • /
    • 2003
  • The widespreading of XML as a standard for semi-structured documents on the Web opens up challenging opportunities for Web query language. And UML is a graphical language to represent the result of object-oriented analysis and design. In this paper, we introduce an UML-based graphical query language for XML documents. The use of a visual formalism for representing the syntax and semantics of queries enables an intuitive expression of queries, even when they are rather complex. And, it is matched a series of processes to store and retrieve XML documents to OODBMS with the use of an uniform visualization for representing both the content of XML documents (and of their DTD) and the syntax and semantics of queries.

A study of the existing problems of digital libraries and their future environment (현존하는 디지털도서관의 문제점과 미래환경에 관한 연구)

  • 박일종
    • Journal of Korean Library and Information Science Society
    • /
    • v.27
    • /
    • pp.391-421
    • /
    • 1997
  • Information scientists need not to answer whether future libraries will be a digital library or not, but to answer how they are structured and served effectively to users currently. 'The library with walls' or 'the library as place' need to be existed in the future, but 'digital library without the wall' or 'virtual library' will need to be studied continuously. This study has tried to reveal the existing problems of digital libraries and their future environment after considering the ambiguous concepts of various types of electronic libraries and their efforts for library automation, and the changed information retrieval circumstances during the last 30 to 40 years through a qualitative document study. As a result, the major findings and suggestions are prepared. The library of the future will be a part of local and national cooperative systems, be filled with the intelligent use of old and new technologies, and be able to su n.0, pport both a place with extensive collections and convenient, easy, & free access to remote intellectual resources. Also, the information storage and retrieval (ISAR) to the future library system would easily provide users with any types of data retrieval system by anybody rather than by an expert or a specialist, so called 'A&E retrieval' in the coming 21th century. It will be highly possible that the future society changes to the information marketplace whose data may be recognized as an intangible assets.

  • PDF

Medicinal plants traditionally used for the management of female reproductive health dysfunction in Tana River County, Kenya

  • Kaingu, Catherine Kaluwa;Oduma, Jemimah Achieng;Mbaria, James Mucunu;Kiama, Stephen Gitah
    • CELLMED
    • /
    • v.3 no.2
    • /
    • pp.17.1-17.10
    • /
    • 2013
  • Reproductive dysfunction is a major health concern amongst the inhabitants of Tana River County. An ethno botanical study was conducted in Garsen, Itsowe and Ngao sub divisions of Tana River County to document the utilization of medicinal plants for the management of female reproductive ailments. The target population was practicing herbalists from Pokomo, Ormo and Giryama communities in the study area. Structured questionnaires and focussed group discussions were used to collect data. Forty eight plant species distributed in 40 genera and 29 families were documented as being important for the management of pregnancy related complications, menstrual disorders, infertility, fibroids and as contraceptives. The species most frequently cited by the herbalists were fourteen. Fifty two percent of the plant species were probably being mentioned for the first time as being useful in reproductive health management. In conclusion, Tana River has a pool of TMPs with a wealth of indigenous knowledge that needs to be exploited. The plants used to treat dysmenorrhea for example may be important analgesic agents that need further investigation while those with anti-fertility properties may contain steroidal phyto chemical compounds. Such species therefore need further investigation to establish their efficacy and mechanism of action.

Collaborative Conflict Handling Model for Courseware Co-Authoring (코스웨어 공동 저작을 위한 협력적 충돌 해결 모델)

  • 안치돈;윤경섭
    • Journal of the Korea Computer Industry Society
    • /
    • v.4 no.4
    • /
    • pp.599-606
    • /
    • 2003
  • As collaborative computing technology goes widespread, interactions among concurrent users are very important, especially in education domain. But existing co-authoring mechanisms and CSCW technologies have limitations for conflict handling. In general, co-authoring tools provide solutions that one or certain users' views are influenced for conflict handling. These solutions have advantages to accomplish rapidly for conflict handling, but other authors' views cannot be affected sufficiently. In this paper, we define the courseware as network structure of instructional units. And we propose co-authoring model for courseware using collaborative conflict-handling mechanism. Using this mechanism, instructors can affect their own viewpoints to the unique courseware. As contents created in these forms have multiple structures. it can provide proper materials according to students' requirements. Therefore, it can provide students with individual learning facilities on online educational systems. Proposed model is useful for courseware based on structured form of document such as XML.

  • PDF

Traditional medicines for common dermatological disorders in Mauritius

  • Mahomoodally, Mohamad Fawzi;Hossain, Ziad Dil
    • CELLMED
    • /
    • v.3 no.4
    • /
    • pp.31.1-31.8
    • /
    • 2013
  • This study has been geared to document primary information on common complementary and alternative medicines (CAM) used to treat and/or manage common dermatological disorders in Mauritius, a tropical multicultural island in the Indian Ocean. Data from 355 key informants was collected via a semi-structured questionnaire. Pearson correlation and Chi-squared test were performed to delineate any association. Quantitative indexes including the Importance Value (IV) and fidelity value were calculated. Results tend to indicate that cultural reasons were behind the use of CAM among Mauritians and traditional knowledge was mainly acquired either from parents/relatives or from self-experience. Among the medicinal plants mentioned, Aziadiracta indica (IV = 0.78) and Paederia tomentosa (IV = 0.70) were found to be most used plants. Calendula officinalis (IV = 0.15), Centella asiatica (IV = 0.22) and Agauria salicifolia (IV = 0.11) were also recorded to be used for common dermatological disorders though greatly under-utilised. Animal products were mentioned by 38.0% respondents and cow ghee was found to be commonly used in the management of measles (IV = 0.88). Spiritual healing was found to be used mainly for measles and warts. Given the plethora of novel information documented from the present survey, it can be suggested that the Mauritian population still relies to a great extent on CAM which needs to be preserved and used sustainably. Nonetheless, further investigation is required to probe the possible active constituents that could be the basis of an evidence based investigation to discover new drugs.

Jointly Image Topic and Emotion Detection using Multi-Modal Hierarchical Latent Dirichlet Allocation

  • Ding, Wanying;Zhu, Junhuan;Guo, Lifan;Hu, Xiaohua;Luo, Jiebo;Wang, Haohong
    • Journal of Multimedia Information System
    • /
    • v.1 no.1
    • /
    • pp.55-67
    • /
    • 2014
  • Image topic and emotion analysis is an important component of online image retrieval, which nowadays has become very popular in the widely growing social media community. However, due to the gaps between images and texts, there is very limited work in literature to detect one image's Topics and Emotions in a unified framework, although topics and emotions are two levels of semantics that often work together to comprehensively describe one image. In this work, a unified model, Joint Topic/Emotion Multi-Modal Hierarchical Latent Dirichlet Allocation (JTE-MMHLDA) model, which extends previous LDA, mmLDA, and JST model to capture topic and emotion information at the same time from heterogeneous data, is proposed. Specifically, a two level graphical structured model is built to realize sharing topics and emotions among the whole document collection. The experimental results on a Flickr dataset indicate that the proposed model efficiently discovers images' topics and emotions, and significantly outperform the text-only system by 4.4%, vision-only system by 18.1% in topic detection, and outperforms the text-only system by 7.1%, vision-only system by 39.7% in emotion detection.

  • PDF