• 제목/요약/키워드: summary

검색결과 3,739건 처리시간 0.026초

퇴원요약 데이터베이스를 이용한 데이터마이닝 기법의 CQI 활동에의 황용 방안 (An application of datamining approach to CQI using the discharge summary)

  • 선미옥;채영문;이해종;이선희;강성홍;호승희
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2000년도 추계정기학술대회:지능형기술과 CRM
    • /
    • pp.289-299
    • /
    • 2000
  • This study provides an application of datamining approach to CQI(Continuous Quality Improvement) using the discharge summary. First, we found a process variation in hospital infection rate by SPC (Statistical Process Control) technique. Second, importance of factors influencing hospital infection was inferred through the decision tree analysis which is a classification method in data-mining approach. The most important factor was surgery followed by comorbidity and length of operation. Comorbidity was further divided into age and principal diagnosis and the length of operation was further divided into age and chief complaint. 24 rules of hospital infection were generated by the decision tree analysis. Of these, 9 rules with predictive prover greater than 50% were suggested as guidelines for hospital infection control. The optimum range of target group in hospital infection control were Identified through the information gain summary. Association rule, which is another kind of datamining method, was performed to analyze the relationship between principal diagnosis and comorbidity. The confidence score, which measures the decree of association, between urinary tract infection and causal bacillus was the highest, followed by the score between postoperative wound disruption find postoperative wound infection. This study demonstrated how datamining approach could be used to provide information to support prospective surveillance of hospital infection. The datamining technique can also be applied to various areas fur CQI using other hospital databases.

  • PDF

Hadoop Based Wavelet Histogram for Big Data in Cloud

  • Kim, Jeong-Joon
    • Journal of Information Processing Systems
    • /
    • 제13권4호
    • /
    • pp.668-676
    • /
    • 2017
  • Recently, the importance of big data has been emphasized with the development of smartphone, web/SNS. As a result, MapReduce, which can efficiently process big data, is receiving worldwide attention because of its excellent scalability and stability. Since big data has a large amount, fast creation speed, and various properties, it is more efficient to process big data summary information than big data itself. Wavelet histogram, which is a typical data summary information generation technique, can generate optimal data summary information that does not cause loss of information of original data. Therefore, a system applying a wavelet histogram generation technique based on MapReduce has been actively studied. However, existing research has a disadvantage in that the generation speed is slow because the wavelet histogram is generated through one or more MapReduce Jobs. And there is a high possibility that the error of the data restored by the wavelet histogram becomes large. However, since the wavelet histogram generation system based on the MapReduce developed in this paper generates the wavelet histogram through one MapReduce Job, the generation speed can be greatly increased. In addition, since the wavelet histogram is generated by adjusting the error boundary specified by the user, the error of the restored data can be adjusted from the wavelet histogram. Finally, we verified the efficiency of the wavelet histogram generation system developed in this paper through performance evaluation.

An Efficient Machine Learning-based Text Summarization in the Malayalam Language

  • P Haroon, Rosna;Gafur M, Abdul;Nisha U, Barakkath
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권6호
    • /
    • pp.1778-1799
    • /
    • 2022
  • Automatic text summarization is a procedure that packs enormous content into a more limited book that incorporates significant data. Malayalam is one of the toughest languages utilized in certain areas of India, most normally in Kerala and in Lakshadweep. Natural language processing in the Malayalam language is relatively low due to the complexity of the language as well as the scarcity of available resources. In this paper, a way is proposed to deal with the text summarization process in Malayalam documents by training a model based on the Support Vector Machine classification algorithm. Different features of the text are taken into account for training the machine so that the system can output the most important data from the input text. The classifier can classify the most important, important, average, and least significant sentences into separate classes and based on this, the machine will be able to create a summary of the input document. The user can select a compression ratio so that the system will output that much fraction of the summary. The model performance is measured by using different genres of Malayalam documents as well as documents from the same domain. The model is evaluated by considering content evaluation measures precision, recall, F score, and relative utility. Obtained precision and recall value shows that the model is trustable and found to be more relevant compared to the other summarizers.

Text Summarization on Large-scale Vietnamese Datasets

  • Ti-Hon, Nguyen;Thanh-Nghi, Do
    • Journal of information and communication convergence engineering
    • /
    • 제20권4호
    • /
    • pp.309-316
    • /
    • 2022
  • This investigation is aimed at automatic text summarization on large-scale Vietnamese datasets. Vietnamese articles were collected from newspaper websites and plain text was extracted to build the dataset, that included 1,101,101 documents. Next, a new single-document extractive text summarization model was proposed to evaluate this dataset. In this summary model, the k-means algorithm is used to cluster the sentences of the input document using different text representations, such as BoW (bag-of-words), TF-IDF (term frequency - inverse document frequency), Word2Vec (Word-to-vector), Glove, and FastText. The summary algorithm then uses the trained k-means model to rank the candidate sentences and create a summary with the highest-ranked sentences. The empirical results of the F1-score achieved 51.91% ROUGE-1, 18.77% ROUGE-2 and 29.72% ROUGE-L, compared to 52.33% ROUGE-1, 16.17% ROUGE-2, and 33.09% ROUGE-L performed using a competitive abstractive model. The advantage of the proposed model is that it can perform well with O(n,k,p) = O(n(k+2/p)) + O(nlog2n) + O(np) + O(nk2) + O(k) time complexity.

Multi-Sized cumulative Summary Structure Driven Light Weight in Frequent Closed Itemset Mining to Increase High Utility

  • Siva S;Shilpa Chaudhari
    • Journal of information and communication convergence engineering
    • /
    • 제21권2호
    • /
    • pp.117-129
    • /
    • 2023
  • High-utility itemset mining (HIUM) has emerged as a key data-mining paradigm for object-of-interest identification and recommendation systems that serve as frequent itemset identification tools, product or service recommendation systems, etc. Recently, it has gained widespread attention owing to its increasing role in business intelligence, top-N recommendation, and other enterprise solutions. Despite the increasing significance and the inability to provide swift and more accurate predictions, most at-hand solutions, including frequent itemset mining, HUIM, and high average- and fast high-utility itemset mining, are limited to coping with real-time enterprise demands. Moreover, complex computations and high memory exhaustion limit their scalability as enterprise solutions. To address these limitations, this study proposes a model to extract high-utility frequent closed itemsets based on an improved cumulative summary list structure (CSLFC-HUIM) to reduce an optimal set of candidate items in the search space. Moreover, it employs the lift score as the minimum threshold, called the cumulative utility threshold, to prune the search space optimal set of itemsets in a nested-list structure that improves computational time, costs, and memory exhaustion. Simulations over different datasets revealed that the proposed CSLFC-HUIM model outperforms other existing methods, such as closed- and frequent closed-HUIM variants, in terms of execution time and memory consumption, making it suitable for different mined items and allied intelligence of business goals.

알코올중독자의 회복척도 CAS(Client Assessment Summary) 한국어판의 타당도 검증 (Validity Verification of a Korean Version of Recovery Scale(Client Assessment Summary) for Alcoholics)

  • 이영선;김수연
    • 한국산학기술학회논문지
    • /
    • 제17권11호
    • /
    • pp.386-394
    • /
    • 2016
  • 본 연구는 치료공동체에 거주하는 알코올 중독자의 회복을 측정하는 도구 CAS(Client Assesment Summary)의 한국어판의 타당도를 검증하여 일반 알코올 중독자의 회복 척도로 사용하기에 수용가능한지에 대한 적합성을 검증하는 것을 목표로 한다. 연구 대상자로 단주 중인 알코올 중독자 205명의 자료를 분석하였으며, 분석은 CAS 척도의 내용타당도, 신뢰도, 요인분석을 통한 구성타당도, 타 회복척도인 ARS, 단주기간, 단주 자기 효능감, 병식, 변화 동기 변수와의 관계 분석을 통해 기준타당도를 검증하는 분석을 시행하였다. 내용타당성 검증 후의 구성타당도 검증을 위한 요인분석 결과, CAS척도는 전체 12문항, 4개 요인으로 구성되었으며 전체 설명력은 76.26%, 공통성 0.6 이상, KMO 값 0.92로 구성타당도를 확인할 수 있었다. 내적일치도 계수는 .92로 신뢰도를 확인하였으며, ARS, 단주기간, 단주 자기효능감, 병식, 변화 동기 변인과의 상관관계로 기준타당도를 확인할 수 있었다. 이러한 검증 과정을 통하여 CAS척도가 치료공동체 뿐 아니라 일반 알코올 중독자에게 사용하기에도 타당한 척도임을 확인하였으며, 이상의 척도가 알코올 중독자의 회복을 평가하는데 학술적, 임상적으로 활용되어 궁극적으로 알코올 중독자의 회복에 기여하기를 기대한다.

8학년 학생들의 탐구 보고서에 나타난 과학방법의 특징 (Characteristics of Scientific Method for the 8th Grade Students‘ Inquiry Reports)

  • 신미영;최승언
    • 한국지구과학회지
    • /
    • 제29권4호
    • /
    • pp.341-351
    • /
    • 2008
  • 본 연구의 목적은 8학년 학생들의 탐구보고서에 제시되어 있는 과학방법의 특징을 조사하려는 것이다. 문헌 연구로부터 과학의 본성을 고려하여 '과학방법과 정보출처 분석'이라는 분석들을 개발하였으며, 이를 사용하여 학생들의 '방법설계', '데이터분석', 정보출처'를 분석하였다. 그리고 분석 결과를 질문수준과 비교하여 '과학방법'이 질문수준의 영향을 받는지 조사하였다. 또한, 학생들이 탐구 활동을 하면서 '과학방법'을 설계할 때 겪는 어려움을 알기 위해 실시한 설문지의 응답을 분석하였다. 결과는 첫째, '방법설계'는 자문과 활동이 있으며, 활동은 실험, 상관연구, 관찰을 말한다. 그 중에서 학생들은 '자문'으로 설계하는 경우가 많았다. 활동을 설계한 경우, 대부분의 학생들은 '실험'을 설계하였다. 둘째, '데이터분석'은 요약, 표, 도표, 그래프 등이 있으며, 학생들은 '요약' 형태로 그들의 데이터를 분석하는 경우가 많았다. 그리고 '요약'은 '단순요약'과 '관계진술'로 구분되었다. 셋째, '정보출처'는 컴퓨터, 도서관, 전문가 상담이 있으며, 대부분의 학생은 정보를 '컴퓨터'에서 구하였다. 넷째, 학생들의 '방법설계'와 '요약'은 질문수준의 영향을 받는 것으로 나타났다. 다섯째. 일부 학생들은 정보가 부족하거나 부정확할 뿐 아니라 정보에 제시된 전문 용어가 어려워 '방법설계'가 어렵다고 하였다.