• Title/Summary/Keyword: applications

Search Result 40,456, Processing Time 0.065 seconds

Study on the Difference in Intake Rate by Kidney in Accordance with whether the Bladder is Shielded and Injection method in 99mTc-DMSA Renal Scan for Infants (소아 99mTc-DMSA renal scan에서 방광차폐유무와 방사성동위원소 주입방법에 따른 콩팥섭취율 차이에 관한 연구)

  • Park, Jeong Kyun;Cha, Jae Hoon;Kim, Kwang Hyun;An, Jong Ki;Hong, Da Young;Seong, Hyo Jin
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.20 no.2
    • /
    • pp.27-31
    • /
    • 2016
  • Purpose $^{99m}Tc-DMSA$ renal scan is a test for the comparison of the function by imaging the parenchyma of the kidneys by the cortex of a kidney and by computing the intake ratio of radiation by the left and right kidney. Since the distance between the kidneys and the bladder is not far given the bodily structure of an infant, the bladder is included in the examination domain. Research was carried out with the presumption that counts of bladder would impart an influence on the kidneys at the time of this renal scan. In consideration of the special feature that only a trace amount of a RI is injected in a pediatric examination, research on the method of injection was also carried out concurrently. Materials and Methods With 34 infants aged between 1 month to 12 months for whom a $^{99m}Tc-DMSA$ renal scan was implemented on the subjects, a Post IMAGE was acquired in accordance with the test time after having injected the same quantity of DMSA of 0.5mCi. Then, after having acquired an additional image by shielding the bladder by using a circular lead plate for comparison purposes, a comparison was made by illustrating the percentile of (Lt. Kidney counts + Rt. Kidney counts)/ Total counts, by drawing the same sized ROI (length of 55.2mm X width of 70.0mm). In addition, in the format of a 3-way stopcock, a Heparin cap and direct injection into the patient were performed in accordance with RI injection methods. The differences in the count changes in accordance with each of the methods were compared by injecting an additional 2cc of saline into the 3-way stopcock and Heparin cap. Results The image prior to shielding of the bladder displayed a kidney intake rate with a deviation of $70.9{\pm}3.18%$ while the image after the shielding of the bladder displayed a kidney intake rate with a deviation of $79.4{\pm}5.19%$, thereby showing approximately 6.5~8.5% of difference. In terms of the injection method, the method that used the 3-way form, a deviation of $68.9{\pm}2.80%$ prior to the shielding and a deviation of $78.1{\pm}5.14%$ after the shielding were displayed. In the method of using a Heparin cap, a deviation of $71.3{\pm}5.14%$ prior to the shielding and a deviation of $79.8{\pm}3.26%$ after the shielding were displayed. Lastly, in the method of direct injection into the patient, a deviation of $75.1{\pm}4.30%$ prior to the shielding and a deviation of $82.1{\pm}2.35%$ after the shielding were displayed, thereby illustrating differences in the kidney intake rates in the order of direct injection, a Heparin cap and the 3-way methods. Conclusion Since a substantially minute quantity of radiopharmaceuticals is injected for infants in comparison to adults, the cases of having shielded the bladder by removing radiation of the bladder displayed kidney intake rates that are improved from those of the cases of not having shielded the bladder. Although there are difficulties in securing blood vessels, it is deemed that the method of direct injection would be more helpful in acquisition of better images since it displays improved kidney intake rate in comparison to other methods.

  • PDF

A Study on the Determination of Scan Speed in Whole Body Bone Scan Applying Oncoflash (Oncoflash를 적용한 전신 뼈 영상 검사의 스캔 속도 결정에 관한 연구)

  • Yang, Gwang-Gil;Jung, Woo-Young
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.13 no.3
    • /
    • pp.56-60
    • /
    • 2009
  • Purpose: The various studies and efforts to develop program are in progress in the field of nuclear medicine for the purpose of reducing scan time. The Oncoflash is one of the programs used in whole body bone scan which allows to maintain the image quality while to reduce scan time. When Those applications are used in clinical setting, both the image quality and reduction of scan time should be considered, therefore, the purpose of this study was to determine the criteria for proper scan speed. Materials and Methods: The subjects of this study were the patients who underwent whole body bone scan at the departments of nuclear medicine in the Asan Medical Center located in Seoul from 1st to 10th, July, 2008. The whole body bone images obtained in the scan speed of 30cm/min were classified by the total counts into under 800 K, and over 800 K, 900 K, 1,000 K, 1,500 K, and 2,000 K. The image quality were assessed qualitatively and the percentages of those of 1,000K and under of total counts were calculated. The FWHM before and after applying the Oncoflash were analyzed using images obtained in $^{99m}Tc$ Flood and 4-Quadrant bar phantom in order to compare the resolution according to the amount of total counts by the application of the Oncoflash. Considering the counts of the whole body bone scan, the dosed 2~5 mCi were used. 152 patients underwent the measurement in which the counts of Patient Postioning Monitor (PPM) were measured with including head and the parts of chest which the starting point of whole body bone scan from 7th to 26th, August, 2008. The correlations with total counts obtained in the scan speed of 30cm/min among them were analyzed (The exclusion criteria were after over six hours of applying isotopes or low amount of doses). Results: The percentage of the whole body bone image which has the geometric average of total counts of under 1,000K among them obtained in the scan speed of 30cm/min were 17.6%(n=58) of 329 patients. The qualitative analysis of the image groups according to the whole body counts showed that the images of under 1,000K were assessed to have coarse particles and increased noises. The analysis on the FWHM of the images before and after applying the Oncoflash showed that, in the case of PPM counts of under 3.6 K, FWHM values after applying the Oncoflash were higher than that before applying the Oncoflash, whereas, in the case of that of over 3.6 K, the FWHM after applying the Oncoflash were not higher than that before applying the Oncoflash. The average of total counts at 2.5~3.0 K, 3.1~3.5 K, 3.6~4.0 k, 4.1~4.5 K, 4.6~5.0 K, 5.1~6.0 K, 6.1~7.0 K, and 7.1 K over (in PPM) were $965{\pm}173\;K$, $1084{\pm}154\;K$, $1242{\pm}186\;K$, $1359{\pm}170\;K$, $1405{\pm}184\;K$, $1640{\pm}376\;K$, $1,771{\pm}324\;K$, and $1,972{\pm}385\;K$, respectively and the correlations between the counts in PPM and the total counts of image obtained in the scan speed of 30 cm/min demonstrated strong correlation (r=.775, p<.01). Conclusions: In the case of PPM coefficient over 3.6 K, the image quality obtained in the scan speed of 30cm/min and after applying the Oncoflash was similar to that obtained in the scan speed of 15 cm/min. In the case of total counts over 1,000 K, it is expected to reduce scan time without any damage on the image quality. In the case of total counts under 1,000 K, however, the image quality were decreased even though the Oncoflash is applied, so it is recommended to perform the re-image in the scan speed of 15 cm/min.

  • PDF

SNR and ADC Changes at Increasing b Values among Patients with Lumbar Vertebral Compression Fracture on 1.5T MR Diffusion Weighted Images (1.5T MR 기기를 이용한 확산강조영상에서 b Value의 증가에 따른 요추압박골절 환자의 신호대 잡음비와 현성 확산 계수의 변화)

  • Cho, Jae-Hwan;Park, Cheol-Soo;Lee, Sun-Yeob;Kim, Bo-Hui
    • Progress in Medical Physics
    • /
    • v.21 no.1
    • /
    • pp.52-59
    • /
    • 2010
  • To examine among patients with vertebral compression fracture the extent to which signal-to-noise ratio (SNR) and Apparent Diffusion Coefficient (ADC) values at the lumbar vertebral compression fracture site vary on diffusion-weighted MR images according to varying b values on the 1.5T MR device. Diffusion-weighted MR images of 30 patients with compression fracture due to chronic osteoporosis who underwent vertebral MRI from Jan. 2008 to Nov. 2009 were respectively obtained using a 1.5-T MR scanner with the b values increased from 400, 600, 800, 1,000 to $1,200\;s/mm^2$. For diffusion-weighted MR images with different b values, the signal-to-noise ratio (SNR) was assessed at three sites: the site of compression fracture of the lumbar vertebral body at L1 to L5, and both the upper and lower discs of the said fracture site, while for ADC map images with different b values, the SNR and ADC were respectively assessed at those three sites. As a quantitative analysis, diffusion-weighted MR images and ADC map images with b value of $400\;s/mm^2$ (the base b values) were respectively compared with the corresponding images with each different b value. As far as qualitative analysis is concerned, for both diffusion-weighted MR and ADC map images with b value of $400\;s/mm^2$, the extent to which signal intensity values obtained at the site of compression fracture of the lumbar vertebral body at L1 to L5 vary according to the increasing b values were examined. The quantitative analysis found that for both diffusion-weighted MR and ADC map images, as the b values increased, the SNR were relatively lowered at all the three sites, compared to the base b value. Also, it was found that as the b values increased, ADC valueswere relatively lowered at all the three sites on ADC map images. On the other hand, the qualitative analysis found that as the b values increased to more than $400\;s/mm^2$, the signal intensity gradually decreased at all the sites, while at the levels of more than $1,000\;s/mm^2$, severe image noises appeared at all of the three sites. In addition, higher signal intensity was found at the site of compression fracture of the lumbar vertebral body than at the discs. Findings showed that with the b value being increased, both the signal-to-noise ratio (SNR) and Apparent Diffusion Coefficient (ADC) values gradually decreased at all the sites of the lumbar vertebral compression fracture and both the upper and lower discs of the fracture site, suggesting that there is a possibility of a wider range of applications to assessment of various vertebral pathologies by utilizing multi b values in the diffusion-weighted MRI examination.

[ $^1H$ ] MR Spectroscopy of the Normal Human Brains: Comparison between Signa and Echospeed 1.5 T System (정상 뇌의 수소 자기공명분광 소견: 1.5 T Signa와 Echospeed 자기공명영상기기에서의 비교)

  • Kang Young Hye;Lee Yoon Mi;Park Sun Won;Suh Chang Hae;Lim Myung Kwan
    • Investigative Magnetic Resonance Imaging
    • /
    • v.8 no.2
    • /
    • pp.79-85
    • /
    • 2004
  • Purpose : To evaluate the usefulness and reproducibility of $^1H$ MRS in different 1.5 T MR machines with different coils to compare the SNR, scan time and the spectral patterns in different brain regions in normal volunteers. Materials and Methods : Localized $^1H$ MR spectroscopy ($^1H$ MRS) was performed in a total of 10 normal volunteers (age; 20-45 years) with spectral parameters adjusted by the autoprescan routine (PROBE package). In all volunteers, MRS was performed in a three times using conventional MRS (Signa Horizon) with 1 channel coil and upgraded MRS (Echospeed plus with EXCITE) with both 1 channel and 8 channel coil. Using these three different machines and coils, SNRs of the spectra in both phantom and volunteers and (pre)scan time of MRS were compared. Two regions of the human brain (basal ganglia and deep white matter) were examined and relative metabolite ratios (NAA/Cr, Cho/Cr, and mI/Cr ratios) were measured in all volunteers. For all spectra, a STEAM localization sequence with three-pulse CHESS $H_2O$ suppression was used, with the following acquisition parameters: TR=3.0/2.0 sec, TE=30 msec, TM=13.7 msec, SW=2500 Hz, SI=2048 pts, AVG : 64/128, and NEX=2/8 (Signa/Echospeed). Results : The SNR was about over $30\%$ higher in Echospeed machine and time for prescan and scan was almost same in different machines and coils. Reliable spectra were obtained on both MRS systems and there were no significant differences in spectral patterns and relative metabolite ratios in two brain regions (p>0.05). Conclusion : Both conventional and new MRI systems are highly reliable and reproducible for $^1H$ MR spectroscopic examinations in human brains and there are no significant differences in applications for $^1H$ MRS between two different MRI systems.

  • PDF

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.

The effects of temperatures on the development of Oriental -tobacco budmoth, Heliothis assulta Guenee, and control effects of Thuricide $HP^{(R)}$- (고추담배나방의 생태 및 방제에 관한 연구 -온도가 담배나방의 생육에 미치는 영향 및 Thuricide $HP^{(R)}$의 방제 효과-)

  • Chung C. S.;Hyun J. S.
    • Korean journal of applied entomology
    • /
    • v.19 no.1 s.42
    • /
    • pp.57-65
    • /
    • 1980
  • The oriental tobacco budmoth, Heliothis assulta Guenee were reared under various temperatures; $20^{\circ}C,\;25^{\circ}C,\;30^{\circ}C$ and the control effects of Thuricide $HP^{(R)}$ were examined. The results obtained were as fellows: 1. The adult longevity of oriental tobacco budmoth was 11.35 days, and 3.00 days for preovipositional period, 4.75 days for ovipositional Period, and 3.50 days for postovipositional period. 2. The total number of eggs laid by a female were 307 at $20^{\circ}C$, 413 at $25^{\circ}C$ and 189 at $30^{\circ}C$. The number of eggs per female per day were 64.05 in average. 3. The average egg Periods were 7.71 days at $20^{\circ}C$, 4.12 days at $25^{\circ}C$ and 3.58 days at $30^{\circ}C$ and the hatchiabilities were $71.25\%,\;78.49\%\;and\;81.05\%$ at the respective incubation temperatures. 4. The larval developmental periods were 43.51 days at $20^{\circ}C$, 21.79 days at $25^{\circ}C$ and 18.05 days at $25^{\circ}C$ and the mortalities were $80.70\%,\;95.93\%$ and $87.01\%$ at the respective temperatures. 5. The pupal developmental periods were 24.22 days at $20^{\circ}C$, 12.36 days at $25^{\circ}C$ and 11.50 days at $30^{\circ}C$ and the mortalities at the respective temperatures were $18.18\%,\;42.11\%\;and\;40.00\%$. 6. The calculated threshold temperatures for the development were $11.61^{\circ}C$ for the eggs, $11.96^{\circ}C$ for the larvae, and $10.06^{\circ}C$ for the pupae. The estimated total effective temperatures were 60.41 day degrees for e eggs, 319.35 day degrees for the larvae, 222.66 day degrees for the pupae, and overall total effective temperatures, however, would be ranged 640-660 day degrees if the reproductive period of the adult was considered. 7. The relationship between the overall developmental periods and the rearing temperature could be Y=-4.272X+155.39 (r=0.9105), where Y; number of days required to complete the life cycle, X; treated temperatures. 8. The control effects of Thuricide $HP^{(R)}$ were $73.43\%$ for spray and $58.22\%$ for bait applications.

  • PDF

A Hybrid Recommender System based on Collaborative Filtering with Selective Use of Overall and Multicriteria Ratings (종합 평점과 다기준 평점을 선택적으로 활용하는 협업필터링 기반 하이브리드 추천 시스템)

  • Ku, Min Jung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.85-109
    • /
    • 2018
  • Recommender system recommends the items expected to be purchased by a customer in the future according to his or her previous purchase behaviors. It has been served as a tool for realizing one-to-one personalization for an e-commerce service company. Traditional recommender systems, especially the recommender systems based on collaborative filtering (CF), which is the most popular recommendation algorithm in both academy and industry, are designed to generate the items list for recommendation by using 'overall rating' - a single criterion. However, it has critical limitations in understanding the customers' preferences in detail. Recently, to mitigate these limitations, some leading e-commerce companies have begun to get feedback from their customers in a form of 'multicritera ratings'. Multicriteria ratings enable the companies to understand their customers' preferences from the multidimensional viewpoints. Moreover, it is easy to handle and analyze the multidimensional ratings because they are quantitative. But, the recommendation using multicritera ratings also has limitation that it may omit detail information on a user's preference because it only considers three-to-five predetermined criteria in most cases. Under this background, this study proposes a novel hybrid recommendation system, which selectively uses the results from 'traditional CF' and 'CF using multicriteria ratings'. Our proposed system is based on the premise that some people have holistic preference scheme, whereas others have composite preference scheme. Thus, our system is designed to use traditional CF using overall rating for the users with holistic preference, and to use CF using multicriteria ratings for the users with composite preference. To validate the usefulness of the proposed system, we applied it to a real-world dataset regarding the recommendation for POI (point-of-interests). Providing personalized POI recommendation is getting more attentions as the popularity of the location-based services such as Yelp and Foursquare increases. The dataset was collected from university students via a Web-based online survey system. Using the survey system, we collected the overall ratings as well as the ratings for each criterion for 48 POIs that are located near K university in Seoul, South Korea. The criteria include 'food or taste', 'price' and 'service or mood'. As a result, we obtain 2,878 valid ratings from 112 users. Among 48 items, 38 items (80%) are used as training dataset, and the remaining 10 items (20%) are used as validation dataset. To examine the effectiveness of the proposed system (i.e. hybrid selective model), we compared its performance to the performances of two comparison models - the traditional CF and the CF with multicriteria ratings. The performances of recommender systems were evaluated by using two metrics - average MAE(mean absolute error) and precision-in-top-N. Precision-in-top-N represents the percentage of truly high overall ratings among those that the model predicted would be the N most relevant items for each user. The experimental system was developed using Microsoft Visual Basic for Applications (VBA). The experimental results showed that our proposed system (avg. MAE = 0.584) outperformed traditional CF (avg. MAE = 0.591) as well as multicriteria CF (avg. AVE = 0.608). We also found that multicriteria CF showed worse performance compared to traditional CF in our data set, which is contradictory to the results in the most previous studies. This result supports the premise of our study that people have two different types of preference schemes - holistic and composite. Besides MAE, the proposed system outperformed all the comparison models in precision-in-top-3, precision-in-top-5, and precision-in-top-7. The results from the paired samples t-test presented that our proposed system outperformed traditional CF with 10% statistical significance level, and multicriteria CF with 1% statistical significance level from the perspective of average MAE. The proposed system sheds light on how to understand and utilize user's preference schemes in recommender systems domain.

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.

Performance Analysis of Frequent Pattern Mining with Multiple Minimum Supports (다중 최소 임계치 기반 빈발 패턴 마이닝의 성능분석)

  • Ryang, Heungmo;Yun, Unil
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.1-8
    • /
    • 2013
  • Data mining techniques are used to find important and meaningful information from huge databases, and pattern mining is one of the significant data mining techniques. Pattern mining is a method of discovering useful patterns from the huge databases. Frequent pattern mining which is one of the pattern mining extracts patterns having higher frequencies than a minimum support threshold from databases, and the patterns are called frequent patterns. Traditional frequent pattern mining is based on a single minimum support threshold for the whole database to perform mining frequent patterns. This single support model implicitly supposes that all of the items in the database have the same nature. In real world applications, however, each item in databases can have relative characteristics, and thus an appropriate pattern mining technique which reflects the characteristics is required. In the framework of frequent pattern mining, where the natures of items are not considered, it needs to set the single minimum support threshold to a too low value for mining patterns containing rare items. It leads to too many patterns including meaningless items though. In contrast, we cannot mine any pattern if a too high threshold is used. This dilemma is called the rare item problem. To solve this problem, the initial researches proposed approximate approaches which split data into several groups according to item frequencies or group related rare items. However, these methods cannot find all of the frequent patterns including rare frequent patterns due to being based on approximate techniques. Hence, pattern mining model with multiple minimum supports is proposed in order to solve the rare item problem. In the model, each item has a corresponding minimum support threshold, called MIS (Minimum Item Support), and it is calculated based on item frequencies in databases. The multiple minimum supports model finds all of the rare frequent patterns without generating meaningless patterns and losing significant patterns by applying the MIS. Meanwhile, candidate patterns are extracted during a process of mining frequent patterns, and the only single minimum support is compared with frequencies of the candidate patterns in the single minimum support model. Therefore, the characteristics of items consist of the candidate patterns are not reflected. In addition, the rare item problem occurs in the model. In order to address this issue in the multiple minimum supports model, the minimum MIS value among all of the values of items in a candidate pattern is used as a minimum support threshold with respect to the candidate pattern for considering its characteristics. For efficiently mining frequent patterns including rare frequent patterns by adopting the above concept, tree based algorithms of the multiple minimum supports model sort items in a tree according to MIS descending order in contrast to those of the single minimum support model, where the items are ordered in frequency descending order. In this paper, we study the characteristics of the frequent pattern mining based on multiple minimum supports and conduct performance evaluation with a general frequent pattern mining algorithm in terms of runtime, memory usage, and scalability. Experimental results show that the multiple minimum supports based algorithm outperforms the single minimum support based one and demands more memory usage for MIS information. Moreover, the compared algorithms have a good scalability in the results.