• Title/Summary/Keyword: document analysis

Search Result 1,192, Processing Time 0.03 seconds

A Dynamic Recommendation System Using User Log Analysis and Document Similarity in Clusters (사용자 로그 분석과 클러스터 내의 문서 유사도를 이용한 동적 추천 시스템)

  • 김진수;김태용;최준혁;임기욱;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.5
    • /
    • pp.586-594
    • /
    • 2004
  • Because web documents become creation and disappearance rapidly, users require the recommend system that offers users to browse the web document conveniently and correctly. One largely untapped source of knowledge about large data collections is contained in the cumulative experiences of individuals finding useful information in the collection. Recommendation systems attempt to extract such useful information by capturing and mining one or more measures of the usefulness of the data. The existing Information Filtering system has the shortcoming that it must have user's profile. And Collaborative Filtering system has the shortcoming that users have to rate each web document first and in high-quantity, low-quality environments, users may cover only a tiny percentage of documents available. And dynamic recommendation system using the user browsing pattern also provides users with unrelated web documents. This paper classifies these web documents using the similarity between the web documents under the web document type and extracts the user browsing sequential pattern DB using the users' session information based on the web server log file. When user approaches the web document, the proposed Dynamic recommendation system recommends Top N-associated web documents set that has high similarity between current web document and other web documents and recommends set that has sequential specificity using the extracted informations and users' session information.

Auto Detection System of Personal Information based on Images and Document Analysis (이미지와 문서 분석을 통한 개인 정보 자동 검색 시스템)

  • Cho, Jeong-Hyun;Ahn, Cheol-Woong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.15 no.5
    • /
    • pp.183-192
    • /
    • 2015
  • This paper proposes Personal Information Auto Detection(PIAD) System to prevent leakage of Personal informations in document and image files that can be used by mobile service provider. The proposed system is to automatically detect the images and documents that contain personal informations and shows the result to the user. The PIAD is divided into the selection step for fast and accurate retrieval images and analysis which is composed of SURF, erosion and dilation, FindContours algorithm. The result of proposed PIAD system showed more than 98% accuracy by selection and analysis steps, 267 images detection of 272 images.

Analysis of Factors Influencing Journal Articles' Citations (KSLA 연구논문 - 논문 인용의 영향요인 분석)

  • Yu, Jae-Bok;Kim, Jae-Ho
    • KSLA Bulletin
    • /
    • s.2
    • /
    • pp.16-27
    • /
    • 2010
  • Recently, the valuation of research papers has been greatly emphasized, and their citation has been accepted as a very useful indicator. In this study, we performed correlation analyses between the paper citation counts and 11 explanatory variables of morphological and conceptual factors with a test dataset of the papers of 11 journals in library and information science. The analysis results of the correlations show that only the document similarity has 5% or more standardized variances(r2) with paper citation counts and the document similarity with citation counts get higher as the variable value increases.

  • PDF

Quality Analysis of the Request for Proposals of Public Information Systems Project : System Operational Concept (공공정보화사업 제안요청서 품질분석 : 시스템 운영 개념을 중심으로)

  • Park, Sanghwi;Kim, Byungcho
    • Journal of Information Technology Services
    • /
    • v.18 no.2
    • /
    • pp.37-54
    • /
    • 2019
  • The purpose of this study is to present an evaluation model to measure the clarification level of stakeholder requirements of public sector software projects in the Republic of Korea. We tried to grasp the quality of proposal request through evaluation model. It also examines the impact of the level of stakeholder requirements on the level of system requirements. To do this, we analyzed existing research models and related standards related to business requirements and stakeholder requirements, and constructed evaluation models for the system operation concept documents in the ISO/IEC/IEEE 29148. The system operation concept document is a document prepared by organizing the requirements of stakeholders in the organization and sharing the intention of the organization. The evaluation model proposed in this study focuses on evaluating whether the contents related to the system operation concept are faithfully written in the request for proposal. The evaluation items consisted of three items: 'organization status', 'desired changes', and 'operational constraints'. The sample extracted 217 RFPs in the national procurement system. As a result of the analysis, the evaluation model proved to be valid and the internal consistency was maintained. The level of system operation concept was very low, and it was also found to affect the quality of system requirements. It is more important to clearly write stakeholders' requirements than the functional requirements. we propose a news classification methods for sentiment analysis that is effective for bankruptcy prediction model.

An Analysis of the Duration of the Construction Document Phase of Large Public Building Projects Delivered by the Total Solution Service from 2009 to 2014 (공공기관 발주 대형건축공사 실시설계 기간에 대한 분석 및 개선방안 -2009년부터 2014년까지 공공발주 맞춤형 서비스 프로젝트를 중심으로 -)

  • Jung, Wooyoung;Lee, Taewon;Rhee, Pung-wook;Lee, Ghang
    • Korean Journal of Construction Engineering and Management
    • /
    • v.16 no.4
    • /
    • pp.89-97
    • /
    • 2015
  • This paper analyzes the duration of the construction document (CD) phase of 42 large public building projects delivered by the total solution service of Public Procurement Services in Korea from 2009 to 2014. The quality of construction documents significantly affects the quality of construction and facility management. Thus, securing appropriate time for the CD phase during project planning is important for the quality of a project. Currently, the duration of the CD phase is planned based on the construction costs of a project following a notice of the Ministry of Land, Infrastructure and Transport. However, our analysis results showed that the correlation between the actual duration of the CD phase and construction costs is very weak. The actual CD phase takes 1.33-1.79 times longer than the planned duration. The practitioners who were interviewed, were already aware that the correlation between the duration of the CD phase and the construction costs is weak. They identified the complexity of the project, the extent of the design changes, project type, client characteristics, and others as more influential factors on the CD phase than the construction costs. To improve the quality of CDs, a new guideline for determining an adequate CD phase duration should be studied and developed.

Identifying the Research Fronts in Korean Library and Information Science by Document Co-citation Analysis (문헌동시인용 분석을 통한 한국 문헌정보학의 연구 전선 파악)

  • Lee, Jae Yun
    • Journal of the Korean Society for information Management
    • /
    • v.32 no.4
    • /
    • pp.77-106
    • /
    • 2015
  • By document co-citation analysis with Korean Citation Index (KCI) data, this study accurately identified the research fronts and hot topics in Korean library and information science (LIS) from 2004 to 2013. 159 core papers in LIS domain and their citations are scraped manually from Korean Citation Index web site. In the cluster analysis and network analysis, 159 core papers were grouped into 27 clusters with multiple papers and 8 singlton clusters. Among the 27 clusters which have multple papers, 'LIS education' cluster was the largest with 16 core papers, and 'citation analysis & intellectual structure analysis' cluster had the strongest citation impact according to the ehs-index. Closer observation of the citations to the core papers in each research front showed that 67.5% of the citations were made by LIS research papers and 32.5% of the citations were made by non-LIS research papers. Considering the share of citations and the citation impact growth index, 'local documentation', 'citation analysis & intellectual structure analysis', and 'research trends analysis' were identified as the most emerging research front in Korean library and information science. The analytical methods used in this study have great potential in discovering the characteristics of research fronts in Korean interdisciplinary research domains.

Reliability Analysis and Utilization of BIM-based Highway Construction Output Volume (BIM기반 고속도로 공사 물량산출 신뢰성 검토 및 활용)

  • Jung, Guk-Young;Woo, Jeong-Won;Kang, Kyeong-Don;Shin, Jae-Choul
    • Journal of KIBIM
    • /
    • v.3 no.3
    • /
    • pp.9-18
    • /
    • 2013
  • In case of applying the BIM method in the civil engineering of irregularly shaped structure, BIM method began to be introduced in the current building engineering area compared with the expected effects of the relatively high construction productivity has been recognized. In this paper, I have developed quantity calculation algorithms applying it to earthwork and bridge construction, tunnel construction, retaining wall construction, culvert construction and implemented BIM based 3D-BIM Modeling quantity calculation. Structure work in which errors occurred in range between -6.28% ~ 5.17%. Especially, understanding of the problem and improvement of the existing 2D-CAD based of quantity calculation through rock type quantity calculation error in range of -14.36% ~ 13.07% of earthwork quantity calculation. It's benefit and applicability of BIM method in civil engineering. In addition, routine method for quantity of earthwork has the same error tolerance negligible for that of structure work. But, rock type's quantity calculated as the error appears significantly to the reliability of 2D-based volume calculation shows that the problem could be. Through the estimating quantity of earthwork based 3D-BIM, proposed method has better reliability than routine method. BIM, as well as the design, construction, maintenance levels of information when you consider the benefits of integration, the introduction of BIM design in civil engineering and the possibility of applying for the effectiveness was confirmed. In addition, as the beginning phase of information integration, quantity document automation program has been developed for activation of BIM. And automatically enter the program code number, linkage and manual volume calculation program, quantity document automation programs, such as the development is now underway, and step-by-step procedures and methods are presented.

A study on Improvement and Analysis of Records Management Status for Disaster Safety Archives in Online Environment (재난안전정보 아카이브 구축을 위한 온라인 기록정보 현황분석 및 개선방안 연구)

  • Han, Hui-Jeong;Park, Tae-Yeon;Oh, Hyo-Jung;Kim, Yong
    • Journal of Korean Library and Information Science Society
    • /
    • v.48 no.2
    • /
    • pp.187-213
    • /
    • 2017
  • In order for preemptive response and prevention against disasters, it is necessary to systematically collect, preserve, manage, and utilize the information resources of records related with disaster safety. Therefore, this study analyzed the types and status of text-based document archives among the records information resources which were published and produced by the disaster safety-related institutions via online. Detail types analysis of disasters in document archives also was conducted. Based on filed interviews, the actual users' requirements for the disaster safety information archives are converged. The ultimate goal of this study is to establish a basis for building of disaster safety information archives by deriving improved management strategies of disaster safety records resources.

Analysis of the National Police Agency business trends using text mining (텍스트 마이닝 기법을 이용한 경찰청 업무 트렌드 분석)

  • Sun, Hyunseok;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.301-317
    • /
    • 2019
  • There has been significant research conducted on how to discover various insights through text data using statistical techniques. In this study we analyzed text data produced by the Korean National Police Agency to identify trends in the work by year and compare work characteristics among local authorities by identifying distinctive keywords in documents produced by each local authority. A preprocessing according to the characteristics of each data was conducted and the frequency of words for each document was calculated in order to draw a meaningful conclusion. The simple term frequency shown in the document is difficult to describe the characteristics of the keywords; therefore, the frequency for each term was newly calculated using the term frequency-inverse document frequency weights. The L2 norm normalization technique was used to compare the frequency of words. The analysis can be used as basic data that can be newly for future police work improvement policies and as a method to improve the efficiency of the police service that also help identify a demand for improvements in indoor work.

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.