• Title/Summary/Keyword: Document-Classification

Search Result 448, Processing Time 0.027 seconds

Review of International Cases for Managing Input Data in Safety Assessment for High-Level Radioactive Waste Deep Disposal Facilities (고준위방사성폐기물 심층처분시설 안전성평가 입력자료 관리를 위한 해외사례 분석)

  • Mi Kyung Kang;Hana Park;Sunju Park;Hae Sik Jeong;Woon Sang Yoon;Jeonghwan Lee
    • Economic and Environmental Geology
    • /
    • v.56 no.6
    • /
    • pp.887-897
    • /
    • 2023
  • Leading waste disposal countries, such as Sweden, Switzerland, and the United Kingdom, conduct safety assessments across all stages of High-Level Radioactive Waste Deep Geological Disposal Facilities-from planning and site selection to construction, operation, closure, and post-closure management. As safety assessments are repeatedly performed at each stage, generating vast amounts of diverse data over extended periods, it is essential to construct a database for safety assessment and establish a data management system. In this study, the safety assessment data management systems of leading countries, were analyzed, categorizing them into 1) input and reference data for safety assessments, 2) guidelines for data management, 3) organizational structures for data management, and 4) computer systems for data management. While each country exhibited differences in specific aspects, commonalities included the classification of safety assessment input data based on disposal system components, the establishment of organizations to supply, use, and manage this data, and the implementation of quality management systems guided by instructions and manuals. These cases highlight the importance of data management systems and document management systems for securing the safety and enhancing the reliability of High-Level Radioactive Waste Disposal Facilities. To achieve this, the classification of input data that can be flexibly and effectively utilized, ensuring the consistency and traceability of input data, and establishing a quality management system for input data and document management are necessary.

Investigations on Techniques and Applications of Text Analytics (텍스트 분석 기술 및 활용 동향)

  • Kim, Namgyu;Lee, Donghoon;Choi, Hochang;Wong, William Xiu Shun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.471-492
    • /
    • 2017
  • The demand and interest in big data analytics are increasing rapidly. The concepts around big data include not only existing structured data, but also various kinds of unstructured data such as text, images, videos, and logs. Among the various types of unstructured data, text data have gained particular attention because it is the most representative method to describe and deliver information. Text analysis is generally performed in the following order: document collection, parsing and filtering, structuring, frequency analysis, and similarity analysis. The results of the analysis can be displayed through word cloud, word network, topic modeling, document classification, and semantic analysis. Notably, there is an increasing demand to identify trending topics from the rapidly increasing text data generated through various social media. Thus, research on and applications of topic modeling have been actively carried out in various fields since topic modeling is able to extract the core topics from a huge amount of unstructured text documents and provide the document groups for each different topic. In this paper, we review the major techniques and research trends of text analysis. Further, we also introduce some cases of applications that solve the problems in various fields by using topic modeling.

A Term Cluster Query Expansion Model Based on Classification Information of Retrieval Documents (검색 문서의 분류 정보에 기반한 용어 클러스터 질의 확장 모델)

  • Kang, Hyun-Su;Kang, Hyun-Kyu;Park, Se-Young;Lee, Yong-Seok
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.7-12
    • /
    • 1999
  • 정보 검색 시스템은 사용자 질의의 키워드들과 문서들의 유사성(similarity)을 기준으로 관련 문서들을 순서화하여 사용자에게 제공한다. 그렇지만 인터넷 검색에 사용되는 질의는 일반적으로 짧기 때문에 보다 유용한 질의를 만들고자 하는 노력이 지금까지 계속되고 있다. 그러나 키워드에 포함된 정보가 제한적이기 때문에 이에 대한 보완책으로 사용자의 적합성 피드백을 이용하는 방법을 널리 사용하고 있다. 본 논문에서는 일반적인 적합성 피드백의 가장 큰 단점인 빈번한 사용자 참여는 지양하고, 시스템에 기반한 적합성 피드백에서 배제한 사용자 참여를 유도하는 검색 문서의 분류 정보에 기반한 용어 클러스터 질의 확장 모델(Term Cluster Query Expansion Model)을 제안한다. 이 방법은 검색 시스템에 의해 검색된 상위 n개의 문서에 대하여 분류기를 이용하여 각각의 문서에 분류 정보를 부여하고, 문서에 부여된 분류 정보를 이용하여 분류 정보의 수(m)만큼으로 문서들을 그룹을 짓는다. 적합성 피드백 알고리즘을 이용하여 m개의 그룹으로부터 각각의 용어 클러스터(Term Cluster)를 생성한다. 이 클러스터가 사용자에게 문서 대신에 피드백의 자료로 제공된다. 실험 결과, 적합성 알고리즘 중 Rocchio방법을 이용할 때 초기 질의보다 나은 성능을 보였지만, 다른 연구에서 보여준 성능 향상은 나타내지 못했다. 그 이유는 분류기의 오류와 문서의 특성상 한 영역으로 규정짓기 어려운 문서가 존재하기 때문이다. 그러나 검색하고자 하는 사용자의 관심 분야나 찾고자 하는 성향이 다르더라도 시스템에 종속되지 않고 유연하게 대처하며 검색 성능(retrieval effectiveness)을 향상시킬 수 있다.사용되고 있어 적응에 문제점을 가지기도 하였다. 본 연구에서는 그 동안 계속되어 온 한글과 한잔의 사용에 관한 논쟁을 언어심리학적인 연구 방법을 통해 조사하였다. 즉, 글을 읽는 속도, 글의 의미를 얼마나 정확하게 이해했는지, 어느 것이 더 기억에 오래 남는지를 측정하여 어느 쪽의 입장이 옮은 지를 판단하는 것이다. 실험 결과는 문장을 읽는 시간에서는 한글 전용문인 경우에 월등히 빨랐다. 그러나. 내용에 대한 기억 검사에서는 국한 혼용 조건에서 더 우수하였다. 반면에, 이해력 검사에서는 천장 효과(Ceiling effect)로 두 조건간에 차이가 없었다. 따라서, 본 실험 결과에 따르면, 글의 읽기 속도가 중요한 문서에서는 한글 전용이 좋은 반면에 글의 내용 기억이 강조되는 경우에는 한자를 혼용하는 것이 더 효율적이다.이 높은 활성을 보였다. 7. 이상을 종합하여 볼 때 고구마 끝순에는 페놀화합물이 다량 함유되어 있어 높은 항산화 활성을 가지며, 아질산염소거능 및 ACE저해활성과 같은 생리적 효과도 높아 기능성 채소로 이용하기에 충분한 가치가 있다고 판단된다.등의 관련 질환의 예방, 치료용 의약품 개발과 기능성 식품에 효과적으로 이용될 수 있음을 시사한다.tall fescue 23%, Kentucky bluegrass 6%, perennial ryegrass 8%) 및 white clover 23%를 유지하였다. 이상의 결과를 종합할 때, 초종과 파종비율에 따른 혼파초지의 건물수량과 사료가치의 차이를 확인할 수 있었으며, 레드 클로버 + 혼파 초지가 건물수량과 사료가치를 높이는데 효과적이었다.\ell}$ 이었으며 , yeast extract 첨가(添加)하여 배양시(培養時)는 yeast extract

  • PDF

Study of Spectral Doppler Waveform Interpretation and Nomenclature in Peripheral Artery (말초 동맥 분광 도플러 파형 해석 및 명명법에 대한 고찰)

  • Ji, Myeong-Hoon;Seoung, Youl-Hun
    • Journal of the Korean Society of Radiology
    • /
    • v.16 no.5
    • /
    • pp.649-660
    • /
    • 2022
  • In 1959, Satomura used spectral Doppler ultrasound to express the velocity of red blood cells according to time change, and Kato defined a zero-base line that could tell the direction of blood flow, making it possible to know the direction of blood flow. This became the basis for the widely used classifications of Triphasic, Biphasic, and Monophasic. However, the above classification has limitations that confuse users with the meaning and timing of use in a clinical environment. As a result, the American Society for Vascular Medicine (SVM) and the Society for Vascular Ultrasound (SVU) A consensus document on Doppler waveform analysis was declared by the joint committee. This study tried to review this consensus and to suggest nomenclature and modifiers that can be used in the domestic vascular ultrasound clinical field. The joint committee formed by SVM and SVU recommended that the use of the triphasic waveform and the biphasic waveform be used as a multiphasic waveform rather than being used due to the ambiguity of interpretation. In addition, it was agreed to name the hybrid-type waveform, which is a monophasic and high-resistance waveform, which has always been a problem of interpretation in a clinical environment, as an intermediate resistive waveform. In addition, in order to increase the communication efficiency between the interpreter and the sonographer, waveform analysis was classified into a main descriptor and a modifier, and it was recommended to use a single nomenclature by unifying various synonyms. It is expected that this literature review will provide accurate arterial spectral Doppler waveform interpretation and an agreed-upon nomenclature to radiologists performing vascular ultrasound examination in clinical practice, and will be utilized as basic data that can contribute to the improvement of public health.

A Design of Coding System for Record Management of Nuclear Power Plant (영광원자력발전소(靈光原子力發電所) 자료관리 코딩시스템의 신설계(新設計))

  • Shin, Seon Woo
    • Journal of the Korean Society for information Management
    • /
    • v.2 no.2
    • /
    • pp.115-149
    • /
    • 1985
  • The classification systems generally used in library are for the external information such as books and periodicals, so it is difficult to apply them to the internal information of specific organization such as documents and records. Therefore, it is necessary for the information centers of any enterprise or specific organization to found the Record Management System (RMS), and to develop and use the specific classification system which is called the coding system for internal information. Documents and components of nuclear power plant are controlled by coding (numbering) system, but they are different each other greatly. So the coordination of work and the cross reference between components and documents are difficult. In this paper, the unified Record Management Coding System is developed for Nuclear Power Plant unit 7 & 8 in Korea on the basis of the existing document and component coding system. The expected effects are the easy cross reference between components and documents, the effective coordination of work and the consistence of central file.

  • PDF

A Concept of Multi-Layered Database for the Maintenance and Management of Bridges (교량의 유지관리를 위한 멀티레이어 데이터베이스 개념)

  • Kim, Bong-Geun;Yi, Jin-Hoon;Lee, Sang-Ho
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.20 no.3
    • /
    • pp.393-404
    • /
    • 2007
  • A concept of multi-layered database is proposed for the integrated operation of bridge information in this study. The multi-layered database is a logically integrated database composed of standardized information layers. The standardized information layers represent the data sets that can be unified, and they are defined by standardized information models. Classification system of bridge component was used as a basis of the multi-layered database, and code system based on the classification system was employed as a key integrator to manipulate the distributed data located on the different information layers. In addition, data level indicating priorities of information layers was defined to support strategic planning of the multi-layered database construction. As a proof of concept, a prototype of multi-layered database for object-oriented 3-D shape information and structural calculation document was built. Data consistency check of the semantically same data in the two different information layer was demonstrated, It is expected that the proposed concept can assure the integrity and consistency of information in the bridge information management.

A Research on Enhancement of Text Categorization Performance by using Okapi BM25 Word Weight Method (Okapi BM25 단어 가중치법 적용을 통한 문서 범주화의 성능 향상)

  • Lee, Yong-Hun;Lee, Sang-Bum
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.12
    • /
    • pp.5089-5096
    • /
    • 2010
  • Text categorization is one of important features in information searching system which classifies documents according to some criteria. The general method of categorization performs the classification of the target documents by eliciting important index words and providing the weight on them. Therefore, the effectiveness of algorithm is so important since performance and correctness of text categorization totally depends on such algorithm. In this paper, an enhanced method for text categorization by improving word weighting technique is introduced. A method called Okapi BM25 has been proved its effectiveness from some information retrieval engines. We applied Okapi BM25 and showed its good performance in the categorization. Various other words weights methods are compared: TF-IDF, TF-ICF and TF-ISF. The target documents used for this experiment is Reuter-21578, and SVM and KNN algorithms are used. Finally, modified Okapi BM25 shows the most excellent performance.

Optical Character Recognition for Hindi Language Using a Neural-network Approach

  • Yadav, Divakar;Sanchez-Cuadrado, Sonia;Morato, Jorge
    • Journal of Information Processing Systems
    • /
    • v.9 no.1
    • /
    • pp.117-140
    • /
    • 2013
  • Hindi is the most widely spoken language in India, with more than 300 million speakers. As there is no separation between the characters of texts written in Hindi as there is in English, the Optical Character Recognition (OCR) systems developed for the Hindi language carry a very poor recognition rate. In this paper we propose an OCR for printed Hindi text in Devanagari script, using Artificial Neural Network (ANN), which improves its efficiency. One of the major reasons for the poor recognition rate is error in character segmentation. The presence of touching characters in the scanned documents further complicates the segmentation process, creating a major problem when designing an effective character segmentation technique. Preprocessing, character segmentation, feature extraction, and finally, classification and recognition are the major steps which are followed by a general OCR. The preprocessing tasks considered in the paper are conversion of gray scaled images to binary images, image rectification, and segmentation of the document's textual contents into paragraphs, lines, words, and then at the level of basic symbols. The basic symbols, obtained as the fundamental unit from the segmentation process, are recognized by the neural classifier. In this work, three feature extraction techniques-: histogram of projection based on mean distance, histogram of projection based on pixel value, and vertical zero crossing, have been used to improve the rate of recognition. These feature extraction techniques are powerful enough to extract features of even distorted characters/symbols. For development of the neural classifier, a back-propagation neural network with two hidden layers is used. The classifier is trained and tested for printed Hindi texts. A performance of approximately 90% correct recognition rate is achieved.

A Study on the Musical Theme Clustering for Searching Note Sequences (음렬 탐색을 위한 주제소절 자동분류에 관한 연구)

  • 심지영;김태수
    • Journal of the Korean Society for information Management
    • /
    • v.19 no.3
    • /
    • pp.5-30
    • /
    • 2002
  • In this paper, classification feature is selected with focus of musical content, note sequences pattern, and measures similarity between note sequences followed by constructing clusters by similar note sequences, which is easier for users to search by showing the similar note sequences with the search result in the CBMR system. Experimental document was $\ulcorner$A Dictionary of Musical Themes$\lrcorner$, the index of theme bar focused on classical music and obtained kern-type file. Humdrum Toolkit version 1.0 was used as note sequences treat tool. The hierarchical clustering method is by stages focused on four-type similarity matrices by whether the note sequences segmentation or not and where the starting point is. For the measurement of the result, WACS standard is used in the case of being manual classification and in the case of the note sequences starling from any point in the note sequences, there is used common feature pattern distribution in the cluster obtained from the clustering result. According to the result, clustering with segmented feature unconnected with the starting point Is higher with distinct difference compared with clustering with non-segmented feature.

A Feasibility Study on Adopting Individual Information Cognitive Processing as Criteria of Categorization on Apple iTunes Store

  • Zhang, Chao;Wan, Lili
    • The Journal of Information Systems
    • /
    • v.27 no.2
    • /
    • pp.1-28
    • /
    • 2018
  • Purpose More than 7.6 million mobile apps could be approved on both Apple iTunes Store and Google Play. For managing those existed Apps, Apple Inc. established twenty-four primary categories, as well as Google Play had thirty-three primary categories. However, all of their categorizations have appeared more and more problems in managing and classifying numerous apps, such as app miscategorized, cross-attribution problems, lack of categorization keywords index, etc. The purpose of this study focused on introducing individual information cognitive processing as the classification criteria to update the current categorization on Apple iTunes Store. Meanwhile, we tried to observe the effectiveness of the new criteria from a classification process on Apple iTunes Store. Design/Methodology/Approach A research approach with four research stages were performed and a series of mixed methods was developed to identify the feasibility of adopting individual information cognitive processing as categorization criteria. By using machine-learning techniques with Term Frequency-Inverse Document Frequency and Singular Value Decomposition, keyword lists were extracted. By using the prior research results related to car app's categorization, we developed individual information cognitive processing. Further keywords extracting process from the extracted keyword lists was performed. Findings By TF-IDF and SVD, keyword lists from more than five thousand apps were extracted. Furthermore, we developed individual information cognitive processing that included a categorization teaching process and learning process. Three top three keywords for each category were extracted. By comparing the extracted results with prior studies, the inter-rater reliability for two different methods shows significant reliable, which proved the individual information cognitive processing to be reliable as criteria of categorization on Apple iTunes Store. The updating suggestions for Apple iTunes Store were discussed in this paper and the results of this paper may be useful for app store hosts to improve the current categorizations on app stores as well as increasing the efficiency of app discovering and locating process for both app developers and users.