• Title/Summary/Keyword: document classification

Search Result 451, Processing Time 0.028 seconds

A Suggestion of the Direction of Construction Disaster Document Management through Text Data Classification Model based on Deep Learning (딥러닝 기반 분류 모델의 성능 분석을 통한 건설 재해사례 텍스트 데이터의 효율적 관리방향 제안)

  • Kim, Hayoung;Jang, YeEun;Kang, HyunBin;Son, JeongWook;Yi, June-Seong
    • Korean Journal of Construction Engineering and Management
    • /
    • v.22 no.5
    • /
    • pp.73-85
    • /
    • 2021
  • This study proposes an efficient management direction for Korean construction accident cases through a deep learning-based text data classification model. A deep learning model was developed, which categorizes five categories of construction accidents: fall, electric shock, flying object, collapse, and narrowness, which are representative accident types of KOSHA. After initial model tests, the classification accuracy of fall disasters was relatively high, while other types were classified as fall disasters. Through these results, it was analyzed that 1) specific accident-causing behavior, 2) similar sentence structure, and 3) complex accidents corresponding to multiple types affect the results. Two accuracy improvement experiments were then conducted: 1) reclassification, 2) elimination. As a result, the classification performance improved with 185.7% when eliminating complex accidents. Through this, the multicollinearity of complex accidents, including the contents of multiple accident types, was resolved. In conclusion, this study suggests the necessity to independently manage complex accidents while preparing a system to describe the situation of future accidents in detail.

A Study on the Correlation Lee Jae Ma's Four Types of Essential Physical Constitution and From index - Concerning Male and Female 3rd Year High School Student in Some Urban and Rural Areas - (사상체질류형(四象體質類型)과 체격(體格) 및 신체형태지수(身體形態指數)와의 비교연구(比較硏究) - 도시(都市)와 농촌(農村)의 일부지역(一部地域) 남녀고등학교(男女高等學校) 3학년(學年) 학생(學生)을 대상(對象)으로 -)

  • Lee, Moon-Ho;Hong, Sun-Yong
    • Journal of Sasang Constitutional Medicine
    • /
    • v.2 no.1
    • /
    • pp.71-85
    • /
    • 1990
  • 673 third-year students of boy's and girl's high schools in Taegu city and Kuni-gun and Youngyang-gun and Euisung-gun in Kyongbuk province were selected and investigated as the subject, of this study on the correlation between Lee Jae Ma's Four Types of Essential Physical Constitution and Physical Form index. The result of the study was found as follows. First, as for Height, the findings were not identical with the expression that "person of shaoyin(minor Yin) Type are short and small -- while person of Taiyin (major Yin) Type are tall and big," cited in classification of four different constitutions in a document named "Dong-Eu-Su-Se-Bo-Won". Comparison of persons of Shaoyang (minor Yang) - Type proved infitness due to the lack of data on Height in documents concerning Lee Jae Ma's four types of essential physical constitution. Second, as for Sitting Height, the correlation was prored between the findings of this study and the expression in the above document describing external physical characteristics of shaoyin-Type persons that "The upper part and' the lower part of the body are well balanced", but in point of Relative Sitting Height, none between the two. Third, as for Chest-Girth and Relative Chest-Girth plus Weight and Relative Weight, the expression that "Persons of Taiyin(major Yin) Type have the largest physique of the lour types of persons in the characteristics of external physical features, and that they also tend to have continental(widechest or large-scaled) character and strong nerve, that they are stoutly-built and fal." proved to have the correlation with the findings of this study. Fourth, in point of Chest-Girth and Relative Chest-Girth, this study found that its findings have the correlation with the phrase that "Chests are well developed upwar -- and sturdy and solid." in describing the characteristics of Shaoyang (minor Yang)-Type person' external physical features, and that with the phrase that "Chests are narrow" in the case of Shaoyin(minor Yin)-Type persons. Fifth, as for Weight and Relative Weight, the correlation was found between the findings and the expression that "shaoyin-Type persons have comparatively less flesh" as a sign of external physical characteristics of Shaoyin-Type persons. The above-cited findings proved that there exist some correlations between external physique of the Lee Jae Ma's four types of essential constitution and physical Form Indexes. Actually, however, in clinical classification, it is desirable that this approach should be consulted only after carefull consideration based on Lee Jae Ma's theory, and it seems imperative to continue the study of objectivization of Lee's theory.

  • PDF

A Deep Learning-based Depression Trend Analysis of Korean on Social Media (딥러닝 기반 소셜미디어 한글 텍스트 우울 경향 분석)

  • Park, Seojeong;Lee, Soobin;Kim, Woo Jung;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.1
    • /
    • pp.91-117
    • /
    • 2022
  • The number of depressed patients in Korea and around the world is rapidly increasing every year. However, most of the mentally ill patients are not aware that they are suffering from the disease, so adequate treatment is not being performed. If depressive symptoms are neglected, it can lead to suicide, anxiety, and other psychological problems. Therefore, early detection and treatment of depression are very important in improving mental health. To improve this problem, this study presented a deep learning-based depression tendency model using Korean social media text. After collecting data from Naver KonwledgeiN, Naver Blog, Hidoc, and Twitter, DSM-5 major depressive disorder diagnosis criteria were used to classify and annotate classes according to the number of depressive symptoms. Afterwards, TF-IDF analysis and simultaneous word analysis were performed to examine the characteristics of each class of the corpus constructed. In addition, word embedding, dictionary-based sentiment analysis, and LDA topic modeling were performed to generate a depression tendency classification model using various text features. Through this, the embedded text, sentiment score, and topic number for each document were calculated and used as text features. As a result, it was confirmed that the highest accuracy rate of 83.28% was achieved when the depression tendency was classified based on the KorBERT algorithm by combining both the emotional score and the topic of the document with the embedded text. This study establishes a classification model for Korean depression trends with improved performance using various text features, and detects potential depressive patients early among Korean online community users, enabling rapid treatment and prevention, thereby enabling the mental health of Korean society. It is significant in that it can help in promotion.

Review of International Cases for Managing Input Data in Safety Assessment for High-Level Radioactive Waste Deep Disposal Facilities (고준위방사성폐기물 심층처분시설 안전성평가 입력자료 관리를 위한 해외사례 분석)

  • Mi Kyung Kang;Hana Park;Sunju Park;Hae Sik Jeong;Woon Sang Yoon;Jeonghwan Lee
    • Economic and Environmental Geology
    • /
    • v.56 no.6
    • /
    • pp.887-897
    • /
    • 2023
  • Leading waste disposal countries, such as Sweden, Switzerland, and the United Kingdom, conduct safety assessments across all stages of High-Level Radioactive Waste Deep Geological Disposal Facilities-from planning and site selection to construction, operation, closure, and post-closure management. As safety assessments are repeatedly performed at each stage, generating vast amounts of diverse data over extended periods, it is essential to construct a database for safety assessment and establish a data management system. In this study, the safety assessment data management systems of leading countries, were analyzed, categorizing them into 1) input and reference data for safety assessments, 2) guidelines for data management, 3) organizational structures for data management, and 4) computer systems for data management. While each country exhibited differences in specific aspects, commonalities included the classification of safety assessment input data based on disposal system components, the establishment of organizations to supply, use, and manage this data, and the implementation of quality management systems guided by instructions and manuals. These cases highlight the importance of data management systems and document management systems for securing the safety and enhancing the reliability of High-Level Radioactive Waste Disposal Facilities. To achieve this, the classification of input data that can be flexibly and effectively utilized, ensuring the consistency and traceability of input data, and establishing a quality management system for input data and document management are necessary.

Investigations on Techniques and Applications of Text Analytics (텍스트 분석 기술 및 활용 동향)

  • Kim, Namgyu;Lee, Donghoon;Choi, Hochang;Wong, William Xiu Shun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.471-492
    • /
    • 2017
  • The demand and interest in big data analytics are increasing rapidly. The concepts around big data include not only existing structured data, but also various kinds of unstructured data such as text, images, videos, and logs. Among the various types of unstructured data, text data have gained particular attention because it is the most representative method to describe and deliver information. Text analysis is generally performed in the following order: document collection, parsing and filtering, structuring, frequency analysis, and similarity analysis. The results of the analysis can be displayed through word cloud, word network, topic modeling, document classification, and semantic analysis. Notably, there is an increasing demand to identify trending topics from the rapidly increasing text data generated through various social media. Thus, research on and applications of topic modeling have been actively carried out in various fields since topic modeling is able to extract the core topics from a huge amount of unstructured text documents and provide the document groups for each different topic. In this paper, we review the major techniques and research trends of text analysis. Further, we also introduce some cases of applications that solve the problems in various fields by using topic modeling.

A Term Cluster Query Expansion Model Based on Classification Information of Retrieval Documents (검색 문서의 분류 정보에 기반한 용어 클러스터 질의 확장 모델)

  • Kang, Hyun-Su;Kang, Hyun-Kyu;Park, Se-Young;Lee, Yong-Seok
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.7-12
    • /
    • 1999
  • 정보 검색 시스템은 사용자 질의의 키워드들과 문서들의 유사성(similarity)을 기준으로 관련 문서들을 순서화하여 사용자에게 제공한다. 그렇지만 인터넷 검색에 사용되는 질의는 일반적으로 짧기 때문에 보다 유용한 질의를 만들고자 하는 노력이 지금까지 계속되고 있다. 그러나 키워드에 포함된 정보가 제한적이기 때문에 이에 대한 보완책으로 사용자의 적합성 피드백을 이용하는 방법을 널리 사용하고 있다. 본 논문에서는 일반적인 적합성 피드백의 가장 큰 단점인 빈번한 사용자 참여는 지양하고, 시스템에 기반한 적합성 피드백에서 배제한 사용자 참여를 유도하는 검색 문서의 분류 정보에 기반한 용어 클러스터 질의 확장 모델(Term Cluster Query Expansion Model)을 제안한다. 이 방법은 검색 시스템에 의해 검색된 상위 n개의 문서에 대하여 분류기를 이용하여 각각의 문서에 분류 정보를 부여하고, 문서에 부여된 분류 정보를 이용하여 분류 정보의 수(m)만큼으로 문서들을 그룹을 짓는다. 적합성 피드백 알고리즘을 이용하여 m개의 그룹으로부터 각각의 용어 클러스터(Term Cluster)를 생성한다. 이 클러스터가 사용자에게 문서 대신에 피드백의 자료로 제공된다. 실험 결과, 적합성 알고리즘 중 Rocchio방법을 이용할 때 초기 질의보다 나은 성능을 보였지만, 다른 연구에서 보여준 성능 향상은 나타내지 못했다. 그 이유는 분류기의 오류와 문서의 특성상 한 영역으로 규정짓기 어려운 문서가 존재하기 때문이다. 그러나 검색하고자 하는 사용자의 관심 분야나 찾고자 하는 성향이 다르더라도 시스템에 종속되지 않고 유연하게 대처하며 검색 성능(retrieval effectiveness)을 향상시킬 수 있다.사용되고 있어 적응에 문제점을 가지기도 하였다. 본 연구에서는 그 동안 계속되어 온 한글과 한잔의 사용에 관한 논쟁을 언어심리학적인 연구 방법을 통해 조사하였다. 즉, 글을 읽는 속도, 글의 의미를 얼마나 정확하게 이해했는지, 어느 것이 더 기억에 오래 남는지를 측정하여 어느 쪽의 입장이 옮은 지를 판단하는 것이다. 실험 결과는 문장을 읽는 시간에서는 한글 전용문인 경우에 월등히 빨랐다. 그러나. 내용에 대한 기억 검사에서는 국한 혼용 조건에서 더 우수하였다. 반면에, 이해력 검사에서는 천장 효과(Ceiling effect)로 두 조건간에 차이가 없었다. 따라서, 본 실험 결과에 따르면, 글의 읽기 속도가 중요한 문서에서는 한글 전용이 좋은 반면에 글의 내용 기억이 강조되는 경우에는 한자를 혼용하는 것이 더 효율적이다.이 높은 활성을 보였다. 7. 이상을 종합하여 볼 때 고구마 끝순에는 페놀화합물이 다량 함유되어 있어 높은 항산화 활성을 가지며, 아질산염소거능 및 ACE저해활성과 같은 생리적 효과도 높아 기능성 채소로 이용하기에 충분한 가치가 있다고 판단된다.등의 관련 질환의 예방, 치료용 의약품 개발과 기능성 식품에 효과적으로 이용될 수 있음을 시사한다.tall fescue 23%, Kentucky bluegrass 6%, perennial ryegrass 8%) 및 white clover 23%를 유지하였다. 이상의 결과를 종합할 때, 초종과 파종비율에 따른 혼파초지의 건물수량과 사료가치의 차이를 확인할 수 있었으며, 레드 클로버 + 혼파 초지가 건물수량과 사료가치를 높이는데 효과적이었다.\ell}$ 이었으며 , yeast extract 첨가(添加)하여 배양시(培養時)는 yeast extract

  • PDF

Study of Spectral Doppler Waveform Interpretation and Nomenclature in Peripheral Artery (말초 동맥 분광 도플러 파형 해석 및 명명법에 대한 고찰)

  • Ji, Myeong-Hoon;Seoung, Youl-Hun
    • Journal of the Korean Society of Radiology
    • /
    • v.16 no.5
    • /
    • pp.649-660
    • /
    • 2022
  • In 1959, Satomura used spectral Doppler ultrasound to express the velocity of red blood cells according to time change, and Kato defined a zero-base line that could tell the direction of blood flow, making it possible to know the direction of blood flow. This became the basis for the widely used classifications of Triphasic, Biphasic, and Monophasic. However, the above classification has limitations that confuse users with the meaning and timing of use in a clinical environment. As a result, the American Society for Vascular Medicine (SVM) and the Society for Vascular Ultrasound (SVU) A consensus document on Doppler waveform analysis was declared by the joint committee. This study tried to review this consensus and to suggest nomenclature and modifiers that can be used in the domestic vascular ultrasound clinical field. The joint committee formed by SVM and SVU recommended that the use of the triphasic waveform and the biphasic waveform be used as a multiphasic waveform rather than being used due to the ambiguity of interpretation. In addition, it was agreed to name the hybrid-type waveform, which is a monophasic and high-resistance waveform, which has always been a problem of interpretation in a clinical environment, as an intermediate resistive waveform. In addition, in order to increase the communication efficiency between the interpreter and the sonographer, waveform analysis was classified into a main descriptor and a modifier, and it was recommended to use a single nomenclature by unifying various synonyms. It is expected that this literature review will provide accurate arterial spectral Doppler waveform interpretation and an agreed-upon nomenclature to radiologists performing vascular ultrasound examination in clinical practice, and will be utilized as basic data that can contribute to the improvement of public health.

A Design of Coding System for Record Management of Nuclear Power Plant (영광원자력발전소(靈光原子力發電所) 자료관리 코딩시스템의 신설계(新設計))

  • Shin, Seon Woo
    • Journal of the Korean Society for information Management
    • /
    • v.2 no.2
    • /
    • pp.115-149
    • /
    • 1985
  • The classification systems generally used in library are for the external information such as books and periodicals, so it is difficult to apply them to the internal information of specific organization such as documents and records. Therefore, it is necessary for the information centers of any enterprise or specific organization to found the Record Management System (RMS), and to develop and use the specific classification system which is called the coding system for internal information. Documents and components of nuclear power plant are controlled by coding (numbering) system, but they are different each other greatly. So the coordination of work and the cross reference between components and documents are difficult. In this paper, the unified Record Management Coding System is developed for Nuclear Power Plant unit 7 & 8 in Korea on the basis of the existing document and component coding system. The expected effects are the easy cross reference between components and documents, the effective coordination of work and the consistence of central file.

  • PDF

A Concept of Multi-Layered Database for the Maintenance and Management of Bridges (교량의 유지관리를 위한 멀티레이어 데이터베이스 개념)

  • Kim, Bong-Geun;Yi, Jin-Hoon;Lee, Sang-Ho
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.20 no.3
    • /
    • pp.393-404
    • /
    • 2007
  • A concept of multi-layered database is proposed for the integrated operation of bridge information in this study. The multi-layered database is a logically integrated database composed of standardized information layers. The standardized information layers represent the data sets that can be unified, and they are defined by standardized information models. Classification system of bridge component was used as a basis of the multi-layered database, and code system based on the classification system was employed as a key integrator to manipulate the distributed data located on the different information layers. In addition, data level indicating priorities of information layers was defined to support strategic planning of the multi-layered database construction. As a proof of concept, a prototype of multi-layered database for object-oriented 3-D shape information and structural calculation document was built. Data consistency check of the semantically same data in the two different information layer was demonstrated, It is expected that the proposed concept can assure the integrity and consistency of information in the bridge information management.

A Research on Enhancement of Text Categorization Performance by using Okapi BM25 Word Weight Method (Okapi BM25 단어 가중치법 적용을 통한 문서 범주화의 성능 향상)

  • Lee, Yong-Hun;Lee, Sang-Bum
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.12
    • /
    • pp.5089-5096
    • /
    • 2010
  • Text categorization is one of important features in information searching system which classifies documents according to some criteria. The general method of categorization performs the classification of the target documents by eliciting important index words and providing the weight on them. Therefore, the effectiveness of algorithm is so important since performance and correctness of text categorization totally depends on such algorithm. In this paper, an enhanced method for text categorization by improving word weighting technique is introduced. A method called Okapi BM25 has been proved its effectiveness from some information retrieval engines. We applied Okapi BM25 and showed its good performance in the categorization. Various other words weights methods are compared: TF-IDF, TF-ICF and TF-ISF. The target documents used for this experiment is Reuter-21578, and SVM and KNN algorithms are used. Finally, modified Okapi BM25 shows the most excellent performance.