• Title/Summary/Keyword: Text data

Search Result 2,956, Processing Time 0.034 seconds

Maritime Safety Tribunal Ruling Analysis using SentenceBERT (SentenceBERT 모델을 활용한 해양안전심판 재결서 분석 방법에 대한 연구)

  • Bori Yoon;SeKil Park;Hyerim Bae;Sunghyun Sim
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.29 no.7
    • /
    • pp.843-856
    • /
    • 2023
  • The global surge in maritime traffic has resulted in an increased number of ship collisions, leading to significant economic, environmental, physical, and human damage. The causes of these maritime accidents are multifaceted, often arising from a combination of crew judgment errors, negligence, complexity of navigation routes, weather conditions, and technical deficiencies in the vessels. Given the intricate nuances and contextual information inherent in each incident, a methodology capable of deeply understanding the semantics and context of sentences is imperative. Accordingly, this study utilized the SentenceBERT model to analyze maritime safety tribunal decisions over the last 20 years in the Busan Sea area, which encapsulated data on ship collision incidents. The analysis revealed important keywords potentially responsible for these incidents. Cluster analysis based on the frequency of specific keyword appearances was conducted and visualized. This information can serve as foundational data for the preemptive identification of accident causes and the development of strategies for collision prevention and response.

An Analysis Model for Journal Evaluation in Special Libraries (전문도서관에서의 학술지 평가를 위한 경제성 분석에 관한 연구)

  • Jung, Hye-Kyung;Jung, Eun-Joo
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.1 s.59
    • /
    • pp.121-138
    • /
    • 2006
  • This study attempts to derive an economic analysis model for journal evaluation and conducts a case analysis based on the model. Total costs are calculated and include administrative fees (such as binding, ordering, claiming, etc.) and subscription costs. The model quantifies qualitative benefits to users, a utility that combines usage data that has also been evaluated in the existing economic analysis models. The model is designed by the usage statistics of the web-based electronic journals, which have become important resources for research. Rankings are assigned based on how items are utilized to the goal of the mother institution. In the case study based on the KDI School Library, the highest ranking off was assigned to journals that patrons used for citation in their outputs. For journals that were used background information, i.e., full text downloading or browsing , each was assigned ranking of 2 and 1, respectively. According to the analysis, the top 20 journals provided 75% of the entire library utility, showing different user behaviors among different cohorts. We expect that the model makes it possible for librarians to measure the value of journals. It can provide a basic tool for journal selection, particularly in special libraries with custom needs.

Analysis method of patent document to Forecast Patent Registration (특허 등록 예측을 위한 특허 문서 분석 방법)

  • Koo, Jung-Min;Park, Sang-Sung;Shin, Young-Geun;Jung, Won-Kyo;Jang, Dong-Sik
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.4
    • /
    • pp.1458-1467
    • /
    • 2010
  • Recently, imitation and infringement rights of an intellectual property are being recognized as impediments to nation's industrial growth. To prevent the huge loss which comes from theses impediments, many researchers are studying protection and efficient management of an intellectual property in various ways. Especially, the prediction of patent registration is very important part to protect and assert intellectual property rights. In this study, we propose the patent document analysis method by using text mining to predict whether the patent is registered or rejected. In the first instance, the proposed method builds the database by using the word frequencies of the rejected patent documents. And comparing the builded database with another patent documents draws the similarity value between each patent document and the database. In this study, we used k-means which is partitioning clustering algorithm to select criteria value of patent rejection. In result, we found conclusion that some patent which similar to rejected patent have strong possibility of rejection. We used U.S.A patent documents about bluetooth technology, solar battery technology and display technology for experiment data.

Semantic Dependency Link Topic Model for Biomedical Acronym Disambiguation (의미적 의존 링크 토픽 모델을 이용한 생물학 약어 중의성 해소)

  • Kim, Seonho;Yoon, Juntae;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.652-665
    • /
    • 2014
  • Many important terminologies in biomedical text are expressed as abbreviations or acronyms. We newly suggest a semantic link topic model based on the concepts of topic and dependency link to disambiguate biomedical abbreviations and cluster long form variants of abbreviations which refer to the same senses. This model is a generative model inspired by the latent Dirichlet allocation (LDA) topic model, in which each document is viewed as a mixture of topics, with each topic characterized by a distribution over words. Thus, words of a document are generated from a hidden topic structure of a document and the topic structure is inferred from observable word sequences of document collections. In this study, we allow two distinct word generation to incorporate semantic dependencies between words, particularly between expansions (long forms) of abbreviations and their sentential co-occurring words. Besides topic information, the semantic dependency between words is defined as a link and a new random parameter for the link presence is assigned to each word. As a result, the most probable expansions with respect to abbreviations of a given abstract are decided by word-topic distribution, document-topic distribution, and word-link distribution estimated from document collection though the semantic dependency link topic model. The abstracts retrieved from the MEDLINE Entrez interface by the query relating 22 abbreviations and their 186 expansions were used as a data set. The link topic model correctly predicted expansions of abbreviations with the accuracy of 98.30%.

Study on Improving the System for the Revitalization and Efficient Management of the Local Commercial Area (지역상권 활성화 및 효율적 관리를 위한 제도 개선방안 연구)

  • Kim, Seung-Hee;Kim, Young-Ki
    • Journal of Distribution Science
    • /
    • v.11 no.5
    • /
    • pp.55-62
    • /
    • 2013
  • Purpose - This study aims to determine the problems and limitations of the Commercial Area Activation System, which was created by a special law for promoting traditional markets and shopping districts to revitalize and efficiently manage the central commercial area in different regions. We also suggest different options for its improvement. Research design, data, and methodology - We also look into the problems of which is being promoted as a demonstration project, from the aspects of legal text and guidelines. Results - The current commercial area activation system has several problems. First, the establishment of a comprehensive basic plan on the commercial area activation is not a requirement. Second, the benefit principle should be established to prevent the moral laxity of merchants who serve important roles in the main components of the commercial area activation business when they conduct their business. Third, the current special law constrains the commercial management organization, as under the civil law yields a limitation on finding a profitable business model. Fourth, to efficiently, constructing a system that links the other central government businesses and is needed. into a regional development budget or a budget for funding small businesses that the central government can control, which is effective. Further, we offer some suggestions for medium- and long-term policies. First, an integrated coordination mechanism at the central office level should be installed while setting the basic policy to revitalize the Based on this policy, local governments need a system that exclusively based on the after establishing a comprehensive plan for urban regeneration and getting approval from the integration organization. Second, a system that enables an understanding of the problems with business promotion by monitoring the procedure of supporting projects and regularly assessing business achievements is needed. Third, a plan is needed for resolving conflicts between various interested parties that adopts the commercial area activation system for carrying out a total redevelopment of the commercial area where small shops are densely located. A market maintenance project has been conducted as a means to recover our traditional market, which was economically depressed, and to revive the local economy, but it is mostly conducted in the form of reconstruction or redevelopment and represents the interests of landowners and merchants. Thus, it is most likely to lead to a gradual disappearance of traditional markets. Conclusions - This study looks primarily into the problems that appeared in the legal text or the guidelines regarding the direction of improvement of the commercial area activation business that has been going on as a demonstration project since 2011 and suggests some solutions.

  • PDF

An Analysis on the Health Education Content Suggested in the 7th Curriculum of Elementary School Education (제7차 초등학교 교육과정에 제시된 보건교육 내용 분석)

  • Kim, Gha-Ok;Park, Young-Soo
    • The Journal of Korean Society for School & Community Health Education
    • /
    • v.2 no.2
    • /
    • pp.39-55
    • /
    • 2001
  • The purpose of this study was to provide the necessary fundamental data in composing a systematic text content of the public health by analyzing each text, domain, and teaching contents suggested in the textbooks as well as teachers' guides of the 7th elementary school education curriculum, while the study subjects were as follows. 1. The health education content suggested in the 7th physical education curriculum were analyzed and examined. 2. The health teaching content of each textbook in the 7th elementary school curriculum was to be analyzed and examined. In order to resolve the above research issues, the physical, spiritual, and social domain along with the (1) Proper living habit, (2) Health and nutrition, (3) Sex education, (4) Prevention of the sense-organic diseases, (5) Cleanliness of food, (6) Oral hygiene, (7) Individual health and public health, (8) Safety in living, (9) Abuse and usage of medication, educational content suggested in the 7tand (10) Environment pollution focused around the health of the elementary school education curriculum was analyzed and its outcome was as below First, compared with the 6th elementary school education curriculum, the health content suggested in the 7th elementary school education curriculum was decreased. Second, although each grade's teaching content of the health domain in the physical education was considered in its structure following after the according systems, they were preponderant in partial subjects such as the safety in living, nutrition, proper living habit, sport, and health in sport. oo. Third, the health education content was organized in 4 units such as the physical growth and development, prevention of diseases, safe living, and leisure living(leisure, spiritual health, and etc.) for the 3rd and 4th grade. Then, as for 5th and 6th grade, it was organized in 3 units such as the understanding the human body, prevention of disease, and leisure and safe living. Fourth, in the physical educational health domain, a strong point was constructed within the physical, spiritual, and social areas of the elementary school physical education. Fifth, the number of the public health education contents directly related with the health education was 43 as with 25 indirect contents. Sixth, each grade's domain unit structure of the public health content was heavy upon the physical and social area throughout every grade while in opposite, the spiritual domain' s unit structure was weak. In according to each grade, the physical domain was stressed in 4, 5, and 6 grades while the social domain was stressed in 1, 5, and 6 grades.

  • PDF

Descriptive Characteristics of the Label Texts Related to Earth Science: Toward Educationally Meaningful Communication (교육적으로 유의미한 의사소통을 위한 지구과학 관련 전시 라벨의 서술 특징)

  • Kim, Chan-Jong;Park, Eun-Ji;Yoon, Sae-Yeol;Lee, Sun-Kyung
    • Journal of the Korean earth science society
    • /
    • v.33 no.1
    • /
    • pp.94-109
    • /
    • 2012
  • The purpose of this study is to analyse the descriptive characteristics of the label texts related to Earth Science at a science museum and a natural history museum in Korea. The data were collected from Korean National Science Museum and Seodaemun Natural History Museum. The analysis framework was modified according to the Systemic Functional Linguistics. As a result, characteristics of the labels are 1) mostly declarative sentences, 2) appropriate amount of scientific information, and 3) mainly 'facts'. Moreover, all of the text genre are 4) 'logical expositions'. Particularly in Korean National Science Museum, the labels present 5) more scientific words among the entire terminologies and 6) more than half subjects omitted or long nominalized. Those results may imply that the labels can lead one-way communication regarding the culture of science rather than two-way. This study presents the descriptive characteristics of the label texts to make educationally meaningful communication possible by building an open structure between visitors' own culture in everyday life and the culture of science.

The Effect of Using Analogies in High School Earth Science Classes (고등학교 10학년 과학 '지구의 변동' 단원에서 비유물 활용의 효과)

  • Kim, Sang-Dal;Kim, Jong-Hee;Lee, Ji-Eun
    • Journal of the Korean earth science society
    • /
    • v.24 no.5
    • /
    • pp.393-401
    • /
    • 2003
  • The purpose of this study is to research the effect of using analogies in high school earth science classes. According to the usage of TWA model, three types of teaching strategies were developed: text developer-generated, teacher-generated, and student-generated analogies. The model described in this paper began with a task analysis of highschool science textbooks for grade 10 to identify how the textbook authors used analogies to explain plate tectonics concepts. In this study, 210 students were sampled from first graders of high school. After 7 classes, the consciousness of students was investigated with questionnaires. The results are as follows: 1. Many plate-tectonics analogies are used in high school science textbooks (total 25). Teachers and authors construct effective analogies to help students build on their relevant knowledge by applying it to new knowledge acquired from textbooks. 2. Analysis of the data indicate that instruction using student-generated analogies was more effective than others. But in the class in conveying complicated concepts (ex. transform fault), teacher-generated instruction was effective. Teachers need to be aware of the weakness of using analogies in order to select the most appropriate analogies. 3. Making analogies in general, as well as using analogies have systematic steps. Analogies should be used after considering student's preconception, teacher's consciousness and text author's intention to use analogies as powerful instructional tools.

A proper folder recommendation technique using frequent itemsets for efficient e-mail classification (효과적인 이메일 분류를 위한 빈발 항목집합 기반 최적 이메일 폴더 추천 기법)

  • Moon, Jong-Pil;Lee, Won-Suk;Chang, Joong-Hyuk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.33-46
    • /
    • 2011
  • Since an e-mail has been an important mean of communication and information sharing, there have been much effort to classify e-mails efficiently by their contents. An e-mail has various forms in length and style, and words used in an e-mail are usually irregular. In addition, the criteria of an e-mail classification are subjective. As a result, it is quite difficult for the conventional text classification technique to be adapted to an e-mail classification efficiently. An e-mail classification technique in a commercial e-mail program uses a simple text filtering technique in an e-mail client. In the previous studies on automatic classification of an e-mail, the Naive Bayesian technique based on the probability has been used to improve the classification accuracy, and most of them are on an e-mail in English. This paper proposes the personalized recommendation technique of an email in Korean using a data mining technique of frequent patterns. The proposed technique consists of two phases such as the pre-processing of e-mails in an e-mail folder and the generating a profile for the e-mail folder. The generated profile is used for an e-mail to be classified into the most appropriate e-mail folder by the subjective criteria. The e-mail classification system is also implemented, which adapts the proposed technique.

Analysis of Inquiry Tasks in Earth Unit of the 10th Grade Science Textbooks (10학년 과학 교과서 지구 단원의 탐구 과제 분석)

  • Kim, Jeong-Yul;Kim, Myung-Suk;Park, Ye-Ri
    • Journal of the Korean earth science society
    • /
    • v.26 no.6
    • /
    • pp.501-510
    • /
    • 2005
  • An analysis was done on the “inquiry sections” of Earth Science chapters of 10th grade science textbooks. The Inquiry sections were classified into different types and the frequencies of basic process skills, integrated process skills, and inquiry activities were measured in section to find out whether they sufficiently satisfy the requirements based on the 7th National Curriculum. The number of selected science textbooks that have been used in high school for this study were eleven. The number of inquiry tasks were on an average of 24.0. The types of inquiry sections and the elements of basic and integrated process skills were different in every textbooks. The number of inquiry activities were also different and analyzed more than those presented. They were not integrated activities but presented as scientific process skills. The basic process skills and integrated process skills presented in textbooks were $16\%\;and\;77.2\%$, respectively. However, the distribution of two kinds of process skills were analyzed to be $45.6\%\;and\;55.4\%$, respectively. In the process skills, the frequencies of inferring $(49.5\%)$ and data interpretation (68.7%) were the highest; however, the other process skills including recognizing problem, formulating hypothesis and generalization were not even presented in any of the text books. Due to the lack of the definitions of Science process skills and inquiry activities in the 7th National Curriculum, each text book defined these terms differently. It suggests that the meaning of inquiry, science process skills, and inquiry activities should be operationally defined in the national curriculum and the criteria for construction of inquiry activities are required.