• Title/Summary/Keyword: Frequency based Text Analysis

Search Result 238, Processing Time 0.025 seconds

A Methodology for Automatic Multi-Categorization of Single-Categorized Documents (단일 카테고리 문서의 다중 카테고리 자동확장 방법론)

  • Hong, Jin-Sung;Kim, Namgyu;Lee, Sangwon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.77-92
    • /
    • 2014
  • Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we propose a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. First, we attempt to find the relationship between documents and topics by using the result of topic analysis for single-categorized documents. Second, we construct a correspondence table between topics and categories by investigating the relationship between them. Finally, we calculate the matching scores for each document to multiple categories. The results imply that a document can be classified into a certain category if and only if the matching score is higher than the predefined threshold. For example, we can classify a certain document into three categories that have larger matching scores than the predefined threshold. The main contribution of our study is that our methodology can improve the applicability of traditional multi-category classifiers by generating multi-categorized documents from single-categorized documents. Additionally, we propose a module for verifying the accuracy of the proposed methodology. For performance evaluation, we performed intensive experiments with news articles. News articles are clearly categorized based on the theme, whereas the use of vulgar language and slang is smaller than other usual text document. We collected news articles from July 2012 to June 2013. The articles exhibit large variations in terms of the number of types of categories. This is because readers have different levels of interest in each category. Additionally, the result is also attributed to the differences in the frequency of the events in each category. In order to minimize the distortion of the result from the number of articles in different categories, we extracted 3,000 articles equally from each of the eight categories. Therefore, the total number of articles used in our experiments was 24,000. The eight categories were "IT Science," "Economy," "Society," "Life and Culture," "World," "Sports," "Entertainment," and "Politics." By using the news articles that we collected, we calculated the document/category correspondence scores by utilizing topic/category and document/topics correspondence scores. The document/category correspondence score can be said to indicate the degree of correspondence of each document to a certain category. As a result, we could present two additional categories for each of the 23,089 documents. Precision, recall, and F-score were revealed to be 0.605, 0.629, and 0.617 respectively when only the top 1 predicted category was evaluated, whereas they were revealed to be 0.838, 0.290, and 0.431 when the top 1 - 3 predicted categories were considered. It was very interesting to find a large variation between the scores of the eight categories on precision, recall, and F-score.

Analyzing Perceptions of Unused Facilities in Rural Areas Using Big Data Techniques - Focusing on the Utilization of Closed Schools as a Youth Start-up Space - (빅데이터 분석 기법을 활용한 농촌지역 유휴공간 인식 분석 - 청년창업 공간으로써 폐교 활용성을 중심으로 -)

  • Jee Yoon Do;Suyeon Kim
    • Journal of Environmental Impact Assessment
    • /
    • v.32 no.6
    • /
    • pp.556-576
    • /
    • 2023
  • This study attempted to find a way to utilize idle spaces in rural areas as a way to respond to rural extinction. Based on the keywords "startup," "youth start-up," and "youth start-up+rural," start-up+rural," the study sought to identify the perception of idle facilities in rural areas through the keywords "Idle facilities" and "closed schools." The study presented basic data for policy direction and plan search by reviewing frequency analysis, major keyword analysis, network analysis, emotional analysis, and domestic and foreign cases. As a result of the analysis, first, it was found that idle facilities and school closures are acting importantly as factors for regional regeneration. Second, in the case of youth startups in rural areas, it was found that not only education on agriculture but also problems for residence should be solved together. Third, in the case of young people, it was confirmed that it was necessary to establish digital utilization for agriculture by actively starting a business using digital. Finally, in order to attract young people and revitalize the region through best practices at home and abroad, policy measures that can serve as various platforms such as culture and education as well as startups should be presented in connection with local residents. These results are significant in that they presented implications for youth start-ups in rural areas by reviewing start-up recognition for the influx of young people as one of the alternatives for the use of idle facilities and regional regeneration, and if additional solutions are presented through field surveys, they can be used to set policy goals that fit the reality.

Big Data Analysis of Social Media on Gangwon-do Tourism (강원도 관광에 대한 소셜 미디어 빅데이터 분석)

  • JIN, TIANCHENG;Jeong, Eun-Hee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.3
    • /
    • pp.193-200
    • /
    • 2021
  • Recently, posts and opinions on tourist attractions are actively shared on social media. These social big data provide meaningful information to identify objective images of tourist destinations recognized by consumers. Therefore, an in-depth understanding of the tourist image is possible by analyzing these big data on tourism. The study is to analyze destination images in Gangwon-do using big data from social media. It is wanted to understand destination images in Gangwon-do using semantic network analysis and then provided suggestions on how to enhance image to secure differentiated competitiveness as a destination for tourists. According to the frequency analysis results, as tourism in Gangwon-do, Sokcho, Gangneung, and Yangyang were mentioned at a high level in that order, and the purpose of travel was restaurant tour, gourmet food, family trip, vacation, and experience. In particular, it was found that they preferred day trips, weekends, and experiences. Four suggestions were made based on the results. First, it is necessary to develop various types of hotels, accommodation facilities and experience-oriented tour packages. Second, it is necessary to develop a day-to-day travel package that utilizes proximity to the Seoul metropolitan area. Third, it is necessary to promote traditional restaurants and local food. Finally, it is necessary to develop tourist package suitable for healing and family travel. Through this research, the destination image of Gangwon-do was identified and a tourism marketing strategy was presented to improve competitiveness. It also provided a theoretical basis for the use of the big data of tourism consumers in the field of tourism business.

Digital Transformation: Using D.N.A.(Data, Network, AI) Keywords Generalized DMR Analysis (디지털 전환: D.N.A.(Data, Network, AI) 키워드를 활용한 토픽 모델링)

  • An, Sehwan;Ko, Kangwook;Kim, Youngmin
    • Knowledge Management Research
    • /
    • v.23 no.3
    • /
    • pp.129-152
    • /
    • 2022
  • As a key infrastructure for digital transformation, the spread of data, network, artificial intelligence (D.N.A.) fields and the emergence of promising industries are laying the groundwork for active digital innovation throughout the economy. In this study, by applying the text mining methodology, major topics were derived by using the abstract, publication year, and research field of the study corresponding to the SCIE, SSCI, and A&HCI indexes of the WoS database as input variables. First, main keywords were identified through TF and TF-IDF analysis based on word appearance frequency, and then topic modeling was performed using g-DMR. With the advantage of the topic model that can utilize various types of variables as meta information, it was possible to properly explore the meaning beyond simply deriving a topic. According to the analysis results, topics such as business intelligence, manufacturing production systems, service value creation, telemedicine, and digital education were identified as major research topics in digital transformation. To summarize the results of topic modeling, 1) research on business intelligence has been actively conducted in all areas after COVID-19, and 2) issues such as intelligent manufacturing solutions and metaverses have emerged in the manufacturing field. It has been confirmed that the topic of production systems is receiving attention once again. Finally, 3) Although the topic itself can be viewed separately in terms of technology and service, it was found that it is undesirable to interpret it separately because a number of studies comprehensively deal with various services applied by combining the relevant technologies.

Maritime Safety Tribunal Ruling Analysis using SentenceBERT (SentenceBERT 모델을 활용한 해양안전심판 재결서 분석 방법에 대한 연구)

  • Bori Yoon;SeKil Park;Hyerim Bae;Sunghyun Sim
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.29 no.7
    • /
    • pp.843-856
    • /
    • 2023
  • The global surge in maritime traffic has resulted in an increased number of ship collisions, leading to significant economic, environmental, physical, and human damage. The causes of these maritime accidents are multifaceted, often arising from a combination of crew judgment errors, negligence, complexity of navigation routes, weather conditions, and technical deficiencies in the vessels. Given the intricate nuances and contextual information inherent in each incident, a methodology capable of deeply understanding the semantics and context of sentences is imperative. Accordingly, this study utilized the SentenceBERT model to analyze maritime safety tribunal decisions over the last 20 years in the Busan Sea area, which encapsulated data on ship collision incidents. The analysis revealed important keywords potentially responsible for these incidents. Cluster analysis based on the frequency of specific keyword appearances was conducted and visualized. This information can serve as foundational data for the preemptive identification of accident causes and the development of strategies for collision prevention and response.

A Study on Industry-specific Sustainability Strategy: Analyzing ESG Reports and News Articles (산업별 지속가능경영 전략 고찰: ESG 보고서와 뉴스 기사를 중심으로)

  • WonHee Kim;YoungOk Kwon
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.287-316
    • /
    • 2023
  • As global energy crisis and the COVID-19 pandemic have emerged as social issues, there is a growing demand for companies to move away from profit-centric business models and embrace sustainable management that balances environmental, social, and governance (ESG) factors. ESG activities of companies vary across industries, and industry-specific weights are applied in ESG evaluations. Therefore, it is important to develop strategic management approaches that reflect the characteristics of each industry and the importance of each ESG factor. Additionally, with the stance of strengthened focus on ESG disclosures, specific guidelines are needed to identify and report on sustainable management activities of domestic companies. To understand corporate sustainability strategies, analyzing ESG reports and news articles by industry can help identify strategic characteristics in specific industries. However, each company has its own unique strategies and report structures, making it difficult to grasp detailed trends or action items. In our study, we analyzed ESG reports (2019-2021) and news articles (2019-2022) of six companies in the 'Finance,' 'Manufacturing,' and 'IT' sectors to examine the sustainability strategies of leading domestic ESG companies. Text mining techniques such as keyword frequency analysis and topic modeling were applied to identify industry-specific, ESG element-specific management strategies and issues. The analysis revealed that in the 'Finance' sector, customer-centric management strategies and efforts to promote an inclusive culture within and outside the company were prominent. Strategies addressing climate change, such as carbon neutrality and expanding green finance, were also emphasized. In the 'Manufacturing' sector, the focus was on creating sustainable communities through occupational health and safety issues, sustainable supply chain management, low-carbon technology development, and eco-friendly investments to achieve carbon neutrality. In the 'IT' sector, there was a tendency to focus on technological innovation and digital responsibility to enhance social value through technology. Furthermore, the key issues identified in the ESG factors were as follows: under the 'Environmental' element, issues such as greenhouse gas and carbon emission management, industry-specific eco-friendly activities, and green partnerships were identified. Under the 'Social' element, key issues included social contribution activities through stakeholder engagement, supporting the growth and coexistence of members and partner companies, and enhancing customer value through stable service provision. Under the 'Governance' element, key issues were identified as strengthening board independence through the appointment of outside directors, risk management and communication for sustainable growth, and establishing transparent governance structures. The exploration of the relationship between ESG disclosures in reports and ESG issues in news articles revealed that the sustainability strategies disclosed in reports were aligned with the issues related to ESG disclosed in news articles. However, there was a tendency to strengthen ESG activities for prevention and improvement after negative media coverage that could have a negative impact on corporate image. Additionally, environmental issues were mentioned more frequently in news articles compared to ESG reports, with environmental-related keywords being emphasized in the 'Finance' sector in the reports. Thus, ESG reports and news articles shared some similarities in content due to the sharing of information sources. However, the impact of media coverage influenced the emphasis on specific sustainability strategies, and the extent of mentioning environmental issues varied across documents. Based on our study, the following contributions were derived. From a practical perspective, companies need to consider their characteristics and establish sustainability strategies that align with their capabilities and situations. From an academic perspective, unlike previous studies on ESG strategies, we present a subdivided methodology through analysis considering the industry-specific characteristics of companies.

Export Control System based on Case Based Reasoning: Design and Evaluation (사례 기반 지능형 수출통제 시스템 : 설계와 평가)

  • Hong, Woneui;Kim, Uihyun;Cho, Sinhee;Kim, Sansung;Yi, Mun Yong;Shin, Donghoon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.109-131
    • /
    • 2014
  • As the demand of nuclear power plant equipment is continuously growing worldwide, the importance of handling nuclear strategic materials is also increasing. While the number of cases submitted for the exports of nuclear-power commodity and technology is dramatically increasing, preadjudication (or prescreening to be simple) of strategic materials has been done so far by experts of a long-time experience and extensive field knowledge. However, there is severe shortage of experts in this domain, not to mention that it takes a long time to develop an expert. Because human experts must manually evaluate all the documents submitted for export permission, the current practice of nuclear material export is neither time-efficient nor cost-effective. Toward alleviating the problem of relying on costly human experts only, our research proposes a new system designed to help field experts make their decisions more effectively and efficiently. The proposed system is built upon case-based reasoning, which in essence extracts key features from the existing cases, compares the features with the features of a new case, and derives a solution for the new case by referencing similar cases and their solutions. Our research proposes a framework of case-based reasoning system, designs a case-based reasoning system for the control of nuclear material exports, and evaluates the performance of alternative keyword extraction methods (full automatic, full manual, and semi-automatic). A keyword extraction method is an essential component of the case-based reasoning system as it is used to extract key features of the cases. The full automatic method was conducted using TF-IDF, which is a widely used de facto standard method for representative keyword extraction in text mining. TF (Term Frequency) is based on the frequency count of the term within a document, showing how important the term is within a document while IDF (Inverted Document Frequency) is based on the infrequency of the term within a document set, showing how uniquely the term represents the document. The results show that the semi-automatic approach, which is based on the collaboration of machine and human, is the most effective solution regardless of whether the human is a field expert or a student who majors in nuclear engineering. Moreover, we propose a new approach of computing nuclear document similarity along with a new framework of document analysis. The proposed algorithm of nuclear document similarity considers both document-to-document similarity (${\alpha}$) and document-to-nuclear system similarity (${\beta}$), in order to derive the final score (${\gamma}$) for the decision of whether the presented case is of strategic material or not. The final score (${\gamma}$) represents a document similarity between the past cases and the new case. The score is induced by not only exploiting conventional TF-IDF, but utilizing a nuclear system similarity score, which takes the context of nuclear system domain into account. Finally, the system retrieves top-3 documents stored in the case base that are considered as the most similar cases with regard to the new case, and provides them with the degree of credibility. With this final score and the credibility score, it becomes easier for a user to see which documents in the case base are more worthy of looking up so that the user can make a proper decision with relatively lower cost. The evaluation of the system has been conducted by developing a prototype and testing with field data. The system workflows and outcomes have been verified by the field experts. This research is expected to contribute the growth of knowledge service industry by proposing a new system that can effectively reduce the burden of relying on costly human experts for the export control of nuclear materials and that can be considered as a meaningful example of knowledge service application.

The sexual awareness and sexual behaviour of high school students (고등학생의 성 의식과 성 행동에 관한 연구)

  • Oh, Hyun-Mee;Park, Young-Soo
    • The Journal of Korean Society for School & Community Health Education
    • /
    • v.2 no.2
    • /
    • pp.89-99
    • /
    • 2001
  • This paper is based on the assumption that the sexual awareness and sexual behaviour of high school students would show the difference between an academic high school and a vocational high school. The main purpose of this paper is to provide basic information on establishing the direction of a realistic and efficient education, which leads to the desirable sex ethics eventually. To do this, a comparative study was carried out to figure out the difference between academic high school students and vocational high school students in terms of their actual awareness and behaviour of sex. I put the following two questions in order to achieve the goal of this study. First, what is the difference in sexual awareness between academic high school students and vocational high school students? Second, what is the difference in sexual behaviour between academic high school students and vocational high school students? The subjects of the survey conducted were 595 high school students in Kyunggi Province and I made the questionnaires referencing pre-studies. The SPSS program was used to get a frequency and a percentage from the results of the survey and then, by applying t-text, $x^2$ verification and interrelation, the following results were obtained. First, regarding sexual awareness, there wasn't much difference in their idea of keeping virginity before marriage between academic high school students and vocational high school students. From the results obtained by analyzing the interrelation between students and their parents and friends in terms of a will to keep virginity, it is found that there was a relevance in both academic high school and vocational high school. When they have a sex problem, it was friends that they are looking for consulting and both groups showed thesame result. But pertaining to an experience of a sex education, the comparative analysis indicated an meaningful difference. Second, as for sex behaviour, it was shown that there was a difference in the experience of going out with the other sex between academic high school students and vocational high school students as well as in the degree of physical touch. However, not much difference was shown in controlling a sex desire between the two groups. As a results of the comparative analysis of the sex experience between the two groups, there was a meaningful difference. In terms of the object of their sex experience, the majority of students in the two groups chose a friend as their first answer and there was little difference. From these results I can draw some conclusions that most of the students in both groups have a link with friends and parents in keeping their virginity. Furthermore, a meaningful difference in experiencing a sex education is presented between the two groups. With regard to the sex behaviour of high school students, a meaningful difference is shown in dating the other sex, a physical touch and a sex experience between the two groups. Consequently, we realize that there is a meaningful difference in some variants on the sexual awareness and sexual behaviour of high school students between academic high school and vocational high school.

  • PDF

Analysis of Multi-Cultural Education Contents of 'Human Development and Family' Area of Middle School Technology-Home Economics Textbooks in Accordance with the 2015 Revised Curriculum (2015 개정 교육과정 중학교 기술·가정 교과서 '인간발달과 가족' 영역에 반영된 다문화교육 내용 분석)

  • Lee, Seong Min;Yu, Nan Sook
    • Journal of Korean Home Economics Education Association
    • /
    • v.31 no.2
    • /
    • pp.79-94
    • /
    • 2019
  • The purpose of this research study was to examine how multi-cultural education contents are reflected in the contents related to the 'Human Development and Family' area of middle school Technology-Home Economics textbooks, based on the multi-cultural education criteria developed by Paek(2014). The sample of the study consisted of the 'Human Development and Family' contents in 'Technology-Home Economics 1·2' textbooks from five publishers (10 volumes) written in accordance with the 2015 revised curriculum. The research results of the study can be summarized as follows: First, all the four multi-cultural education criterion components were shown in the textbooks from five publishers. Cooperation was shown in the highest frequency, followed by Identity, Diversity, and Equality. Second, the analyses of text, image, and learning activities in the textbooks confirmed that Home Economics as a subject contains multi-cultural education contents that allow students to establish self and identity, respect others, accept diverse family patterns, understand family changes from multiple viewpoints, and form family community culture from gender equality perspective without prejudice. Ultimately, it can be concluded that Home Economics is a subject that enables learners to learn how individuals and families can cooperate and care for each other in order to harmoniously coexist.

A Study on Marine Accident Ontology Development and Data Management: Based on a Situation Report Analysis of Southwest Coast Marine Accidents in Korea (해양사고 온톨로지 구축 및 데이터 관리방안 연구: 서해남부해역 선박사고 상황보고서 분석을 중심으로)

  • Lee, Young Jai;Kang, Seong Kyung;Gu, Ja-Yeong
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.25 no.4
    • /
    • pp.423-432
    • /
    • 2019
  • Along with an increase in marine activities every year, the frequency of marine accidents is on the rise. Accordingly, various research activities and policies for marine safety are being implemented. Despite these efforts, the number of accidents are increasing every year, bringing their effectiveness into question. Preliminary studies relying on annual statistical reports provide precautionary measures for items that stand out significantly, through the comparison of statistical provision items. Since the 2000s, large-scale marine accidents have repeatedly occurred, and case studies have examined the "accident response." Likewise, annual statistics or accident cases are used as core data in policy formulation for domestic maritime safety. However, they are just a summary of post-accident results. In this study, limitations of current marine research and policy are evaluated through a literature review of case studies and analyses of marine accidents. In addition, the ontology of the marine accident information classification system will be revised to improve the current limited usage of the information through an attribute analysis of boating accident status reports and text mining. These aspects consist of the reporter, the report method, the rescue organization, corrective measures, vulnerability of response, payloads, cause of oil spill, damage pattern, and the result of an accident response. These can be used consistently in the future as classified standard terms to collect and utilize information more efficiently. Moreover, the research proposes a data collection and quality assurance method for the practical use of ontology. A clear understanding of the problems presently faced in marine safety will allow "suf icient quality information" to be leveraged for the purpose of conducting various researches and realizing effective policies.