• Title/Summary/Keyword: 텍스트 연구

Search Result 3,495, Processing Time 0.03 seconds

Analysis of the Study Trend of Glass Ceiling by Period Using Text Mining (텍스트 마이닝을 이용한 시대별 유리천장 연구동향 분석)

  • Kim, Young-Man;Lee, Jin Gu
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.8
    • /
    • pp.376-387
    • /
    • 2021
  • This study is to analyze the research trends related to the 'glass ceiling' phenomenon using big data analysis methods and to suggest social implications. To analyze the research trends of 'glass ceiling', the historical event that broke the 'glass ceiling' was set as an important issue, and keywords were collected by dividing park's term into three. Before, throughout and after, her term. As a result of frequency analysis, research was conducted based on 'public servants' which was selected as the main keyword in the first period, while 'women's work family compatibility' was chosen as the main keyword group in the second period. In the third period, keywords for women's occupational groups were being diversified. As a result of applying CONCOR techniques to make the studied main topics grouped, we were able to confirm that the main issues were the differentiating factors, the customary gender discrimination culture, the jobs aimed for studying, the work-family balance, the glass ceiling and the organizational performance adjustment factors, the public sector, organizational performance, and the private sector. Besides work-family compatibility support system, it was suggested as a social implication that research on improving the system to resolve the glass ceiling factor and to expand the target jobs to give solutions to real-life issues were needed, and also suggested that research on the 'glass ceiling' which the general public perceives through social medias or articles in the news, was needed in the future.

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

Topic Modeling of News Article about International Construction Market Using Latent Dirichlet Allocation (Latent Dirichlet Allocation 기법을 활용한 해외건설시장 뉴스기사의 토픽 모델링(Topic Modeling))

  • Moon, Seonghyeon;Chung, Sehwan;Chi, Seokho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.4
    • /
    • pp.595-599
    • /
    • 2018
  • Sufficient understanding of oversea construction market status is crucial to get profitability in the international construction project. Plenty of researchers have been considering the news article as a fine data source for figuring out the market condition, since the data includes market information such as political, economic, and social issue. Since the text data exists in unstructured format with huge size, various text-mining techniques were studied to reduce the unnecessary manpower, time, and cost to summarize the data. However, there are some limitations to extract the needed information from the news article because of the existence of various topics in the data. This research is aimed to overcome the problems and contribute to summarization of market status by performing topic modeling with Latent Dirichlet Allocation. With assuming that 10 topics existed in the corpus, the topics included projects for user convenience (topic-2), private supports to solve poverty problems in Africa (topic-4), and so on. By grouping the topics in the news articles, the results could improve extracting useful information and summarizing the market status.

Automatic Text Categorization Using Passage-based Weight Function and Passage Type (문단 단위 가중치 함수와 문단 타입을 이용한 문서 범주화)

  • Joo, Won-Kyun;Kim, Jin-Suk;Choi, Ki-Seok
    • The KIPS Transactions:PartB
    • /
    • v.12B no.6 s.102
    • /
    • pp.703-714
    • /
    • 2005
  • Researches in text categorization have been confined to whole-document-level classification, probably due to lacks of full-text test collections. However, full-length documents availably today in large quantities pose renewed interests in text classification. A document is usually written in an organized structure to present its main topic(s). This structure can be expressed as a sequence of sub-topic text blocks, or passages. In order to reflect the sub-topic structure of a document, we propose a new passage-level or passage-based text categorization model, which segments a test document into several Passages, assigns categories to each passage, and merges passage categories to document categories. Compared with traditional document-level categorization, two additional steps, passage splitting and category merging, are required in this model. By using four subsets of Routers text categorization test collection and a full-text test collection of which documents are varying from tens of kilobytes to hundreds, we evaluated the proposed model, especially the effectiveness of various passage types and the importance of passage location in category merging. Our results show simple windows are best for all test collections tested in these experiments. We also found that passages have different degrees of contribution to main topic(s), depending on their location in the test document.

The Expressive of <The Emperor and the Assassin>'s Comic Image from the Perspective of Narrative (서사적 관점에서 본 만화 <형가자진왕>의 도상 표현)

  • Jo, Jeong-Rae
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.2
    • /
    • pp.84-93
    • /
    • 2014
  • Comics uses words to convey its content and meaning, while the comic image conveys the content as a narrative function to represent how language is combined with the text. This paper makes a comparison and analysis of the comics of Japan and South Korea, in terms of words and image expression, narrative techniques, and the way of communication, to study the characteristics of image narrative. The comic image of Jing ke is the other as a flow of narrative and getting rid of the current screen, to resonate with the readers. Go U-yeong's comics and Sumeragi Natsuki's set up a virtual narrative time and space through the line, surface, space and shade, to realize the reproduction of unhistorical facts and the significance of narrative with the artist's imagination. Sumeragi Natsuki's comics uses historical facts to represent exquisite narrative like still-life paintings. She focuses on the description of the objective facts of history, to seek the sensitive comic image beyond reality. The image narration of Go U-yeong's comics is a clash between his historical narrative among the subjective romantic image and the readers' awareness narrative flow that they insist inside. Therefore, he tries to keep balance. The instant image in his comics is not a reproduction of the historical real moment, but a reproduction image of the reality reconstructed by his own pursuit of narrative.

A Study on Questionnaire Improvement using Text Mining (텍스트 마이닝 기법을 활용한 설문 문항 개선에 관한 연구)

  • Paek, Yun-Ji;Jung, Chang-Hyun
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.26 no.2
    • /
    • pp.121-128
    • /
    • 2020
  • The Marine Safety Culture Index (MSCI) was developed in the year 2018 for objectively assessing the public safety culture levels and for incorporating it as data to spread knowledge regarding the marine safety culture. The method for calculating the safety culture index should include issues that may affect the safety culture and should consist of appropriate attributes for estimating the current status. In addition, continuous verification and supplementation are required for addressing social and economic changes. In this study, to determine whether the questionnaire designed by marine experts reflects the people's interests and needs, we analyzed 915 marine safety proposals. Text mining was employed for analyzing the unstructured data of the marine safety proposals, and network analysis and topic modeling were subsequently performed. Analysis of the marine safety proposals was centered on attributes such as education, public relations, safety rules, awareness, skilled workers, and systems. Eighteen questions were modified and supplemented for reflecting the marine safety proposals, and reliability of the revised questions was analyzed. Furthermore, compared to the previous year, the questionnaire's internal consistency was improved upon and was rated at a high value of 0.895. It is expected that by employing the derived marine safety culture index and incorporating the improved questionnaire that reflects the requirements of marine experts and the people, the improved questionnaire will contribute to the establishment of policies for spreading knowledge regarding the marine safety culture.

Citizen Sentiment Analysis of the Social Disaster by Using Opinion Mining (오피니언 마이닝 기법을 이용한 사회적 재난의 시민 감성도 분석)

  • Seo, Min Song;Yoo, Hwan Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.1
    • /
    • pp.37-46
    • /
    • 2017
  • Recently, disaster caused by social factors is frequently occurring in Korea. Prediction about what crisis could happen is difficult, raising the citizen's concern. In this study, we developed a program to acquire tweet data by applying Python language based Tweepy plug-in, regarding social disasters such as 'Nonspecific motive crimes' and 'Oxy' products. These data were used to evaluate psychological trauma and anxiety of citizens through the text clustering analysis and the opinion mining analysis of the R Studio program after natural language processing. In the analysis of the 'Oxy' case, the accident of Sewol ferry, the continual sale of Oxy products of the Oxy had the highest similarity and 'Nonspecific motive crimes', the coping measures of the government against unexpected incidents such as the 'incident' of the screen door, the accident of Sewol ferry and 'Nonspecific motive crime' due to misogyny in Busan, had the highest similarity. In addition, the average index of the Citizens sentiment score in Nonspecific motive crimes was more negative than that in the Oxy case by 11.61%p. Therefore, it is expected that the findings will be utilized to predict the mental health of citizens to prevent future accidents.

Research on text mining based malware analysis technology using string information (문자열 정보를 활용한 텍스트 마이닝 기반 악성코드 분석 기술 연구)

  • Ha, Ji-hee;Lee, Tae-jin
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.45-55
    • /
    • 2020
  • Due to the development of information and communication technology, the number of new / variant malicious codes is increasing rapidly every year, and various types of malicious codes are spreading due to the development of Internet of things and cloud computing technology. In this paper, we propose a malware analysis method based on string information that can be used regardless of operating system environment and represents library call information related to malicious behavior. Attackers can easily create malware using existing code or by using automated authoring tools, and the generated malware operates in a similar way to existing malware. Since most of the strings that can be extracted from malicious code are composed of information closely related to malicious behavior, it is processed by weighting data features using text mining based method to extract them as effective features for malware analysis. Based on the processed data, a model is constructed using various machine learning algorithms to perform experiments on detection of malicious status and classification of malicious groups. Data has been compared and verified against all files used on Windows and Linux operating systems. The accuracy of malicious detection is about 93.5%, the accuracy of group classification is about 90%. The proposed technique has a wide range of applications because it is relatively simple, fast, and operating system independent as a single model because it is not necessary to build a model for each group when classifying malicious groups. In addition, since the string information is extracted through static analysis, it can be processed faster than the analysis method that directly executes the code.

Study on the textuality of Haedongyeongeon[해동영언] in Mansebo[만세보] ("만세보(萬歲報)" 소재(所載) <해동영언(海東永言)>의 텍스트성 연구)

  • Lee, Sang-Won
    • Sijohaknonchong
    • /
    • v.25
    • /
    • pp.211-237
    • /
    • 2006
  • Mansebo[만세보] contains a total of 111 old shijos under the title of Haedongyeongeon[해동영언]. This dissertation presumes Haedongyeongeon[해동영언] as early 20th century shijo text and surveys its literary characteristic and its significance in relation with anthological compilation. Haedongyeongeon can be seen as both newspaper serials and a short anthology. The basic pattern of the serials shows an organization of 'title. musical designation, author. text. and a brief review. Of these, the review is what most clearly shows the characteristic of the serials. The review is written in Chinese followed by Korean letters to designate the sound of the Chinese. which is presumably designed to attract more readers for the newspaper. On the other hand, Haedongyeongeon[해동영언], when seen as a collection of works printed in serials, clearly shows an intention of compiling an anthology, particularly in its way of overall classification of works or arranging works according to their authors, and thus may well be defined as a short anthology. This anthology somewhat excessively pursues perfection in formality, and is characterized by its strong intent to be read as popular literature, and therefore could be said to manifest the general characteristic of 20th century anthologies. The planner of the serial Haedongyeongeon[해동영언], or the compiler of the anthology is thought to be one of the core figures of Mansebo[만세보], that is, O Sechang[오세창], Lee Injik[이인직], Choi Yeongnyeon[최영년], Shin Gwanghui[신광희], but of them all, considering all circumstances, Choi Yeongnyeon[최영년] is most likely to be the one. Lastly, it is presently unknown what anthology was used as the basis of Haedongyeongeon[해동영언] and accordingly any judgement on that head has been deferred.

  • PDF

Bio-Sensing Convergence Big Data Computing Architecture (바이오센싱 융합 빅데이터 컴퓨팅 아키텍처)

  • Ko, Myung-Sook;Lee, Tae-Gyu
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.2
    • /
    • pp.43-50
    • /
    • 2018
  • Biometric information computing is greatly influencing both a computing system and Big-data system based on the bio-information system that combines bio-signal sensors and bio-information processing. Unlike conventional data formats such as text, images, and videos, biometric information is represented by text-based values that give meaning to a bio-signal, important event moments are stored in an image format, a complex data format such as a video format is constructed for data prediction and analysis through time series analysis. Such a complex data structure may be separately requested by text, image, video format depending on characteristics of data required by individual biometric information application services, or may request complex data formats simultaneously depending on the situation. Since previous bio-information processing computing systems depend on conventional computing component, computing structure, and data processing method, they have many inefficiencies in terms of data processing performance, transmission capability, storage efficiency, and system safety. In this study, we propose an improved biosensing converged big data computing architecture to build a platform that supports biometric information processing computing effectively. The proposed architecture effectively supports data storage and transmission efficiency, computing performance, and system stability. And, it can lay the foundation for system implementation and biometric information service optimization optimized for future biometric information computing.