• Title/Summary/Keyword: 형태소

Search Result 811, Processing Time 0.024 seconds

Analysis of Keywords in national river occupancy permits by region using text mining and network theory (텍스트 마이닝과 네트워크 이론을 활용한 권역별 국가하천 점용허가 키워드 분석)

  • Seong Yun Jeong
    • Smart Media Journal
    • /
    • v.12 no.11
    • /
    • pp.185-197
    • /
    • 2023
  • This study was conducted using text mining and network theory to extract useful information for application for occupancy and performance of permit tasks contained in the permit contents from the permit register, which is used only for the simple purpose of recording occupancy permit information. Based on text mining, we analyzed and compared the frequency of vocabulary occurrence and topic modeling in five regions, including Seoul, Gyeonggi, Gyeongsang, Jeolla, Chungcheong, and Gangwon, as well as normalization processes such as stopword removal and morpheme analysis. By applying four types of centrality algorithms, including stage, proximity, mediation, and eigenvector, which are widely used in network theory, we looked at keywords that are in a central position or act as an intermediary in the network. Through a comprehensive analysis of vocabulary appearance frequency, topic modeling, and network centrality, it was found that the 'installation' keyword was the most influential in all regions. This is believed to be the result of the Ministry of Environment's permit management office issuing many permits for constructing facilities or installing structures. In addition, it was found that keywords related to road facilities, flood control facilities, underground facilities, power/communication facilities, sports/park facilities, etc. were at a central position or played a role as an intermediary in topic modeling and networks. Most of the keywords appeared to have a Zipf's law statistical distribution with low frequency of occurrence and low distribution ratio.

What Concerns Does ChatGPT Raise for Us?: An Analysis Centered on CTM (Correlated Topic Modeling) of YouTube Video News Comments (ChatGPT는 우리에게 어떤 우려를 초래하는가?: 유튜브 영상 뉴스 댓글의 CTM(Correlated Topic Modeling) 분석을 중심으로)

  • Song, Minho;Lee, Soobum
    • Informatization Policy
    • /
    • v.31 no.1
    • /
    • pp.3-31
    • /
    • 2024
  • This study aimed to examine public concerns in South Korea considering the country's unique context, triggered by the advent of generative artificial intelligence such as ChatGPT. To achieve this, comments from 102 YouTube video news related to ethical issues were collected using a Python scraper, and morphological analysis and preprocessing were carried out using Textom on 15,735 comments. These comments were then analyzed using a Correlated Topic Model (CTM). The analysis identified six primary topics within the comments: "Legal and Ethical Considerations"; "Intellectual Property and Technology"; "Technological Advancement and the Future of Humanity"; "Potential of AI in Information Processing"; "Emotional Intelligence and Ethical Regulations in AI"; and "Human Imitation."Structuring these topics based on a correlation coefficient value of over 10% revealed 3 main categories: "Legal and Ethical Considerations"; "Issues Related to Data Generation by ChatGPT (Intellectual Property and Technology, Potential of AI in Information Processing, and Human Imitation)"; and "Fear for the Future of Humanity (Technological Advancement and the Future of Humanity, Emotional Intelligence, and Ethical Regulations in AI)."The study confirmed the coexistence of various concerns along with the growing interest in generative AI like ChatGPT, including worries specific to the historical and social context of South Korea. These findings suggest the need for national-level efforts to ensure data fairness.

Design and Implementation of Game Server using the Efficient Load Balancing Technology based on CPU Utilization (게임서버의 CPU 사용율 기반 효율적인 부하균등화 기술의 설계 및 구현)

  • Myung, Won-Shig;Han, Jun-Tak
    • Journal of Korea Game Society
    • /
    • v.4 no.4
    • /
    • pp.11-18
    • /
    • 2004
  • The on-line games in the past were played by only two persons exchanging data based on one-to-one connections, whereas recent ones (e.g. MMORPG: Massively Multi-player Online Role-playings Game) enable tens of thousands of people to be connected simultaneously. Specifically, Korea has established an excellent network infrastructure that can't be found anywhere in the world. Almost every household has a high-speed Internet access. What made this possible was, in part, high density of population that has accelerated the formation of good Internet infrastructure. However, this rapid increase in the use of on-line games may lead to surging traffics exceeding the limited Internet communication capacity so that the connection to the games is unstable or the server fails. expanding the servers though this measure is very costly could solve this problem. To deal with this problem, the present study proposes the load distribution technology that connects in the form of local clustering the game servers divided by their contents used in each on-line game reduces the loads of specific servers using the load balancer, and enhances performance of sewer for their efficient operation. In this paper, a cluster system is proposed where each Game server in the system has different contents service and loads are distributed efficiently using the game server resource information such as CPU utilization. Game sewers having different contents are mutually connected and managed with a network file system to maintain information consistency required to support resource information updates, deletions, and additions. Simulation studies show that our method performs better than other traditional methods. In terms of response time, our method shows shorter latency than RR (Round Robin) and LC (Least Connection) by about 12%, 10% respectively.

  • PDF

Measuring the Economic Impact of Item Descriptions on Sales Performance (온라인 상품 판매 성과에 영향을 미치는 상품 소개글 효과 측정 기법)

  • Lee, Dongwon;Park, Sung-Hyuk;Moon, Songchun
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.1-17
    • /
    • 2012
  • Personalized smart devices such as smartphones and smart pads are widely used. Unlike traditional feature phones, theses smart devices allow users to choose a variety of functions, which support not only daily experiences but also business operations. Actually, there exist a huge number of applications accessible by smart device users in online and mobile application markets. Users can choose apps that fit their own tastes and needs, which is impossible for conventional phone users. With the increase in app demand, the tastes and needs of app users are becoming more diverse. To meet these requirements, numerous apps with diverse functions are being released on the market, which leads to fierce competition. Unlike offline markets, online markets have a limitation in that purchasing decisions should be made without experiencing the items. Therefore, online customers rely more on item-related information that can be seen on the item page in which online markets commonly provide details about each item. Customers can feel confident about the quality of an item through the online information and decide whether to purchase it. The same is true of online app markets. To win the sales competition against other apps that perform similar functions, app developers need to focus on writing app descriptions to attract the attention of customers. If we can measure the effect of app descriptions on sales without regard to the app's price and quality, app descriptions that facilitate the sale of apps can be identified. This study intends to provide such a quantitative result for app developers who want to promote the sales of their apps. For this purpose, we collected app details including the descriptions written in Korean from one of the largest app markets in Korea, and then extracted keywords from the descriptions. Next, the impact of the keywords on sales performance was measured through our econometric model. Through this analysis, we were able to analyze the impact of each keyword itself, apart from that of the design or quality. The keywords, comprised of the attribute and evaluation of each app, are extracted by a morpheme analyzer. Our model with the keywords as its input variables was established to analyze their impact on sales performance. A regression analysis was conducted for each category in which apps are included. This analysis was required because we found the keywords, which are emphasized in app descriptions, different category-by-category. The analysis conducted not only for free apps but also for paid apps showed which keywords have more impact on sales performance for each type of app. In the analysis of paid apps in the education category, keywords such as 'search+easy' and 'words+abundant' showed higher effectiveness. In the same category, free apps whose keywords emphasize the quality of apps showed higher sales performance. One interesting fact is that keywords describing not only the app but also the need for the app have asignificant impact. Language learning apps, regardless of whether they are sold free or paid, showed higher sales performance by including the keywords 'foreign language study+important'. This result shows that motivation for the purchase affected sales. While item reviews are widely researched in online markets, item descriptions are not very actively studied. In the case of the mobile app markets, newly introduced apps may not have many item reviews because of the low quantity sold. In such cases, item descriptions can be regarded more important when customers make a decision about purchasing items. This study is the first trial to quantitatively analyze the relationship between an item description and its impact on sales performance. The results show that our research framework successfully provides a list of the most effective sales key terms with the estimates of their effectiveness. Although this study is performed for a specified type of item (i.e., mobile apps), our model can be applied to almost all of the items traded in online markets.

STANDARDIZATION OF WORD/NONWORD READING TEST AND LETTER-SYMBOL DISCRIMINATION TASK FOR THE DIAGNOSIS OF DEVELOPMENTAL READING DISABILITY (발달성 읽기 장애 진단을 위한 단어/비단어 읽기 검사와 글자기호감별검사의 표준화 연구)

  • Cho, Soo-Churl;Lee, Jung-Bun;Chungh, Dong-Seon;Shin, Sung-Woong
    • Journal of the Korean Academy of Child and Adolescent Psychiatry
    • /
    • v.14 no.1
    • /
    • pp.81-94
    • /
    • 2003
  • Objectives:Developmental reading disorder is a condition which manifests significant developmenttal delay in reading ability or persistent errors. About 3-7% of school-age children have this condition. The purpose of the present study was to validate the diagnostic values of Word/Nonword Reading Test and Letter-Symbol Discrimination Task for the purpose of overcoming the caveats of Basic Learning Skills Test. Methods:Sixty-three reading-disordered patients(mean age 10.48 years old) and sex, age-matched 77 normal children(mean age 10.33 years old) were selected by clinical evaluation and DSM-IV criteria. Reading I and II of Basic Learning Skills Test, Word/Nonword Reading Test, and Letter-Symbol Discrimination Task were carried out to them. Word/Nonword Reading Test:One hundred usual highfrequency words and one hundred meaningless nonwords were presented to the subjects within 1.2 and 2.4 seconds, respectively. Through these results, automatized phonological processing ability and conscious letter-sound matching ability were estimated. Letter-Symbol Discrimination Task:mirror image letters which reading-disordered patients are apt to confuse were used. Reliability, concurrent validity, construct validity, and discriminant validity tests were conducted. Results:Word/Nonword Reading Test:the reliability(alpha) was 0.96, and concurrent validity with Basic Learning Skills test was 0.94. The patients with developmental reading disorders differed significantly from normal children in Word/Nonword Reading Test performances. Through discriminant analysis, 83.0% of original cases were correctly classified by this test. Letter-Symbol Discrimination Task:the reliability(alpha) was 0.86, and concurrent validity with Basic Learning Skills test was 0.86. There were significant differences in scores between the patients and normal children. Factor analysis revealed that this test were composed of saccadic mirror image processing, global accuracy, mirror image processing deficit, static image processing, global vigilance deficit, and inattention-impulsivity factors. By discriminant analysis, 87.3% of the patients and normal children were correctly classified. Conclusion:The patients with developmental reading disorders had deficits in automatized visuallexical route, morpheme-phoneme conversion mechanism, and visual information processing. These deficits were reliably and validly evaluated by Word/Nonword Reading Test and Letter-Symbol Discrimination Task.

  • PDF

Bankruptcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics (빅데이터 기반의 정성 정보를 활용한 부도 예측 모형 구축)

  • Jo, Nam-ok;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.33-56
    • /
    • 2016
  • Many researchers have focused on developing bankruptcy prediction models using modeling techniques, such as statistical methods including multiple discriminant analysis (MDA) and logit analysis or artificial intelligence techniques containing artificial neural networks (ANN), decision trees, and support vector machines (SVM), to secure enhanced performance. Most of the bankruptcy prediction models in academic studies have used financial ratios as main input variables. The bankruptcy of firms is associated with firm's financial states and the external economic situation. However, the inclusion of qualitative information, such as the economic atmosphere, has not been actively discussed despite the fact that exploiting only financial ratios has some drawbacks. Accounting information, such as financial ratios, is based on past data, and it is usually determined one year before bankruptcy. Thus, a time lag exists between the point of closing financial statements and the point of credit evaluation. In addition, financial ratios do not contain environmental factors, such as external economic situations. Therefore, using only financial ratios may be insufficient in constructing a bankruptcy prediction model, because they essentially reflect past corporate internal accounting information while neglecting recent information. Thus, qualitative information must be added to the conventional bankruptcy prediction model to supplement accounting information. Due to the lack of an analytic mechanism for obtaining and processing qualitative information from various information sources, previous studies have only used qualitative information. However, recently, big data analytics, such as text mining techniques, have been drawing much attention in academia and industry, with an increasing amount of unstructured text data available on the web. A few previous studies have sought to adopt big data analytics in business prediction modeling. Nevertheless, the use of qualitative information on the web for business prediction modeling is still deemed to be in the primary stage, restricted to limited applications, such as stock prediction and movie revenue prediction applications. Thus, it is necessary to apply big data analytics techniques, such as text mining, to various business prediction problems, including credit risk evaluation. Analytic methods are required for processing qualitative information represented in unstructured text form due to the complexity of managing and processing unstructured text data. This study proposes a bankruptcy prediction model for Korean small- and medium-sized construction firms using both quantitative information, such as financial ratios, and qualitative information acquired from economic news articles. The performance of the proposed method depends on how well information types are transformed from qualitative into quantitative information that is suitable for incorporating into the bankruptcy prediction model. We employ big data analytics techniques, especially text mining, as a mechanism for processing qualitative information. The sentiment index is provided at the industry level by extracting from a large amount of text data to quantify the external economic atmosphere represented in the media. The proposed method involves keyword-based sentiment analysis using a domain-specific sentiment lexicon to extract sentiment from economic news articles. The generated sentiment lexicon is designed to represent sentiment for the construction business by considering the relationship between the occurring term and the actual situation with respect to the economic condition of the industry rather than the inherent semantics of the term. The experimental results proved that incorporating qualitative information based on big data analytics into the traditional bankruptcy prediction model based on accounting information is effective for enhancing the predictive performance. The sentiment variable extracted from economic news articles had an impact on corporate bankruptcy. In particular, a negative sentiment variable improved the accuracy of corporate bankruptcy prediction because the corporate bankruptcy of construction firms is sensitive to poor economic conditions. The bankruptcy prediction model using qualitative information based on big data analytics contributes to the field, in that it reflects not only relatively recent information but also environmental factors, such as external economic conditions.

The Analysis of the Visitors' Experiences in Yeonnam-dong before and after the Gyeongui Line Park Project - A Text Mining Approach - (경의선숲길 조성 전후의 연남동 방문자의 경험 분석 - 블로그 텍스트 분석을 중심으로 -)

  • Kim, Sae-Ryung;Choi, Yunwon;Yoon, Heeyeun
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.47 no.4
    • /
    • pp.33-49
    • /
    • 2019
  • The purpose of this study was to investigate the changes in the experiences of visitors of Yeonnam-dong during the period covering the development of a linear park, the Gyeongui Line Park. This study used a text mining technique to analyze Naver Blog postings of those who visited Yeonnam-dong from June 2013 to May 2017, divided into four periods -from June 2013 to May 2014, from June 2014 to May 2015, from June 2015 to May 2016 and from June 2016 to May 2017. The keywords used were 'Yeonnam-dong', 'Gyeongui Line' and 'Yeontral Park' and the data was further refined and resampled. A semantic network analysis was conducted on the basis of the co-occurrences of words. The results of the study were as follows. During the entire period, the main experience of visitors to Yeonnam-dong was 'food culture' consistently, but the activities related to 'market', 'browsing', and 'buy' increased. Also, activities such as 'walk', 'play' and 'rest' in the park newly appeared after the construction of the park. Moreover, more diverse opinions about the Yeonnam-dong were expressed on the blog, and Yeonnam-dong began to be recognized as a place where a variety of activities can be enjoyed. Lastly, when the visitors wrote about the theme 'food culture', the scope of the keywords expanded from simple ones, such as 'eat', 'photograph' and 'chatting' to 'market', 'browsing', and 'walk'. The sub-themes that appeared with the park also expanded to various topics with the emergence of the Gyeongui Line Book Street. This study analyzed the change of experiences of visitors objectively with text mining, a quantitative methodology. Due to the nature of text mining, however, the subjective opinions inevitably have been involved in the process of refining. Also, further research is required to assess the direct relationship between these changes and park construction.

Analysis of Research Trends of 'Word of Mouth (WoM)' through Main Path and Word Co-occurrence Network (주경로 분석과 연관어 네트워크 분석을 통한 '구전(WoM)' 관련 연구동향 분석)

  • Shin, Hyunbo;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.179-200
    • /
    • 2019
  • Word-of-mouth (WoM) is defined by consumer activities that share information concerning consumption. WoM activities have long been recognized as important in corporate marketing processes and have received much attention, especially in the marketing field. Recently, according to the development of the Internet, the way in which people exchange information in online news and online communities has been expanded, and WoM is diversified in terms of word of mouth, score, rating, and liking. Social media makes online users easy access to information and online WoM is considered a key source of information. Although various studies on WoM have been preceded by this phenomenon, there is no meta-analysis study that comprehensively analyzes them. This study proposed a method to extract major researches by applying text mining techniques and to grasp the main issues of researches in order to find the trend of WoM research using scholarly big data. To this end, a total of 4389 documents were collected by the keyword 'Word-of-mouth' from 1941 to 2018 in Scopus (www.scopus.com), a citation database, and the data were refined through preprocessing such as English morphological analysis, stopwords removal, and noun extraction. To carry out this study, we adopted main path analysis (MPA) and word co-occurrence network analysis. MPA detects key researches and is used to track the development trajectory of academic field, and presents the research trend from a macro perspective. For this, we constructed a citation network based on the collected data. The node means a document and the link means a citation relation in citation network. We then detected the key-route main path by applying SPC (Search Path Count) weights. As a result, the main path composed of 30 documents extracted from a citation network. The main path was able to confirm the change of the academic area which was developing along with the change of the times reflecting the industrial change such as various industrial groups. The results of MPA revealed that WoM research was distinguished by five periods: (1) establishment of aspects and critical elements of WoM, (2) relationship analysis between WoM variables, (3) beginning of researches of online WoM, (4) relationship analysis between WoM and purchase, and (5) broadening of topics. It was found that changes within the industry was reflected in the results such as online development and social media. Very recent studies showed that the topics and approaches related WoM were being diversified to circumstantial changes. However, the results showed that even though WoM was used in diverse fields, the main stream of the researches of WoM from the start to the end, was related to marketing and figuring out the influential factors that proliferate WoM. By applying word co-occurrence network analysis, the research trend is presented from a microscopic point of view. Word co-occurrence network was constructed to analyze the relationship between keywords and social network analysis (SNA) was utilized. We divided the data into three periods to investigate the periodic changes and trends in discussion of WoM. SNA showed that Period 1 (1941~2008) consisted of clusters regarding relationship, source, and consumers. Period 2 (2009~2013) contained clusters of satisfaction, community, social networks, review, and internet. Clusters of period 3 (2014~2018) involved satisfaction, medium, review, and interview. The periodic changes of clusters showed transition from offline to online WoM. Media of WoM have become an important factor in spreading the words. This study conducted a quantitative meta-analysis based on scholarly big data regarding WoM. The main contribution of this study is that it provides a micro perspective on the research trend of WoM as well as the macro perspective. The limitation of this study is that the citation network constructed in this study is a network based on the direct citation relation of the collected documents for MPA.

An Analysis of the Internal Marketing Impact on the Market Capitalization Fluctuation Rate based on the Online Company Reviews from Jobplanet (직원을 위한 내부마케팅이 기업의 시가 총액 변동률에 미치는 영향 분석: 잡플래닛 기업 리뷰를 중심으로)

  • Kichul Choi;Sang-Yong Tom Lee
    • Information Systems Review
    • /
    • v.20 no.2
    • /
    • pp.39-62
    • /
    • 2018
  • Thanks to the growth of computing power and the recent development of data analytics, researchers have started to work on the data produced by users through the Internet or social media. This study is in line with these recent research trends and attempts to adopt data analytical techniques. We focus on the impact of "internal marketing" factors on firm performance, which is typically studied through survey methodologies. We looked into the job review platform Jobplanet (www.jobplanet.co.kr), which is a website where employees and former employees anonymously review companies and their management. With web crawling processes, we collected over 40K data points and performed morphological analysis to classify employees' reviews for internal marketing data. We then implemented econometric analysis to see the relationship between internal marketing and market capitalization. Contrary to the findings of extant survey studies, internal marketing is positively related to a firm's market capitalization only within a limited area. In most of the areas, the relationships are negative. Particularly, female-friendly environment and human resource development (HRD) are the areas exhibiting positive relations with market capitalization in the manufacturing industry. In the service industry, most of the areas, such as employ welfare and work-life balance, are negatively related with market capitalization. When firm size is small (or the history is short), female-friendly environment positively affect firm performance. On the contrary, when firm size is big (or the history is long), most of the internal marketing factors are either negative or insignificant. We explain the theoretical contributions and managerial implications with these results.

A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis (텍스트 마이닝을 활용한 신문사에 따른 내용 및 논조 차이점 분석)

  • Kam, Miah;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.53-77
    • /
    • 2012
  • This study analyses the difference of contents and tones of arguments among three Korean major newspapers, the Kyunghyang Shinmoon, the HanKyoreh, and the Dong-A Ilbo. It is commonly accepted that newspapers in Korea explicitly deliver their own tone of arguments when they talk about some sensitive issues and topics. It could be controversial if readers of newspapers read the news without being aware of the type of tones of arguments because the contents and the tones of arguments can affect readers easily. Thus it is very desirable to have a new tool that can inform the readers of what tone of argument a newspaper has. This study presents the results of clustering and classification techniques as part of text mining analysis. We focus on six main subjects such as Culture, Politics, International, Editorial-opinion, Eco-business and National issues in newspapers, and attempt to identify differences and similarities among the newspapers. The basic unit of text mining analysis is a paragraph of news articles. This study uses a keyword-network analysis tool and visualizes relationships among keywords to make it easier to see the differences. Newspaper articles were gathered from KINDS, the Korean integrated news database system. KINDS preserves news articles of the Kyunghyang Shinmun, the HanKyoreh and the Dong-A Ilbo and these are open to the public. This study used these three Korean major newspapers from KINDS. About 3,030 articles from 2008 to 2012 were used. International, national issues and politics sections were gathered with some specific issues. The International section was collected with the keyword of 'Nuclear weapon of North Korea.' The National issues section was collected with the keyword of '4-major-river.' The Politics section was collected with the keyword of 'Tonghap-Jinbo Dang.' All of the articles from April 2012 to May 2012 of Eco-business, Culture and Editorial-opinion sections were also collected. All of the collected data were handled and edited into paragraphs. We got rid of stop-words using the Lucene Korean Module. We calculated keyword co-occurrence counts from the paired co-occurrence list of keywords in a paragraph. We made a co-occurrence matrix from the list. Once the co-occurrence matrix was built, we used the Cosine coefficient matrix as input for PFNet(Pathfinder Network). In order to analyze these three newspapers and find out the significant keywords in each paper, we analyzed the list of 10 highest frequency keywords and keyword-networks of 20 highest ranking frequency keywords to closely examine the relationships and show the detailed network map among keywords. We used NodeXL software to visualize the PFNet. After drawing all the networks, we compared the results with the classification results. Classification was firstly handled to identify how the tone of argument of a newspaper is different from others. Then, to analyze tones of arguments, all the paragraphs were divided into two types of tones, Positive tone and Negative tone. To identify and classify all of the tones of paragraphs and articles we had collected, supervised learning technique was used. The Na$\ddot{i}$ve Bayesian classifier algorithm provided in the MALLET package was used to classify all the paragraphs in articles. After classification, Precision, Recall and F-value were used to evaluate the results of classification. Based on the results of this study, three subjects such as Culture, Eco-business and Politics showed some differences in contents and tones of arguments among these three newspapers. In addition, for the National issues, tones of arguments on 4-major-rivers project were different from each other. It seems three newspapers have their own specific tone of argument in those sections. And keyword-networks showed different shapes with each other in the same period in the same section. It means that frequently appeared keywords in articles are different and their contents are comprised with different keywords. And the Positive-Negative classification showed the possibility of classifying newspapers' tones of arguments compared to others. These results indicate that the approach in this study is promising to be extended as a new tool to identify the different tones of arguments of newspapers.