• Title/Summary/Keyword: 비정형데이터

Search Result 589, Processing Time 0.033 seconds

Comparative Exploration of Gyeongin Ara Waterway Recognition Before and After COVID-19 Outbreak Using Unstructured Big Data (비정형 빅데이터를 활용한 코로나19 발병 전후 경인 아라뱃길 인식 비교 탐색)

  • Han Jangheon
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.20 no.1
    • /
    • pp.17-29
    • /
    • 2024
  • The Gyeongin Ara Waterway is a regional development project designed to transport cargo by sea and to utilize the surrounding waterfront area to enjoy tourism and leisure. It is being used as a space for demonstration projects for urban air transportation (UAM), which has recently been attracting attention, and various efforts are being made at the local level to strengthen cultural and tourism functions and revitalize local food. This study examined the perception and trends of tourism consumers on the Gyeongin Ara Waterway before and after the outbreak of COVID-19. The research method utilized semantic network analysis based on social network analysis. As a result of the study, first, before the outbreak of COVID-19, key words such as bicycle, Han River, riding, Gimpo, Seoul, hotel, cruise ship, Korea Water Resources Corporation, emotion, West Sea, weekend, and travel showed a high frequency of appearance. After the outbreak of COVID-19, keywords such as cafe, discovery, women, Gimpo, restaurant, bakery, observatory, La Mer, and cruise ship showed a high frequency of appearance. Second, the results of the degree centrality analysis showed that before the outbreak of COVID-19, there was increased interest in accommodations for tourism, such as Marina Bay and hotels. After the outbreak of COVID-19, interest in food such as specific bakeries and cafes such as La Mer was found to be high. Third, due to the CONCOR analysis, five keyword clusters were formed before the outbreak of COVID-19, and the number of keyword clusters increased to eight after the outbreak of COVID-19.

Direct Reconstruction of Displaced Subdivision Mesh from Unorganized 3D Points (연결정보가 없는 3차원 점으로부터 차이분할메쉬 직접 복원)

  • Jung, Won-Ki;Kim, Chang-Heon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.6
    • /
    • pp.307-317
    • /
    • 2002
  • In this paper we propose a new mesh reconstruction scheme that produces a displaced subdivision surface directly from unorganized points. The displaced subdivision surface is a new mesh representation that defines a detailed mesh with a displacement map over a smooth domain surface, but original displaced subdivision surface algorithm needs an explicit polygonal mesh since it is not a mesh reconstruction algorithm but a mesh conversion (remeshing) algorithm. The main idea of our approach is that we sample surface detail from unorganized points without any topological information. For this, we predict a virtual triangular face from unorganized points for each sampling ray from a parameteric domain surface. Direct displaced subdivision surface reconstruction from unorganized points has much importance since the output of this algorithm has several important properties: It has compact mesh representation since most vertices can be represented by only a scalar value. Underlying structure of it is piecewise regular so it ran be easily transformed into a multiresolution mesh. Smoothness after mesh deformation is automatically preserved. We avoid time-consuming global energy optimization by employing the input data dependant mesh smoothing, so we can get a good quality displaced subdivision surface quickly.

Issue tracking and voting rate prediction for 19th Korean president election candidates (댓글 분석을 통한 19대 한국 대선 후보 이슈 파악 및 득표율 예측)

  • Seo, Dae-Ho;Kim, Ji-Ho;Kim, Chang-Ki
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.199-219
    • /
    • 2018
  • With the everyday use of the Internet and the spread of various smart devices, users have been able to communicate in real time and the existing communication style has changed. Due to the change of the information subject by the Internet, data became more massive and caused the very large information called big data. These Big Data are seen as a new opportunity to understand social issues. In particular, text mining explores patterns using unstructured text data to find meaningful information. Since text data exists in various places such as newspaper, book, and web, the amount of data is very diverse and large, so it is suitable for understanding social reality. In recent years, there has been an increasing number of attempts to analyze texts from web such as SNS and blogs where the public can communicate freely. It is recognized as a useful method to grasp public opinion immediately so it can be used for political, social and cultural issue research. Text mining has received much attention in order to investigate the public's reputation for candidates, and to predict the voting rate instead of the polling. This is because many people question the credibility of the survey. Also, People tend to refuse or reveal their real intention when they are asked to respond to the poll. This study collected comments from the largest Internet portal site in Korea and conducted research on the 19th Korean presidential election in 2017. We collected 226,447 comments from April 29, 2017 to May 7, 2017, which includes the prohibition period of public opinion polls just prior to the presidential election day. We analyzed frequencies, associative emotional words, topic emotions, and candidate voting rates. By frequency analysis, we identified the words that are the most important issues per day. Particularly, according to the result of the presidential debate, it was seen that the candidate who became an issue was located at the top of the frequency analysis. By the analysis of associative emotional words, we were able to identify issues most relevant to each candidate. The topic emotion analysis was used to identify each candidate's topic and to express the emotions of the public on the topics. Finally, we estimated the voting rate by combining the volume of comments and sentiment score. By doing above, we explored the issues for each candidate and predicted the voting rate. The analysis showed that news comments is an effective tool for tracking the issue of presidential candidates and for predicting the voting rate. Particularly, this study showed issues per day and quantitative index for sentiment. Also it predicted voting rate for each candidate and precisely matched the ranking of the top five candidates. Each candidate will be able to objectively grasp public opinion and reflect it to the election strategy. Candidates can use positive issues more actively on election strategies, and try to correct negative issues. Particularly, candidates should be aware that they can get severe damage to their reputation if they face a moral problem. Voters can objectively look at issues and public opinion about each candidate and make more informed decisions when voting. If they refer to the results of this study before voting, they will be able to see the opinions of the public from the Big Data, and vote for a candidate with a more objective perspective. If the candidates have a campaign with reference to Big Data Analysis, the public will be more active on the web, recognizing that their wants are being reflected. The way of expressing their political views can be done in various web places. This can contribute to the act of political participation by the people.

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality (지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법)

  • Choi, Sukjae;Lee, Jungwon;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.119-138
    • /
    • 2017
  • Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.

An Analysis of the Internal Marketing Impact on the Market Capitalization Fluctuation Rate based on the Online Company Reviews from Jobplanet (직원을 위한 내부마케팅이 기업의 시가 총액 변동률에 미치는 영향 분석: 잡플래닛 기업 리뷰를 중심으로)

  • Kichul Choi;Sang-Yong Tom Lee
    • Information Systems Review
    • /
    • v.20 no.2
    • /
    • pp.39-62
    • /
    • 2018
  • Thanks to the growth of computing power and the recent development of data analytics, researchers have started to work on the data produced by users through the Internet or social media. This study is in line with these recent research trends and attempts to adopt data analytical techniques. We focus on the impact of "internal marketing" factors on firm performance, which is typically studied through survey methodologies. We looked into the job review platform Jobplanet (www.jobplanet.co.kr), which is a website where employees and former employees anonymously review companies and their management. With web crawling processes, we collected over 40K data points and performed morphological analysis to classify employees' reviews for internal marketing data. We then implemented econometric analysis to see the relationship between internal marketing and market capitalization. Contrary to the findings of extant survey studies, internal marketing is positively related to a firm's market capitalization only within a limited area. In most of the areas, the relationships are negative. Particularly, female-friendly environment and human resource development (HRD) are the areas exhibiting positive relations with market capitalization in the manufacturing industry. In the service industry, most of the areas, such as employ welfare and work-life balance, are negatively related with market capitalization. When firm size is small (or the history is short), female-friendly environment positively affect firm performance. On the contrary, when firm size is big (or the history is long), most of the internal marketing factors are either negative or insignificant. We explain the theoretical contributions and managerial implications with these results.

Development of Topic Trend Analysis Model for Industrial Intelligence using Public Data (텍스트마이닝을 활용한 공개데이터 기반 기업 및 산업 토픽추이분석 모델 제안)

  • Park, Sunyoung;Lee, Gene Moo;Kim, You-Eil;Seo, Jinny
    • Journal of Technology Innovation
    • /
    • v.26 no.4
    • /
    • pp.199-232
    • /
    • 2018
  • There are increasing needs for understanding and fathoming of business management environment through big data analysis at industrial and corporative level. The research using the company disclosure information, which is comprehensively covering the business performance and the future plan of the company, is getting attention. However, there is limited research on developing applicable analytical models leveraging such corporate disclosure data due to its unstructured nature. This study proposes a text-mining-based analytical model for industrial and firm level analyses using publicly available company disclousre data. Specifically, we apply LDA topic model and word2vec word embedding model on the U.S. SEC data from the publicly listed firms and analyze the trends of business topics at the industrial and corporate levels. Using LDA topic modeling based on SEC EDGAR 10-K document, whole industrial management topics are figured out. For comparison of different pattern of industries' topic trend, software and hardware industries are compared in recent 20 years. Also, the changes of management subject at firm level are observed with comparison of two companies in software industry. The changes of topic trends provides lens for identifying decreasing and growing management subjects at industrial and firm level. Mapping companies and products(or services) based on dimension reduction after using word2vec word embedding model and principal component analysis of 10-K document at firm level in software industry, companies and products(services) that have similar management subjects are identified and also their changes in decades. For suggesting methodology to develop analysis model based on public management data at industrial and corporate level, there may be contributions in terms of making ground of practical methodology to identifying changes of managements subjects. However, there are required further researches to provide microscopic analytical model with regard to relation of technology management strategy between management performance in case of related to various pattern of management topics as of frequent changes of management subject or their momentum. Also more studies are needed for developing competitive context analysis model with product(service)-portfolios between firms.

The Analysis of Information Security Awareness Using A Text Mining Approach (텍스트 마이닝을 이용한 정보보호인식 분석 및 강화 방안 모색)

  • Lee, Tae-Heon;Youn, Young-Ju;Kim, Hee-Woong
    • Informatization Policy
    • /
    • v.23 no.4
    • /
    • pp.76-94
    • /
    • 2016
  • Recently in Korea, the importance of information security awareness has been receiving a growing attention. Attacks such as social engineering and ransomware are hard to be prevented because it cannot be solved by information security technology. Also, the profitability of information security industry has been decreasing for years. Therefore, many companies try to find a new growth-engine and an entry to the foreign market. The main purpose of this paper is to draw out some information security issues and to analyze them. Finally, this study identifies issues and suggests how to improve the situation in Korea. For this, topic modeling analysis has been used to find information security issues of each country. Moreover, the score of sentiment analysis has been used to compare them. The study is exploring and explaining what critical issues are and how to improve the situation based on the identified issues of the Korean information security industry. Also, this study is also demonstrating how text mining can be applied to the context of information security awareness. From a pragmatic perspective, the study has the implications for information security enterprises. This study is expected to provide a new and realistic method for analyzing domestic and foreign issues using the analysis of real data of the Twitter API.

Design and Implementation of An I/O System for Irregular Application under Parallel System Environments (병렬 시스템 환경하에서 비정형 응용 프로그램을 위한 입출력 시스템의 설계 및 구현)

  • No, Jae-Chun;Park, Seong-Sun;;Gwon, O-Yeong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.11
    • /
    • pp.1318-1332
    • /
    • 1999
  • 본 논문에서는 입출력 응용을 위해 collective I/O 기법을 기반으로 한 실행시간 시스템의 설계, 구현 그리고 그 성능평가를 기술한다. 여기서는 모든 프로세서가 동시에 I/O 요구에 따라 스케쥴링하며 I/O를 수행하는 collective I/O 방안과 프로세서들이 여러 그룹으로 묶이어, 다음 그룹이 데이터를 재배열하는 통신을 수행하는 동안 오직 한 그룹만이 동시에 I/O를 수행하는 pipelined collective I/O 등의 두 가지 설계방안을 살펴본다. Pipelined collective I/O의 전체 과정은 I/O 노드 충돌을 동적으로 줄이기 위해 파이프라인된다. 이상의 설계 부분에서는 동적으로 충돌 관리를 위한 지원을 제공한다. 본 논문에서는 다른 노드의 메모리 영역에 이미 존재하는 데이터를 재 사용하여 I/O 비용을 줄이기 위해 collective I/O 방안에서의 소프트웨어 캐슁 방안과 두 가지 모형에서의 chunking과 온라인 압축방안을 기술한다. 그리고 이상에서 기술한 방안들이 입출력을 위해 높은 성능을 보임을 기술하는데, 이 성능결과는 Intel Paragon과 ASCI/Red teraflops 기계 상에서 실험한 것이다. 그 결과 응용 레벨에서의 bandwidth는 peak point가 55%까지 측정되었다.Abstract In this paper we present the design, implementation and evaluation of a runtime system based on collective I/O techniques for irregular applications. We present two designs, namely, "Collective I/O" and "Pipelined Collective I/O". In the first scheme, all processors participate in the I/O simultaneously, making scheduling of I/O requests simpler but creating a possibility of contention at the I/O nodes. In the second approach, processors are grouped into several groups, so that only one group performs I/O simultaneously, while the next group performs communication to rearrange data, and this entire process is pipelined to reduce I/O node contention dynamically. In other words, the design provides support for dynamic contention management. Then we present a software caching method using collective I/O to reduce I/O cost by reusing data already present in the memory of other nodes. Finally, chunking and on-line compression mechanisms are included in both models. We demonstrate that we can obtain significantly high-performance for I/O above what has been possible so far. The performance results are presented on an Intel Paragon and on the ASCI/Red teraflops machine. Application level I/O bandwidth up to 55% of the peak is observed.he peak is observed.

Two-dimensional Velocity Measurements of Uvêrsbreen Glacier in Svalbard Using TerraSAR-X Offset Tracking Approach (TerraSAR-X 위성레이더 오프셋 트래킹 기법을 활용한 스발바르 Uvêrsbreen 빙하의 2차원 속도)

  • Baek, Won-Kyung;Jung, Hyung-Sup;Chae, Sung-Ho;Lee, Won-Jin
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.3
    • /
    • pp.495-506
    • /
    • 2018
  • Global interest in climate change and sea level rise has led to active research on the velocities of glaciers. In studies about the velocity of glaciers, in-situ measurements can obtain the most accurate data but have limitations to acquire periodical or long-term data. Offset tracking using SAR is actively being used as an alternative of in-situ measurements. Offset tracking has a limitation in that the accuracy of observation is lower than that of other observational techniques, but it has been improved by recent studies. Recent studies in the $Uv{\hat{e}}rsbreen$ glacier area have shown that glacier altitudes decrease at a rate of 1.5 m/year. The glacier displacement velocities in this region are heavily influenced by climate change and can be important in monitoring and forecasting long-term climate change. However, there are few concrete examples of research in this area. In this study, we applied the improved offset tracking method to observe the two-dimensional velocity in the $Uv{\hat{e}}rsbreen$ glacier. As a result, it was confirmed that the glacier moved at a maximum rate of 133.7 m/year. The measruement precisions for azimuth and line-of-sight directions were 5.4 and 3.3 m/year respectively. These results will be utilized to study long-term changes in elevation of glaciers and to study environmental impacts due to climate change.

A Study on the Research Trends in Fintech using Topic Modeling (토픽 모델링을 이용한 핀테크 기술 동향 분석)

  • Kim, TaeKyung;Choi, HoeRyeon;Lee, HongChul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.11
    • /
    • pp.670-681
    • /
    • 2016
  • Recently, based on Internet and mobile environments, the Fintech industry that fuses finance and IT together has been rapidly growing and Fintech services armed with simplicity and convenience have been leading the conversion of all financial services into online and mobile services. However, despite the rapid growth of the Fintech industry, few studies have classified Fintech technologies into detailed technologies, analyzed the technology development trends of major market countries, and supported technology planning. In this respect, using Fintech technological data in the form of unstructured data, the present study extracts and defines detailed Fintech technologies through the topic modeling technique. Thereafter, hot and cold topics of the derived detailed Fintech technologies are identified to determine the trend of Fintech technologies. In addition, the trends of technology development in the USA, South Korea, and China, which are major market countries for major Fintech industrial technologies, are analyzed. Finally, through the analyses of networks between detailed Fintech technologies, linkages between the technologies are examined. The trends of Fintech industrial technologies identified in the present study are expected to be effectively utilized for the establishment of policies in the area of the Fintech industry and Fintech related enterprises' establishment of technology strategies.