• Title/Summary/Keyword: big data mining

Search Result 679, Processing Time 0.028 seconds

A Study of The GPGPU Performance (범용 그래픽 처리장치 (GPGPU)의 성능에 대한 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.6
    • /
    • pp.201-206
    • /
    • 2018
  • As the artificial intelligence and big data technology has been developed recently, the importance of GPGPU, which is a general purpose graphics processing unit, is emphasized. In addition, by the demand for mining equipment to obtain bit coins, which is a block chain application technology, the price of GPGPU has increased sharply with scarcity. If a GPGPU can be precisely simulated, it is possible to conduct experiments on various GPGPU types and analyze performance without purchasing expensive ones. In this paper, we investigate the configuration of a GPGPU simulator and measure the performance of various benchmark programs using GPGPU-Sim.

How do People Understand and Express "Smart City?": Analysis of Transition in Smart-city Keywords through Semantic Network Analysis of SNS Big Data between 2011 and 2020

  • Kim, Seong-A;Kim, Heungsoon
    • Architectural research
    • /
    • v.24 no.2
    • /
    • pp.41-52
    • /
    • 2022
  • The purpose of this study is to grasp the understanding of smart cities and to review whether the common perception of smart cities, as people understand it, is changing over time. This study analyzes keywords related to smart cities used in social network services (SNSs) in 2011, 2016, and 2020 respectively through semantic network analysis. Smart city discussions appearing on SNS in 2011 mainly focused on technology, and the results of 2016 were generally similar to those of 2011. We can also find policy or business-oriented characteristics in emerging countries in 2020. We highlight that all the results of 2011, 2016, and 2020 have some correlation with each other through QAP(Quadratic Assignment Procedure) correlation analysis, and among them, the correlation between 2011 and 2016 is analyzed the most. The results of the frequency analysis, centrality analysis, and CONCOR(CONvergence of interaction CORrelation) analysis support these results. The results of this study help establish policies that reflect the needs and opinions of citizens in planning smart cities by identifying trends and paradigm transitions expressed by people in SNS. Furthermore, it is expected to help emerging countries by enhancing the understanding of the essence and trend of smart cities and to contribute by suggesting the direction of more sustainable technology development in future smart city policies for leading countries.

Development of a mobile application focusing on developmental support care for Korean infants born prematurely: a methodological study

  • Park, Ji Hyeon;Cho, Haeryun
    • Child Health Nursing Research
    • /
    • v.28 no.2
    • /
    • pp.112-123
    • /
    • 2022
  • Purpose: This study aimed to develop and evaluate a mobile application focusing on developmental support care for infants born prematurely. Methods: An application was developed using the analysis, design, development, implementation, and evaluation model. In the analysis phase, previous research was evaluated through big data text-mining and a literature review. In the design phase, the preliminary content of the application was designed, and the content validity and comprehension were verified. A hybrid application was developed and used by eight experts and ten users, who evaluated the layout of the mobile application and their satisfaction with it. Results: The content of the designed application comprised a diary, customized information, developmental play, and community. The mean scores for layout were 3.73±0.47 and 3.43±0.68 out of 4 points among the experts and users, respectively. Users' mean satisfaction score was 3.70±0.70 out of 5 points. Conclusion: The information provided by the mobile application was evaluated as consistent and systematic. The application was also found to be satisfactory by infants' parents. The mobile application developed through this study is expected to be effective in supporting the development of children born prematurely.

An Analysis of Flood Vulnerability by Administrative Region through Big Data Analysis (빅데이터 분석을 통한 행정구역별 홍수 취약성 분석)

  • Yu, Yeong UK;Seong, Yeon Jeong;Park, Tae Gyeong;Jung, Young Hun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.193-193
    • /
    • 2021
  • 전 세계적으로 기후변화가 지속되면서 그에 따른 자연재난의 강도와 발생 빈도가 증가하고 있다. 자연재난의 발생 유형 중 집중호우와 태풍으로 인한 수문학적 재난이 대부분을 차지하고 있으며, 홍수피해는 지역적 수문학적 특성에 따라 피해의 규모와 범위가 달라지는 경향을 보인다. 이러한 이질적인 피해를 관리하기 위해서는 많은 홍수피해 정보를 수집하는 것이 필연적이다. 정보화 시대인 요즘 방대한 양의 데이터가 발생하면서 '빅데이터', '머신러닝', '인공지능'과 같은 말들이 다양한 분야에서 주목을 받고 있다. 홍수피해 정보에 대해서도 과거 국가에서 발간하는 정보외에 인터넷에는 뉴스기사나 SNS 등 미디어를 통하여 수많은 정보들이 생성되고 있다. 이러한 방대한 규모의 데이터는 미래 경쟁력의 우위를 좌우하는 중요한 자원이 될 것이며, 홍수대비책으로 활용될 소중한 정보가 될 수 있다. 본 연구는 인터넷기반으로 한 홍수피해 현상 조사를 통해 홍수피해 규모에 따라 발생하는 홍수피해 현상을 파악하고자 하였다. 이를 위해 과거에 발생한 홍수피해 사례를 조사하여 강우량, 홍수피해 현상 등 홍수피해 관련 정보를 조사하였다. 홍수피해 현상은 뉴스기사나 보고서 등 미디어 정보를 활용하여 수집하였으며, 수집된 비정형 형태의 텍스트 데이터를 '텍스트 마이닝(Text Mining)' 기법을 이용하여 데이터를 정형화 및 주요 홍수피해 현상 키워드를 추출하여 데이터를 수치화하여 표현하였다.

  • PDF

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality (지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법)

  • Choi, Sukjae;Lee, Jungwon;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.119-138
    • /
    • 2017
  • Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.

Analysis of Waterpark Status and Recognition Using Big Data Analysis (빅데이터 분석을 활용한 워터파크 현황 및 인식 분석)

  • Kim, Jae-Hwan;Lee, Jae-Moon
    • Journal of Digital Convergence
    • /
    • v.15 no.10
    • /
    • pp.525-535
    • /
    • 2017
  • The purpose of this study aims to examine consumer perception and current status of water park. The Naver and Daum were used for data collection channels and the keyword 'water park' was used for data retrieval. The data analysis period was limited to the study period from January 1, 2015 to December 31, 2016 for a total of two years. First, as a result of the frequency analysis, hidden cameras, Lotte water park, arrests, suspects, gimhae were in top 5 in 2015, Lotte water park, swimming, summer, opening, admission ticket were in top 5 in 2016. Second, as a result of the connection degree central analysis, hidden camera, arrest, suspect, female, shower room were in top 5 in 2015, swimming, Lotte water park, summer and One Mount, admission ticket were in top 5 in 2016. Third, as a result of the N-GRAM network graph, the water park/hidden camera, the hidden camera/hidden camera, the suspect/arrest, the Gimhae/Lotte water park, water park/suspect were in top 5 in 2015, and One Mount/water park, Gimhae/Lotte water park, water park/admission ticket, water park/water park, water park/opening were in top 5 in 2016. Fourth, as a result of the CONCOR analysis, three groups in 2015 and two groups in 2016 were formed.

Maritime Safety Tribunal Ruling Analysis using SentenceBERT (SentenceBERT 모델을 활용한 해양안전심판 재결서 분석 방법에 대한 연구)

  • Bori Yoon;SeKil Park;Hyerim Bae;Sunghyun Sim
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.29 no.7
    • /
    • pp.843-856
    • /
    • 2023
  • The global surge in maritime traffic has resulted in an increased number of ship collisions, leading to significant economic, environmental, physical, and human damage. The causes of these maritime accidents are multifaceted, often arising from a combination of crew judgment errors, negligence, complexity of navigation routes, weather conditions, and technical deficiencies in the vessels. Given the intricate nuances and contextual information inherent in each incident, a methodology capable of deeply understanding the semantics and context of sentences is imperative. Accordingly, this study utilized the SentenceBERT model to analyze maritime safety tribunal decisions over the last 20 years in the Busan Sea area, which encapsulated data on ship collision incidents. The analysis revealed important keywords potentially responsible for these incidents. Cluster analysis based on the frequency of specific keyword appearances was conducted and visualized. This information can serve as foundational data for the preemptive identification of accident causes and the development of strategies for collision prevention and response.

The development of symmetrically and attributably pure confidence in association rule mining (연관성 규칙에서 활용 가능한 대칭적 기여 순수 신뢰도의 개발)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.3
    • /
    • pp.601-609
    • /
    • 2014
  • The most widely used data mining technique for big data analysis is to generate meaningful association rules. This method has been used to find the relationship between set of items based on the association criteria such as support, confidence, lift, etc. Among them, confidence is the most frequently used, but it has the drawback that we can not know the direction of association by it. The attributably pure confidence was developed to compensate for this drawback, but the value was changed by the position of two item sets. In this paper, we propose four symmetrically and attributably pure confidence measures to compensate the shortcomings of confidence and the attributably pure confidence. And then we prove three conditions of interestingness measure by Piatetsky-Shapiro, and comparative studies with confidence, attributably pure confidence, and four symmetrically and attributably pure confidence measures are shown by numerical examples. The results show that the symmetrically and attributably pure confidence measures are better than confidence and the attributably pure confidence. Also the measure NSAPis found to be the best among these four symmetrically and attributably pure confidence measures.

Research Trends in Record Management Using Unstructured Text Data Analysis (비정형 텍스트 데이터 분석을 활용한 기록관리 분야 연구동향)

  • Deokyong Hong;Junseok Heo
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.23 no.4
    • /
    • pp.73-89
    • /
    • 2023
  • This study aims to analyze the frequency of keywords used in Korean abstracts, which are unstructured text data in the domestic record management research field, using text mining techniques to identify domestic record management research trends through distance analysis between keywords. To this end, 1,157 keywords of 77,578 journals were visualized by extracting 1,157 articles from 7 journal types (28 types) searched by major category (complex study) and middle category (literature informatics) from the institutional statistics (registered site, candidate site) of the Korean Citation Index (KCI). Analysis of t-Distributed Stochastic Neighbor Embedding (t-SNE) and Scattertext using Word2vec was performed. As a result of the analysis, first, it was confirmed that keywords such as "record management" (889 times), "analysis" (888 times), "archive" (742 times), "record" (562 times), and "utilization" (449 times) were treated as significant topics by researchers. Second, Word2vec analysis generated vector representations between keywords, and similarity distances were investigated and visualized using t-SNE and Scattertext. In the visualization results, the research area for record management was divided into two groups, with keywords such as "archiving," "national record management," "standardization," "official documents," and "record management systems" occurring frequently in the first group (past). On the other hand, keywords such as "community," "data," "record information service," "online," and "digital archives" in the second group (current) were garnering substantial focus.

Analysis of the Landscape Characteristics of Island Tourist Site Using Big Data - Based on Bakji and Banwol-do, Shinan-gun - (빅데이터를 활용한 섬 관광지의 경관 특성 분석 - 신안군 박지·반월도를 대상으로 -)

  • Do, Jee-Yoon;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.49 no.2
    • /
    • pp.61-73
    • /
    • 2021
  • This study aimed to identify the landscape perception and landscape characteristics of users by utilizing SNS data generated by their experiences. Therefore, how to recognize the main places and scenery appearing on the island, and what are the characteristics of the main scenery were analyzed using online text data and photo data. Text data are text mining and network structural analysis, while photographic data are landscape identification models and color analysis. As a result of the study, First, as a result of frequency analysis of Bakji·Banwol-do topics, we were able to derive keywords for local landscapes such as 'Purple Bridge', 'Doori Village', and location, behavior, and landscape images by analyzing them simultaneously. Second, the network structure analysis showed that the connection between key and undrawn keywords could be more specifically analyzed, indicating that creating landscapes using colors is affecting regional activation. Third, after analyzing the landscape identification model, it was found that artificial elements would be excluded to create preferred landscapes using the main targets of "Purple Bridge" and "Doori Village", and that it would be effective to set a view point of the sea and sky. Fourth, Bakji·Banwol-do were the first islands to be created under the theme of color, and the colors used in artificial facilities were similar to the surrounding environment, and were harmonized with contrasting lighting and saturation values. This study used online data uploaded directly by visitors in the landscape field to identify users' perceptions and objects of the landscape. Furthermore, the use of both text and photographic data to identify landscape recognition and characteristics is significant in that they can specifically identify which landscape and resources they prefer and perceive. In addition, the use of quantitative big data analysis and qualitative landscape identification models in identifying visitors' perceptions of local landscapes will help them understand the landscape more specifically through discussions based on results.