• Title/Summary/Keyword: in-database analytics

Search Result 23, Processing Time 0.023 seconds

Design and implementation of a music recommendation model through social media analytics (소셜 미디어 분석을 통한 음악 추천 모델의 설계 및 구현)

  • Chung, Kyoung-Rock;Park, Koo-Rack;Park, Sang-Hyock
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.9
    • /
    • pp.214-220
    • /
    • 2021
  • With the rapid spread of smartphones, it has become common to listen to music everywhere, just like background music in life, so it is necessary to create a music database that can make recommendations according to individual circumstances and conditions. This paper proposes a music recommendation model through social media. Since emotions, situations, time of day, weather, etc. are included in hashtags, it is possible to build a social media-based database that reflects the opinions of various people with collective intelligence. We use web crawling to collect and categorize different hashtags from posts with music title hashtags to use real listeners' opinions about music in a database. Data from social media is used to create a music database, and music is classified in a different way from collaborative filtering, which is mainly used by existing music platforms.

Impact of Open Access Models on Citation Metrics

  • Razumova, Irina K.;Kuznetsov, Alexander
    • Journal of Information Science Theory and Practice
    • /
    • v.7 no.2
    • /
    • pp.23-31
    • /
    • 2019
  • We report results of selection-bias-free approaches to the analysis of the impact of open access (OA) models on citation metrics. We studied reference groups of Gold and Green OA articles and the group of non-OA (Paywall) articles with the new functionality of the Web of Science Core Collection database, the InCites platform of Clarivate Analytics, and the Dimensions database of Digital Science. For each reference group we obtained the values of the percent of cited articles and citation impact and their dependence on the depth of the citation period. Different research fields were analyzed in two schemas of the InCites platform. We report the higher values and growth rates of the citation metrics: citation impact and %Cited, in the OA reference groups over the Paywall group. The Green OA articles demonstrate the highest values of citation metrics among all the OA models. Dependence of the value of citation impact on citation period follows linear law with R2 values close to 0.9-1.0. The overall annual growth rates of citation impact of the Green OA, Gold OA, and the Paywall articles, k equal, respectively, 3.6, 2.4, and 1.4 in Dimensions and 4.6, 3.6, and 2.3 in the Web of Science Core Collection. We suppose that earlier results reported for the articles in pure OA journals vs. articles in Paywall journals were affected by the high citation impact of the Green and Hybrid OA articles that could not be elucidated in the Paywall journals at that time.

Practical Text Mining for Trend Analysis: Ontology to visualization in Aerospace Technology

  • Kim, Yoosin;Ju, Yeonjin;Hong, SeongGwan;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.8
    • /
    • pp.4133-4145
    • /
    • 2017
  • Advances in science and technology are driving us to the better life but also forcing us to make more investment at the same time. Therefore, the government has provided the investment to carry on the promising futuristic technology successfully. Indeed, a lot of resources from the government have supported into the science and technology R&D projects for several decades. However, the performance of the public investments remains unclear in many ways, so thus it is required that planning and evaluation about the new investment should be on data driven decision with fact based evidence. In this regard, the government wanted to know the trend and issue of the science and technology with evidences, and has accumulated an amount of database about the science and technology such as research papers, patents, project reports, and R&D information. Nowadays, the database is supporting to various activities such as planning policy, budget allocation, and investment evaluation for the science and technology but the information quality is not reached to the expectation because of limitations of text mining to drill out the information from the unstructured data like the reports and papers. To solve the problem, this study proposes a practical text mining methodology for the science and technology trend analysis, in case of aerospace technology, and conduct text mining methods such as ontology development, topic analysis, network analysis and their visualization.

Trends Analysis on Research Articles of the Sharing Economy through a Meta Study Based on Big Data Analytics (빅데이터 분석 기반의 메타스터디를 통해 본 공유경제에 대한 학술연구 동향 분석)

  • Kim, Ki-youn
    • Journal of Internet Computing and Services
    • /
    • v.21 no.4
    • /
    • pp.97-107
    • /
    • 2020
  • This study aims to conduct a comprehensive meta-study from the perspective of content analysis to explore trends in Korean academic research on the sharing economy by using the big data analytics. Comprehensive meta-analysis methodology can examine the entire set of research results historically and wholly to illuminate the tendency or properties of the overall research trend. Academic research related to the sharing economy first appeared in the year in which Professor Lawrence Lessig introduced the concept of the sharing economy to the world in 2008, but research began in earnest in 2013. In particular, between 2006 and 2008, research improved dramatically. In order to grasp the overall flow of domestic academic research of trends, 8 years of papers from 2013 to the present have been selected as target analysis papers, focusing on titles, keywords, and abstracts using database of electronic journals. Big data analysis was performed in the order of cleaning, analysis, and visualization of the collected data to derive research trends and insights by year and type of literature. We used Python3.7 and Textom analysis tools for data preprocessing, text mining, and metrics frequency analysis for key word extraction, and N-gram chart, centrality and social network analysis and CONCOR clustering visualization based on UCINET6/NetDraw, Textom program, the keywords clustered into 8 groups were used to derive the typologies of each research trend. The outcomes of this study will provide useful theoretical insights and guideline to future studies.

Interoperability between NoSQL and RDBMS via Auto-mapping Scheme in Distributed Parallel Processing Environment (분산병렬처리 환경에서 오토매핑 기법을 통한 NoSQL과 RDBMS와의 연동)

  • Kim, Hee Sung;Lee, Bong Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.11
    • /
    • pp.2067-2075
    • /
    • 2017
  • Lately big data processing is considered as an emerging issue. As a huge amount of data is generated, data processing capability is getting important. In processing big data, both Hadoop distributed file system and unstructured date processing-based NoSQL data store are getting a lot of attention. However, there still exists problems and inconvenience to use NoSQL. In case of low volume data, MapReduce of NoSQL normally consumes unnecessary processing time and requires relatively much more data retrieval time than RDBMS. In order to address the NoSQL problem, in this paper, an interworking scheme between NoSQL and the conventional RDBMS is proposed. The developed auto-mapping scheme enables to choose an appropriate database (NoSQL or RDBMS) depending on the amount of data, which results in fast search time. The experimental results for a specific data set shows that the database interworking scheme reduces data searching time by 35% at the maximum.

Perspectives on Clinical Informatics: Integrating Large-Scale Clinical, Genomic, and Health Information for Clinical Care

  • Choi, In Young;Kim, Tae-Min;Kim, Myung Shin;Mun, Seong K.;Chung, Yeun-Jun
    • Genomics & Informatics
    • /
    • v.11 no.4
    • /
    • pp.186-190
    • /
    • 2013
  • The advances in electronic medical records (EMRs) and bioinformatics (BI) represent two significant trends in healthcare. The widespread adoption of EMR systems and the completion of the Human Genome Project developed the technologies for data acquisition, analysis, and visualization in two different domains. The massive amount of data from both clinical and biology domains is expected to provide personalized, preventive, and predictive healthcare services in the near future. The integrated use of EMR and BI data needs to consider four key informatics areas: data modeling, analytics, standardization, and privacy. Bioclinical data warehouses integrating heterogeneous patient-related clinical or omics data should be considered. The representative standardization effort by the Clinical Bioinformatics Ontology (CBO) aims to provide uniquely identified concepts to include molecular pathology terminologies. Since individual genome data are easily used to predict current and future health status, different safeguards to ensure confidentiality should be considered. In this paper, we focused on the informatics aspects of integrating the EMR community and BI community by identifying opportunities, challenges, and approaches to provide the best possible care service for our patients and the population.

Ontology based Integrated Construction Information Management for Modernized Traditional Housing (Hanok)

  • Lee, Heewoo;Lee, Yunsub;Jin, Zhenhui;Gebremichael, Dagem Derese;Jung, Youngsoo
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.162-169
    • /
    • 2022
  • In an attempt to disseminate modernized Korean traditional housing (Hanok), a ten-year research project was initiated in 2010 by the Korean Government to reduce the construction cost, improve the facility performance, and automate the Hanok construction industry. To meet these objectives, various research areas, including public policies, planning methods, design standards, new building materials, construction standards, maintenance procedures, advanced project management tools, and integrated IT applications have been developed. In addition, comprehensive technologies developed were applied to the ten pilot Hanok buildings to validate the real-world performance as part of the research project. To further facilitate the digital transformation of the Hanok industry by using the research results, it is required to disseminate the developed technologies in an automated and standardized manner. In particular, it is crucial to systematize and manage the interoperability of various technical data and accumulated historical data for different business functions, especially within the highly fragmented industry. In this context, this paper proposes an ontology-based Hanok information dissemination platform to enable industry-wide automated knowledge and information sharing. The system architecture, standardized historical database, and advanced analytics based on ontology web language (OWL) for the Hanok industrialization platform are introduced.

  • PDF

The Correlation between Social Media and the Behaviors of the Supreme Court in Korea (소셜미디어와 대법원 판결의 상관 관계에 대한 분석)

  • Heo, Junhong;Seo, Yeeun;Lee, Seoyeong;Lee, Sang-Yong Tom
    • Knowledge Management Research
    • /
    • v.22 no.3
    • /
    • pp.31-53
    • /
    • 2021
  • As a communication channel for individuals, social media is affecting various areas such as business, economy, politics, and society. One of the less-studied areas is the law. Therefore, this study collected various information from social media and analyzed its impacts on the legal decisions, especially the Supreme Court decisions in Korea. This study was conducted by compiling information from Internet news articles and public responses. We found that when the negative reactions from the public got higher, the trial duration until the supreme court making the final decisions became shorter. However, we were not able to find the significant relationship between social media reactions and dismissal of appeal nor annulment. Our study would contribute to the information systems and knowledge management research in a sense that the social analytics is applied to the area of legal decisions, instead of using conventional qualitative study methodology. Our study is also meaningful to the practitioners because that big data analytical business can be applied to the field of law by creating a new database for the emerging legal technology. Finally, law makers can think of a better way to standardize the legal decision process to minimize the reverse effects from social media.

Analysis of Adverse Drug Reaction Reports using Text Mining (텍스트마이닝을 이용한 약물유해반응 보고자료 분석)

  • Kim, Hyon Hee;Rhew, Kiyon
    • Korean Journal of Clinical Pharmacy
    • /
    • v.27 no.4
    • /
    • pp.221-227
    • /
    • 2017
  • Background: As personalized healthcare industry has attracted much attention, big data analysis of healthcare data is essential. Lots of healthcare data such as product labeling, biomedical literature and social media data are unstructured, extracting meaningful information from the unstructured text data are becoming important. In particular, text mining for adverse drug reactions (ADRs) reports is able to provide signal information to predict and detect adverse drug reactions. There has been no study on text analysis of expert opinion on Korea Adverse Event Reporting System (KAERS) databases in Korea. Methods: Expert opinion text of KAERS database provided by Korea Institute of Drug Safety & Risk Management (KIDS-KD) are analyzed. To understand the whole text, word frequency analysis are performed, and to look for important keywords from the text TF-IDF weight analysis are performed. Also, related keywords with the important keywords are presented by calculating correlation coefficient. Results: Among total 90,522 reports, 120 insulin ADR report and 858 tramadol ADR report were analyzed. The ADRs such as dizziness, headache, vomiting, dyspepsia, and shock were ranked in order in the insulin data, while the ADR symptoms such as vomiting, 어지러움, dizziness, dyspepsia and constipation were ranked in order in the tramadol data as the most frequently used keywords. Conclusion: Using text mining of the expert opinion in KIDS-KD, frequently mentioned ADRs and medications are easily recovered. Text mining in ADRs research is able to play an important role in detecting signal information and prediction of ADRs.

Women's Employment in Industries and Risk of Preeclampsia and Gestational Diabetes: A National Population Study of Republic of Korea

  • Jeong-Won Oh;Seyoung Kim;Jung-won Yoon;Taemi Kim;Myoung-Hee Kim;Jia Ryu;Seung-Ah Choe
    • Safety and Health at Work
    • /
    • v.14 no.3
    • /
    • pp.272-278
    • /
    • 2023
  • Background: Some working conditions may pose a higher physical or psychological demand to pregnant women leading to increased risks of pregnancy complications. Objectives: We assessed the association of woman's employment status and the industrial classification with obstetric complications. Methods: We conducted a national population study using the National Health Information Service database of Republic of Korea. Our analysis encompassed 1,316,310 women who experienced first-order live births in 2010-2019. We collected data on the employment status and the industrial classification of women, as well as their diagnoses of preeclampsia (PE) and gestational diabetes mellitus (GDM) classified as A1 (well controlled by diet) or A2 (requiring medication). We calculated odds ratios (aORs) of complications per employment, and each industrial classification was adjusted for individual risk factors. Results: Most (64.7%) were in employment during pregnancy. Manufacturing (16.4%) and the health and social (16.2%) work represented the most prevalent industries. The health and social work exhibited a higher risk of PE (aOR = 1.11, 95% confidence interval [CI]: 1.03-1.21), while the manufacturing industry demonstrated a higher risk of class A2 GDM (1.20, 95% CI: 1.03-1.41) than financial intermediation. When analyzing both classes of GDM, women who worked in public administration and defense/social security showed higher risk of class A1 GDM (1.04, 95% CI: 1.01, 1.07). When comparing high-risk industries with nonemployment, the health and social work showed a comparable risk of PE (1.02, 95% CI: 0.97, 1.07). Conclusion: Employment was associated with overall lower risks of obstetric complications. Health and social service work can counteract the healthy worker effect in relation to PE. This highlights the importance of further elucidating specific occupational risk factors within the high-risk industries.