• Title/Summary/Keyword: Machine Learning

Search Result 5,463, Processing Time 0.035 seconds

Construction of Test Collection for Evaluation of Scientific Relation Extraction System (과학기술분야 용어 간 관계추출 시스템의 평가를 위한 테스트컬렉션 구축)

  • Choi, Yun-Soo;Choi, Sung-Pil;Jeong, Chang-Hoo;Yoon, Hwa-Mook;You, Beom-Jong
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2009.05a
    • /
    • pp.754-758
    • /
    • 2009
  • Extracting information in large-scale documents would be very useful not only for information retrieval but also for question answering and summarization. Even though relation extraction is very important area, it is difficult to develop and evaluate a machine learning based system without test collection. The study shows how to build test collection(KREC2008) for the relation extraction system. We extracted technology terms from abstracts of journals and selected several relation candidates between them using Wordnet. Judges who were well trained in evaluation process assigned a relation from candidates. The process provides the method with which even non-experts are able to build test collection easily. KREC2008 are open to the public for researchers and developers and will be utilized for development and evaluation of relation extraction system.

  • PDF

The 4th.industrial revolution and Korean university's role change (4차산업혁명과 한국대학의 역할 변화)

  • Park, Sang-Kyu
    • Journal of Convergence for Information Technology
    • /
    • v.8 no.1
    • /
    • pp.235-242
    • /
    • 2018
  • The interest about 4th Industrial Revolution was impressively increased from newspapers, iindustry, government and academic sectors. Especially AI what could be felt by the skin of many peoples, already overpassed the ability of the human's even in creative areas. Namely, now many people start fo feel that the effect of the revolution is just infront of themselves. There were several issues in this trend, the ability of deep learning by machine, the identity of the human, the change of job environment and the concern about the social change etc. Recently many studies have been made about the 4th industrial revolution in many fields like as AI(artificial intelligence), CRISPR, big data and driverless car etc. As many positive effects and pessimistic effects are existed at the same time and many preventing actions are being suggested recently, these opinions will be compared and analyzed and better solutions will be found eventually. Several educational, political, scientific, social and ethical effects and solutions were studied and suggested in this study. Clear implication from the study is that the world we will live from now on is changing faster than ever in the social, industrial, political and educational environment. If it will reform the social systems according to those changes, a society (nation or government) will grasp the chance of its development or take-off, otherwise, it will consume the resources ineffectively and lose the competition as a whole society. But the method of that reform is not that apparent in many aspects as the revolution is progressing currently and its definition should be made whether in industrial or scientific aspect. The person or nation who will define it will have the advantage of leading the future of that business or society.

A Study on the Effects of Online Word-of-Mouth on Game Consumers Based on Sentimental Analysis (감성분석 기반의 게임 소비자 온라인 구전효과 연구)

  • Jung, Keun-Woong;Kim, Jong Uk
    • Journal of Digital Convergence
    • /
    • v.16 no.3
    • /
    • pp.145-156
    • /
    • 2018
  • Unlike the past, when distributors distributed games through retail stores, they are now selling digital content, which is based on online distribution channels. This study analyzes the effects of eWOM (electronic Word of Mouth) on sales volume of game sold on Steam, an online digital content distribution channel. Recently, data mining techniques based on Big Data have been studied. In this study, emotion index of eWOM is derived by emotional analysis which is a text mining technique that can analyze the emotion of each review among factors of eWOM. Emotional analysis utilizes Naive Bayes and SVM classifier and calculates the emotion index through the SVM classifier with high accuracy. Regression analysis is performed on the dependent variable, sales variation, using the emotion index, the number of reviews of each game, the size of eWOM, and the user score of each game, which is a rating of eWOM. Regression analysis revealed that the size of the independent variable eWOM and the emotion index of the eWOM were influential on the dependent variable, sales variation. This study suggests the factors of eWOM that affect the sales volume when Korean game companies enter overseas markets based on steam.

A Method for Correcting Air-Pressure Data Collected by Mini-AWS (소형 자동기상관측장비(Mini-AWS) 기압자료 보정 기법)

  • Ha, Ji-Hun;Kim, Yong-Hyuk;Im, Hyo-Hyuc;Choi, Deokwhan;Lee, Yong Hee
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.3
    • /
    • pp.182-189
    • /
    • 2016
  • For high accuracy of forecast using numerical weather prediction models, we need to get weather observation data that are large and high dense. Korea Meteorological Administration (KMA) mantains Automatic Weather Stations (AWSs) to get weather observation data, but their installation and maintenance costs are high. Mini-AWS is a very compact automatic weather station that can measure and record temperature, humidity, and pressure. In contrast to AWS, costs of Mini-AWS's installation and maintenance are low. It also has a little space restraints for installing. So it is easier than AWS to install mini-AWS on places where we want to get weather observation data. But we cannot use the data observed from Mini-AWSs directly, because it can be affected by surrounding. In this paper, we suggest a correcting method for using pressure data observed from Mini-AWS as weather observation data. We carried out preconditioning process on pressure data from Mini-AWS. Then they were corrected by using machine learning methods with the aim of adjusting to pressure data of the AWS closest to them. Our experimental results showed that corrected pressure data are in regulation and our correcting method using SVR showed very good performance.

Study on High-speed Cyber Penetration Attack Analysis Technology based on Static Feature Base Applicable to Endpoints (Endpoint에 적용 가능한 정적 feature 기반 고속의 사이버 침투공격 분석기술 연구)

  • Hwang, Jun-ho;Hwang, Seon-bin;Kim, Su-jeong;Lee, Tae-jin
    • Journal of Internet Computing and Services
    • /
    • v.19 no.5
    • /
    • pp.21-31
    • /
    • 2018
  • Cyber penetration attacks can not only damage cyber space but can attack entire infrastructure such as electricity, gas, water, and nuclear power, which can cause enormous damage to the lives of the people. Also, cyber space has already been defined as the fifth battlefield, and strategic responses are very important. Most of recent cyber attacks are caused by malicious code, and since the number is more than 1.6 million per day, automated analysis technology to cope with a large amount of malicious code is very important. However, it is difficult to deal with malicious code encryption, obfuscation and packing, and the dynamic analysis technique is not limited to the performance requirements of dynamic analysis but also to the virtual There is a limit in coping with environment avoiding technology. In this paper, we propose a machine learning based malicious code analysis technique which improve the weakness of the detection performance of existing analysis technology while maintaining the light and high-speed analysis performance applicable to commercial endpoints. The results of this study show that 99.13% accuracy, 99.26% precision and 99.09% recall analysis performance of 71,000 normal file and malicious code in commercial environment and analysis time in PC environment can be analyzed more than 5 per second, and it can be operated independently in the endpoint environment and it is considered that it works in complementary form in operation in conjunction with existing antivirus technology and static and dynamic analysis technology. It is also expected to be used as a core element of EDR technology and malware variant analysis.

The study of Defense Artificial Intelligence and Block-chain Convergence (국방분야 인공지능과 블록체인 융합방안 연구)

  • Kim, Seyong;Kwon, Hyukjin;Choi, Minwoo
    • Journal of Internet Computing and Services
    • /
    • v.21 no.2
    • /
    • pp.81-90
    • /
    • 2020
  • The purpose of this study is to study how to apply block-chain technology to prevent data forgery and alteration in the defense sector of AI(Artificial intelligence). AI is a technology for predicting big data by clustering or classifying it by applying various machine learning methodologies, and military powers including the U.S. have reached the completion stage of technology. If data-based AI's data forgery and modulation occurs, the processing process of the data, even if it is perfect, could be the biggest enemy risk factor, and the falsification and modification of the data can be too easy in the form of hacking. Unexpected attacks could occur if data used by weaponized AI is hacked and manipulated by North Korea. Therefore, a technology that prevents data from being falsified and altered is essential for the use of AI. It is expected that data forgery prevention will solve the problem by applying block-chain, a technology that does not damage data, unless more than half of the connected computers agree, even if a single computer is hacked by a distributed storage of encrypted data as a function of seawater.

Clustering of Smart Meter Big Data Based on KNIME Analytic Platform (KNIME 분석 플랫폼 기반 스마트 미터 빅 데이터 클러스터링)

  • Kim, Yong-Gil;Moon, Kyung-Il
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.2
    • /
    • pp.13-20
    • /
    • 2020
  • One of the major issues surrounding big data is the availability of massive time-based or telemetry data. Now, the appearance of low cost capture and storage devices has become possible to get very detailed time data to be used for further analysis. Thus, we can use these time data to get more knowledge about the underlying system or to predict future events with higher accuracy. In particular, it is very important to define custom tailored contract offers for many households and businesses having smart meter records and predict the future electricity usage to protect the electricity companies from power shortage or power surplus. It is required to identify a few groups with common electricity behavior to make it worth the creation of customized contract offers. This study suggests big data transformation as a side effect and clustering technique to understand the electricity usage pattern by using the open data related to smart meter and KNIME which is an open source platform for data analytics, providing a user-friendly graphical workbench for the entire analysis process. While the big data components are not open source, they are also available for a trial if required. After importing, cleaning and transforming the smart meter big data, it is possible to interpret each meter data in terms of electricity usage behavior through a dynamic time warping method.

Current status and future plans of KMTNet microlensing experiments

  • Chung, Sun-Ju;Gould, Andrew;Jung, Youn Kil;Hwang, Kyu-Ha;Ryu, Yoon-Hyun;Shin, In-Gu;Yee, Jennifer C.;Zhu, Wei;Han, Cheongho;Cha, Sang-Mok;Kim, Dong-Jin;Kim, Hyun-Woo;Kim, Seung-Lee;Lee, Chung-Uk;Lee, Yongseok
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.43 no.1
    • /
    • pp.41.1-41.1
    • /
    • 2018
  • We introduce a current status and future plans of Korea Microlensing Telescope Network (KMTNet) microlensing experiments, which include an observational strategy, pipeline, event-finder, and collaborations with Spitzer. The KMTNet experiments were initiated in 2015. From 2016, KMTNet observes 27 fields including 6 main fields and 21 subfields. In 2017, we have finished the DIA photometry for all 2016 and 2017 data. Thus, it is possible to do a real-time DIA photometry from 2018. The DIA photometric data is used for finding events from the KMTNet event-finder. The KMTNet event-finder has been improved relative to the previous version, which already found 857 events in 4 main fields of 2015. We have applied the improved version to all 2016 data. As a result, we find that 2597 events are found, and out of them, 265 are found in KMTNet-K2C9 overlapping fields. For increasing the detection efficiency of event-finder, we are working on filtering false events out by machine-learning method. In 2018, we plan to measure event detection efficiency of KMTNet by injecting fake events into the pipeline near the image level. Thanks to high-cadence observations, KMTNet found fruitful interesting events including exoplanets and brown dwarfs, which were not found by other groups. Masses of such exoplanets and brown dwarfs are measured from collaborations with Spitzer and other groups. Especially, KMTNet has been closely cooperating with Spitzer from 2015. Thus, KMTNet observes Spitzer fields. As a result, we could measure the microlens parallaxes for many events. Also, the automated KMTNet PySIS pipeline was developed before the 2017 Spitzer season and it played a very important role in selecting the Spitzer target. For the 2018 Spitzer season, we will improve the PySIS pipeline to obtain better photometric results.

  • PDF

The Implementable Functions of the CoreNet of a Multi-Valued Single Neuron Network (단층 코어넷 다단입력 인공신경망회로의 함수에 관한 구현가능 연구)

  • Park, Jong Joon
    • Journal of IKEEE
    • /
    • v.18 no.4
    • /
    • pp.593-602
    • /
    • 2014
  • One of the purposes of an artificial neural netowrk(ANNet) is to implement the largest number of functions as possible with the smallest number of nodes and layers. This paper presents a CoreNet which has a multi-leveled input value and a multi-leveled output value with a 2-layered ANNet, which is the basic structure of an ANNet. I have suggested an equation for calculating the capacity of the CoreNet, which has a p-leveled input and a q-leveled output, as $a_{p,q}={\frac{1}{2}}p(p-1)q^2-{\frac{1}{2}}(p-2)(3p-1)q+(p-1)(p-2)$. I've applied this CoreNet into the simulation model 1(5)-1(6), which has 5 levels of an input and 6 levels of an output with no hidden layers. The simulation result of this model gives, the maximum 219 convergences for the number of implementable functions using the cot(${\sqrt{x}}$) input leveling method. I have also shown that, the 27 functions are implementable by the calculation of weight values(w, ${\theta}$) with the multi-threshold lines in the weight space, which are diverged in the simulation results. Therefore the 246 functions are implementable in the 1(5)-1(6) model, and this coincides with the value from the above eqution $a_{5,6}(=246)$. I also show the implementable function numbering method in the weight space.

Implementation of a Spam Message Filtering System using Sentence Similarity Measurements (문장유사도 측정 기법을 통한 스팸 필터링 시스템 구현)

  • Ou, SooBin;Lee, Jongwoo
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.1
    • /
    • pp.57-64
    • /
    • 2017
  • Short message service (SMS) is one of the most important communication methods for people who use mobile phones. However, illegal advertising spam messages exploit people because they can be used without the need for friend registration. Recently, spam message filtering systems that use machine learning have been developed, but they have some disadvantages such as requiring many calculations. In this paper, we implemented a spam message filtering system using the set-based POI search algorithm and sentence similarity without servers. This algorithm can judge whether the input query is a spam message or not using only letter composition without any server computing. Therefore, we can filter the spam message although the input text message has been intentionally modified. We added a specific preprocessing option which aims to enable spam filtering. Based on the experimental results, we observe that our spam message filtering system shows better performance than the original set-based POI search algorithm. We evaluate the proposed system through extensive simulation. According to the simulation results, the proposed system can filter the text message and show high accuracy performance against the text message which cannot be filtered by the 3 major telecom companies.