• Title/Summary/Keyword: 전문가 시스템

Search Result 2,693, Processing Time 0.03 seconds

Effects of Web-based STEAM Program Using 3D Data: Focused on the Geology Units in Earth Science I Textbook (3차원 데이터 활용 웹기반 STEAM 프로그램의 효과 : 지구과학I의 '지질 단원'을 중심으로)

  • Ho Yeon Kim;Ki Rak Park;Hyoungbum Kim
    • Journal of the Korean Society of Earth Science Education
    • /
    • v.16 no.2
    • /
    • pp.247-260
    • /
    • 2023
  • In this study, when applying the 'geological structure' content element of high school earth science I developed according to the 2015 curriculum to the STEAM program using a web-based expert system using 3D data of Google Earth and drones, the creative problem-solving ability of high school students, attitudes toward STEAM, and the results of this study are as follows. First, after applying the STEAM program, high school students' creative problem-solving ability showed meaningful results at the p<.001 level. Second, STEAM attitudes showed a significant value at the p<.001 level, confirming that they had a positive impact on high school students' attitudes towards STEAM. It was judged that web-based class activities using Google Earth and drones were useful for integrated thinking such as learners' sense of efficacy and value recognition for usefulness of knowledge. High school students' satisfaction with the STEAM program was 3.251, showing a slightly high average. It was confirmed that web-based class activities such as drones and Google Earth had a positive impact on learners' class satisfaction. However, it was interpreted that the lack of time for class activities limited the ability of the learners to increase their interest in class. The proposal of this research is as follows. First of all, in consideration of the production of presentation materials and practical training in the STEAM program, activities such as block time and advance instruction for class understanding before class are necessary. Secondly, in order to revitalize STEAM education in the high school curriculum, we judge that research on the development of various integrated education programs that can be applied to the high school grade system is necessary.

Analysis of Global Success Factors of K-pop Music (K-pop 음악의 글로벌 성공 요인 분석)

  • Lee, Kate Seung-Yeon;Chang, Min-Ho
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.4
    • /
    • pp.1-15
    • /
    • 2019
  • Psy's Gangnam style in 2012 showed K-pop's potential for global growth and BTS proved it by reaching three consecutive Billboard No.1. The success in the global music market brings tremendous economical and cultural power. This study is conducted for the continuous growth of K-pop music in the global music market by analyzing the musical factor of K-pop's global success. The top 20 most-viewed K-pop MV on Youtube is chosen as a research subject because Youtube is a worldwide platform that reflects global popularity. For the process of K-pop music creation, the role of the composer is expanded and many overseas producers participate in music creation. All 20 songs are created by the collective creation system and there is a consecutive collaboration between the main producers and certain artists. The top 20 most viewed K-pop songs have the musical characteristics of transnational genre convergence, hook songs, sophisticated sounds, frequent use of English lyrics, a reflection of the latest global trends, rhythm optimized for dance and clear concept. It makes the K-pop song easily remembered and familiar to overseas listeners. K-pop's healthy and fresh theme brings emotional empathy and reflects Korean sentiments. K-pop's global success is not a coincidence, but a result of continuous efforts to advance overseas. Some critics criticize K-pop's musical style is similar and it shows K-pop's limitation but K-pop progressed its musical evolution. By keeping the merits of K-pop's success factors and complementing its weak points, K-pop will continue its popularity and increase influence in the global music market.

Natural Monument Cretaceous Stromatolite at the Daegu Catholic University, Gyeongsan: Occurrences, Natural Heritage Values, and Plan for Preservation and Utilization (천연기념물 경산 대구가톨릭대학교 백악기 스트로마톨라이트: 산상, 자연유산적 가치 및 보존·활용 방안)

  • KONG Dal-Yong;LEE Seong-Joo
    • Korean Journal of Heritage: History & Science
    • /
    • v.56 no.3
    • /
    • pp.214-232
    • /
    • 2023
  • Stromatolite at the Daegu Catholic University, Gyeongsan was designated as a natural monument in December 2009 because it was very excellent in terms of rarity, accessibility, preservation and scale. From the time of designation, the necessity of confirming the lateral extension of the stromatolite beds with the excavation of the surrounding area, and preparing a preservation plan was raised. Accordingly, the Cultural Heritage Administration conducted an investigation of the scale, production pattern, and weathering state of stromatolites with an excavation from April to December 2022, and based on this, suggested natural heritage values and conservation and use plans. The excavation was carried out in a 1,186m2 area surrounding the exposed hemispherical stromatolite (approximately 30m2). Stromatolites are continuously distributed over the entire excavation area, and hemispherical stromatolites predominate in the eastern region, and the distribution and size of hemispherical domes tend to decrease toward the west. These characteristics are interpreted as a result of long-term growth in large-scale lakes, where stratiform or small columnar domes continued to grow and connect with each other, finally forming large domes. Consequently, large and small domes were distributed on the bedding plane in clusters like coral reefs. The growth of plants and lichens, as well as small-scale faults and joints developed on the stromatolite bedding surface, is the main cause of accelerated weathering. However, preservation treatment with chemicals as with dinosaur footprints or dinosaur egg fossil sites is not suitable due to the characteristics of stromatolites, and preservation with the installation of closed protection facilities should be considered. This excavation confirmed that the distribution, size and value of stromatolites are much larger and higher than at the time of designation as a natural monument. Therefore, additional excavation of areas by experts that could not be excavated due to the discovery of buried cultural properties (stone chamber tombs) and reexamination of the expansion designation of natural monuments are required.

Design of Client-Server Model For Effective Processing and Utilization of Bigdata (빅데이터의 효과적인 처리 및 활용을 위한 클라이언트-서버 모델 설계)

  • Park, Dae Seo;Kim, Hwa Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.109-122
    • /
    • 2016
  • Recently, big data analysis has developed into a field of interest to individuals and non-experts as well as companies and professionals. Accordingly, it is utilized for marketing and social problem solving by analyzing the data currently opened or collected directly. In Korea, various companies and individuals are challenging big data analysis, but it is difficult from the initial stage of analysis due to limitation of big data disclosure and collection difficulties. Nowadays, the system improvement for big data activation and big data disclosure services are variously carried out in Korea and abroad, and services for opening public data such as domestic government 3.0 (data.go.kr) are mainly implemented. In addition to the efforts made by the government, services that share data held by corporations or individuals are running, but it is difficult to find useful data because of the lack of shared data. In addition, big data traffic problems can occur because it is necessary to download and examine the entire data in order to grasp the attributes and simple information about the shared data. Therefore, We need for a new system for big data processing and utilization. First, big data pre-analysis technology is needed as a way to solve big data sharing problem. Pre-analysis is a concept proposed in this paper in order to solve the problem of sharing big data, and it means to provide users with the results generated by pre-analyzing the data in advance. Through preliminary analysis, it is possible to improve the usability of big data by providing information that can grasp the properties and characteristics of big data when the data user searches for big data. In addition, by sharing the summary data or sample data generated through the pre-analysis, it is possible to solve the security problem that may occur when the original data is disclosed, thereby enabling the big data sharing between the data provider and the data user. Second, it is necessary to quickly generate appropriate preprocessing results according to the level of disclosure or network status of raw data and to provide the results to users through big data distribution processing using spark. Third, in order to solve the problem of big traffic, the system monitors the traffic of the network in real time. When preprocessing the data requested by the user, preprocessing to a size available in the current network and transmitting it to the user is required so that no big traffic occurs. In this paper, we present various data sizes according to the level of disclosure through pre - analysis. This method is expected to show a low traffic volume when compared with the conventional method of sharing only raw data in a large number of systems. In this paper, we describe how to solve problems that occur when big data is released and used, and to help facilitate sharing and analysis. The client-server model uses SPARK for fast analysis and processing of user requests. Server Agent and a Client Agent, each of which is deployed on the Server and Client side. The Server Agent is a necessary agent for the data provider and performs preliminary analysis of big data to generate Data Descriptor with information of Sample Data, Summary Data, and Raw Data. In addition, it performs fast and efficient big data preprocessing through big data distribution processing and continuously monitors network traffic. The Client Agent is an agent placed on the data user side. It can search the big data through the Data Descriptor which is the result of the pre-analysis and can quickly search the data. The desired data can be requested from the server to download the big data according to the level of disclosure. It separates the Server Agent and the client agent when the data provider publishes the data for data to be used by the user. In particular, we focus on the Big Data Sharing, Distributed Big Data Processing, Big Traffic problem, and construct the detailed module of the client - server model and present the design method of each module. The system designed on the basis of the proposed model, the user who acquires the data analyzes the data in the desired direction or preprocesses the new data. By analyzing the newly processed data through the server agent, the data user changes its role as the data provider. The data provider can also obtain useful statistical information from the Data Descriptor of the data it discloses and become a data user to perform new analysis using the sample data. In this way, raw data is processed and processed big data is utilized by the user, thereby forming a natural shared environment. The role of data provider and data user is not distinguished, and provides an ideal shared service that enables everyone to be a provider and a user. The client-server model solves the problem of sharing big data and provides a free sharing environment to securely big data disclosure and provides an ideal shared service to easily find big data.

A Study on the Application of Outlier Analysis for Fraud Detection: Focused on Transactions of Auction Exception Agricultural Products (부정 탐지를 위한 이상치 분석 활용방안 연구 : 농수산 상장예외품목 거래를 대상으로)

  • Kim, Dongsung;Kim, Kitae;Kim, Jongwoo;Park, Steve
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.93-108
    • /
    • 2014
  • To support business decision making, interests and efforts to analyze and use transaction data in different perspectives are increasing. Such efforts are not only limited to customer management or marketing, but also used for monitoring and detecting fraud transactions. Fraud transactions are evolving into various patterns by taking advantage of information technology. To reflect the evolution of fraud transactions, there are many efforts on fraud detection methods and advanced application systems in order to improve the accuracy and ease of fraud detection. As a case of fraud detection, this study aims to provide effective fraud detection methods for auction exception agricultural products in the largest Korean agricultural wholesale market. Auction exception products policy exists to complement auction-based trades in agricultural wholesale market. That is, most trades on agricultural products are performed by auction; however, specific products are assigned as auction exception products when total volumes of products are relatively small, the number of wholesalers is small, or there are difficulties for wholesalers to purchase the products. However, auction exception products policy makes several problems on fairness and transparency of transaction, which requires help of fraud detection. In this study, to generate fraud detection rules, real huge agricultural products trade transaction data from 2008 to 2010 in the market are analyzed, which increase more than 1 million transactions and 1 billion US dollar in transaction volume. Agricultural transaction data has unique characteristics such as frequent changes in supply volumes and turbulent time-dependent changes in price. Since this was the first trial to identify fraud transactions in this domain, there was no training data set for supervised learning. So, fraud detection rules are generated using outlier detection approach. We assume that outlier transactions have more possibility of fraud transactions than normal transactions. The outlier transactions are identified to compare daily average unit price, weekly average unit price, and quarterly average unit price of product items. Also quarterly averages unit price of product items of the specific wholesalers are used to identify outlier transactions. The reliability of generated fraud detection rules are confirmed by domain experts. To determine whether a transaction is fraudulent or not, normal distribution and normalized Z-value concept are applied. That is, a unit price of a transaction is transformed to Z-value to calculate the occurrence probability when we approximate the distribution of unit prices to normal distribution. The modified Z-value of the unit price in the transaction is used rather than using the original Z-value of it. The reason is that in the case of auction exception agricultural products, Z-values are influenced by outlier fraud transactions themselves because the number of wholesalers is small. The modified Z-values are called Self-Eliminated Z-scores because they are calculated excluding the unit price of the specific transaction which is subject to check whether it is fraud transaction or not. To show the usefulness of the proposed approach, a prototype of fraud transaction detection system is developed using Delphi. The system consists of five main menus and related submenus. First functionalities of the system is to import transaction databases. Next important functions are to set up fraud detection parameters. By changing fraud detection parameters, system users can control the number of potential fraud transactions. Execution functions provide fraud detection results which are found based on fraud detection parameters. The potential fraud transactions can be viewed on screen or exported as files. The study is an initial trial to identify fraud transactions in Auction Exception Agricultural Products. There are still many remained research topics of the issue. First, the scope of analysis data was limited due to the availability of data. It is necessary to include more data on transactions, wholesalers, and producers to detect fraud transactions more accurately. Next, we need to extend the scope of fraud transaction detection to fishery products. Also there are many possibilities to apply different data mining techniques for fraud detection. For example, time series approach is a potential technique to apply the problem. Even though outlier transactions are detected based on unit prices of transactions, however it is possible to derive fraud detection rules based on transaction volumes.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

Public Sentiment Analysis of Korean Top-10 Companies: Big Data Approach Using Multi-categorical Sentiment Lexicon (국내 주요 10대 기업에 대한 국민 감성 분석: 다범주 감성사전을 활용한 빅 데이터 접근법)

  • Kim, Seo In;Kim, Dong Sung;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.45-69
    • /
    • 2016
  • Recently, sentiment analysis using open Internet data is actively performed for various purposes. As online Internet communication channels become popular, companies try to capture public sentiment of them from online open information sources. This research is conducted for the purpose of analyzing pulbic sentiment of Korean Top-10 companies using a multi-categorical sentiment lexicon. Whereas existing researches related to public sentiment measurement based on big data approach classify sentiment into dimensions, this research classifies public sentiment into multiple categories. Dimensional sentiment structure has been commonly applied in sentiment analysis of various applications, because it is academically proven, and has a clear advantage of capturing degree of sentiment and interrelation of each dimension. However, the dimensional structure is not effective when measuring public sentiment because human sentiment is too complex to be divided into few dimensions. In addition, special training is needed for ordinary people to express their feeling into dimensional structure. People do not divide their sentiment into dimensions, nor do they need psychological training when they feel. People would not express their feeling in the way of dimensional structure like positive/negative or active/passive; rather they express theirs in the way of categorical sentiment like sadness, rage, happiness and so on. That is, categorial approach of sentiment analysis is more natural than dimensional approach. Accordingly, this research suggests multi-categorical sentiment structure as an alternative way to measure social sentiment from the point of the public. Multi-categorical sentiment structure classifies sentiments following the way that ordinary people do although there are possibility to contain some subjectiveness. In this research, nine categories: 'Sadness', 'Anger', 'Happiness', 'Disgust', 'Surprise', 'Fear', 'Interest', 'Boredom' and 'Pain' are used as multi-categorical sentiment structure. To capture public sentiment of Korean Top-10 companies, Internet news data of the companies are collected over the past 25 months from a representative Korean portal site. Based on the sentiment words extracted from previous researches, we have created a sentiment lexicon, and analyzed the frequency of the words coming up within the news data. The frequency of each sentiment category was calculated as a ratio out of the total sentiment words to make ranks of distributions. Sentiment comparison among top-4 companies, which are 'Samsung', 'Hyundai', 'SK', and 'LG', were separately visualized. As a next step, the research tested hypothesis to prove the usefulness of the multi-categorical sentiment lexicon. It tested how effective categorial sentiment can be used as relative comparison index in cross sectional and time series analysis. To test the effectiveness of the sentiment lexicon as cross sectional comparison index, pair-wise t-test and Duncan test were conducted. Two pairs of companies, 'Samsung' and 'Hanjin', 'SK' and 'Hanjin' were chosen to compare whether each categorical sentiment is significantly different in pair-wise t-test. Since category 'Sadness' has the largest vocabularies, it is chosen to figure out whether the subgroups of the companies are significantly different in Duncan test. It is proved that five sentiment categories of Samsung and Hanjin and four sentiment categories of SK and Hanjin are different significantly. In category 'Sadness', it has been figured out that there were six subgroups that are significantly different. To test the effectiveness of the sentiment lexicon as time series comparison index, 'nut rage' incident of Hanjin is selected as an example case. Term frequency of sentiment words of the month when the incident happened and term frequency of the one month before the event are compared. Sentiment categories was redivided into positive/negative sentiment, and it is tried to figure out whether the event actually has some negative impact on public sentiment of the company. The difference in each category was visualized, moreover the variation of word list of sentiment 'Rage' was shown to be more concrete. As a result, there was huge before-and-after difference of sentiment that ordinary people feel to the company. Both hypotheses have turned out to be statistically significant, and therefore sentiment analysis in business area using multi-categorical sentiment lexicons has persuasive power. This research implies that categorical sentiment analysis can be used as an alternative method to supplement dimensional sentiment analysis when figuring out public sentiment in business environment.

Term Mapping Methodology between Everyday Words and Legal Terms for Law Information Search System (법령정보 검색을 위한 생활용어와 법률용어 간의 대응관계 탐색 방법론)

  • Kim, Ji Hyun;Lee, Jong-Seo;Lee, Myungjin;Kim, Wooju;Hong, June Seok
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.137-152
    • /
    • 2012
  • In the generation of Web 2.0, as many users start to make lots of web contents called user created contents by themselves, the World Wide Web is overflowing by countless information. Therefore, it becomes the key to find out meaningful information among lots of resources. Nowadays, the information retrieval is the most important thing throughout the whole field and several types of search services are developed and widely used in various fields to retrieve information that user really wants. Especially, the legal information search is one of the indispensable services in order to provide people with their convenience through searching the law necessary to their present situation as a channel getting knowledge about it. The Office of Legislation in Korea provides the Korean Law Information portal service to search the law information such as legislation, administrative rule, and judicial precedent from 2009, so people can conveniently find information related to the law. However, this service has limitation because the recent technology for search engine basically returns documents depending on whether the query is included in it or not as a search result. Therefore, it is really difficult to retrieve information related the law for general users who are not familiar with legal terms in the search engine using simple matching of keywords in spite of those kinds of efforts of the Office of Legislation in Korea, because there is a huge divergence between everyday words and legal terms which are especially from Chinese words. Generally, people try to access the law information using everyday words, so they have a difficulty to get the result that they exactly want. In this paper, we propose a term mapping methodology between everyday words and legal terms for general users who don't have sufficient background about legal terms, and we develop a search service that can provide the search results of law information from everyday words. This will be able to search the law information accurately without the knowledge of legal terminology. In other words, our research goal is to make a law information search system that general users are able to retrieval the law information with everyday words. First, this paper takes advantage of tags of internet blogs using the concept for collective intelligence to find out the term mapping relationship between everyday words and legal terms. In order to achieve our goal, we collect tags related to an everyday word from web blog posts. Generally, people add a non-hierarchical keyword or term like a synonym, especially called tag, in order to describe, classify, and manage their posts when they make any post in the internet blog. Second, the collected tags are clustered through the cluster analysis method, K-means. Then, we find a mapping relationship between an everyday word and a legal term using our estimation measure to select the fittest one that can match with an everyday word. Selected legal terms are given the definite relationship, and the relations between everyday words and legal terms are described using SKOS that is an ontology to describe the knowledge related to thesauri, classification schemes, taxonomies, and subject-heading. Thus, based on proposed mapping and searching methodologies, our legal information search system finds out a legal term mapped with user query and retrieves law information using a matched legal term, if users try to retrieve law information using an everyday word. Therefore, from our research, users can get exact results even if they do not have the knowledge related to legal terms. As a result of our research, we expect that general users who don't have professional legal background can conveniently and efficiently retrieve the legal information using everyday words.

Development of Intelligent Job Classification System based on Job Posting on Job Sites (구인구직사이트의 구인정보 기반 지능형 직무분류체계의 구축)

  • Lee, Jung Seung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.123-139
    • /
    • 2019
  • The job classification system of major job sites differs from site to site and is different from the job classification system of the 'SQF(Sectoral Qualifications Framework)' proposed by the SW field. Therefore, a new job classification system is needed for SW companies, SW job seekers, and job sites to understand. The purpose of this study is to establish a standard job classification system that reflects market demand by analyzing SQF based on job offer information of major job sites and the NCS(National Competency Standards). For this purpose, the association analysis between occupations of major job sites is conducted and the association rule between SQF and occupation is conducted to derive the association rule between occupations. Using this association rule, we proposed an intelligent job classification system based on data mapping the job classification system of major job sites and SQF and job classification system. First, major job sites are selected to obtain information on the job classification system of the SW market. Then We identify ways to collect job information from each site and collect data through open API. Focusing on the relationship between the data, filtering only the job information posted on each job site at the same time, other job information is deleted. Next, we will map the job classification system between job sites using the association rules derived from the association analysis. We will complete the mapping between these market segments, discuss with the experts, further map the SQF, and finally propose a new job classification system. As a result, more than 30,000 job listings were collected in XML format using open API in 'WORKNET,' 'JOBKOREA,' and 'saramin', which are the main job sites in Korea. After filtering out about 900 job postings simultaneously posted on multiple job sites, 800 association rules were derived by applying the Apriori algorithm, which is a frequent pattern mining. Based on 800 related rules, the job classification system of WORKNET, JOBKOREA, and saramin and the SQF job classification system were mapped and classified into 1st and 4th stages. In the new job taxonomy, the first primary class, IT consulting, computer system, network, and security related job system, consisted of three secondary classifications, five tertiary classifications, and five fourth classifications. The second primary classification, the database and the job system related to system operation, consisted of three secondary classifications, three tertiary classifications, and four fourth classifications. The third primary category, Web Planning, Web Programming, Web Design, and Game, was composed of four secondary classifications, nine tertiary classifications, and two fourth classifications. The last primary classification, job systems related to ICT management, computer and communication engineering technology, consisted of three secondary classifications and six tertiary classifications. In particular, the new job classification system has a relatively flexible stage of classification, unlike other existing classification systems. WORKNET divides jobs into third categories, JOBKOREA divides jobs into second categories, and the subdivided jobs into keywords. saramin divided the job into the second classification, and the subdivided the job into keyword form. The newly proposed standard job classification system accepts some keyword-based jobs, and treats some product names as jobs. In the classification system, not only are jobs suspended in the second classification, but there are also jobs that are subdivided into the fourth classification. This reflected the idea that not all jobs could be broken down into the same steps. We also proposed a combination of rules and experts' opinions from market data collected and conducted associative analysis. Therefore, the newly proposed job classification system can be regarded as a data-based intelligent job classification system that reflects the market demand, unlike the existing job classification system. This study is meaningful in that it suggests a new job classification system that reflects market demand by attempting mapping between occupations based on data through the association analysis between occupations rather than intuition of some experts. However, this study has a limitation in that it cannot fully reflect the market demand that changes over time because the data collection point is temporary. As market demands change over time, including seasonal factors and major corporate public recruitment timings, continuous data monitoring and repeated experiments are needed to achieve more accurate matching. The results of this study can be used to suggest the direction of improvement of SQF in the SW industry in the future, and it is expected to be transferred to other industries with the experience of success in the SW industry.

A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network (뉴럴 텐서 네트워크 기반 주식 개별종목 지식개체명 추출 방법에 관한 연구)

  • Yang, Yunseok;Lee, Hyun Jun;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.25-38
    • /
    • 2019
  • Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it's constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge. So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied. Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented. The empirical study to confirm the usefulness of the presented model, experts' reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set. As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge. In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user's investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement. Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.