• Title/Summary/Keyword: Web based data system

Search Result 1,908, Processing Time 0.037 seconds

Benchmarking the Regional Patients Using DEA : Focused on A Oriental Medicine Hospital (자료포락분석방법을 이용한 내원환자의 지역별 벤치마킹분석 : 일개 한방병원을 중심으로)

  • Moon, Kyeong-Jun;Lee, Kwang-Soo;Kwon, Hyuk-Jun
    • The Journal of Society for e-Business Studies
    • /
    • v.19 no.3
    • /
    • pp.91-105
    • /
    • 2014
  • This study purposed to benchmark the number of patients who visited an oriental medicine hospital from its surrounding regions using data envelopment analysis (DEA) model, and to analyze the relationships between regional characteristics and efficiency scores from DEA. Study data was collected from one oriental medicine hospital operated in a metropolitan city in Korea. Patient locations were identified at the smallest administrative district, Dong, and number of patients was calculated at the Dong level based on the address of patients in hospital information system. Socio-demographic variables of each Dong were identified from the Statistics of Korea web-sites. DEA was used to benchmark the number of patients between Dongs and to compute the efficiency scores. Tobit regression analysis model was applied to analyze the relationship between efficiency scores and regional variables. 6 Dongs were identified as efficient after DEA. In Tobit analysis, number of medical aid recipients and number of total population in each Dong was significant in explaining the differences of efficiency scores. The study model introduced the application of DEA model in benchmarking the patients between regions. It can be applied to identify the number of patients in each region which a hospital needs to improve their performances.

A System for Automatic Classification of Traditional Culture Texts (전통문화 콘텐츠 표준체계를 활용한 자동 텍스트 분류 시스템)

  • Hur, YunA;Lee, DongYub;Kim, Kuekyeng;Yu, Wonhee;Lim, HeuiSeok
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.12
    • /
    • pp.39-47
    • /
    • 2017
  • The Internet have increased the number of digital web documents related to the history and traditions of Korean Culture. However, users who search for creators or materials related to traditional cultures are not able to get the information they want and the results are not enough. Document classification is required to access this effective information. In the past, document classification has been difficult to manually and manually classify documents, but it has recently been difficult to spend a lot of time and money. Therefore, this paper develops an automatic text classification model of traditional cultural contents based on the data of the Korean information culture field composed of systematic classifications of traditional cultural contents. This study applied TF-IDF model, Bag-of-Words model, and TF-IDF/Bag-of-Words combined model to extract word frequencies for 'Korea Traditional Culture' data. And we developed the automatic text classification model of traditional cultural contents using Support Vector Machine classification algorithm.

A Study on Detecting Fake Reviews Using Machine Learning: Focusing on User Behavior Analysis (머신러닝을 활용한 가짜리뷰 탐지 연구: 사용자 행동 분석을 중심으로)

  • Lee, Min Cheol;Yoon, Hyun Shik
    • Knowledge Management Research
    • /
    • v.21 no.3
    • /
    • pp.177-195
    • /
    • 2020
  • The social consciousness on fake reviews has triggered researchers to suggest ways to cope with them by analyzing contents of fake reviews or finding ways to discover them by means of structural characteristics of them. This research tried to collect data from blog posts in Naver and detect habitual patterns users use unconsciously by variables extracted from blogs and blog posts by a machine learning model and wanted to use the technique in predicting fake reviews. Data analysis showed that there was a very high relationship between the number of all the posts registered in the blog of the writer of the related writing and the date when it was registered. And, it was found that, as model to detect advertising reviews, Random Forest is the most suitable. If a review is predicted to be an advertising one by the model suggested in this research, it is very likely that it is fake review, and that it violates the guidelines on investigation into markings and advertising regarding recommendation and guarantee in the Law of Marking and Advertising. The fact that, instead of using analysis of morphemes in contents of writings, this research adopts behavior analysis of the writer, and, based on such an approach, collects characteristic data of blogs and blog posts not by manual works, but by automated system, and discerns whether a certain writing is advertising or not is expected to have positive effects on improving efficiency and effectiveness in detecting fake reviews.

A Study on the Fat and Fatty Acid Intake of College Women Evaluated through Internet Nutritional Assessment System (인터넷 상의 영양평가프로그램을 이용한 일부 여대생의 지방 및 지방산 섭취에 관한 연구)

  • Yu, Choon-Hie
    • Journal of Nutrition and Health
    • /
    • v.40 no.1
    • /
    • pp.78-88
    • /
    • 2007
  • The purpose of this study was to investigate dietary fat and individual fatty acids intake pattern of 174 college women living in Seoul and Gyong-gi province through internet nutritional assessment system. Each of the subjects was required to input their own food intake for three days, which included two days during the week and one day of the weekend, on the web program directly and all of the data collected were used for statistical analysis. The mean daily caloric intake of the subjects was 1,500.9 kcal which was at 71.5% of Estimated Energy Requirement (EER). Dietary fat contributed 27.6% of the total caloric intake which was slightly higher than the recommended limit of 25%. Daily cholesterol intake was 310.0 mg, which was also high to some degree. Mean daily N6 and N3 fatty acid intake was 6.1 g and 0.9 g, respectively, and calory % calculated from each were 3.63% and 0.53%. This result showed the intake of N3 fatty acid fell in Acceptable Macronutrient Distribution Ranges (AMR) $0.5\sim1.0%$ but that of N6 fatty acid was somewhat lower than the AMDR $4\sim8%$. N6/N3 ratio 8.5/l, however, was within the desirable range $4\sim10/1$. Considering overall dietary fatty acids intake, oleic acid was the most abundant, followed by linoleic and palmitic acid. And among polyunsaturated fatty acids intake, linoleic acid was exclusively high, accounting for 97.4% of total N6 fatty acid intake. On the contrary, three fatty acids, linolenic (67.3%), DHA (21.1%) and EPA (10.0%), together supplied 98.4% of total N3 fatty acid intake. Mean P/M/S was 0.9/l.1/1.0. The subjects' intake of fat, many fatty acids and cholesterol came from diverse food groups including meats, fats and oils, milk and milk products, eggs, fish, and soybean products. Nevertheless, the subjects tended to show unfavorable fat and fatty acids intake pattern in terms of quantity and quality. Based on these results, it is important to monitor dietary fat intake pattern of the general population continuously and an internet program such as the one used for this study would be valuable, especially for assessing dietary patterns in the younger generation.

Realization of a Web-based Distribution System for the Monitoring of Business Press Releases and News Gathering Robots (기업 보도자료 모니터링을 위한 웹기반 배포시스템 및 기사 수집로봇 구현)

  • Shin, Myeong-Sook;Oh, Jung-Jin;Lee, Joon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.12
    • /
    • pp.103-111
    • /
    • 2013
  • At present, a variety of Korean news stories have been about important online content and its importance in the press is becoming higher. Diverse news from businesses are provided to the public as press releases through newspapers or broadcasting media. For such news to become information for a press release, enterprises visit reporters, use e-mails, faxes, or couriers to deliver the information. However, such methods have problems with time, human resources, expenses, and file damage. Also, with these methods it is bothersome for enterprises to check what has been released and for the press to make frequent contact with enterprises for interviews and for content to be released. Therefore, this study aimed to realize a distribution system which enterprises can use to distribute data to be released to the press and to easily check what is to be released while the press can ask for interview requests in a simple way, as well as a news gathering robot that can collects news on the enterprises involved from articles online or in portal sites.

A MDA-based Approach to Developing UI Architecture for Mobile Telephony Software (MDA기반 이동 단말 시스템 소프트웨어 개발 기법)

  • Lee Joon-Sang;Chae Heung-Seok
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.383-390
    • /
    • 2006
  • Product-line engineering is a dreaming goal in software engineering research. Unfortunately, the current underlying technologies do not seem to be still not much matured enough to make it viable in the industry. Based on our experiences in working on mobile telephony systems over 3 years, now we are in the course of developing an approach to product-line engineering for mobile telephony system software. In this paper, the experiences are shared together with our research motivation and idea. Consequently, we propose an approach to building and maintaining telephony application logics from the perspective of scenes. As a Domain-Specific Language(DSL), Menu Navigation Viewpoint(MNV) DSL is designed to deal with the problem domain of telephony applications. The functional requirements on how a set of telephony application logics are configured can be so various depending on manufacturer, product concept, service carrier, and so on. However, there is a commonality that all of the currently used telephony application logics can be generally described from the point of user's view, with a set of functional features that can be combinatorially synthesized from typical telephony services(i.e. voice/video telephony, CBS/SMS/MMS, address book, data connection, camera/multimedia, web browsing, etc.), and their possible connectivity. MNV DSL description acts as a backbone software architecture based on which the other types of telephony application logics are placed and aligned to work together globally.

A User Profile-based Filtering Method for Information Search in Smart TV Environment (스마트 TV 환경에서 정보 검색을 위한 사용자 프로파일 기반 필터링 방법)

  • Sean, Visal;Oh, Kyeong-Jin;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.97-117
    • /
    • 2012
  • Nowadays, Internet users tend to do a variety of actions at the same time such as web browsing, social networking and multimedia consumption. While watching a video, once a user is interested in any product, the user has to do information searches to get to know more about the product. With a conventional approach, user has to search it separately with search engines like Bing or Google, which might be inconvenient and time-consuming. For this reason, a video annotation platform has been developed in order to provide users more convenient and more interactive ways with video content. In the future of smart TV environment, users can follow annotated information, for example, a link to a vendor to buy the product of interest. It is even better to enable users to search for information by directly discussing with friends. Users can effectively get useful and relevant information about the product from friends who share common interests or might have experienced it before, which is more reliable than the results from search engines. Social networking services provide an appropriate environment for people to share products so that they can show new things to their friends and to share their personal experiences on any specific product. Meanwhile, they can also absorb the most relevant information about the product that they are interested in by either comments or discussion amongst friends. However, within a very huge graph of friends, determining the most appropriate persons to ask for information about a specific product has still a limitation within the existing conventional approach. Once users want to share or discuss a product, they simply share it to all friends as new feeds. This means a newly posted article is blindly spread to all friends without considering their background interests or knowledge. In this way, the number of responses back will be huge. Users cannot easily absorb the relevant and useful responses from friends, since they are from various fields of interest and knowledge. In order to overcome this limitation, we propose a method to filter a user's friends for information search, which leverages semantic video annotation and social networking services. Our method filters and brings out who can give user useful information about a specific product. By examining the existing Facebook information regarding users and their social graph, we construct a user profile of product interest. With user's permission and authentication, user's particular activities are enriched with the domain-specific ontology such as GoodRelations and BestBuy Data sources. Besides, we assume that the object in the video is already annotated using Linked Data. Thus, the detail information of the product that user would like to ask for more information is retrieved via product URI. Our system calculates the similarities among them in order to identify the most suitable friends for seeking information about the mentioned product. The system filters a user's friends according to their score which tells the order of whom can highly likely give the user useful information about a specific product of interest. We have conducted an experiment with a group of respondents in order to verify and evaluate our system. First, the user profile accuracy evaluation is conducted to demonstrate how much our system constructed user profile of product interest represents user's interest correctly. Then, the evaluation on filtering method is made by inspecting the ranked results with human judgment. The results show that our method works effectively and efficiently in filtering. Our system fulfills user needs by supporting user to select appropriate friends for seeking useful information about a specific product that user is curious about. As a result, it helps to influence and convince user in purchase decisions.

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality (지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법)

  • Choi, Sukjae;Lee, Jungwon;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.119-138
    • /
    • 2017
  • Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.