Browse > Article
http://dx.doi.org/10.36498/kbigdt.2022.7.2.1

Overseas Address Data Quality Verification Technique using Artificial Intelligence Reflecting the Characteristics of Administrative System  

Jin-Sil Kim (충북대학교 대학원 빅데이터학과)
Kyung-Hee Lee (충북대학교 경영정보학과)
Wan-Sup Cho (충북대학교 대학원 빅데이터학과)
Publication Information
The Journal of Bigdata / v.7, no.2, 2022 , pp. 1-9 More about this Journal
Abstract
In the global era, the importance of imported food safety management is increasing. Address information of overseas food companies is key information for imported food safety management, and must be verified for prompt response and follow-up management in the event of a food risk. However, because each country's address system is different, one verification system cannot verify the addresses of all countries. Also, the purpose of address verification may be different depending on the field used. In this paper, we deal with the problem of classifying a given overseas food business address into the administrative district level of the country. This is because, in the event of harm to imported food, it is necessary to find the administrative district level from the address of the relevant company, and based on this trace the food distribution route or take measures to ban imports. However, in some countries the administrative district level name is omitted from the address, and the same place name is used repeatedly in several administrative district levels, so it is not easy to accurately classify the administrative district level from the address. In this study we propose a deep learning-based administrative district level classification model suitable for this case, and verify the actual address data of overseas food companies. Specifically, a method of training using a label powerset in a multi-label classification model is used. To verify the proposed method, the accuracy was verified for the addresses of overseas manufacturing companies in Ecuador and Vietnam registered with the Ministry of Food and Drug Safety, and the accuracy was improved by 28.1% and 13%, respectively, compared to the existing classification model.
Keywords
Multi-label Classification; Text Classification; Deep Learning; RNN; LSTM;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Luaces, O., Diez, J., Barranquero, J., del Coz, J. J., & Bahamonde, A., "Binary relevance efficacy for multilabel classification". Progress in Artificial Intelligence, 1(4), 303-313. 2012.    DOI
2 Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. "Dropout: a simple way to prevent neural networks from overfitting". The journal of machine learning research, 15(1), pp. 1929-1958., 2014.
3 식약처, https://www.mfds.go.kr/index.do 
4 Soeng, Saravit, Jin-Hyun Bae, Kyung-Hee Lee, and Wan-Sup Cho, "Deep Learning Based Improvement in Overseas Manufacturer Address Quality Using Administrative District Data", Applied Sciences 12, no. 21: 11129, 2022, https://doi.org/10.3390/app122111129    DOI
5 양광, 수입식품 안전을 위한 해외기업 정보검증 도구 설계 및 구현, 충북대학교 석사학위논문, 2022. 
6 Peter Christen and Daniel Belacic, "Automated Probabilistic Address Standardisation and Verification", In Proc. 4th Australasian Data Mining Conference - AusDM05, 2005. 
7 N. Abid, A. ul Hasan and F. Shafait, "DeepParse: A Trainable Postal Address Parser," 2018 Digital Image Computing: Techniques and Applications (DICTA), pp. 1-8, 2018, doi: 10.1109/DICTA.2018.8615844.    DOI
8 민경현, 송재영, 유기윤, 김지영, "단어 임베딩과 어텐션 기반의 딥러닝 모델을 활용한 장소정보 탐지 기법". 대한공간정보학회지, 제27권, 5호, pp. 33-39, 2019. 
9 Szymanski, P., & Kajdanowicz, T. "scikit-multilearn: A scikit-based Python environment for performing multi-label classification". Journal of Machine Learning Research, 20, pp.1-22, 2019. 
10 Zhan g, M. L., Li, Y. K., Liu, X. Y., & Gen g, X. "Binary Relevance for Multi-Label Learning: An Overview". Frontiers of Computer Science, 12(2), 191-202, 2018.    DOI