Crack detection is an important measure in the field of structural health monitoring. However, visual crack detection is labor-intensive, time-consuming, inefficient, and expensive. Although image-based detection and processing provides an efficient way for structural crack detection, its accuracy depends on image quality. For engineering structures, especially bridges, the change of light conditions and the difference of surface characteristics of structural components pose a major challenge to traditional crack detection methods. In this paper, a novel crack detection method based on convolutional neural networks is proposed. The development of this method is divided into the following stages. The initial automated crack classification is carried out by using MobileNetV3, and then the improved DeepLabv3+ network is used to segment the classified crack image semantically accurately. Finally, the real crack image is used for verification. To verify the proposed method, several conventional deep learning networks are trained and compared. The improved DeepLabV3+ integrates MobileNetV3 as its feature extraction backbone and incorporates the convolutional block attention module, which achieves 87.79% average intersection and 93.87% average pixel accuracy on public and real data sets. Compared with traditional models such as VGG16, the proposed method shortens the training time by more than 80% while maintaining high detection accuracy. In addition, the compact parameter configuration and moderate model size make it particularly suitable for deployment on mobile detection devices.