Deep learning-based methods, especially convolutional neural networks (CNNs), have shown their effectiveness for image classification. In this paper, vision transformer technology is used to classify the surface defects of processed bamboo, which can be more quick and accurate compared with the low efficiency of manual identification. In the first step, we replace the activation function from Gelu to Mish in the encoder part, but the classification performance is not satisfied. Then, to get a better classification results, we keep the original activation function and introduce the DropBlock. Compared with dropout, DropBlock can obtain better classification accuracy. Finally, compared with the results after transfer learning, it is proved that replacing dropout with DropBlock can improve the classification accuracy. The results on the bamboo chip datasets show that the accuracy of this method is 2% higher than the original transformer network whether using transfer learning.