天天草狠狠干,插插插日日日,91精品日本久久久久久牛牛

前言：paddleocr可以說是最近ocr的主?軍，?家對于paddleocr的認(rèn)可度是相當(dāng)?shù)?，特別是最近推出的輕量級模型，可以識別近80種語?，并且效率是這三種ocr工具種最?的，相同的圖?，paddleocr只需要2秒左右。對于多場景的?語種需求，需要再多多訓(xùn)練模型。paddleocr最?的好處是?檔健全，?持自己訓(xùn)練模型，所以對于?上?多數(shù)的?章來說有很多使?者已經(jīng)基于這個平臺開始訓(xùn)練自己的模型，使?場景?常?泛。

一、介紹

1、什么是OCR?

2、 PaddleOCR

2.1 PP-OCR簡介和特點(diǎn)

2.2 特點(diǎn)

3、模型訓(xùn)練

3.1 文本檢測

3.2 文本識別

3.1 文字方向分類

二、安裝和使用

1、安裝

2、python 識別圖片文字

一、介紹

1、什么是OCR?

光學(xué)字符識別（Optical Character Recognition, OCR），ORC是指對包含文本資料的圖像文件進(jìn)行分析識別處理，獲取文字及版面信息的技術(shù)，檢測圖像中的文本資料，并且識別出文本的內(nèi)容。

那么有哪些應(yīng)用場景呢？

其實(shí)我們?nèi)粘Ｉ钪刑幪幎加衞cr的影子，比如在疫情期間身份證識別錄入信息、車輛車牌號識別、自動駕駛等。我們的生活中，機(jī)器學(xué)習(xí)已經(jīng)越來越多的扮演著重要角色，也不再是神秘的東西。

OCR的技術(shù)路線是什么呢？

ocr的運(yùn)行方式：輸入->圖像預(yù)處理->文字檢測->文本識別->輸出

2、 PaddleOCR

PaddleOCR支持多種與OCR相關(guān)的前沿算法，并在此基礎(chǔ)上開發(fā)了行業(yè)特色模型/解決方案PP-OCR和PP-Structure，打通了數(shù)據(jù)生產(chǎn)、模型訓(xùn)練、壓縮、推理和部署的全流程。

PaddleOCR分為文本檢測、文本識別和方向分類器三部分，其中文本檢測有三個模型，分別是MobileNetV3、ResNet18_vd和ResNet50，其中最常使用的是MobileNetV3模型，整體比較小，適合應(yīng)用于手機(jī)端。文本識別只有一個MobileNetV3預(yù)訓(xùn)練模型。方向分類器使用默認(rèn)的模型。

2.1 PP-OCR簡介和特點(diǎn)

PP-OCR是自主研發(fā)的一款實(shí)用的超輕量級OCR系統(tǒng)，在重新實(shí)現(xiàn)學(xué)術(shù)算法的基礎(chǔ)上，兼顧精度和速度的平衡，進(jìn)行了精簡和優(yōu)化。

PP-OCRv2系統(tǒng)輸送管道如下：

PP-OCR

PP-OCR是一個兩階段的OCR系統(tǒng)，其中文本檢測算法為DB，文本識別算法為CRNN。此外，在檢測和識別模塊之間增加了一個文章方向分類器來處理不同方向的文本。

PP-OCR從骨干網(wǎng)絡(luò)選擇與調(diào)整、預(yù)測頭設(shè)計(jì)、數(shù)據(jù)增強(qiáng)、學(xué)習(xí)率轉(zhuǎn)換策略、正則化參數(shù)選擇、預(yù)訓(xùn)練模型使用、模型自動剪裁與量化等8個方面采用19種有效策略進(jìn)行優(yōu)化瘦身每個模塊的型號（如上圖綠色框所示）。最終的結(jié)果是一個整體大小為3.5M的超輕量級中英文OCR模型和一個2.8M的英文數(shù)字OCR模型。

PP-OCRv2

在PP-OCR的基礎(chǔ)上，PP-OCRv2在五個方面進(jìn)一步優(yōu)化。檢測模型采用CML(Collaborative Mutual Learning)知識蒸餾策略和CopyPaste數(shù)據(jù)擴(kuò)展策略。識別模型采用LCNet輕量級骨干網(wǎng)絡(luò)、U-DML知識蒸餾策略和增強(qiáng)的CTC損失函數(shù)改進(jìn)（如上圖紅框所示），進(jìn)一步提升了推理速度和預(yù)測效果。

PP-OCRv3

PP-OCRv3在PP-OCRv2的基礎(chǔ)上對檢測模型和識別模型進(jìn)行了9個方面的升級：

PP-OCRv3檢測器對PP-OCRv2中提出的CML(Collaborative Mutual Learning)文本檢測策略進(jìn)行了升級，進(jìn)一步優(yōu)化了教師模型和學(xué)生模型的效果。在教師模型的優(yōu)化中，提出了一個大感受野的泛模塊LK-PAN，并采用了DML蒸餾策略；在學(xué)生模型的優(yōu)化中，提出了一種帶有殘差注意機(jī)制的FPN模塊RSE-FPN。
PP-OCRv3 識別器基于文本識別算法SVTR進(jìn)行了優(yōu)化。SVTR通過引入transformers結(jié)構(gòu)不再采用RNN，可以更有效地挖掘文本行圖像的上下文信息，從而提高文本識別能力。PP-OCRv3采用輕量級文本識別網(wǎng)絡(luò)SVTR_LCNet，通過attention引導(dǎo)訓(xùn)練CTC，數(shù)據(jù)增強(qiáng)策略TextConAug，通過自監(jiān)督TextRotNet、UDML（Unified Deep Mutual Learning）和UIM（Unlabeled Images Mining）進(jìn)行更好的預(yù)訓(xùn)練模型來加速模型并提高效果。

2.2 特點(diǎn)

超輕量級PP-OCRv3系列機(jī)型：檢測（3.6M）+方向分類器（1.4M）+識別12M）=17.0M
超輕量級PP-OCRv2系列機(jī)型：檢測（3.1M）+方向分類器（1.4M）+識別8.5M）=13.0M
超輕量級PP-OCR移動系列機(jī)型：檢測（3.0M）+方向分類器（1.4M）+識別（5.0M）=9.4M
一般PP-OCR服務(wù)器系列機(jī)型：檢測（47.1M）+方向分類器（1.4M）+識別（94.9M）=143.4M
支持中文、英文、數(shù)字識別、豎排文本識別、長文本識別
支持多語言識別：韓語、日語、德語、法語等約80種語言

3、模型訓(xùn)練

3.1 文本檢測

1）數(shù)據(jù)和權(quán)重準(zhǔn)備

1.1）數(shù)據(jù)準(zhǔn)備

要準(zhǔn)備數(shù)據(jù)集，請參閱ocr_datasets。

1.2）下載預(yù)訓(xùn)練模型

首先下載預(yù)訓(xùn)練模型。PaddleOCR 的檢測模型目前支持 3 個主干，分別是 MobileNetV3、ResNet18_vd 和 ResNet50_vd。您可以根據(jù)需要使用PaddleClas中的模型來替換骨干。并且骨干預(yù)訓(xùn)練權(quán)重的響應(yīng)下載鏈接可以在（https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.0/README_cn.md#resnet%E5%8F%8A%E5% 85%B6vd%E7%B3%BB%E5%88%97）。


cd PaddleOCR/
# Download the pre-trained model of MobileNetV3
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/MobileNetV3_large_x0_5_pretrained.pdparams
# or, download the pre-trained model of ResNet18_vd
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet18_vd_pretrained.pdparams
# or, download the pre-trained model of ResNet50_vd
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet50_vd_ssld_pretrained.pdparams

2）培訓(xùn)

2.1）開始訓(xùn)練

如果安裝了 CPU 版本，請?jiān)谂渲弥袑?shù)設(shè)置use_gpu為false。


python3 tools/train.py -c configs/det/det_mv3_db.yml  \
         -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained

在上面的說明中，使用-c選擇訓(xùn)練使用configs/det/det_db_mv3.yml配置文件。配置文件的詳細(xì)解釋請參考config。

您也可以-o在不修改 yml 文件的情況下更改訓(xùn)練參數(shù)。例如，將訓(xùn)練學(xué)習(xí)率調(diào)整為 0.0001


# single GPU training
python3 tools/train.py -c configs/det/det_mv3_db.yml -o   \
         Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained  \
         Optimizer.base_lr=0.0001
# multi-GPU training
# Set the GPU ID used by the '--gpus' parameter.
python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
# multi-Node, multi-GPU training
# Set the IPs of your nodes used by the '--ips' parameter. Set the GPU ID used by the '--gpus' parameter.
python3 -m paddle.distributed.launch --ips='xx.xx.xx.xx,xx.xx.xx.xx' --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained

注意：多節(jié)點(diǎn)多GPU訓(xùn)練，需要將ips前面命令中的值替換為自己機(jī)器的地址，并且機(jī)器之間要能ping通。此外，當(dāng)我們開始訓(xùn)練時，它需要在多臺機(jī)器上分別激活命令。查看機(jī)器IP地址的命令是ifconfig。

如果想進(jìn)一步加快訓(xùn)練速度，可以使用自動混合精度訓(xùn)練。對于單卡訓(xùn)練，命令如下：


python3 tools/train.py -c configs/det/det_mv3_db.yml \
     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
     Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True

2.2）加載訓(xùn)練好的模型并繼續(xù)訓(xùn)練

如果您希望加載訓(xùn)練好的模型并再次繼續(xù)訓(xùn)練，您可以將參數(shù)指定Global.checkpoints為要加載的模型路徑。

例如：

python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./your/trained/model

注意：的優(yōu)先級Global.checkpoints高于Global.pretrained_model，即同時指定兩個參數(shù)時，Global.checkpoints會先加載指定的模型。如果指定的模型路徑Global.checkpoints錯誤，Global.pretrained_model將加載指定的模型路徑。

2.3）新骨干訓(xùn)練

network部分完成了網(wǎng)絡(luò)的搭建，PaddleOCR將網(wǎng)絡(luò)分為四個部分，分別在ppocr/modeling下。進(jìn)入網(wǎng)絡(luò)的數(shù)據(jù)會依次經(jīng)過這四個部分（transforms->backbones->ecks->heads）。


├── architectures # Code for building network
├── transforms    # Image Transformation Module
├── backbones     # Feature extraction module
├── necks         # Feature enhancement module
└── heads         # Output module

如果要替換的Backbone在PaddleOCR中有對應(yīng)的實(shí)現(xiàn)，可以直接修改Backbone配置yml文件部分的參數(shù)。

但是，如果您想使用新的 Backbone，更換主干的示例如下：

在ppocr/modeling/backbones文件夾下新建一個文件，例如 my_backbone.py。
在my_backbone.py文件中添加代碼，示例代碼如下：


import paddle
import paddle.nn as nn
import paddle.nn.functional as F
class MyBackbone(nn.Layer):
    def __init__(self, *args, **kwargs):
        super(MyBackbone, self).__init__()
        # your init code
        self.conv = nn.xxxx
    def forward(self, inputs):
        # your network forward
        y = self.conv(inputs)
        return y

在ppocr/modeling/backbones/_ init_ .py文件中導(dǎo)入添加的模塊。

添加網(wǎng)絡(luò)的四部分模塊后，只需要在配置文件中配置即可使用，如：


  Backbone:
    name: MyBackbone
    args1: args1

注意：有關(guān)替換 Backbone 和其他 mudule 的更多詳細(xì)信息可以在doc中找到。

2.4）混合精度訓(xùn)練

如果想進(jìn)一步加快訓(xùn)練速度，可以使用Auto Mixed Precision Training，以單機(jī)單gpu為例，命令如下：


python3 tools/train.py -c configs/det/det_mv3_db.yml \
     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
     Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True

2.5）分布式訓(xùn)練

多機(jī)多gpu訓(xùn)練時，使用--ips參數(shù)設(shè)置使用的機(jī)器IP地址，使用--gpus參數(shù)設(shè)置使用的GPU ID：


python3 -m paddle.distributed.launch --ips='xx.xx.xx.xx,xx.xx.xx.xx' --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_db.yml \
     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained

注意：使用多機(jī)多gpu訓(xùn)練時，需要將上面命令中的ips值替換為自己機(jī)器的地址，機(jī)器需要能夠互相ping通。此外，訓(xùn)練需要在多臺機(jī)器上單獨(dú)啟動。查看機(jī)器ip地址的命令是ifconfig。

2.6）知識蒸餾訓(xùn)練

PaddleOCR 支持知識蒸餾，用于文本檢測訓(xùn)練過程。有關(guān)詳細(xì)信息，請參閱文檔。

2.7）其他平臺訓(xùn)練（Windows/macOS/Linux DCU）

Windows GPU/CPU Windows平臺與Linux平臺略有不同：Windows平臺只支持single gpu訓(xùn)練和推理，訓(xùn)練指定GPUset CUDA_VISIBLE_DEVICES=0 在Windows平臺上，DataLoader只支持單進(jìn)程模式，所以需要設(shè)置num_workers為0；
不支持macOS GPU模式，需要use_gpu在配置文件中設(shè)置為False，其余訓(xùn)練評估預(yù)測命令與Linux GPU完全相同。
Linux DCU 在 DCU 設(shè)備上運(yùn)行需要設(shè)置環(huán)境變量export HIP_VISIBLE_DEVICES=0,1,2,3，其余訓(xùn)練和評估預(yù)測命令與 Linux GPU 完全相同。

3.) 評估與測試

3.10 評價

PaddleOCR 計(jì)算了三個指標(biāo)來評估 OCR 檢測任務(wù)的性能：Precision、Recall 和 Hmean(F-Score)。

運(yùn)行以下代碼計(jì)算評價指標(biāo)。結(jié)果將保存在save_res_path配置文件中指定的測試結(jié)果文件中det_db_mv3.yml

評估時，設(shè)置后處理參數(shù)box_thresh=0.6, unclip_ratio=1.5. 如果您使用不同的數(shù)據(jù)集、不同的模型進(jìn)行訓(xùn)練，則應(yīng)調(diào)整這兩個參數(shù)以獲得更好的結(jié)果。

Global.save_model_dir訓(xùn)練時的模型參數(shù)默認(rèn)保存在該目錄下。評估指標(biāo)時，需要設(shè)置Global.checkpoints指向保存的參數(shù)文件。

python3 tools/eval.py -c configs/det/det_mv3_db.yml  -o Global.checkpoints='{path/to/weights}/best_accuracy' PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=1.5

注意：box_thresh和unclip_ratio是DB后處理所需的參數(shù)，在評估EAST和SAST模型時不需要設(shè)置。

3.2 測試

在單張圖片上測試檢測結(jié)果：

python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img='./doc/imgs_en/img_10.jpg' Global.pretrained_model='./output/det_db/best_accuracy'

測試DB模型時，調(diào)整后處理閾值：

python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img='./doc/imgs_en/img_10.jpg' Global.pretrained_model='./output/det_db/best_accuracy'  PostProcess.box_thresh=0.6 PostProcess.unclip_ratio=2.0

測試文件夾中所有圖片的檢測結(jié)果：

python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img='./doc/imgs_en/' Global.pretrained_model='./output/det_db/best_accuracy'

4.) 推理

推理模型（保存的模型paddle.jit.save）一般是模型訓(xùn)練完成后保存的固化模型，多用于部署中的預(yù)測。

訓(xùn)練過程中保存的模型是checkpoints模型，保存了模型的參數(shù)，多用于恢復(fù)訓(xùn)練。

與檢查點(diǎn)模型相比，推理模型會額外保存模型的結(jié)構(gòu)信息。因此，由于模型結(jié)構(gòu)和模型參數(shù)已經(jīng)固化在推理模型文件中，因此更易于部署，適合與實(shí)際系統(tǒng)集成。

首先，我們可以將 DB 訓(xùn)練模型轉(zhuǎn)換為推理模型：

python3 tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model='./output/det_db/best_accuracy' Global.save_inference_dir='./output/det_db_inference/'

檢測推理模型預(yù)測：

python3 tools/infer/predict_det.py --det_algorithm='DB' --det_model_dir='./output/det_db_inference/' --image_dir='./doc/imgs/' --use_gpu=True

如果是其他檢測算法，比如EAST，需要將det_algorithm參數(shù)修改為EAST，默認(rèn)為DB算法：

python3 tools/infer/predict_det.py --det_algorithm='EAST' --det_model_dir='./output/det_db_inference/' --image_dir='./doc/imgs/' --use_gpu=True

3.2 文本識別

1.）數(shù)據(jù)準(zhǔn)備

1.1）數(shù)據(jù)集準(zhǔn)備

要準(zhǔn)備數(shù)據(jù)集，請參閱ocr_datasets。

PaddleOCR 提供了用于訓(xùn)練 icdar2015 數(shù)據(jù)集的標(biāo)簽文件，可以通過以下方式下載：


# Training set label
wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
# Test Set Label
wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt

PaddleOCR還提供了數(shù)據(jù)格式轉(zhuǎn)換腳本，可以將ICDAR官網(wǎng)標(biāo)簽轉(zhuǎn)換為PaddleOCR支持的數(shù)據(jù)格式。數(shù)據(jù)轉(zhuǎn)換工具在ppocr/utils/gen_label.py，這里以訓(xùn)練集為例：


# convert the official gt to rec_gt_label.txt
python gen_label.py --mode='rec' --input_path='{path/of/origin/label}' --output_label='rec_gt_label.txt'

數(shù)據(jù)格式如下，（a）為原始圖片，（b）為每張圖片對應(yīng)的Ground Truth文本文件：

多語言數(shù)據(jù)集

多語言模型訓(xùn)練方法與中文模型相同。訓(xùn)練數(shù)據(jù)集是 100w 個合成數(shù)據(jù)?？梢允褂靡韵聝煞N方法下載少量字體和測試數(shù)據(jù)。

百度網(wǎng)盤。
谷歌驅(qū)動器

1.2）字典

最后，需要提供一個字典（{word_dict_name}.txt），以便在模型訓(xùn)練時，所有出現(xiàn)的字符都可以映射到字典索引。

因此，字典需要包含您希望正確識別的所有字符。{word_dict_name}.txt 需要按如下格式寫入，并以utf-8編碼格式保存：


l
d
a
d
r
n

在word_dict.txt中，每行有一個單詞，將字符和數(shù)字索引映射在一起，例如“and”將映射到 [2 5 1]

PaddleOCR 內(nèi)置字典，可按需使用。

ppocr/utils/ppocr_keys_v1.txt是一本6623字的漢語詞典。

ppocr/utils/ic15_dict.txt是一本有 63 個字符的英文字典

ppocr/utils/dict/french_dict.txt是一個有 118 個字符的法語詞典

ppocr/utils/dict/japan_dict.txt是一個有 4399 個字符的日語字典

ppocr/utils/dict/korean_dict.txt是一個包含 3636 個字符的韓語詞典

ppocr/utils/dict/german_dict.txt是一個有 131 個字符的德語詞典

ppocr/utils/en_dict.txt是一本有 96 個字符的英文字典

目前多語言模型仍處于演示階段，將繼續(xù)優(yōu)化模型并添加語言。非常歡迎您向我們提供其他語言的字典和字體，如果您愿意，可以將字典文件提交給dict，我們將在 Repo 中感謝您。

要自定義 dict 文件，請修改character_dict_path.configs/rec/rec_icdar15_train.yml

自定義詞典

如果您需要自定義 dic 文件，請?jiān)?configs/rec/rec_icdar15_train.yml 中添加 character_dict_path 字段以指向您的字典路徑。并將 character_type 設(shè)置為 ch。

1.4）添加空間類別

如果要支持space分類識別，請將use_space_charyml文件中的字段設(shè)置為True.

1.5）數(shù)據(jù)增強(qiáng)

PaddleOCR 提供了多種數(shù)據(jù)增強(qiáng)方法。默認(rèn)情況下啟用所有增強(qiáng)方法。

默認(rèn)的擾動方法是：cvtColor、模糊、抖動、Gasuss 噪聲、隨機(jī)裁剪、透視、顏色反轉(zhuǎn)、TIA 增強(qiáng)。

在訓(xùn)練過程中以 40% 的概率選擇每種干擾方法。具體代碼實(shí)現(xiàn)請參考：rec_img_aug.py

2.）培訓(xùn)

PaddleOCR 提供訓(xùn)練腳本、評估腳本和預(yù)測腳本。本節(jié)將以 CRNN 識別模型為例：

2.1）開始訓(xùn)練

首先下載pretrain模型，可以下載訓(xùn)練好的模型在icdar2015數(shù)據(jù)上進(jìn)行finetune：


cd PaddleOCR/
# Download the pre-trained model of en_PP-OCRv3
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_train.tar
# Decompress model parameters
cd pretrain_models
tar -xf en_PP-OCRv3_rec_train.tar && rm -rf en_PP-OCRv3_rec_train.tar

開始訓(xùn)練：


# GPU training Support single card and multi-card training
# Training icdar15 English data and The training log will be automatically saved as train.log under '{save_model_dir}'
#specify the single card training(Long training time, not recommended)
python3 tools/train.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.pretrained_model=en_PP-OCRv3_rec_train/best_accuracy
#specify the card number through --gpus
python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.pretrained_model=en_PP-OCRv3_rec_train/best_accuracy

PaddleOCR 支持交替訓(xùn)練和評估。您可以修改eval_batch_stepinconfigs/rec/rec_icdar15_train.yml以設(shè)置評估頻率。默認(rèn)情況下，每 500 次迭代評估一次，并output/rec_CRNN/best_accuracy在評估過程中保存最佳 acc 模型。

如果評估集很大，測試將很耗時。建議減少評價次數(shù)，或者訓(xùn)練后評價。

提示：您可以使用該-c參數(shù)選擇configs/rec/路徑下的多個模型配置進(jìn)行訓(xùn)練。rec_algorithm支持的識別算法：

訓(xùn)練中文數(shù)據(jù)，推薦使用 ch_PP-OCRv3_rec_distillation.yml。如果想在中文數(shù)據(jù)集上嘗試其他算法的結(jié)果，請參考以下說明修改配置文件：

舉ch_PP-OCRv3_rec_distillation.yml個例子：


Global:
  ...
  # Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary
  character_dict_path: ppocr/utils/ppocr_keys_v1.txt
  # Modify character type
  ...
  # Whether to recognize spaces
  use_space_char: True
Optimizer:
  ...
  # Add learning rate decay strategy
  lr:
    name: Cosine
    learning_rate: 0.001
  ...
...
Train:
  dataset:
    # Type of dataset，we support LMDBDataSet and SimpleDataSet
    name: SimpleDataSet
    # Path of dataset
    data_dir: ./train_data/
    # Path of train list
    label_file_list: ['./train_data/train_list.txt']
    transforms:
      ...
      - RecResizeImg:
          # Modify image_shape to fit long text
          image_shape: [3, 48, 320]
      ...
  loader:
    ...
    # Train batch_size for Single card
    batch_size_per_card: 256
    ...
Eval:
  dataset:
    # Type of dataset，we support LMDBDataSet and SimpleDataSet
    name: SimpleDataSet
    # Path of dataset
    data_dir: ./train_data
    # Path of eval list
    label_file_list: ['./train_data/val_list.txt']
    transforms:
      ...
      - RecResizeImg:
          # Modify image_shape to fit long text
          image_shape: [3, 48, 320]
      ...
  loader:
    # Eval batch_size for Single card
    batch_size_per_card: 256
    ...

請注意，預(yù)測/評估的配置文件必須與訓(xùn)練一致。

2.2）加載訓(xùn)練好的模型并繼續(xù)訓(xùn)練

如果您希望加載訓(xùn)練好的模型并再次繼續(xù)訓(xùn)練，您可以將參數(shù)指定Global.checkpoints為要加載的模型路徑。

例如：

python3 tools/train.py -c configs/rec/rec_icdar15_train.yml -o Global.checkpoints=./your/trained/model

2.3）新骨干訓(xùn)練

如果要替換的Backbone在PaddleOCR中有對應(yīng)的實(shí)現(xiàn)，可以直接修改Backbone配置yml文件部分的參數(shù)。

但是，如果您想使用新的 Backbone，更換主干的示例如下：

在ppocr/modeling/backbones文件夾下新建一個文件，例如 my_backbone.py。
在my_backbone.py文件中添加代碼，示例代碼如下：


import paddle
import paddle.nn as nn
import paddle.nn.functional as F
class MyBackbone(nn.Layer):
    def __init__(self, *args, **kwargs):
        super(MyBackbone, self).__init__()
        # your init code
        self.conv = nn.xxxx
    def forward(self, inputs):
        # your network forward
        y = self.conv(inputs)
        return y

在ppocr/modeling/backbones/_ init_ .py文件中導(dǎo)入添加的模塊。

添加網(wǎng)絡(luò)的四部分模塊后，只需要在配置文件中配置即可使用，如：


  Backbone:
    name: MyBackbone
    args1: args1

注意：有關(guān)替換 Backbone 和其他 mudule 的更多詳細(xì)信息可以在doc中找到。

2.4）混合精度訓(xùn)練

如果想進(jìn)一步加快訓(xùn)練速度，可以使用Auto Mixed Precision Training，以單機(jī)單gpu為例，命令如下：


python3 tools/train.py -c configs/rec/rec_icdar15_train.yml \
     -o Global.pretrained_model=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train \
     Global.use_amp=True Global.scale_loss=1024.0 Global.use_dynamic_loss_scaling=True

2.5）分布式訓(xùn)練

多機(jī)多gpu訓(xùn)練時，使用--ips參數(shù)設(shè)置使用的機(jī)器IP地址，使用--gpus參數(shù)設(shè)置使用的GPU ID：


python3 -m paddle.distributed.launch --ips='xx.xx.xx.xx,xx.xx.xx.xx' --gpus '0,1,2,3' tools/train.py -c configs/rec/rec_icdar15_train.yml \
     -o Global.pretrained_model=./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train

2.6）知識蒸餾訓(xùn)練

PaddleOCR 支持知識蒸餾，用于文本識別訓(xùn)練過程。有關(guān)詳細(xì)信息，請參閱文檔。

2.7）多語言培訓(xùn)

目前PaddleOCR支持的多語言算法有：

配置文件	算法名稱	骨干	反式	序列	預(yù)測	語
rec_chinese_cht_lite_train.yml	神經(jīng)網(wǎng)絡(luò)	Mobilenet_v3 小 0.5	沒有任何	BiLSTM	反恐委員會	中國傳統(tǒng)的
rec_en_lite_train.yml	神經(jīng)網(wǎng)絡(luò)	Mobilenet_v3 小 0.5	沒有任何	BiLSTM	反恐委員會	英文（區(qū)分大小寫）
rec_french_lite_train.yml	神經(jīng)網(wǎng)絡(luò)	Mobilenet_v3 小 0.5	沒有任何	BiLSTM	反恐委員會	法語
rec_ger_lite_train.yml	神經(jīng)網(wǎng)絡(luò)	Mobilenet_v3 小 0.5	沒有任何	BiLSTM	反恐委員會	德語
rec_japan_lite_train.yml	神經(jīng)網(wǎng)絡(luò)	Mobilenet_v3 小 0.5	沒有任何	BiLSTM	反恐委員會	日本人
rec_k??orean_lite_train.yml	神經(jīng)網(wǎng)絡(luò)	Mobilenet_v3 小 0.5	沒有任何	BiLSTM	反恐委員會	韓國人
rec_latin_lite_train.yml	神經(jīng)網(wǎng)絡(luò)	Mobilenet_v3 小 0.5	沒有任何	BiLSTM	反恐委員會	拉丁
rec_arabic_lite_train.yml	神經(jīng)網(wǎng)絡(luò)	Mobilenet_v3 小 0.5	沒有任何	BiLSTM	反恐委員會	阿拉伯
rec_cyrillic_lite_train.yml	神經(jīng)網(wǎng)絡(luò)	Mobilenet_v3 小 0.5	沒有任何	BiLSTM	反恐委員會	西里爾
rec_devanagari_lite_train.yml	神經(jīng)網(wǎng)絡(luò)	Mobilenet_v3 小 0.5	沒有任何	BiLSTM	反恐委員會	梵文

更多支持的語言請參考：多語言模型

如果想在現(xiàn)有模型效果的基礎(chǔ)上進(jìn)行微調(diào)，請參考以下說明修改配置文件：

舉rec_french_lite_train個例子：


Global:
  ...
  # Add a custom dictionary, such as modify the dictionary, please point the path to the new dictionary
  character_dict_path: ./ppocr/utils/dict/french_dict.txt
  ...
  # Whether to recognize spaces
  use_space_char: True
...
Train:
  dataset:
    # Type of dataset，we support LMDBDataSet and SimpleDataSet
    name: SimpleDataSet
    # Path of dataset
    data_dir: ./train_data/
    # Path of train list
    label_file_list: ['./train_data/french_train.txt']
    ...
Eval:
  dataset:
    # Type of dataset，we support LMDBDataSet and SimpleDataSet
    name: SimpleDataSet
    # Path of dataset
    data_dir: ./train_data
    # Path of eval list
    label_file_list: ['./train_data/french_val.txt']
    ...

2.8）其他平臺訓(xùn)練（Windows/macOS/Linux DCU）

Windows GPU/CPU Windows平臺與Linux平臺略有不同：Windows平臺只支持single gpu訓(xùn)練和推理，訓(xùn)練指定GPUset CUDA_VISIBLE_DEVICES=0 在Windows平臺上，DataLoader只支持單進(jìn)程模式，所以需要設(shè)置num_workers為0；
不支持macOS GPU模式，需要use_gpu在配置文件中設(shè)置為False，其余訓(xùn)練評估預(yù)測命令與Linux GPU完全相同。
Linux DCU 在 DCU 設(shè)備上運(yùn)行需要設(shè)置環(huán)境變量export HIP_VISIBLE_DEVICES=0,1,2,3，其余訓(xùn)練和評估預(yù)測命令與 Linux GPU 完全相同。

3.）評估與測試

3.1）評價

Global.save_model_dir訓(xùn)練時的模型參數(shù)默認(rèn)保存在該目錄下。評估指標(biāo)時，需要設(shè)置Global.checkpoints指向保存的參數(shù)文件。可以通過修改文件Eval.dataset.label_file_list中的字段來設(shè)置評估數(shù)據(jù)集configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml。


# GPU evaluation, Global.checkpoints is the weight to be tested
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.checkpoints={path/to/weights}/best_accuracy

3.2）測試

使用 paddleocr 訓(xùn)練的模型，可以通過以下腳本快速得到預(yù)測。

默認(rèn)預(yù)測圖片存儲在中infer_img，訓(xùn)練后的權(quán)重通過指定-o Global.checkpoints：

根據(jù)配置文件中設(shè)置的save_model_dir和save_epoch_step字段，將保存以下參數(shù)：


output/rec/
├── best_accuracy.pdopt  
├── best_accuracy.pdparams  
├── best_accuracy.states  
├── config.yml  
├── iter_epoch_3.pdopt  
├── iter_epoch_3.pdparams  
├── iter_epoch_3.states  
├── latest.pdopt  
├── latest.pdparams  
├── latest.states  
└── train.log

其中，best_accuracy.*是評估集上最好的模型；iter_epoch_x.* 是以為間隔保存的模型save_epoch_step；latest.* 是最后一個 epoch 的模型。


# Predict English results
python3 tools/infer_rec.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.pretrained_model={path/to/weights}/best_accuracy  Global.infer_img=doc/imgs_words/en/word_1.png

輸入圖像：

得到輸入圖像的預(yù)測結(jié)果：


infer_img: doc/imgs_words/en/word_1.png
        result: ('joint', 0.9998967)

用于預(yù)測的配置文件必須與訓(xùn)練一致。比如你用完成了中文模型的訓(xùn)練python3 tools/train.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml，可以使用下面的命令來預(yù)測中文模型：


# Predict Chinese results
python3 tools/infer_rec.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/ch/word_1.jpg

輸入圖像：

得到輸入圖像的預(yù)測結(jié)果：


infer_img: doc/imgs_words/ch/word_1.jpg
        result: ('韓國小館', 0.997218)

4.）推理

推理模型（保存的模型paddle.jit.save）一般是模型訓(xùn)練完成后保存的固化模型，多用于部署中的預(yù)測。

訓(xùn)練過程中保存的模型是checkpoints模型，保存了模型的參數(shù)，多用于恢復(fù)訓(xùn)練。

識別模型轉(zhuǎn)化為推理模型的方式與檢測相同，如下：


# -c Set the training algorithm yml configuration file
# -o Set optional parameters
# Global.pretrained_model parameter Set the training model address to be converted without adding the file suffix .pdmodel, .pdopt or .pdparams.
# Global.save_inference_dir Set the address where the converted model will be saved.
python3 tools/export_model.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.pretrained_model=en_PP-OCRv3_rec_train/best_accuracy  Global.save_inference_dir=./inference/en_PP-OCRv3_rec/

如果您有使用不同字典文件在自己的數(shù)據(jù)集上訓(xùn)練的模型，請確保character_dict_path將配置文件中的修改為您的字典文件路徑。

轉(zhuǎn)換成功后模型保存目錄下有三個文件：


inference/en_PP-OCRv3_rec/
    ├── inference.pdiparams         # The parameter file of recognition inference model
    ├── inference.pdiparams.info    # The parameter information of recognition inference model, which can be ignored
    └── inference.pdmodel           # The program file of recognition model

使用自定義字符字典的文本識別模型推理

如果在訓(xùn)練過程中修改了文本字典，在使用推理模型進(jìn)行預(yù)測時，需要指定使用的字典路徑--rec_char_dict_path
```
python3 tools/infer/predict_rec.py --image_dir='./doc/imgs_words_en/word_336.png' --rec_model_dir='./your inference model' --rec_image_shape='3, 32, 100' --rec_char_dict_path='your text dict path'
```

3.1 文字方向分類

1.）方法介紹

角度分類用于圖像不是0度的場景。在這個場景中，需要對圖片中檢測到的文本行進(jìn)行校正操作。在PaddleOCR系統(tǒng)中，文本檢測后得到的文本行圖像經(jīng)過仿射變換后送入識別模型。這時候只需要對文本進(jìn)行0度和180度角分類，所以內(nèi)置的PaddleOCR文本角度分類器只支持0度和180度分類。如果你想支持更多的角度，你可以自己修改算法來支持。

0度和180度數(shù)據(jù)樣本示例：

2.）數(shù)據(jù)準(zhǔn)備

請按如下方式組織數(shù)據(jù)集：

訓(xùn)練數(shù)據(jù)的默認(rèn)存儲路徑是PaddleOCR/train_data/cls，如果你的磁盤上已經(jīng)有一個數(shù)據(jù)集，只需創(chuàng)建一個指向數(shù)據(jù)集目錄的軟鏈接：

ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/cls/dataset

請參考以下內(nèi)容來整理您的數(shù)據(jù)。

訓(xùn)練集

首先將訓(xùn)練圖像放在同一個文件夾（train_images）中，并使用一個txt文件（cls_gt_train.txt）來存儲圖像路徑和標(biāo)簽。

注意：默認(rèn)情況下，圖片路徑和圖片標(biāo)簽是用分割的\t，如果使用其他方法分割，會導(dǎo)致訓(xùn)練錯誤

0 和 180 分別表示圖像的角度為 0 度和 180 度。


' Image file name           Image annotation '
train/word_001.jpg   0
train/word_002.jpg   180

最終的訓(xùn)練集應(yīng)具有以下文件結(jié)構(gòu)：


|-train_data
    |-cls
        |- cls_gt_train.txt
        |- train
            |- word_001.png
            |- word_002.jpg
            |- word_003.jpg
            | ...

測試集

與訓(xùn)練集類似，測試集也需要提供一個包含所有圖像（測試）的文件夾和一個 cls_gt_test.txt。測試集的結(jié)構(gòu)如下：


|-train_data
    |-cls
        |- cls_gt_test.txt
        |- test
            |- word_001.jpg
            |- word_002.jpg
            |- word_003.jpg
            | ...

3.）培訓(xùn)

將準(zhǔn)備好的txt文件和圖片文件夾路徑寫入配置文件Train/Eval.dataset.label_file_list和Train/Eval.dataset.data_dir字段下，圖片的絕對路徑由Train/Eval.dataset.data_dir字段和txt文件中記錄的圖片名稱組成。

PaddleOCR 提供訓(xùn)練腳本、評估腳本和預(yù)測腳本。

開始訓(xùn)練：


# Set PYTHONPATH path
export PYTHONPATH=$PYTHONPATH:.
# GPU training Support single card and multi-card training, specify the card number through --gpus.
# Start training, the following command has been written into the train.sh file, just modify the configuration file path in the file
python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7'  tools/train.py -c configs/cls/cls_mv3.yml

數(shù)據(jù)增強(qiáng)

PaddleOCR 提供了多種數(shù)據(jù)增強(qiáng)方法。如果您想在訓(xùn)練過程中添加干擾，請取消注釋配置文件中的RecAug和RandAugment字段Train.dataset.transforms。

默認(rèn)的擾動方法有：cvtColor、模糊、抖動、高斯噪聲、隨機(jī)裁剪、透視、顏色反轉(zhuǎn)、RandAugment。

除 RandAugment 外，在訓(xùn)練過程中以 50% 的概率選擇每種干擾方法。具體代碼實(shí)現(xiàn)請參考： rec_img_aug.py randaugment.py

訓(xùn)練

PaddleOCR 支持交替訓(xùn)練和評估。您可以修改eval_batch_stepinconfigs/cls/cls_mv3.yml以設(shè)置評估頻率。默認(rèn)情況下，每 1000 次迭代對其進(jìn)行評估。訓(xùn)練期間將保存以下內(nèi)容：


├── best_accuracy.pdopt # Optimizer parameters for the best model
├── best_accuracy.pdparams # Parameters of the best model
├── best_accuracy.states # Metric info and epochs of the best model
├── config.yml # Configuration file for this experiment
├── latest.pdopt # Optimizer parameters for the latest model
├── latest.pdparams # Parameters of the latest model
├── latest.states # Metric info and epochs of the latest model
└── train.log # Training log

如果評估集很大，測試將很耗時。建議減少評價次數(shù)，或者訓(xùn)練后評價。

請注意，預(yù)測/評估的配置文件必須與訓(xùn)練一致。

4.）評價

可以通過修改文件Eval.dataset.label_file_list中的字段來設(shè)置評估數(shù)據(jù)集configs/cls/cls_mv3.yml。


export CUDA_VISIBLE_DEVICES=0
# GPU evaluation, Global.checkpoints is the weight to be tested
python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy

5.）預(yù)測

訓(xùn)練引擎預(yù)測

使用 paddleocr 訓(xùn)練的模型，可以通過以下腳本快速得到預(yù)測。

用于Global.infer_img指定預(yù)測圖片或文件夾的路徑，Global.checkpoints用于指定權(quán)重：


# Predict English results
python3 tools/infer_cls.py -c configs/cls/cls_mv3.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.load_static_weights=false Global.infer_img=doc/imgs_words_en/word_10.png

輸入圖像：

得到輸入圖像的預(yù)測結(jié)果：


infer_img: doc/imgs_words_en/word_10.png
     result: ('0', 0.9999995)

二、安裝和使用

1、安裝

我安裝的環(huán)境是 ubuntu18.04、python 3.7 和 pip 22.1.2，python環(huán)境至少要是3以上，pip版本最好也高一些，不然安裝過程中會有很多錯誤，提升pip版本的命令也在下面：


sudo apt install python3.7 python3.7-dev  //先安裝python環(huán)境和依賴包
sudo apt install python3-pip //安裝pip3
sudo pip3 install --upgrade pip  //提升pip版本
pip3 install -i https://mirror.baidu.com/pypi/simple cmake //此CMake是一個開源、跨平臺的工具系列，旨在構(gòu)建、測試和打包軟件
pip3 install -i https://mirror.baidu.com/pypi/simple paddlepaddle //此處需要先安裝此工具，如果系統(tǒng)有g(shù)pu硬件可安裝paddlepaddle-gpu
pip3 install -i https://mirror.baidu.com/pypi/simple paddleocr==2.4 //此處我選的版本是2.4，官網(wǎng)上建議大于等于2.0.1即可

另外安裝過程中可能會有很多下面的錯誤，就是因?yàn)楫?dāng)前環(huán)境的版本低的原因，只需要單獨(dú)安裝此python擴(kuò)展即可，比如下面的軟件是 scikit-learn，只需要稍微降低一下版本即可，執(zhí)行

pip3 install -i https://mirror.baidu.com/pypi/simple scikit-learn==1.0  //“==” 后面是版本號，存在的版本號只需要輸入10000（不存在的版本號）就會全部出來

paddleorc安裝可能會出來很多其他問題，百度搜索大部分的問題都會有答案。

2、python 識別圖片文字

識別圖片：

代碼：


#!/user/bin/env python
# coding=utf-8
from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='ch') # need to run only once to download and load model into memory
img_path = './file/aa.png'
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)
# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='./fonts/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')

識別結(jié)果和繪制結(jié)果圖：

繪制結(jié)果圖需要安裝下面的相關(guān)庫和有 ttf 中文文件包，我此處使用的仿宋體。

本站僅提供存儲服務(wù)，所有內(nèi)容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點(diǎn)擊舉報。

九色国产,午夜在线视频,新黄色网址,九九色综合,天天做夜夜做久久做狠狠,天天躁夜夜躁狠狠躁2021a,久久不卡一区二区三区

一、介紹

1、什么是OCR?

2、 PaddleOCR

2.1 PP-OCR簡介和特點(diǎn)

2.2 特點(diǎn)

3、模型訓(xùn)練

3.1 文本檢測

3.2 文本識別

3.1 文字方向分類

二、安裝和使用

1、安裝

2、python 識別圖片文字

一、介紹

1、什么是OCR?

2、 PaddleOCR

1、安裝

2、python 識別圖片文字