PTQ(PostTrainingQuantization)源码阅读一 最近在做模型量化相关工作,就研究下PTQ的原理和代码实现。PTQ原理部分已经有很多文章讲的都很好,有时间的话后面自己总结一篇原理篇。本文主要从PTQ代码实现来阐述。 讲解代码前我们先看下PTQ的使用:loadmodelmodelloadmodel(modelpath)model。eval()registerquanthandlehookinforwardposthooksptqPTQ()modelptq。quantize(model)calibrationforkey,inputinreader:model(input)computequantparamsptq。ptq。convert(model)savequantmodeljit。save(model,quantmodelpath) 我们先看下如何收集activation量化信息。ImperativePTQclassImperativePTQ(object):Staticposttrainingquantization。definit(self,quantconfigptqconfig。defaultptqconfig):Constructor。Args:quantconfig(PTQConfig):theconfigofposttrainingquantization。Theconfighasweightquantizerandactivationquantizer。Indefault,theweightquantizerisPerChannelAbsmaxQuantizerandtheactivationquantizerisKLQuantizer。super()。init()assertisinstance(quantconfig,ptqconfig。PTQConfig)self。quantconfigquantconfig ImperativePTQ是PTQ的实现类。输出参数为quantconfig,主要指明weightactivation的量化方法。默认的activationquantizer使用KLQuantizer,weightquntizer使用PerChannelAbsmaxQuantizer。classPTQConfig(object):ThePTQconfigshowshowtoquantizetheinputsandoutputs。definit(self,activationquantizer,weightquantizer):Constructor。Args:activationquantizer(BaseQuantizer):Theactivationquantizer。ItshouldbetheinstanceofBaseQuantizer。weightquantizer(BaseQuantizer):Theweightquantizer。ItshouldbetheinstanceofBaseQuantizer。super()。init()assertisinstance(activationquantizer,tuple(SUPPORTACTQUANTIZERS))assertisinstance(weightquantizer,tuple(SUPPORTWTQUANTIZERS))self。inactquantizercopy。deepcopy(activationquantizer)self。outactquantizercopy。deepcopy(activationquantizer)self。wtquantizercopy。deepcopy(weightquantizer)self。quanthookhandleNoneInordertowrapsimulatedlayers,useinactquantizertocalculatetheinputthresholdsforconv2d,linearandetc。self。enableinactquantizerFalsedefaultptqconfigPTQConfig(KLQuantizer(),PerChannelAbsmaxQuantizer()) 其中quanthookhandle是Layer的fowardposthook的handle。 enableinactquantizer是否使用inactquantizer计算输入激活的量化参数。activation默认使用KLQuantizer量化器。weight默认使用PerChannelAbsmaxQuantizer量化器。isskiplayer和isquantlayer 模型一般是一层一层堆叠起来的,框架提供的nn。Conv2d,nn。Linear层一般作为基础层来搭建模型网络。量化时我们需要知道哪些层需要量化,哪些层不需要量化。可以通过isskiplayer和isquantlayer两个静态类方法获得。staticmethoddefisskiplayer(layer):returnhasattr(layer,skipquant)andlayer。skipquantTruestaticmethoddefisquantlayer(layer):returnhasattr(layer,quantconfig)isleaflayerdefisleaflayer(layer):Whetherthelayerisleaflayer。returnisinstance(layer,paddle。nn。Layer)andlen(layer。sublayers())0 layer的sublayers空时为叶子节点。quantizedefquantize(self,model,inplaceFalse,fuseFalse,fuselistNone):Addquantconfigandhooktothetargetlayer。Args:model(paddle。nn。Layer):Themodeltobequantized。inplace(bool):Whetherapplyquantizationtotheinputmodel。Default:False。fuse(bool):Whethertofuselayers。Default:False。fuselist(list):Thelayersnamestobefused。Forexample,fuselist〔〔conv1,bn1〕,〔conv2,bn2〕〕。ATypeErrorwouldberaisediffusewassetasTruebutfuselistwasNone。Default:None。Returnquantizedmodel(paddle。nn。Layer):Thequantizedmodel。assertisinstance(model,paddle。nn。Layer),Themodelmustbetheinstanceofpaddle。nn。Layer。ifnotinplace:modelcopy。deepcopy(model)iffuse:model。eval()modelfuseutils。fuselayers(model,fuselist) 我们看下模型量化的入口,model是模型实例,inplace指明是否在原图上操作,fuse和fuselist用户指定是否对模型做fuse操作。该接口最终返经过处理(用于收集模型各层activation的信息)后的模型。forname,layerinmodel。namedsublayers():if(PTQRegistry。issupportedlayer(layer)andutils。isleaflayer(layer)andnotself。isskiplayer(layer)):Addquantconfigquantconfigcopy。deepcopy(self。quantconfig)ifPTQRegistry。issimulatedquantlayer(layer):quantactivationquantconfig。enableinactquantizerTruelayer。quantconfigquantconfigregisterhookhookptqhooks。quantforwardposthookquanthookhandlelayer。registerforwardposthook(hook)quantconfig。quanthookhandlequanthookhandlelayer。forwardposthooks。movetoend(quanthookhandle。hookid,lastFalse)returnmodel 首先遍历各层,判断该层:是否支持量化。是否是叶子层。是否跳过该层。 PTQRegistry是一个字典,后续再看下其实现。 如果满足上述条件,则对该层添加量化处理:层中保存量化配置参数quantconfig。如果是模拟量化层(针对inputweight量化)的话,开启enableinactquantizer。再层中注册registerforwardposthook,其实现为ptqhooks。quantforwardposthook。 我们看下quantforwardposthook的实现:defquantforwardposthook(layer,inputs,outputs):TheforwardposthookforPTQ。asserthasattr(layer,quantconfig),Thelayershouldhavequantconfigattrqclayer。quantconfigifqc。enableinactquantizer:qc。inactquantizer。sampledata(layer,inputs)qc。outactquantizer。sampledata(layer,(outputs,)) 在forward完成后,通过qc。outactquantizer收集outputs的activation数据。 根据qc。enableinactquantizer的配置确定是否收集inputs的activation数据。 我们知道,只有PTQRegistry。issimulatedquantlayer(layer)真(目前只有nn。Conv2Dnn。Linaer时为真)的时候qc。enableinactquantizer为真。 KLQuantizer、PerChannelAbsmaxQuantizer的实现我们后面再讨论。 至此,处理完各层后返回model对象。后续使用校准数据过model,收集activation分布。