前言 嗨喽!大家好呀,这里是魔王本次必备素材:wkhtmltopdf〔软件〕素材代码第三方库:requestspipinstallrequestsparselpipinstallparselpdfkitpipinstallpdfkit开发环境:版本:python3。8编辑器:pycharm winR输入cmd输入安装命令pipinstall模块名如果出现爆红可能是因为网络连接超时切换国内镜像源采集流程:一。分析想要数据内容,可以从哪里获取 通过开发者工具进行抓包分析,分析之后可得,我们想要数据内容其实就请求导航栏url地址即可二。代码实现步骤:获取多个文章内容(获取所有文章url地址)发送请求,对于文章目录页面发送请求获取数据,获取网页源代码数据文本数据解析数据,提取文章url地址获取文章内容代码发送请求,对于url地址发送请求获取数据,获取网页源代码数据解析数据,提取文章内容保存数据,先保存成html文件,再把html文件转成PDF代码importrequests数据请求模块importparsel数据解析模块importre正则表示importpdfkitimportsubprocessforpageinrange(4,6):urlfhttps:blog。csdn。netfei347795790articlelist{page}确定请求网址headers请求头,主要用于伪装python,防止程序被服务器识别出来headers{useragent:Mozilla5。0(WindowsNT10。0;Win64;x64)AppleWebKit537。36(KHTML,likeGecko)Chrome100。0。4896。88Safari537。36}用requests模块里面get方式发送请求responserequests。get(urlurl,headersheaders)print(response。text)Response〔200〕响应对象200表示请求成功selectorparsel。Selector(response。text)SelectorxpathNonedatahtmllangzhCNheadmeta。。。返回对象css是解析方式之一根据标签属性内容提取数据a::attr(href)获取a标签里面href属性hrefselector。css(articleMeListblogp。articlelistph4a::attr(href))。getall()print(href)forindexinhref:try:print(index)htmldatarequests。get(urlindex,headersheaders)。textselector1parsel。Selector(htmldata)titleselector1。css(articleContentId::text)。get()cmdfC:1Softwareinstallationwkhtmltopdfbinwkhtmltopdf。exe{index}pdf1{title}。pdfsubprocess。run(cmd,shellTrue)exceptExceptionase:print(e)importrequestsurlhttps:blog。csdn。netphoenixwebv1commentsubmitlikeurlhttps:blog。csdn。netphoenixwebv1articlelikeheaders{cookie:uuidttdd10293602884101640936706807857482;gadsID1a4feb23074a346922da76a196cf0001:T1640936708:RT1640936708:SALNIMawGCakjM400IbVY204TvKfKLhDlg;Hmlvte5ef47b9f471504959267fd614d579cd1645514550;gpiUID0000049689281fe2:T1649317424:RT1649317424:SALNIMYlX9R83NQ5EzlFY5UgNF09G45dPw;cdlprid;cdlrid1650090830371447095;cdlfrefhttps:so。csdn。netsosearch;cdlfpagedownloadqq4365171010848772;cdlumdistribute。pcsearchresult。nonetaskblog27Evipall7Efirstrankecpmv17Erankv31ecpm1114898691。nonecase;dcsessionid101650262926080。949004;cfirstrefwww。baidu。com;cfirstpagehttps3Ablog。csdn。netfei347795790category11731395。html;csegment10;Hmlvt6bcd52f51e9b3dce32bec4a3997715ac1650090803,1650095679,1650112607,1650262927;firstDie1;hidelogin1;dcsid70fca81ac8fa563314905c0e38f533b9;unloginscrollstep1650263871780;cprefdefault;SESSIONeb13b53e41e843e0aa6c54811bb65d0c;crefhttps3Ablog。csdn。netfei347795790articledetails110070943;ssxmoditnaQqfx2DB7DDQexCq0LpO8D9i8DORYYrQN7Yd7DlOiQxA5D8D6DQeGTTRdYT1zCepuDQDRgyfKlFpO2GWKk7YawWsUnO4GLDmKDyKAueDxOq0rD74irDDxD39D7PGmDiWZRD721lSgK8DWKDKx0kDY5DwAGDiPD7gFeCB9w1g911pBGd4D1qCvxKBKD9x0CDlPxf9GkDDyf69isyo3EDmb3A1BhDCKDjg71s6YDUeysgaFUj0aAnT5YQxxLQi4Kg0Dt2DK2GYGQpN1nredjDxfsrFTnTqDDpxpywx4D;ssxmoditna2Qqfx2DB7DDQexCq0LpO8D9i8DORYYrQN7YdD6h8iQD0vxLx03qKru2dUOqcnUg8xhCDRoHKH1SQqrUY0iFWAxmRhDFIOD8xod7VS8Bv0m23mlQcq912jIp1r8bM1z9ZgSyzg5CKBhHsmH8BeHiq8wHMDp1prTH5eoO5FE83p976COKCP57q35OWchziuDVBi5KB4GeDIbenWenPaKBYrmQWWek4qqcAFWKnxt0Mu0pK0nDH5MrPa1eVQQxRaZDREMbBYBbi5mb17K13xzFVen8OpHAqwpp5dK4R7caLRTTSb5K91ea5UFt8D4QRiIhqRrfRvYeu3qEY9QQR0z44fKRGxd4eDPiR10huFCIxaBe1UeQB7YnpQcFwEWvPmO4sAHn95OQwbC9HpmTa9ElIP2bcWFkmwB9NEj9ID2xYEaLSiPkWWTiiKaT0bWKAsYGdWnDDgDcIQr4ORGCBGmQPG7O2Y7VmmARgGWWKoqszEmiwB0m7gWRz91NQE4wXTt78wCo3LWZRxCkoO7m1KT4rmvfKxZNITqbgwhrixDKd9D7DYFqeD;UserNameweixin43239784;UserInfob58cf84406a84acebf2c3f36442f1c59;UserTokenb58cf84406a84acebf2c3f36442f1c59;UserNickE697A0E99BA8E0B888E0B8B8E0B98AE0B89A;AU1D5;UNweixin43239784;BT1650268841955;puidU010000;cpageiddefault;dctosraizmv;logIdpv153;Hmlpvt6bcd52f51e9b3dce32bec4a3997715ac1650268904;Hmup6bcd52f51e9b3dce32bec4a3997715ac7B22islogin223A7B22value223A221222C22scope223A17D2C22isonline223A7B22value223A221222C22scope223A17D2C22isvip223A7B22value223A220222C22scope223A17D2C22uid223A7B22value223A22weixin43239784222C22scope223A17D7D;Hmct6bcd52f51e9b3dce32bec4a3997715ac6525110293602884101640936706807857482!57441weixin43239784;logIdview478;logIdclick110,origin:https:blog。csdn。net,referer:https:blog。csdn。netfei347795790articledetails110070943,useragent:Mozilla5。0(WindowsNT10。0;Win64;x64)AppleWebKit537。36(KHTML,likeGecko)Chrome100。0。4896。88Safari537。36,xrequestedwith:XMLHttpRequest,xtingyunid:impGljNfnc;r268943811,}data{commentId:,content:自游老师真帅,articleId:124196275,}likedata{articleId:110070943}responserequests。post(urlurl,datadata,headersheaders)responserequests。post(urllikeurl,datalikedata,headersheaders)print(response)尾语 好了,我的这篇文章写到这里就结束啦! 有更多建议或问题可以评论区或私信我哦!一起加油努力叭() 喜欢就关注一下博主,或点赞收藏评论一下我的文章叭!!!