使用python爬取某购物网站的评论信息: 客户提供的URL信息为TXT文件: 代码实现: 如有疑问及其它需求可私信我,谢谢!导入库frombs4importBeautifulSoupimportjsonimportrandomimportrequestsimportreimportosimportsysimportpandasaspdfromtimeimportsleep伪装浏览器信息,使用requests采集网页信息defgethtml(url):headers1{Accept:texthtml,applicationxhtmlxml,applicationxml;q0。9,imageavif,imagewebp,imageapng,;q0。8,applicationsignedexchange;vb3;q0。9,AcceptEncoding:gzip,deflate,br,AcceptLanguage:zhCN,zh;q0。9,CacheControl:maxage0,Connection:keepalive,Cookie:sessionid13217203679562261;sessionidtime2082787201l;i18nprefsUSD;lcmainzhCN;spcdnL5Z9:CN;ubidmain13364208467804925;sessiontokenO3UlULvvCRokkDroX8qSnLtxqVwN7eEzOnwXRMPb4n49t7LOhY0X9ZXCylrXR8E2QuCTUFFBiaepsfckFqKkhgenyXoxaqah3pyrHnEr0dof1qgLBnKiBvaOmOR81saNG1R6edkbXZMHQlyVOWclSZCAQE3hesiWntIIGpTLqTZWWvVSxpgTkpBxo7kTcFO6ouRwaKQvx5sngUjRCGoGTnhf6GtRQKWF4yRnhdDw;csmhittb:48HQZ7V78BDK4P2A1E0Zs48HQZ7V78BDK4P2A1E0Z1674827897763t:1674827897763adb:adblkno,secchua:Chromium;v104,NotA;Brand;v99,GoogleChrome;v104,secchuamobile:?0,secchuaplatform:Windows,SecFetchDest:document,SecFetchMode:navigate,SecFetchSite:none,SecFetchUser:?1,UpgradeInsecureRequests:1,UserAgent:Mozilla5。0(WindowsNT10。0;Win64;x64)AppleWebKit537。36(KHTML,likeGecko)Chrome104。0。0。0Safari537。36}resultrequests。get(url,headersheaders1)ifresult。statuscode200:Theresponseissuccessfulandastringisreturnedprint(获取网页:,url,成功)returnresult。textreturn新建excel模板dfpd。DataFrame({url:〔〕,bookname:〔〕,author:〔〕,allgrade:〔〕,grade:〔〕,name:〔〕,title:〔〕,review:〔〕})读取网址fopen(url。txt,r,encodingutf8)flstf。readlines()row0遍历网址foruinflst:print(u)urlre。findall(https:。,u)iflen(url)0:print(获取网页:,url〔0〕)获取网页内容htmlgethtml(url〔0〕)bsSoupBeautifulSoup(html,html。parser)抓取评论awardbsSoup。find(p,classafixedleftgridcolaokaligncenteracolright)gradeaward。textprint(grade)抓取书名award1bsSoup。find(p,classafixedleftgridcolproductinfoacolright)booknameaward1。textprint(bookname)抓取作者award3bsSoup。find(p,classarowproductbyline)authoraward3。textprint(author)全球评分award4bsSoup。find(p,classarowaspacingmediumaverageStarRatingNumerical)allgradeaward4。textprint(allgrade)抓取评论信息award2bsSoup。find(p,classasectionaspacingnonereviewviewscelwidget)pinglunlstaward2。findall(p,classasectioncelwidget)foriinpinglunlst:print(i。find(p,classaprofilecontent)。text)print(i。find(p,classarowaspacingsmallreviewdata)。text)print(i。find(a,classasizebasealinknormalreviewtitleacolorbasereviewtitlecontentatextbold)。text)df。loc〔row,url〕url〔0〕df。loc〔row,author〕authordf。loc〔row,grade〕grade。split(,)〔0〕df。loc〔row,allgrade〕allgradedf。loc〔row,bookname〕booknamedf。loc〔row,name〕i。find(p,classaprofilecontent)。texttry:df。loc〔row,title〕i。find(a,classasizebasealinknormalreviewtitleacolorbasereviewtitlecontentatextbold)。text。replace(,)except:df。loc〔row,title〕i。find(span,classasizebasereviewtitleacolorbasereviewtitlecontentatextbold)。text。replace(,)df。loc〔row,review〕i。find(p,classarowaspacingsmallreviewdata)。text。replace(,)row1sleep(2)df。toexcel(book。xlsx) 整体采集比较简单,使用开发者模式找到网页对应信息的标签值,使用BeautifulSoup实现,最终结果使用Pandas保存成excel文件 如有疑问及其它需求可私信我,谢谢!