本次通过猫眼电影,对春节贺岁大片【满江红】进行数据分析。而本次我们通过动态接口形式获取评论信息,静态HTML解析需要额外的字体解析,网上的教程也已经很全了,有兴趣的小伙伴们也可以多多冲浪或和本人探讨哈! 满江红影图一、接口分析 1。目标站点:猫眼H5 接口列表 2。通过滑动查看评论信息,或点击评论进入评论子页面滑动,即可抓取到相关接口(浏览器F12工具中只能抓取到子评论接口,如果要整个评论的需要抓包工具配合或使用手机抓包) 接口详情 3。评论接口(已加密处理) aHR0cHM6Ly9tLm1hb3lhbi5jb20vYXBvbGxvL2Fwb2xsb2FwaS9tbWRiL3JlcGxpZXMvY29tbWVudC8xMTY3MTI5MDg5Lmpzb24X3ZfPXllcyZvZmZzZXQ9NDA 二、响应分析通过子评论接口,可以分析出来相关字段(昵称、性别、评分、评论内容、评论点赞量、用户等级等){cmts:〔{approve:0,assistAwardInfo:{avatar:,celebrityId:0,celebrityName:,rank:0,title:},avatarurl:https:img。meituan。netmaoyanusere6f7600fa2980a929accb602fde5abaa2776。jpg,channelId:70001,content:在电影院看真的很有氛围!背景音乐也很加分,deleted:false,id:1171602285,ipLocName:福建,nickName:腿小菇,time:2023022710:24,userId:1322748722,userLevel:3,vipInfo:,vipType:0}〕,ocm:{approve:8657,approved:false,assistAwardInfo:{avatar:,celebrityId:0,celebrityName:,rank:0,title:},authInfo:,avatarurl:https:img。meituan。netavatar66fb6e3ef190201864c732a03b5d9be924014。jpg,content:刚看完满江红,真的好看,这是我看过最值的一部电影,反转反转再反转,真的是永远想不到下一步是什么,而且还很搞笑,搞笑又宏伟,真的描述不出来这个电影的好,都给我去看!满江红!入股不亏!!!!,id:1167129089,ipLocName:辽宁,isMajor:false,juryLevel:0,majorType:0,mvid:1462626,nick:Gpc126688235,nickName:Gpc126688235,oppose:0,pro:false,reply:680,score:5,spoiler:0,supportComment:true,supportLike:true,sureViewed:1,tagList:{fixed:〔{id:1,name:购票好评},{id:4,name:购票},{id:6,name:优质评价}〕},time:2023012212:19,userId:3164097169,userLevel:2,videoDuration:0,vipInfo:,vipType:0},total:60} 2。完整comment接口响应示例{data:{hotIds:〔1167280609,1167187803〕,total:16521,comments:〔{avatarUrl:https:img。meituan。netmaoyanuser80cdf9a184d40eb9ecc0e5d170f3e45d11928。png,buyTicket:false,channelId:3,content:还行吧,没有看开心,delete:false,follow:false,gender:1,id:1171756165,imageUrls:〔〕,ipLocName:山东,likedByCurrentUser:false,major:false,movie:{id:0,sc:0},movieId:1462626,nick:淘嘉豪,replyCount:0,score:9,showApprove:false,showVote:false,spoiler:false,startTime:1677923460000,tagList:〔{id:1,name:购票好评},{id:4,name:购票}〕,time:1677923460000,ugcType:11,upCount:0,userId:71317227,userLevel:2,vipType:0},〕,t2total:0,myComment:{}},paging:{},ts:1677956823197}三、数据解析构造请求头,模拟数据请求defgetfilmdata(offset0,filenamefilm):urlfaHR0cHM6Ly9tLm1hb3lhbi5jb20vYXBvbGxvL2Fwb2xsb2FwaS9tbWRiL3JlcGxpZXMvY29tbWVudC8xMTY3MTI5MDg5Lmpzb24X3ZfPXllcyZvZmZzZXQ9NDAheaders{UserAgent:Mozilla5。0(iPhone;CPUiPhoneOS110likeMacOSX)AppleWebKit604。1。38(KHTML,likeGecko)Version11。0Mobile15A372Safari604。1}cookies{uuidnv:v1,iuuid:942C12B0DF4311E9ADA9C1C3B540BA45F066B2B3028841B8A0BC3544E4C0AD17,ci:12CE58C97E4BAAC,lxsdkcuid:16d6c9b401ec80c6c86354bd8a95b12321110020016d6c9b401ec8,webp:true,lxsdk:942C12B0DF4311E9ADA9C1C3B540BA45F066B2B3028841B8A0BC3544E4C0AD17}开始页面请求,返回响应内容responserequests。get(url,headersheaders,cookiescookies)。json()总评论数totalresponse〔total〕print(total)评论信息列表cmtsresponse〔cmts〕pprint(cmts)forcommentincmts:data〔〕评论ididcomment〔id〕评论内容contentcomment〔content〕用户昵称nickNamecomment〔nickName〕用户评分scorecomment〔score〕评论时间startTimecomment〔time〕用户iduserIdcomment〔userId〕用户等级userLevelcomment〔userLevel〕用户性别gendercomment。get(gender,None)data〔nickName〕nickNamedata〔gender〕genderdata〔score〕scoredata〔content〕contentdata〔userId〕userIddata〔userLevel〕userLevelsavedatacsv(data,filename)returntotal 2。数据存储(这里为以csv演示)defsavedatacsv(data,filename):withopen(filename,a,encodingutf8sig,newline)asfp:创建写对象writercsv。writer(fp)title〔nickName,gender,score,content,userId,userLevel〕解决循环存储,表头重复问题withopen(filename,r,encodingutf8sig,newline)asfp:创建读对象readercsv。reader(fp)ifnot〔rowforrowinreader〕:writer。writerow(title)writer。writerow(〔data〔i〕foriintitle〕)else:writer。writerow(〔data〔i〕foriintitle〕)print(10保存完毕10) 影评结果四、数据可视化影评分词defwordcloudanalysis(filename):dfpd。readcsv(filename,encodingutf8)contentdf〔content〕。tostring()开始分词使用jieba进行精确分词获取词语列表wordsjieba。lcut(content)使用空格拼接获得字符串words。join(words)生成词云读取图片,生成图片形状maskpicnp。array(Image。open(1。jpg))wordscloudWordCloud(backgroundcolorwhite,词云图片的背景颜色width800,height600,词云图片的宽度,默认400像素;词云图片的高度,默认200像素fontpathmsyh。ttf,词云指定字体文件的完整路径maxwords200,词云图中最大词数,默认200maxfontsize80,词云图中最大的字体字号,默认None,根据高度自动调节minfontsize词云图中最小的字体字号,默认号fontstep1,词云图中字号步进间隔,默认randomstate30,设置有多少种随机生成状态,即有多少种配色方案maskmaskpic词云形状,默认None,即方形图)。generate(words)有jieba分词拼接的字符串生成词云wordscloud。tofile(comment。png)保存词云为图片使用plt显示词云plt。imshow(wordscloud,interpolationbilinear)消除坐标轴plt。axis(off)plt。show() 分词 2。观看人群性别及评分占比分析(由于取得部分数据,不代表最终现实结果,勿纠)defgenderpieanalysis(filename):dfpd。readcsv(filename,encodingutf8)print(df)1。观看人群性别genderdf〔gender〕。valuecounts()print(gender)饼图,标题:观看人群性别占比调用自定义饼图函数创建画布和轴fig,axplt。subplots(figsize(6,6),dpi100)plt。figure()size0。5labelsdata。indexax。pie(gender,labels〔女,男,未知〕,startangle90,autopct。1f,colorssns。colorpalette(husl,len(gender)),radius1,饼图半径,默认为1pctdistance0。75,控制百分比显示位置wedgepropsdict(widthsize,edgecolorw),控制甜甜圈的宽度textpropsdict(fontsize10)控制字号及颜色)ax。settitle(【满江红】观看人群性别占比,fontsize15)plt。title(title)plt。show() 性别占比 评分占比 3。用户等级分析defuserlevelbaranalysis(filename):dfpd。readcsv(filename,encodingutf8)print(df)userLeveldf〔userLevel〕。valuecounts()。sortindex()print(userLevel)xuserLevel。indexyuserLevelfig,axplt。subplots()plt。bar(x,y,colorDE85B5)柱状图标题plt。title(评论用户等级数量分布柱状图)plt。grid(True,axisy,alpha1)fori,jinzip(x,y):plt。text(i,j,dj,horizontalalignmentcenter,)ax。spines〔right〕。setvisible(False)ax。spines〔top〕。setvisible(False)plt。show() 等级数量分布 该篇文章只是从评分角度去做的数据分析,其实还可以从影视类型、年度电影Top、票房等角度进一步做数据分析。该篇文章来自本人知乎号:梓羽Python 文章链接:https:zhuanlan。zhihu。comp611295606