frompandasimportSeries,DataFrameimportpandasaspdimportnumpyasnpimportmatplotlib。pyplotaspltimportmatplotlibasmplimportseabornassns导入seaborn库,并取别名为snsmatplotlibinline在Ipython编译器里直接使用,功能是可以内嵌绘图,并且可以省略掉plt。show()这一步 In〔2〕:pd。setoption(mode。chainedassignment,None)关闭警告 1、从github上下载这个文件,这是官方给的范例数据库:https:github。commwaskomseaborndata2、找到loaddataset()在本地的数据库地址。getdatahome()函数的作用就是获取loaddataset()的数据库地址。sns。utils。getdatahome()之后就会出现已下形式的地址 你的驱动器:Users你的用户名seaborndata例如:‘C:Usersuser1seaborndata’3、将下载的文件夹解压,然后把里面的内容复制到数据库地址下。 In〔3〕:tipssns。loaddataset(tips)loaddataset(tips)函数默认首先从本地库调取tips。csv文件tips。head() Out〔3〕: totalbill tip sex smoker day time size 0hr16。99 1。01 Female No Sun Dinner 2hr1hr10。34 1。66 Male No Sun Dinner 3hr2hr21。01 3。50 Male No Sun Dinner 3hr3hr23。68 3。31 Male No Sun Dinner 2hr4hr24。59 3。61 Female No Sun Dinner 4数据分组groupby分组 In〔4〕:groupedtips〔tip〕。groupby(tips〔sex〕)grouped返回的grouped为GroupBy对象,是保存的中间数据, Out〔4〕:pandas。core。groupby。SeriesGroupByobjectat0x000000000BCF8160 In〔5〕:grouped。mean()对该对象调用mean方法即可返回数据 Out〔5〕:sexMale3。089618Female2。833448Name:tip,dtype:float64 In〔7〕:datemeantips〔tip〕。groupby(〔tips〔day〕,tips〔time〕〕)。mean()通过多个分组键进行计算,通过day和time,计算小费平均值datemean Out〔7〕:daytimeThurLunch2。767705Dinner3。000000FriLunch2。382857Dinner2。940000SatDinner2。993103SunDinner3。255132Name:tip,dtype:float64 In〔8〕:datemean。plot(kindbarh)barh为柱形图 Out〔8〕:matplotlib。axes。subplots。AxesSubplotat0x7bff1d0 In〔9〕:tips。dtypes Out〔9〕:totalbillfloat64tipfloat64sexcategorysmokercategorydaycategorytimecategorysizeint64dtype:object In〔14〕:forname,groupintips。groupby(tips〔sex〕):print(name)print(group)Maletotalbilltipsexsmokerdaytimesize110。341。66MaleNoSunDinner3221。013。50MaleNoSunDinner3323。683。31MaleNoSunDinner2525。294。71MaleNoSunDinner468。772。00MaleNoSunDinner2726。883。12MaleNoSunDinner4815。041。96MaleNoSunDinner2914。783。23MaleNoSunDinner21010。271。71MaleNoSunDinner21215。421。57MaleNoSunDinner21318。433。00MaleNoSunDinner41521。583。92MaleNoSunDinner21716。293。71MaleNoSunDinner31920。653。35MaleNoSatDinner32017。924。08MaleNoSatDinner22339。427。58MaleNoSatDinner42419。823。18MaleNoSatDinner22517。812。34MaleNoSatDinner42613。372。00MaleNoSatDinner22712。692。00MaleNoSatDinner22821。704。30MaleNoSatDinner2309。551。45MaleNoSatDinner23118。352。50MaleNoSatDinner43417。783。27MaleNoSatDinner23524。063。60MaleNoSatDinner33616。312。00MaleNoSatDinner33818。692。31MaleNoSatDinner33931。275。00MaleNoSatDinner34016。042。24MaleNoSatDinner34117。462。54MaleNoSunDinner2。。。。。。。。。。。。。。。。。。。。。。。1957。561。44MaleNoThurLunch219610。342。00MaleYesThurLunch219913。512。00MaleYesThurLunch220018。714。00MaleYesThurLunch320420。534。00MaleYesThurLunch420626。593。41MaleYesSatDinner320738。733。00MaleYesSatDinner420824。272。03MaleYesSatDinner221030。062。00MaleYesSatDinner321125。895。16MaleYesSatDinner421248。339。00MaleNoSatDinner421628。153。00MaleYesSatDinner521711。591。50MaleYesSatDinner22187。741。44MaleYesSatDinner222012。162。20MaleYesFriLunch22228。581。92MaleYesFriLunch122413。421。58MaleYesFriLunch222720。453。00MaleNoSatDinner422813。282。72MaleNoSatDinner223024。012。00MaleYesSatDinner423115。693。00MaleYesSatDinner323211。613。39MaleNoSatDinner223310。771。47MaleNoSatDinner223415。533。00MaleYesSatDinner223510。071。25MaleNoSatDinner223612。601。00MaleYesSatDinner223732。831。17MaleYesSatDinner223929。035。92MaleNoSatDinner324122。672。00MaleYesSatDinner224217。821。75MaleNoSatDinner2〔157rowsx7columns〕Femaletotalbilltipsexsmokerdaytimesize016。991。01FemaleNoSunDinner2424。593。61FemaleNoSunDinner41135。265。00FemaleNoSunDinner41414。833。02FemaleNoSunDinner21610。331。67FemaleNoSunDinner31816。973。50FemaleNoSunDinner32120。292。75FemaleNoSatDinner22215。772。23FemaleNoSatDinner22919。653。00FemaleNoSatDinner23215。063。00FemaleNoSatDinner23320。692。45FemaleNoSatDinner43716。933。07FemaleNoSatDinner35110。292。60FemaleNoSunDinner25234。815。20FemaleNoSunDinner45726。411。50FemaleNoSatDinner26616。452。47FemaleNoSatDinner2673。071。00FemaleYesSatDinner17117。073。00FemaleNoSatDinner37226。863。14FemaleYesSatDinner27325。285。00FemaleYesSatDinner27414。732。20FemaleNoSatDinner28210。071。83FemaleNoThurLunch18534。835。17FemaleNoThurLunch4925。751。00FemaleYesFriDinner29316。324。30FemaleYesFriDinner29422。753。25FemaleNoFriDinner210011。352。50FemaleYesFriDinner210115。383。00FemaleYesFriDinner210244。302。50FemaleYesSatDinner310322。423。48FemaleYesSatDinner2。。。。。。。。。。。。。。。。。。。。。。。15529。855。14FemaleNoSunDinner515725。003。75FemaleNoSunDinner415813。392。61FemaleNoSunDinner216216。212。00FemaleNoSunDinner316417。513。00FemaleYesSunDinner216810。591。61FemaleYesSatDinner216910。632。00FemaleYesSatDinner21789。604。00FemaleYesSunDinner218620。903。50FemaleYesSunDinner318818。153。50FemaleYesSunDinner319119。814。19FemaleYesThurLunch219743。115。00FemaleYesThurLunch419813。002。00FemaleYesThurLunch220112。742。01FemaleYesThurLunch220213。002。00FemaleYesThurLunch220316。402。50FemaleYesThurLunch220516。473。23FemaleYesThurLunch320912。762。23FemaleYesSatDinner221313。272。50FemaleYesSatDinner221428。176。50FemaleYesSatDinner321512。901。10FemaleYesSatDinner221930。143。09FemaleYesSatDinner422113。423。48FemaleYesFriLunch222315。983。00FemaleNoFriLunch322516。272。50FemaleYesFriLunch222610。092。00FemaleYesFriLunch222922。122。88FemaleYesSatDinner223835。834。67FemaleNoSatDinner324027。182。00FemaleYesSatDinner224318。783。00FemaleNoThurDinner2〔87rowsx7columns〕 In〔15〕:tips。groupby(tips〔sex〕)。size()size方法可返回各分组的大小 Out〔15〕:sexMale157Female87dtype:int64 In〔16〕:tips。groupby(tips〔sex〕)。count() Out〔16〕: totalbill tip smoker day time size sex Male 157hr157hr157hr157hr157hr157hrFemale 87hr87hr87hr87hr87hr87按照列名分组 In〔19〕:smokermeantips。groupby(smoker)。mean()smokermean Out〔19〕: totalbill tip size smoker Yes 20。756344 3。008710 2。408602 No 19。188278 2。991854 2。668874 In〔21〕:smokermean〔tip〕。plot(kindbar) Out〔21〕:matplotlib。axes。subplots。AxesSubplotat0x811ef98 In〔24〕:sizemean1tips〔tip〕。groupby(tips〔size〕)。mean()sizemean1 Out〔24〕:size11。43750022。58230833。39315844。13540554。02800065。225000Name:tip,dtype:float64 In〔25〕:sizemean2tips。groupby(size)〔tip〕。mean()语法糖sizemean2 Out〔25〕:size11。43750022。58230833。39315844。13540554。02800065。225000Name:tip,dtype:float64 In〔27〕:sizemean2。plot() Out〔27〕:matplotlib。axes。subplots。AxesSubplotat0xbe269e8 In〔29〕:dfDataFrame(np。arange(16)。reshape(4,4))df Out〔29〕: 0hr1hr2hr3hr0hr0hr1hr2hr3hr1hr4hr5hr6hr7hr2hr8hr9hr10hr11hr3hr12hr13hr14hr15按列表或元组分组 In〔30〕:list1〔a,b,a,b〕 In〔32〕:df。groupby(list1)。sum() Out〔32〕: 0hr1hr2hr3hra 8hr10hr12hr14hrb 16hr18hr20hr22按字典分组 In〔33〕:dfDataFrame(np。random。normal(size(6,6)),index〔a,b,c,A,B,C〕)df Out〔33〕: 0hr1hr2hr3hr4hr5hra 0。031512 0。896280 0。000981 0。558886 1。574150 0。030435 b 0。774907 0。020968 0。575220 0。566894 1。326251 0。775521 c 1。437972 0。699240 1。064924 0。235661 1。841803 1。238480 A 1。756554 0。652186 1。149668 0。192652 2。202044 0。366539 B 0。575227 0。299196 0。120483 2。665255 0。432872 1。627597 C 0。481407 0。983928 1。270371 1。581129 1。568339 2。122324 In〔34〕:dict1{a:one,A:one,b:two,B:two,c:three,C:three} In〔35〕:df。groupby(dict1)。sum() Out〔35〕: 0hr1hr2hr3hr4hr5hrone 1。725042 0。244095 1。148687 0。751538 0。627894 0。396974 three 1。919380 1。683169 0。205448 1。345468 0。273464 0。883844 two 0。199680 0。320164 0。454738 3。232148 1。759122 2。403117按函数分组 In〔37〕:dfDataFrame(np。random。randn(4,4))df Out〔37〕: 0hr1hr2hr3hr0hr0。803694 1。242886 0。393840 1。137829 1hr1。048137 0。931402 0。262153 0。609839 2hr0。135432 0。739250 1。685265 1。562063 3hr0。863777 0。687589 1。901485 0。224359 In〔38〕:defjug(x):ifx0:returnaelse:returnb In〔41〕:df〔3〕。groupby(df〔3〕。map(jug))。sum() Out〔41〕:3a2。171902b1。362188Name:3,dtype:float64 In〔42〕:dfDataFrame(np。arange(16)。reshape(4,4),index〔〔one,one,two,two〕,〔a,b,a,b〕〕,columns〔〔apple,apple,orange,orange〕,〔red,green,red,green〕〕)层次化索引,可通过级别进行分组,通过level参数,输入编号或名称即可df Out〔42〕: apple orange red green red green one a 0hr1hr2hr3hrb 4hr5hr6hr7hrtwo a 8hr9hr10hr11hrb 12hr13hr14hr15hrIn〔43〕:df。groupby(level1)。sum() Out〔43〕: apple orange red green red green a 8hr10hr12hr14hrb 16hr18hr20hr22hrIn〔44〕:df。groupby(level1,axis1)。sum()在列上进行分组(axis1) Out〔44〕: green red one a 4hr2hrb 12hr10hrtwo a 20hr18hrb 28hr26聚合运算聚合函数 In〔47〕:maxtiptips。groupby(sex)〔tip〕。max()通过性别分组,计算小费的最大值maxtip Out〔47〕:sexMale10。0Female6。5Name:tip,dtype:float64 In〔48〕:maxtip。plot(kindbar) Out〔48〕:matplotlib。axes。subplots。AxesSubplotat0xcb046a0 In〔50〕:dfDataFrame(np。arange(16)。reshape(4,4))df Out〔50〕: 0hr1hr2hr3hr0hr0hr1hr2hr3hr1hr4hr5hr6hr7hr2hr8hr9hr10hr11hr3hr12hr13hr14hr15hrIn〔53〕:list1〔a,b,a,b〕df。groupby(list1)。quantile(0。5)quantile分位数函数 Out〔53〕: 0。5 0hr1hr2hr3hra 4。0 5。0 6。0 7。0 b 8。0 9。0 10。0 11。0 In〔4〕:defgetrange(x):returnx。max()x。min() In〔5〕:tipsrangetips。groupby(sex)〔tip〕。agg(getrange)常用于调用groupby()函数之后,对数据做一些聚合操作,包括sum,min,max以及其他一些聚合函数tipsrange Out〔5〕:sexMale9。0Female5。5Name:tip,dtype:float64 In〔6〕:tipsrange。plot(kindbar) Out〔6〕:matplotlib。axes。subplots。AxesSubplotat0xb9cef60 多函数应用 In〔13〕:tips。groupby(〔sex,smoker〕)〔tip〕。agg(〔mean,std,getrange〕)对agg参数传入多函数列表,即可完成一列的多函数运算 Out〔13〕: mean std getrange sex smoker Male Yes 3。051167 1。500120 9。00 No 3。113402 1。489559 7。75 Female Yes 2。931515 1。219916 5。50 No 2。773519 1。128425 4。20 In〔15〕:tips。groupby(〔sex,smoker〕)〔tip〕。agg(〔(tipmean,mean),(Range,getrange)〕)不想使用默认的运算函数列名,可以元组的形式传入,前面为名称,后面为聚合函数 Out〔15〕: tipmean Range sex smoker Male Yes 3。051167 9。00 No 3。113402 7。75 Female Yes 2。931515 5。50 No 2。773519 4。20 In〔16〕:tips。groupby(〔day,time〕)〔totalbill,tip〕。agg(〔(tipmean,mean),(Range,getrange)〕)对多列进行多聚合函数运算时,会产生层次化索引 Out〔16〕: totalbill tip tipmean Range tipmean Range day time Thur Lunch 17。664754 35。60 2。767705 5。45 Dinner 18。780000 0。00 3。000000 0。00 Fri Lunch 12。845714 7。69 2。382857 1。90 Dinner 19。663333 34。42 2。940000 3。73 Sat Dinner 20。441379 47。74 2。993103 9。00 Sun Dinner 21。410000 40。92 3。255132 5。49 In〔17〕:tips。groupby(〔day,time〕)〔totalbill,tip〕。agg({totalbill:sum,tip:mean})对不同列使用不同的函数运算,可以通过字典来定义映射关系 Out〔17〕: totalbill tip day time Thur Lunch 1077。55 2。767705 Dinner 18。78 3。000000 Fri Lunch 89。92 2。382857 Dinner 235。96 2。940000 Sat Dinner 1778。40 2。993103 Sun Dinner 1627。16 3。255132 In〔18〕:tips。groupby(〔day,time〕)〔totalbill,tip〕。agg({totalbill:〔sum,mean〕,tip:mean}) Out〔18〕: totalbill tip sum mean mean day time Thur Lunch 1077。55 17。664754 2。767705 Dinner 18。78 18。780000 3。000000 Fri Lunch 89。92 12。845714 2。382857 Dinner 235。96 19。663333 2。940000 Sat Dinner 1778。40 20。441379 2。993103 Sun Dinner 1627。16 21。410000 3。255132 In〔23〕:noindextips。groupby(〔sex,smoker〕,asindexFalse)〔tip〕。mean()希望返回的结果不以分组键为索引,通过asindexFalse可以完成noindex Out〔23〕: sex smoker tip 0hrMale Yes 3。051167 1hrMale No 3。113402 2hrFemale Yes 2。931515 3hrFemale No 2。773519 In〔24〕:tips Out〔24〕: totalbill tip sex smoker day time size 0hr16。99 1。01 Female No Sun Dinner 2hr1hr10。34 1。66 Male No Sun Dinner 3hr2hr21。01 3。50 Male No Sun Dinner 3hr3hr23。68 3。31 Male No Sun Dinner 2hr4hr24。59 3。61 Female No Sun Dinner 4hr5hr25。29 4。71 Male No Sun Dinner 4hr6hr8。77 2。00 Male No Sun Dinner 2hr7hr26。88 3。12 Male No Sun Dinner 4hr8hr15。04 1。96 Male No Sun Dinner 2hr9hr14。78 3。23 Male No Sun Dinner 2hr10hr10。27 1。71 Male No Sun Dinner 2hr11hr35。26 5。00 Female No Sun Dinner 4hr12hr15。42 1。57 Male No Sun Dinner 2hr13hr18。43 3。00 Male No Sun Dinner 4hr14hr14。83 3。02 Female No Sun Dinner 2hr15hr21。58 3。92 Male No Sun Dinner 2hr16hr10。33 1。67 Female No Sun Dinner 3hr17hr16。29 3。71 Male No Sun Dinner 3hr18hr16。97 3。50 Female No Sun Dinner 3hr19hr20。65 3。35 Male No Sat Dinner 3hr20hr17。92 4。08 Male No Sat Dinner 2hr21hr20。29 2。75 Female No Sat Dinner 2hr22hr15。77 2。23 Female No Sat Dinner 2hr23hr39。42 7。58 Male No Sat Dinner 4hr24hr19。82 3。18 Male No Sat Dinner 2hr25hr17。81 2。34 Male No Sat Dinner 4hr26hr13。37 2。00 Male No Sat Dinner 2hr27hr12。69 2。00 Male No Sat Dinner 2hr28hr21。70 4。30 Male No Sat Dinner 2hr29hr19。65 3。00 Female No Sat Dinner 2hr。。。 。。。 。。。 。。。 。。。 。。。 。。。 。。。 214hr28。17 6。50 Female Yes Sat Dinner 3hr215hr12。90 1。10 Female Yes Sat Dinner 2hr216hr28。15 3。00 Male Yes Sat Dinner 5hr217hr11。59 1。50 Male Yes Sat Dinner 2hr218hr7。74 1。44 Male Yes Sat Dinner 2hr219hr30。14 3。09 Female Yes Sat Dinner 4hr220hr12。16 2。20 Male Yes Fri Lunch 2hr221hr13。42 3。48 Female Yes Fri Lunch 2hr222hr8。58 1。92 Male Yes Fri Lunch 1hr223hr15。98 3。00 Female No Fri Lunch 3hr224hr13。42 1。58 Male Yes Fri Lunch 2hr225hr16。27 2。50 Female Yes Fri Lunch 2hr226hr10。09 2。00 Female Yes Fri Lunch 2hr227hr20。45 3。00 Male No Sat Dinner 4hr228hr13。28 2。72 Male No Sat Dinner 2hr229hr22。12 2。88 Female Yes Sat Dinner 2hr230hr24。01 2。00 Male Yes Sat Dinner 4hr231hr15。69 3。00 Male Yes Sat Dinner 3hr232hr11。61 3。39 Male No Sat Dinner 2hr233hr10。77 1。47 Male No Sat Dinner 2hr234hr15。53 3。00 Male Yes Sat Dinner 2hr235hr10。07 1。25 Male No Sat Dinner 2hr236hr12。60 1。00 Male Yes Sat Dinner 2hr237hr32。83 1。17 Male Yes Sat Dinner 2hr238hr35。83 4。67 Female No Sat Dinner 3hr239hr29。03 5。92 Male No Sat Dinner 3hr240hr27。18 2。00 Female Yes Sat Dinner 2hr241hr22。67 2。00 Male Yes Sat Dinner 2hr242hr17。82 1。75 Male No Sat Dinner 2hr243hr18。78 3。00 Female No Thur Dinner 2hr244rows7columns分组运算transform方法 In〔28〕:dfDataFrame(tips。groupby(sex)〔tip〕。mean())df Out〔28〕: tip sex Male 3。089618 Female 2。833448 In〔29〕:newtipspd。merge(tips,df,leftonsex,rightindexTrue)先聚合运算,然后再将其合并newtips。head() Out〔29〕: totalbill tipx sex smoker day time size tipy 0hr16。99 1。01 Female No Sun Dinner 2hr2。833448 4hr24。59 3。61 Female No Sun Dinner 4hr2。833448 11hr35。26 5。00 Female No Sun Dinner 4hr2。833448 14hr14。83 3。02 Female No Sun Dinner 2hr2。833448 16hr10。33 1。67 Female No Sun Dinner 3hr2。833448 In〔32〕:tips。groupby(sex)〔tip〕。transform(mean)transform方法可以使运算分布到每一行 Out〔32〕:02。83344813。08961823。08961833。08961842。83344853。08961863。08961873。08961883。08961893。089618103。089618112。833448123。089618133。089618142。833448153。089618162。833448173。089618182。833448193。089618203。089618212。833448222。833448233。089618243。089618253。089618263。089618273。089618283。089618292。833448。。。2142。8334482152。8334482163。0896182173。0896182183。0896182192。8334482203。0896182212。8334482223。0896182232。8334482243。0896182252。8334482262。8334482273。0896182283。0896182292。8334482303。0896182313。0896182323。0896182333。0896182343。0896182353。0896182363。0896182373。0896182382。8334482393。0896182402。8334482413。0896182423。0896182432。833448Name:tip,Length:244,dtype:float64apply方法 In〔10〕:deftop(x,n5):returnx。sortvalues(bytip,ascendingFalse)〔n:〕 In〔11〕:tips。groupby(sex)。apply(top) Out〔11〕: totalbill tip sex smoker day time size sex Male 43hr9。68 1。32 Male No Sun Dinner 2hr235hr10。07 1。25 Male No Sat Dinner 2hr75hr10。51 1。25 Male No Sat Dinner 2hr237hr32。83 1。17 Male Yes Sat Dinner 2hr236hr12。60 1。00 Male Yes Sat Dinner 2hrFemale 215hr12。90 1。10 Female Yes Sat Dinner 2hr0hr16。99 1。01 Female No Sun Dinner 2hr111hr7。25 1。00 Female No Sat Dinner 1hr67hr3。07 1。00 Female Yes Sat Dinner 1hr92hr5。75 1。00 Female Yes Fri Dinner 2hrIn〔12〕:tips。groupby(sex,groupkeysFalse)。apply(top)希望返回的结果不以分组键为索引,通过groupkeysFalse可以完成 Out〔12〕: totalbill tip sex smoker day time size 43hr9。68 1。32 Male No Sun Dinner 2hr235hr10。07 1。25 Male No Sat Dinner 2hr75hr10。51 1。25 Male No Sat Dinner 2hr237hr32。83 1。17 Male Yes Sat Dinner 2hr236hr12。60 1。00 Male Yes Sat Dinner 2hr215hr12。90 1。10 Female Yes Sat Dinner 2hr0hr16。99 1。01 Female No Sun Dinner 2hr111hr7。25 1。00 Female No Sat Dinner 1hr67hr3。07 1。00 Female Yes Sat Dinner 1hr92hr5。75 1。00 Female Yes Fri Dinner 2hrIn〔18〕:data{name:〔张三,李四,peter,王五,小明,小红〕,sex:〔female,female,male,male,male,female〕,math:〔67,72,np。nan,82,90,np。nan〕}dfDataFrame(data)df〔math〕df〔math〕df Out〔18〕: math name sex 0hr67。0 张三 female 1hr72。0 李四 female 2hrNaN peter male 3hr82。0 王五 male 4hr90。0 小明 male 5hrNaN 小红 female In〔19〕:df。fillna(df〔math〕。mean())通过平均值对缺失值进行填充 Out〔19〕: math name sex 0hr67。00 张三 female 1hr72。00 李四 female 2hr77。75 peter male 3hr82。00 王五 male 4hr90。00 小明 male 5hr77。75 小红 female In〔20〕:flambdax:x。fillna(x。mean())lambda匿名函数,分组后,再进行插值df。groupby(sex)。apply(f) Out〔20〕: math name sex sex female 0hr67。0 张三 female 1hr72。0 李四 female 5hr69。5 小红 female male 2hr86。0 peter male 3hr82。0 王五 male 4hr90。0 小明 male数据透视表透视表 In〔25〕:tips。pivottable?查询数据透视表帮助文档 In〔22〕:tips。pivottable(valuestip,indexsex,columnssmoker)value代表的是值,index为行,columns为例计算为平均值(默认) Out〔22〕: smoker Yes No sex Male 3。051167 3。113402 Female 2。931515 2。773519 In〔23〕:tips。pivottable(valuestip,indexsex,columnssmoker,aggfuncsum)aggfunc参数来指定计算方式 Out〔23〕: smoker Yes No sex Male 183。07 302。00 Female 96。74 149。77 In〔24〕:tips。pivottable(valuestip,indexsex,columnssmoker,aggfuncsum,marginsTrue)margins分项小计 Out〔24〕: smoker Yes No All sex Male 183。07 302。00 485。07 Female 96。74 149。77 246。51 All 279。81 451。77 731。58交叉表交叉表是一种用于计算分组频率的特殊透视表 In〔33〕:crosstablepd。crosstab(indextips〔day〕,columnstips〔size〕)crosstable Out〔33〕: size 1hr2hr3hr4hr5hr6hrday Thur 1hr48hr4hr5hr1hr3hrFri 1hr16hr1hr1hr0hr0hrSat 2hr53hr18hr13hr1hr0hrSun 0hr39hr15hr18hr3hr1hrIn〔36〕:dfcrosstable。p(crosstable。sum(1),axis0)通过p函数,可以使得每行的和为1,频率百分比df Out〔36〕: size 1hr2hr3hr4hr5hr6hrday Thur 0。016129 0。774194 0。064516 0。080645 0。016129 0。048387 Fri 0。052632 0。842105 0。052632 0。052632 0。000000 0。000000 Sat 0。022989 0。609195 0。206897 0。149425 0。011494 0。000000 Sun 0。000000 0。513158 0。197368 0。236842 0。039474 0。013158 In〔37〕:df。plot(kindbar,stackedTrue)柱形图通过stackedTrue可以绘制堆积图 Out〔37〕:matplotlib。axes。subplots。AxesSubplotat0xb9a6080