范文健康探索娱乐情感热点
投稿投诉
热点动态
科技财经
情感日志
励志美文
娱乐时尚
游戏搞笑
探索旅游
历史星座
健康养生
美丽育儿
范文作文
教案论文
国学影视

Python数据聚合和分组

  from pandas import Series,DataFrame import pandas as pd import numpy as np import matplotlib.pyplot as plt  import matplotlib as mpl import seaborn as sns    # 导入 seaborn 库,并取别名为 sns  %matplotlib inline    # 在Ipython编译器里直接使用,功能是可以内嵌绘图,并且可以省略掉plt.show()这一步
  In [2]: pd.set_option("mode.chained_assignment",None)    # 关闭警告
  1、从github上下载这个文件,这是官方给的范例数据库:https://github.com/mwaskom/seaborn-data/ 2、找到load_dataset()在本地的数据库地址。 get_data_home()函数的作用就是获取load_dataset() 的数据库地址。 sns.utils.get_data_home() 之后就会出现已下形式的地址
  <你的驱动器>:Users<你的用户名>seaborn-data 例如:‘C:Usersuser1seaborn-data’ 3、将下载的文件夹解压,然后把里面的内容复制到数据库地址下。
  In [3]: tips=sns.load_dataset("tips")   # load_dataset("tips")函数默认首先从本地库调取tips.csv文件 tips.head()
  Out[3]:
  total_bill
  tip
  sex
  smoker
  day
  time
  size
  0
  16.99   1.01   Female   No   Sun   Dinner   2
  1
  10.34   1.66   Male   No   Sun   Dinner   3
  2
  21.01   3.50   Male   No   Sun   Dinner   3
  3
  23.68   3.31   Male   No   Sun   Dinner   2
  4
  24.59   3.61   Female   No   Sun   Dinner   4 数据分组groupby分组   In [4]: grouped = tips["tip"].groupby(tips["sex"]) grouped # 返回的grouped为GroupBy对象,是保存的中间数据,   Out[4]:   In [5]: grouped.mean() # 对该对象调用mean方法即可返回数据   Out[5]: sex Male 3.089618 Female 2.833448 Name: tip, dtype: float64   In [7]: date_mean = tips["tip"].groupby([tips["day"],tips["time"]]).mean() # 通过多个分组键进行计算,通过day和time,计算小费平均值 date_mean   Out[7]: day time Thur Lunch 2.767705 Dinner 3.000000 Fri Lunch 2.382857 Dinner 2.940000 Sat Dinner 2.993103 Sun Dinner 3.255132 Name: tip, dtype: float64   In [8]: date_mean.plot(kind="barh") # barh为柱形图   Out[8]:   In [9]: tips.dtypes   Out[9]: total_bill float64 tip float64 sex category smoker category day category time category size int64 dtype: object   In [14]: for name,group in tips.groupby(tips["sex"]): print(name) print(group) Male total_bill tip sex smoker day time size 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 5 25.29 4.71 Male No Sun Dinner 4 6 8.77 2.00 Male No Sun Dinner 2 7 26.88 3.12 Male No Sun Dinner 4 8 15.04 1.96 Male No Sun Dinner 2 9 14.78 3.23 Male No Sun Dinner 2 10 10.27 1.71 Male No Sun Dinner 2 12 15.42 1.57 Male No Sun Dinner 2 13 18.43 3.00 Male No Sun Dinner 4 15 21.58 3.92 Male No Sun Dinner 2 17 16.29 3.71 Male No Sun Dinner 3 19 20.65 3.35 Male No Sat Dinner 3 20 17.92 4.08 Male No Sat Dinner 2 23 39.42 7.58 Male No Sat Dinner 4 24 19.82 3.18 Male No Sat Dinner 2 25 17.81 2.34 Male No Sat Dinner 4 26 13.37 2.00 Male No Sat Dinner 2 27 12.69 2.00 Male No Sat Dinner 2 28 21.70 4.30 Male No Sat Dinner 2 30 9.55 1.45 Male No Sat Dinner 2 31 18.35 2.50 Male No Sat Dinner 4 34 17.78 3.27 Male No Sat Dinner 2 35 24.06 3.60 Male No Sat Dinner 3 36 16.31 2.00 Male No Sat Dinner 3 38 18.69 2.31 Male No Sat Dinner 3 39 31.27 5.00 Male No Sat Dinner 3 40 16.04 2.24 Male No Sat Dinner 3 41 17.46 2.54 Male No Sun Dinner 2 .. ... ... ... ... ... ... ... 195 7.56 1.44 Male No Thur Lunch 2 196 10.34 2.00 Male Yes Thur Lunch 2 199 13.51 2.00 Male Yes Thur Lunch 2 200 18.71 4.00 Male Yes Thur Lunch 3 204 20.53 4.00 Male Yes Thur Lunch 4 206 26.59 3.41 Male Yes Sat Dinner 3 207 38.73 3.00 Male Yes Sat Dinner 4 208 24.27 2.03 Male Yes Sat Dinner 2 210 30.06 2.00 Male Yes Sat Dinner 3 211 25.89 5.16 Male Yes Sat Dinner 4 212 48.33 9.00 Male No Sat Dinner 4 216 28.15 3.00 Male Yes Sat Dinner 5 217 11.59 1.50 Male Yes Sat Dinner 2 218 7.74 1.44 Male Yes Sat Dinner 2 220 12.16 2.20 Male Yes Fri Lunch 2 222 8.58 1.92 Male Yes Fri Lunch 1 224 13.42 1.58 Male Yes Fri Lunch 2 227 20.45 3.00 Male No Sat Dinner 4 228 13.28 2.72 Male No Sat Dinner 2 230 24.01 2.00 Male Yes Sat Dinner 4 231 15.69 3.00 Male Yes Sat Dinner 3 232 11.61 3.39 Male No Sat Dinner 2 233 10.77 1.47 Male No Sat Dinner 2 234 15.53 3.00 Male Yes Sat Dinner 2 235 10.07 1.25 Male No Sat Dinner 2 236 12.60 1.00 Male Yes Sat Dinner 2 237 32.83 1.17 Male Yes Sat Dinner 2 239 29.03 5.92 Male No Sat Dinner 3 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 [157 rows x 7 columns] Female total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 11 35.26 5.00 Female No Sun Dinner 4 14 14.83 3.02 Female No Sun Dinner 2 16 10.33 1.67 Female No Sun Dinner 3 18 16.97 3.50 Female No Sun Dinner 3 21 20.29 2.75 Female No Sat Dinner 2 22 15.77 2.23 Female No Sat Dinner 2 29 19.65 3.00 Female No Sat Dinner 2 32 15.06 3.00 Female No Sat Dinner 2 33 20.69 2.45 Female No Sat Dinner 4 37 16.93 3.07 Female No Sat Dinner 3 51 10.29 2.60 Female No Sun Dinner 2 52 34.81 5.20 Female No Sun Dinner 4 57 26.41 1.50 Female No Sat Dinner 2 66 16.45 2.47 Female No Sat Dinner 2 67 3.07 1.00 Female Yes Sat Dinner 1 71 17.07 3.00 Female No Sat Dinner 3 72 26.86 3.14 Female Yes Sat Dinner 2 73 25.28 5.00 Female Yes Sat Dinner 2 74 14.73 2.20 Female No Sat Dinner 2 82 10.07 1.83 Female No Thur Lunch 1 85 34.83 5.17 Female No Thur Lunch 4 92 5.75 1.00 Female Yes Fri Dinner 2 93 16.32 4.30 Female Yes Fri Dinner 2 94 22.75 3.25 Female No Fri Dinner 2 100 11.35 2.50 Female Yes Fri Dinner 2 101 15.38 3.00 Female Yes Fri Dinner 2 102 44.30 2.50 Female Yes Sat Dinner 3 103 22.42 3.48 Female Yes Sat Dinner 2 .. ... ... ... ... ... ... ... 155 29.85 5.14 Female No Sun Dinner 5 157 25.00 3.75 Female No Sun Dinner 4 158 13.39 2.61 Female No Sun Dinner 2 162 16.21 2.00 Female No Sun Dinner 3 164 17.51 3.00 Female Yes Sun Dinner 2 168 10.59 1.61 Female Yes Sat Dinner 2 169 10.63 2.00 Female Yes Sat Dinner 2 178 9.60 4.00 Female Yes Sun Dinner 2 186 20.90 3.50 Female Yes Sun Dinner 3 188 18.15 3.50 Female Yes Sun Dinner 3 191 19.81 4.19 Female Yes Thur Lunch 2 197 43.11 5.00 Female Yes Thur Lunch 4 198 13.00 2.00 Female Yes Thur Lunch 2 201 12.74 2.01 Female Yes Thur Lunch 2 202 13.00 2.00 Female Yes Thur Lunch 2 203 16.40 2.50 Female Yes Thur Lunch 2 205 16.47 3.23 Female Yes Thur Lunch 3 209 12.76 2.23 Female Yes Sat Dinner 2 213 13.27 2.50 Female Yes Sat Dinner 2 214 28.17 6.50 Female Yes Sat Dinner 3 215 12.90 1.10 Female Yes Sat Dinner 2 219 30.14 3.09 Female Yes Sat Dinner 4 221 13.42 3.48 Female Yes Fri Lunch 2 223 15.98 3.00 Female No Fri Lunch 3 225 16.27 2.50 Female Yes Fri Lunch 2 226 10.09 2.00 Female Yes Fri Lunch 2 229 22.12 2.88 Female Yes Sat Dinner 2 238 35.83 4.67 Female No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [87 rows x 7 columns]   In [15]: tips.groupby(tips["sex"]).size() # size方法可返回各分组的大小   Out[15]: sex Male 157 Female 87 dtype: int64   In [16]: tips.groupby(tips["sex"]).count()   Out[16]:   total_bill   tip   smoker   day   time   size   sex   Male   157
  157
  157
  157
  157
  157
  Female   87
  87
  87
  87
  87
  87 按照列名分组   In [19]: smoker_mean = tips.groupby("smoker").mean() smoker_mean   Out[19]:   total_bill   tip   size   smoker   Yes   20.756344   3.008710   2.408602   No   19.188278   2.991854   2.668874   In [21]: smoker_mean["tip"].plot(kind="bar")   Out[21]:   In [24]: size_mean1 = tips["tip"].groupby(tips["size"]).mean() size_mean1   Out[24]: size 1 1.437500 2 2.582308 3 3.393158 4 4.135405 5 4.028000 6 5.225000 Name: tip, dtype: float64   In [25]: size_mean2 = tips.groupby("size")["tip"].mean() #语法糖 size_mean2   Out[25]: size 1 1.437500 2 2.582308 3 3.393158 4 4.135405 5 4.028000 6 5.225000 Name: tip, dtype: float64   In [27]: size_mean2.plot()   Out[27]:   In [29]: df = DataFrame(np.arange(16).reshape(4,4)) df   Out[29]:   0
  1
  2
  3
  0
  0
  1
  2
  3
  1
  4
  5
  6
  7
  2
  8
  9
  10
  11
  3
  12
  13
  14
  15 按列表或元组分组   In [30]: list1 = ["a","b","a","b"]   In [32]: df.groupby(list1).sum()   Out[32]:   0
  1
  2
  3
  a   8
  10
  12
  14
  b   16
  18
  20
  22 按字典分组   In [33]: df = DataFrame(np.random.normal(size=(6,6)),index=["a","b","c","A","B","C"]) df   Out[33]:   0
  1
  2
  3
  4
  5
  a   0.031512   -0.896280   -0.000981   0.558886   -1.574150   0.030435   b   0.774907   0.020968   0.575220   -0.566894   1.326251   0.775521   c   1.437972   -0.699240   -1.064924   0.235661   1.841803   1.238480   A   -1.756554   0.652186   1.149668   0.192652   2.202044   0.366539   B   -0.575227   0.299196   -0.120483   -2.665255   0.432872   1.627597   C   0.481407   -0.983928   1.270371   -1.581129   -1.568339   -2.122324   In [34]: dict1 = { "a":"one", "A":"one", "b":"two", "B":"two", "c":"three", "C":"three" }   In [35]: df.groupby(dict1).sum()   Out[35]:   0
  1
  2
  3
  4
  5
  one   -1.725042   -0.244095   1.148687   0.751538   0.627894   0.396974   three   1.919380   -1.683169   0.205448   -1.345468   0.273464   -0.883844   two   0.199680   0.320164   0.454738   -3.232148   1.759122   2.403117 按函数分组   In [37]: df = DataFrame(np.random.randn(4,4)) df   Out[37]:   0
  1
  2
  3
  0
  0.803694   -1.242886   0.393840   -1.137829   1
  1.048137   -0.931402   -0.262153   0.609839   2
  0.135432   0.739250   -1.685265   1.562063   3
  -0.863777   -0.687589   1.901485   -0.224359   In [38]: def jug(x): if x >= 0: return "a" else: return "b"   In [41]: df[3].groupby(df[3].map(jug)).sum()   Out[41]: 3 a 2.171902 b -1.362188 Name: 3, dtype: float64   In [42]: df = DataFrame(np.arange(16).reshape(4,4), index=[["one","one","two","two"],["a","b","a","b"]], columns=[["apple","apple","orange","orange"],["red","green","red","green"]]) """层次化索引,可通过级别进行分组,通过level参数,输入编号或名称即可""" df   Out[42]:   apple   orange   red   green   red   green   one   a   0
  1
  2
  3
  b   4
  5
  6
  7
  two   a   8
  9
  10
  11
  b   12
  13
  14
  15
  In [43]: df.groupby(level=1).sum()   Out[43]:   apple   orange   red   green   red   green   a   8
  10
  12
  14
  b   16
  18
  20
  22
  In [44]: df.groupby(level=1,axis=1).sum() # 在列上进行分组(axis=1)   Out[44]:   green   red   one   a   4
  2
  b   12
  10
  two   a   20
  18
  b   28
  26 聚合运算聚合函数   In [47]: max_tip = tips.groupby("sex")["tip"].max() # 通过性别分组,计算小费的最大值 max_tip   Out[47]: sex Male 10.0 Female 6.5 Name: tip, dtype: float64   In [48]: max_tip.plot(kind="bar")   Out[48]:   In [50]: df = DataFrame(np.arange(16).reshape(4,4)) df   Out[50]:   0
  1
  2
  3
  0
  0
  1
  2
  3
  1
  4
  5
  6
  7
  2
  8
  9
  10
  11
  3
  12
  13
  14
  15
  In [53]: list1 = ["a","b","a","b"] df.groupby(list1).quantile(0.5) # quantile分位数函数   Out[53]:   0.5   0
  1
  2
  3
  a   4.0   5.0   6.0   7.0   b   8.0   9.0   10.0   11.0   In [4]: def get_range(x): return x.max()-x.min()   In [5]: tips_range = tips.groupby("sex")["tip"].agg(get_range) """常用于调用groupby()函数之后,对数据做一些聚合操作,包括sum,min,max以及其他一些聚合函数""" tips_range   Out[5]: sex Male 9.0 Female 5.5 Name: tip, dtype: float64   In [6]: tips_range.plot(kind="bar")   Out[6]:   多函数应用   In [13]: tips.groupby(["sex","smoker"])["tip"].agg(["mean","std",get_range]) # 对agg参数传入多函数列表,即可完成一列的多函数运算   Out[13]:   mean   std   get_range   sex   smoker   Male   Yes   3.051167   1.500120   9.00   No   3.113402   1.489559   7.75   Female   Yes   2.931515   1.219916   5.50   No   2.773519   1.128425   4.20   In [15]: tips.groupby(["sex","smoker"])["tip"].agg([("tip_mean","mean"),("Range",get_range)]) # 不想使用默认的运算函数列名,可以元组的形式传入,前面为名称,后面为聚合函数   Out[15]:   tip_mean   Range   sex   smoker   Male   Yes   3.051167   9.00   No   3.113402   7.75   Female   Yes   2.931515   5.50   No   2.773519   4.20   In [16]: tips.groupby(["day","time"])["total_bill","tip"].agg([("tip_mean","mean"),("Range",get_range)]) # 对多列进行多聚合函数运算时,会产生层次化索引   Out[16]:   total_bill   tip   tip_mean   Range   tip_mean   Range   day   time   Thur   Lunch   17.664754   35.60   2.767705   5.45   Dinner   18.780000   0.00   3.000000   0.00   Fri   Lunch   12.845714   7.69   2.382857   1.90   Dinner   19.663333   34.42   2.940000   3.73   Sat   Dinner   20.441379   47.74   2.993103   9.00   Sun   Dinner   21.410000   40.92   3.255132   5.49   In [17]: tips.groupby(["day","time"])["total_bill","tip"].agg({"total_bill":"sum","tip":"mean"}) # 对不同列使用不同的函数运算,可以通过字典来定义映射关系   Out[17]:   total_bill   tip   day   time   Thur   Lunch   1077.55   2.767705   Dinner   18.78   3.000000   Fri   Lunch   89.92   2.382857   Dinner   235.96   2.940000   Sat   Dinner   1778.40   2.993103   Sun   Dinner   1627.16   3.255132   In [18]: tips.groupby(["day","time"])["total_bill","tip"].agg({"total_bill":["sum","mean"],"tip":"mean"})   Out[18]:   total_bill   tip   sum   mean   mean   day   time   Thur   Lunch   1077.55   17.664754   2.767705   Dinner   18.78   18.780000   3.000000   Fri   Lunch   89.92   12.845714   2.382857   Dinner   235.96   19.663333   2.940000   Sat   Dinner   1778.40   20.441379   2.993103   Sun   Dinner   1627.16   21.410000   3.255132   In [23]: no_index = tips.groupby(["sex","smoker"],as_index=False)["tip"].mean() # 希望返回的结果不以分组键为索引,通过as_index=False可以完成 no_index   Out[23]:   sex   smoker   tip   0
  Male   Yes   3.051167   1
  Male   No   3.113402   2
  Female   Yes   2.931515   3
  Female   No   2.773519   In [24]: tips   Out[24]:   total_bill   tip   sex   smoker   day   time   size   0
  16.99   1.01   Female   No   Sun   Dinner   2
  1
  10.34   1.66   Male   No   Sun   Dinner   3
  2
  21.01   3.50   Male   No   Sun   Dinner   3
  3
  23.68   3.31   Male   No   Sun   Dinner   2
  4
  24.59   3.61   Female   No   Sun   Dinner   4
  5
  25.29   4.71   Male   No   Sun   Dinner   4
  6
  8.77   2.00   Male   No   Sun   Dinner   2
  7
  26.88   3.12   Male   No   Sun   Dinner   4
  8
  15.04   1.96   Male   No   Sun   Dinner   2
  9
  14.78   3.23   Male   No   Sun   Dinner   2
  10
  10.27   1.71   Male   No   Sun   Dinner   2
  11
  35.26   5.00   Female   No   Sun   Dinner   4
  12
  15.42   1.57   Male   No   Sun   Dinner   2
  13
  18.43   3.00   Male   No   Sun   Dinner   4
  14
  14.83   3.02   Female   No   Sun   Dinner   2
  15
  21.58   3.92   Male   No   Sun   Dinner   2
  16
  10.33   1.67   Female   No   Sun   Dinner   3
  17
  16.29   3.71   Male   No   Sun   Dinner   3
  18
  16.97   3.50   Female   No   Sun   Dinner   3
  19
  20.65   3.35   Male   No   Sat   Dinner   3
  20
  17.92   4.08   Male   No   Sat   Dinner   2
  21
  20.29   2.75   Female   No   Sat   Dinner   2
  22
  15.77   2.23   Female   No   Sat   Dinner   2
  23
  39.42   7.58   Male   No   Sat   Dinner   4
  24
  19.82   3.18   Male   No   Sat   Dinner   2
  25
  17.81   2.34   Male   No   Sat   Dinner   4
  26
  13.37   2.00   Male   No   Sat   Dinner   2
  27
  12.69   2.00   Male   No   Sat   Dinner   2
  28
  21.70   4.30   Male   No   Sat   Dinner   2
  29
  19.65   3.00   Female   No   Sat   Dinner   2
  ...   ...   ...   ...   ...   ...   ...   ...   214
  28.17   6.50   Female   Yes   Sat   Dinner   3
  215
  12.90   1.10   Female   Yes   Sat   Dinner   2
  216
  28.15   3.00   Male   Yes   Sat   Dinner   5
  217
  11.59   1.50   Male   Yes   Sat   Dinner   2
  218
  7.74   1.44   Male   Yes   Sat   Dinner   2
  219
  30.14   3.09   Female   Yes   Sat   Dinner   4
  220
  12.16   2.20   Male   Yes   Fri   Lunch   2
  221
  13.42   3.48   Female   Yes   Fri   Lunch   2
  222
  8.58   1.92   Male   Yes   Fri   Lunch   1
  223
  15.98   3.00   Female   No   Fri   Lunch   3
  224
  13.42   1.58   Male   Yes   Fri   Lunch   2
  225
  16.27   2.50   Female   Yes   Fri   Lunch   2
  226
  10.09   2.00   Female   Yes   Fri   Lunch   2
  227
  20.45   3.00   Male   No   Sat   Dinner   4
  228
  13.28   2.72   Male   No   Sat   Dinner   2
  229
  22.12   2.88   Female   Yes   Sat   Dinner   2
  230
  24.01   2.00   Male   Yes   Sat   Dinner   4
  231
  15.69   3.00   Male   Yes   Sat   Dinner   3
  232
  11.61   3.39   Male   No   Sat   Dinner   2
  233
  10.77   1.47   Male   No   Sat   Dinner   2
  234
  15.53   3.00   Male   Yes   Sat   Dinner   2
  235
  10.07   1.25   Male   No   Sat   Dinner   2
  236
  12.60   1.00   Male   Yes   Sat   Dinner   2
  237
  32.83   1.17   Male   Yes   Sat   Dinner   2
  238
  35.83   4.67   Female   No   Sat   Dinner   3
  239
  29.03   5.92   Male   No   Sat   Dinner   3
  240
  27.18   2.00   Female   Yes   Sat   Dinner   2
  241
  22.67   2.00   Male   Yes   Sat   Dinner   2
  242
  17.82   1.75   Male   No   Sat   Dinner   2
  243
  18.78   3.00   Female   No   Thur   Dinner   2
  244 rows 7 columns 分组运算transform方法   In [28]: df = DataFrame(tips.groupby("sex")["tip"].mean()) df   Out[28]:   tip   sex   Male   3.089618   Female   2.833448   In [29]: new_tips = pd.merge(tips,df,left_on="sex",right_index=True) # 先聚合运算,然后再将其合并 new_tips.head()   Out[29]:   total_bill   tip_x   sex   smoker   day   time   size   tip_y   0
  16.99   1.01   Female   No   Sun   Dinner   2
  2.833448   4
  24.59   3.61   Female   No   Sun   Dinner   4
  2.833448   11
  35.26   5.00   Female   No   Sun   Dinner   4
  2.833448   14
  14.83   3.02   Female   No   Sun   Dinner   2
  2.833448   16
  10.33   1.67   Female   No   Sun   Dinner   3
  2.833448   In [32]: tips.groupby("sex")["tip"].transform("mean") # transform方法可以使运算分布到每一行   Out[32]: 0 2.833448 1 3.089618 2 3.089618 3 3.089618 4 2.833448 5 3.089618 6 3.089618 7 3.089618 8 3.089618 9 3.089618 10 3.089618 11 2.833448 12 3.089618 13 3.089618 14 2.833448 15 3.089618 16 2.833448 17 3.089618 18 2.833448 19 3.089618 20 3.089618 21 2.833448 22 2.833448 23 3.089618 24 3.089618 25 3.089618 26 3.089618 27 3.089618 28 3.089618 29 2.833448 ... 214 2.833448 215 2.833448 216 3.089618 217 3.089618 218 3.089618 219 2.833448 220 3.089618 221 2.833448 222 3.089618 223 2.833448 224 3.089618 225 2.833448 226 2.833448 227 3.089618 228 3.089618 229 2.833448 230 3.089618 231 3.089618 232 3.089618 233 3.089618 234 3.089618 235 3.089618 236 3.089618 237 3.089618 238 2.833448 239 3.089618 240 2.833448 241 3.089618 242 3.089618 243 2.833448 Name: tip, Length: 244, dtype: float64apply方法   In [10]: def top(x,n=5): return x.sort_values(by="tip",ascending=False)[-n:]   In [11]: tips.groupby("sex").apply(top)   Out[11]:   total_bill   tip   sex   smoker   day   time   size   sex   Male   43
  9.68   1.32   Male   No   Sun   Dinner   2
  235
  10.07   1.25   Male   No   Sat   Dinner   2
  75
  10.51   1.25   Male   No   Sat   Dinner   2
  237
  32.83   1.17   Male   Yes   Sat   Dinner   2
  236
  12.60   1.00   Male   Yes   Sat   Dinner   2
  Female   215
  12.90   1.10   Female   Yes   Sat   Dinner   2
  0
  16.99   1.01   Female   No   Sun   Dinner   2
  111
  7.25   1.00   Female   No   Sat   Dinner   1
  67
  3.07   1.00   Female   Yes   Sat   Dinner   1
  92
  5.75   1.00   Female   Yes   Fri   Dinner   2
  In [12]: tips.groupby("sex",group_keys=False).apply(top) # 希望返回的结果不以分组键为索引,通过group_keys=False可以完成   Out[12]:   total_bill   tip   sex   smoker   day   time   size   43
  9.68   1.32   Male   No   Sun   Dinner   2
  235
  10.07   1.25   Male   No   Sat   Dinner   2
  75
  10.51   1.25   Male   No   Sat   Dinner   2
  237
  32.83   1.17   Male   Yes   Sat   Dinner   2
  236
  12.60   1.00   Male   Yes   Sat   Dinner   2
  215
  12.90   1.10   Female   Yes   Sat   Dinner   2
  0
  16.99   1.01   Female   No   Sun   Dinner   2
  111
  7.25   1.00   Female   No   Sat   Dinner   1
  67
  3.07   1.00   Female   Yes   Sat   Dinner   1
  92
  5.75   1.00   Female   Yes   Fri   Dinner   2
  In [18]: data = { "name":["张三", "李四", "peter", "王五", "小明", "小红"], "sex":["female", "female", "male", "male","male","female"], "math":[67, 72, np.nan, 82, 90, np.nan] } df = DataFrame(data) df["math"] = df["math"] df   Out[18]:   math   name   sex   0
  67.0   张三   female   1
  72.0   李四   female   2
  NaN   peter   male   3
  82.0   王五   male   4
  90.0   小明   male   5
  NaN   小红   female   In [19]: df.fillna(df["math"].mean()) # 通过平均值对缺失值进行填充   Out[19]:   math   name   sex   0
  67.00   张三   female   1
  72.00   李四   female   2
  77.75   peter   male   3
  82.00   王五   male   4
  90.00   小明   male   5
  77.75   小红   female   In [20]: f = lambda x: x.fillna(x.mean()) # lambda匿名函数,分组后,再进行插值 df.groupby("sex").apply(f)   Out[20]:   math   name   sex   sex   female   0
  67.0   张三   female   1
  72.0   李四   female   5
  69.5   小红   female   male   2
  86.0   peter   male   3
  82.0   王五   male   4
  90.0   小明   male 数据透视表透视表   In [25]: tips.pivot_table? # 查询数据透视表帮助文档   In [22]: tips.pivot_table(values="tip",index="sex",columns="smoker") # value代表的是值,index为行,columns为例 # 计算为平均值(默认)   Out[22]:   smoker   Yes   No   sex   Male   3.051167   3.113402   Female   2.931515   2.773519   In [23]: tips.pivot_table(values="tip",index="sex",columns="smoker",aggfunc="sum") # aggfunc参数来指定计算方式   Out[23]:   smoker   Yes   No   sex   Male   183.07   302.00   Female   96.74   149.77   In [24]: tips.pivot_table(values="tip",index="sex",columns="smoker",aggfunc="sum",margins=True) #margins分项小计   Out[24]:   smoker   Yes   No   All   sex   Male   183.07   302.00   485.07   Female   96.74   149.77   246.51   All   279.81   451.77   731.58 交叉表交叉表是一种用于计算分组频率的特殊透视表   In [33]: cross_table = pd.crosstab(index=tips["day"],columns=tips["size"]) cross_table   Out[33]:   size   1
  2
  3
  4
  5
  6
  day   Thur   1
  48
  4
  5
  1
  3
  Fri   1
  16
  1
  1
  0
  0
  Sat   2
  53
  18
  13
  1
  0
  Sun   0
  39
  15
  18
  3
  1
  In [36]: df = cross_table.p(cross_table.sum(1),axis=0) # 通过p函数,可以使得每行的和为1,频率百分比 df   Out[36]:   size   1
  2
  3
  4
  5
  6
  day   Thur   0.016129   0.774194   0.064516   0.080645   0.016129   0.048387   Fri   0.052632   0.842105   0.052632   0.052632   0.000000   0.000000   Sat   0.022989   0.609195   0.206897   0.149425   0.011494   0.000000   Sun   0.000000   0.513158   0.197368   0.236842   0.039474   0.013158   In [37]: df.plot(kind="bar",stacked = True) # 柱形图通过stacked=True可以绘制堆积图   Out[37]:

新版的上海滩其实在2007年上映这部电视剧,这部电视是不是在1991年时候的人,是不是特别喜欢看这部电视剧呢,这部电视剧就是上海滩这部电视剧呢?接下来的话我们一起来看看这是位演员吧!第一位是黄为歌而赞吴莫愁从红极一时到无人问津?多年的误会终于解开说起吴莫愁这个名字,想必很多人都不陌生。2012年,吴莫愁参加浙江卫视歌唱选秀节目中国好声音的比赛,经过多轮淘汰赛后,最终获得中国好声音庾澄庆组冠军全国总决赛亚军。从此,吴莫愁这个闯关东里的鲜儿真的那么善良正直还是装的?在闯关东这部剧里最让我不理解的女性角色就是鲜儿,纵观全剧她表现得一直是善良,隐忍处处为他人着想,可是她真的有表现出来这么好吗?首先小编特别肯定鲜儿为了传文的付出,甚至是传文的再生父1987年,ampampquot独腿英雄ampampquot徐良凭血染的风采红遍全国,他后来怎样了经历战火,忍受残疾,主动要求入伍,走过青春,不一样的军旅生涯,有如英雄一般,出现在世人的眼底。1985年,徐良主动入伍,1986年,炮弹袭来,永远失去奔跑的权利,1987年,一首血刘丹首谈孙女移居内地读书,满脸疼爱露不舍,称刘恺威杨幂有计划饿了吗?戳右边关注我们,每天给您送上最新出炉的娱乐硬核大餐!6月29日,有媒体曝光了刘丹的近况,他在出席活动的时候,首次回应孙女小糯米是否会来内地上学一事,引起网友关注。刘丹穿着一人生赢家李小璐请三个保姆伺候,30岁身价上亿坐拥数栋豪宅座驾500万的劳斯莱斯,一场直播带货4700万,三个保姆贴身伺候,离婚后的李小璐为何这么壕?李小璐偶然的一条视频,竟无意间暴露出自己的富婆生活。只见视频中,一家人围在一起包饺子,已微信新功能,更实用了!微信聊天,最怕发图片电话多少,截图发你快递单号,截图发你见面地址,截图发你好看的网址,截图发你在微信最新版本,快速提取文字功能还能更方便更强大!直接摁住图片,选择提取文字,就可以将新能源充电桩作为新基建7大关键词之一,国家政策大力支持今年以来,我国新能源这个大行业正在迅速发展,国家政策的大力支持是一个接一个的,而电车的继续攻城掠地,带来的是充电站的运营收益也是蒸蒸日上!现在投资充电桩零几年投资房产起头资金小,最华为宣布新决定,鸿蒙3。0删除谷歌代码,再次为鸿蒙正名本文原创,禁止搬运和抄袭,违者必究!鸿蒙系统应用市场已经有几年时间了,但是很多人对鸿蒙系统依然抱有偏见,没有真正看清鸿蒙究竟是怎样的一款操作系统。只因为鸿蒙系统兼容安卓应用,所以觉性能号称超越4680,麒麟电池能帮助宁德时代继续引领动力电池江湖吗?日前,宁德时代麒麟电池发布,一段近4分钟的视频对新电池技术进行了讲述。虽然没有大规模新品发布会和媒体沟通会,略显低调的麒麟电池却引发了业内的高度关注。毕竟,前有刀片电池的市占率提升3款有口皆碑的直屏手机,价格很友善,都是天花板级别3款有口皆碑的直屏手机,价格很友善,都是天花板级别第一款realmeQ5Pro采用一块6。62英寸的三星E4直屏,支持120Hz智能四档电竞刷新率,720Hz超高触控报点率1000
癌症病人该怎么吃(3)癌细胞能被饿死吗?饮食影响癌症的方方面面,包括肿瘤的发生发展和对治疗的反应。据估计,多达三分之一的常见癌症是可以预防的,部分是通过饮食调整。此外,饮食对肿瘤进展有什么影响呢?如何将这些知识转化为抗癌热水比冷水结冰更快?这个曾入选十大科学骗局的现象正在被证实一杯热水,一杯冷水,把它们都放进冰箱,哪一个先结冰?常识告诉我们,冷水会先结冰。但包括亚里士多德勒内笛卡尔和弗朗西斯培根在内的很多杰出人士都观察到,实际上热水可能更快结冰。经验丰富云闪付APP一键查卡功能面向境内所有省市开放入境隔离改为73行程卡取消星号暑运航旅市场有望强势复苏频道财经来源北京青年报客户端日期202206291617文章摘要6月29日,工信部网宣布取消通信行程卡星号标记后,据去哪儿平台行程码摘星,餐饮旅游板块大涨!说走就走的旅行来了?本文共2000字阅读完约6分钟金融投资报记者薛蕾6月29日,工信部发文称,即日起将取消通信行程卡星号标记。这是继卫健委28日公布新型冠状病毒肺炎防控方案(第九版)中将入境者和密接者游云南正当时缤纷夏日待你来常常有人问什么时候去云南最合适?小布的回答是就!现!在!不同的时候来到云南你会寻觅到不一样的美好有清新奇妙有闲适淡然有超高饱和度的视觉冲击也有直击心灵的治愈瞬间云南已经做好全方位准江川荷花待放等你来入夏之后赏荷就成了令人期待的夏日主题在玉溪江川河咀社区荷塘里的荷花初醒含苞待放以迷人的姿态迎接游客的到来夏日,荷塘里荷叶层层升起,一枝枝花骨朵羞答答地从荷叶下冒了出来,一副蓄势待发酒店机票搜索热度猛增暑期旅游业有望迎来复苏央视网消息通信行程卡星号标记取消的消息发布后,6月29日下午,各大旅游平台上酒店机票的搜索热度立刻出现了明显上涨,暑期旅游业有望迎来复苏。消息发布后,携程平台上多个热门旅游目的地的NO。85如何做好文旅地产?这三个成功的产品模型不要错过(下)本文大约4372字,阅读需要5分钟编者话上篇(点此回看),作者重点解析了他提出的模型一全国性的旅游目的地全国性的房地产市场模型二全国性的旅游目的地区域性的文旅房地产市场,以及从模型在马六甲,看中国前主席和总理们参观过的古迹过去几百年,中国葡萄牙西班牙英国的航船都给马六甲(Malacca,马来语Melaka)送来过客人或殖民主。马六甲的古址遗迹保留得好,因此于2008年和槟城乔治市(GeorgeTow晚上十点后翻开我的生活日记透过车窗就能看到外面热热热闹闹,我下车的地方是个景区,小商贩手里拿着夜光棒花头饰,人群三三两两聊的甚欢,路边电动车骑车很拥挤,两个男孩坐在共享单车上,是在等人吧,如绕道华胥咥勾魂早上在鸡窝子吃过饭后,就打算游过净业寺后驱车绕道而行,到华胥镇美味卤肉店去咥蒜汁蘸面!好多年前被一帮同学带到华胥杏花谷阿氏村摘大黄杏吃,当时就在镇上美味卤肉店吃了一碗蒜汁蘸面,把人