pandas的快速入门
创建对象
利用Series函数创建list数据import pandas as pd import numpy as np import matplotlib.pyplot as plt s=pd.Series([1,3,5,np.nan,6,8]) #自动设置索引为整型 print(s)
运行结果:0 1.0
1 3.0
2 5.0
3 NaN
4 6.0
5 8.0
dtype: float64
通过date_range()函数创建时间序列对象import pandas as pd import numpy as np dates=pd.date_range("20220313",periods=6) #生成行标为日期的对象 print(dates)
运行结果DatetimeIndex(["2022-03-13", "2022-03-14", "2022-03-15", "2022-03-16",
"2022-03-17", "2022-03-18"],
dtype="datetime64[ns]", freq="D")
通过pd.DataFrame()函数生成对象import pandas as pd import numpy as np s=pd.DataFrame(np.random.randn(6,4),index=pd.date_range("20220313",periods=6),columns=list("ABCD")) #三个参数,对象值二维数组,行标签是日期,列标签A、B、C、D print(s)
运行结果:A B C D
2022-03-13 0.091190 -0.004746 -0.239654 -0.884793
2022-03-14 -0.600911 -0.993399 1.748447 -1.762878
2022-03-15 1.002787 -1.243758 1.828056 -1.330431
2022-03-16 1.615258 0.074866 0.539540 -0.704944
2022-03-17 -1.337127 0.842423 0.496022 0.106319
2022-03-18 -1.612054 -1.595551 -0.797913 0.197910
通过字典(dict)创建对象import pandas as pd import numpy as np df=pd.DataFrame({ "A":1.0, "B":pd.Timestamp("20130102"), "C":pd.Series(1,index=list(range(4)),dtype="float32"), "D":np.array([3]*4,dtype="int32"), "E":pd.Categorical(["test","trail","test","trail"]), "F":"foo" }) print(df)
运行结果:A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 trail foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 trail foo
查看列索引的数据类型-df.dtypesimport pandas as pd import numpy as np df=pd.DataFrame({ "A":1.0, "B":pd.Timestamp("20130102"), "C":pd.Series(1,index=list(range(4)),dtype="float32"), "D":np.array([3]*4,dtype="int32"), "E":pd.Categorical(["test","trail","test","trail"]), "F":"foo" }) # 输出列的数据类型 print(df.dtypes)
输出结果:A float64
B datetime64[ns]
C float32
D int32
E category
F object
dtype: object
查看数据
查看前五行数据-df.head()import pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(8,6),index=pd.date_range("20220319",periods=8),columns=list("ABCDEF")) # 输出前五行 print(df.head())
输出结果:A B C D E F
2022-03-19 2.078983 -1.381038 0.829121 -0.140437 -2.208191 -1.693364
2022-03-20 1.894550 0.081164 -0.031612 0.034612 -0.100628 0.134954
2022-03-21 0.464486 -0.119990 -0.825946 1.712146 -0.909462 -0.378363
2022-03-22 -0.160688 -0.535295 1.126277 -0.523524 0.673785 -1.012944
2022-03-23 1.419877 0.994878 -1.384770 -0.242714 0.705876 -0.960672
查看最后五行-df.tail()import pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(8,6),index=pd.date_range("20220319",periods=8),columns=list("ABCDEF")) print(df.tail())
输出结果A B C D E F
2022-03-22 -1.932294 1.902191 0.826973 -0.857617 -0.534688 -1.663434
2022-03-23 0.534020 0.264641 -0.308872 -1.244114 -0.428450 1.123675
2022-03-24 -0.862361 -0.169321 -0.094426 -0.636117 1.344678 -0.639293
2022-03-25 0.204781 -0.117539 -0.268759 0.147306 1.026043 -0.024996
2022-03-26 0.276475 0.254426 -0.786022 -2.055175 -0.020681 1.618434
查看索引-df.indeximport pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(8,6),index=pd.date_range("20220319",periods=8),columns=list("ABCDEF")) # 输出行索引 print(df.index)
运行结果DatetimeIndex(["2022-03-19", "2022-03-20", "2022-03-21", "2022-03-22",
"2022-03-23", "2022-03-24", "2022-03-25", "2022-03-26"],
dtype="datetime64[ns]", freq="D")
查看列-df.columnsimport pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(8,6),index=pd.date_range("20220319",periods=8),columns=list("ABCDEF")) # 输出列索引 print(df.columns)
运行结果
Index(["A", "B", "C", "D", "E", "F"], dtype="object")
查看所有对象的值(列表形式)-df.valuesimport pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(8,6),index=pd.date_range("20220319",periods=8),columns=list("ABCDEF")) # 输出对象的值 print(df.values)
运行结果:[[ 0.89753168 -0.4647323 -0.66857853 -0.60079733 1.11180881 -0.72215369]