python文本文件读取与存储
CSV文件的读取
In〔1〕:importcsv通过Python自带的csv库创建CSV文件fpopen(H:python数据分析数据ch4ex1。csv,w,newline)新建CSV文件writercsv。writer(fp)writer。writerow((id,name,grade))向CSV写入数据writer。writerow((1,lucky,87))writer。writerow((2,peter,92))writer。writerow((3,lili,85))fp。close()
In〔2〕:!typeH:python数据分析数据ch4ex1。csv通过!type方法查看数据,type方法只适用于Windows系统,UNIX系统使用!cat命令。id,name,grade1,lucky,872,peter,923,lili,85
In〔3〕:importpandasaspddfpd。readcsv(open(H:python数据分析数据ch4ex1。csv))使用readcsv函数读取CSV文件读取CSV文件时,如果文件路径中有中文,需要加open函数,否则会报错df
Out〔3〕:
id
name
grade
0hr1hrlucky
87hr1hr2hrpeter
92hr2hr3hrlili
85hrIn〔4〕:dfpd。readtable(open(H:python数据分析数据ch4ex1。csv),sep,)使用readtable进行读取CSV文件,指定分隔符即可df
Out〔4〕:
id
name
grade
0hr1hrlucky
87hr1hr2hrpeter
92hr2hr3hrlili
85hrIn〔5〕:dfpd。readcsv(open(H:python数据分析数据ch4ex1。csv),indexcolid)默认情况下,读取的DataFrame的行索引是从0开始进行计数通过indexcol参数指定id列为行索引df
Out〔5〕:
name
grade
id
1hrlucky
87hr2hrpeter
92hr3hrlili
85hrIn〔6〕:importcsv通过Python自带的csv库创建CSV文件fpopen(H:python数据分析数据ch4ex2。csv,w,newline)writercsv。writer(fp)writer。writerow((school,id,name,grade))写入数据writer。writerow((a,1,lucky,87))writer。writerow((a,2,peter,92))writer。writerow((a,3,lili,85))writer。writerow((b,1,coco,78))writer。writerow((b,2,kevin,87))writer。writerow((b,3,heven,96))fp。close()
In〔7〕:!typeH:python数据分析数据ch4ex2。csv查看数据school,id,name,gradea,1,lucky,87a,2,peter,92a,3,lili,85b,1,coco,78b,2,kevin,87b,3,heven,96
In〔8〕:dfpd。readcsv(open(H:python数据分析数据ch4ex2。csv),indexcol〔0,id〕)层次化索引,传入列编号或者列名组成的列表即可df
Out〔8〕:
name
grade
school
id
a
1hrlucky
87hr2hrpeter
92hr3hrlili
85hrb
1hrcoco
78hr2hrkevin
87hr3hrheven
96hrIn〔9〕:importcsv通过Python自带的csv库创建CSV文件fpopen(H:python数据分析数据ch4ex3。csv,w,newline)writercsv。writer(fp)writer。writerow((1,lucky,87))writer。writerow((2,peter,92))writer。writerow((3,lili,85))fp。close()
In〔10〕:!typeH:python数据分析数据ch4ex3。csv查看数据1,lucky,872,peter,923,lili,85
In〔12〕:dfpd。readcsv(open(H:python数据分析数据ch4ex3。csv))默认情况读取,会指定第一行为标题行df
Out〔12〕:
1hrlucky
87hr0hr2hrpeter
92hr1hr3hrlili
85hrIn〔13〕:dfpd。readcsv(open(H:python数据分析数据ch4ex3。csv),headerNone)通过header参数分配默认的标题行如果表头的type和csv内容的type相一致的时候,那么直接读取,会让第一行来当表头此时加headerNone,可以让第一行不当表头,而默认给0、1来当表头header这个属性是指,在不加headerNone这个属性所出来的数据的基础上,把那个数据的表头去掉,换成0开头的表头df
Out〔13〕:
0hr1hr2hr0hr1hrlucky
87hr1hr2hrpeter
92hr2hr3hrlili
85hrIn〔14〕:dfpd。readcsv(open(H:python数据分析数据ch4ex3。csv),names〔id,name,grade〕)通过names参数给其指定列名当设置了names属性之后,header无论设不设置,都会是Nonedf
Out〔14〕:
id
name
grade
0hr1hrlucky
87hr1hr2hrpeter
92hr2hr3hrlili
85hrIn〔15〕:importcsv通过Python自带的csv库创建CSV文件并写入数据fpopen(H:python数据分析数据ch4ex4。csv,w,newline)writercsv。writer(fp)writer。writerow(〔Thisisgrade〕)writer。writerow((id,name,grade))writer。writerow((1,lucky,87))writer。writerow((2,peter,92))writer。writerow((3,lili,85))writer。writerow(〔time〕)fp。close()
In〔16〕:!typeH:python数据分析数据ch4ex4。csv查看数据Thisisgradeid,name,grade1,lucky,872,peter,923,lili,85time
In〔17〕:dfpd。readcsv(open(H:python数据分析数据ch4ex4。csv),skiprows〔0,5〕)通过skiprows参数跳过一些行无论是带表头还是不带表头,skiprows2的效果,都是读第三行(也就是跳了两行读)df
Out〔17〕:
id
name
grade
0hr1hrlucky
87hr1hr2hrpeter
92hr2hr3hrlili
85hrIn〔19〕:dfpd。readcsv(open(H:python数据分析数据titanic。csv),nrows10)通过nrows参数,可以选择只读取部分行数据df
Out〔19〕:
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0hr1hr0hr3hrBraund,Mr。OwenHarris
male
22。0
1hr0hrA521171
7。2500
NaN
S
1hr2hr1hr1hrCumings,Mrs。JohnBradley(FlorenceBriggsTh。。。
female
38。0
1hr0hrPC17599
71。2833
C85
C
2hr3hr1hr3hrHeikkinen,Miss。Laina
female
26。0
0hr0hrSTONO2。3101282
7。9250
NaN
S
3hr4hr1hr1hrFutrelle,Mrs。JacquesHeath(LilyMayPeel)
female
35。0
1hr0hr113803hr53。1000
C123
S
4hr5hr0hr3hrAllen,Mr。WilliamHenry
male
35。0
0hr0hr373450hr8。0500
NaN
S
5hr6hr0hr3hrMoran,Mr。James
male
NaN
0hr0hr330877hr8。4583
NaN
Q
6hr7hr0hr1hrMcCarthy,Mr。TimothyJ
male
54。0
0hr0hr17463hr51。8625
E46
S
7hr8hr0hr3hrPalsson,Master。GostaLeonard
male
2。0
3hr1hr349909hr21。0750
NaN
S
8hr9hr1hr3hrJohnson,Mrs。OscarW(ElisabethVilhelminaBerg)
female
27。0
0hr2hr347742hr11。1333
NaN
S
9hr10hr1hr2hrNasser,Mrs。Nicholas(AdeleAchem)
female
14。0
1hr0hr237736hr30。0708
NaN
C
In〔20〕:dfpd。readcsv(open(H:python数据分析数据titanic。csv),nrows10,usecols〔Survived,Sex〕)通过usecols参数进行部分列的选取df
Out〔20〕:
Survived
Sex
0hr0hrmale
1hr1hrfemale
2hr1hrfemale
3hr1hrfemale
4hr0hrmale
5hr0hrmale
6hr0hrmale
7hr0hrmale
8hr1hrfemale
9hr1hrfemale
In〔21〕:dfpd。readcsv(open(H:python数据分析数据titanic。csv))在处理很大文件的时候,需要对文件进行逐块读取,首先通过info函数查看泰坦尼克号的生还者数据,共有891条数据df。info()classpandas。core。frame。DataFrameRangeIndex:891entries,0to890Datacolumns(total12columns):PassengerId891nonnullint64Survived891nonnullint64Pclass891nonnullint64Name891nonnullobjectSex891nonnullobjectAge714nonnullfloat64SibSp891nonnullint64Parch891nonnullint64Ticket891nonnullobjectFare891nonnullfloat64Cabin204nonnullobjectEmbarked889nonnullobjectdtypes:float64(2),int64(5),object(5)memoryusage:83。6KB
In〔22〕:chunkerpd。readcsv(open(H:python数据分析数据titanic。csv),chunksize100)通过chunksize参数,即可逐步读取文件设定读取的行数,返回一个固定行数的迭代器,每次读取只消耗相应行数对应的dataframe的内存,从而可以有效的解决内存消耗过多的问题chunker
Out〔22〕:pandas。io。parsers。TextFileReaderat0x96c3cf8
In〔23〕:dfpd。readcsv(open(H:python数据分析数据titanic。csv))df〔Sex〕。valuecounts()
Out〔23〕:male577female314Name:Sex,dtype:int64
In〔24〕:frompandasimportSeriesimportpandasaspdchunkerpd。readcsv(open(H:python数据分析数据titanic。csv),chunksize100)sexSeries(〔〕)foriinchunker:返回的是可迭代的TextFileReader。通过迭代,可以对Sex列进行计数sexsex。add(i〔Sex〕。valuecounts(),fillvalue0)sex
Out〔24〕:male577。0female314。0dtype:float64readcsvreadtable参数
TXT文件的读取
In〔25〕:fpopen(H:python数据分析数据ch4ex6。txt,a)创建TXT文件fp。writelines(id?name?grade)写入数据fp。writelines(1?lucky?87)fp。writelines(2?peter?92)fp。writelines(3?lili?85)fp。close()
In〔26〕:!typeH:python数据分析数据ch4ex6。txt查看数据id?name?grade1?lucky?872?peter?923?lili?85
In〔27〕:importpandasaspddfpd。readtable(open(H:python数据分析数据ch4ex6。txt),sep?)读取TXT文件通过readtable函数中的sep参数进行分隔符的指定df
Out〔27〕:
id
name
grade
0hr1hrlucky
87hr1hr2hrpeter
92hr2hr3hrlili
85hrIn〔28〕:!typeH:python数据分析数据ch4ex7。txt查看TXT文件,以空格隔开的文件idnamegrade1lucky872peter923lili85
In〔29〕:dfpd。readtable(open(H:python数据分析数据ch4ex7。txt),seps)正则表达式处理空格读取数据df
Out〔29〕:
id
name
grade
0hr1hrlucky
87hr1hr2hrpeter
92hr2hr3hrlili
85文本存储
In〔30〕:importpandasaspddfpd。readcsv(open(H:python数据分析数据ch4ex1。csv))df
Out〔30〕:
id
name
grade
0hr1hrlucky
87hr1hr2hrpeter
92hr2hr3hrlili
85hrIn〔31〕:利用DataFrame的tocsv方法,可以将数据存储到以逗号分隔的CSV文件中df。tocsv(H:python数据分析数据out1。csv)!typeH:python数据分析数据out1。csv,id,name,grade0,1,lucky,871,2,peter,922,3,lili,85
In〔32〕:通过sep参数指定存储的分隔符,默认情况下会存储行和列索引df。tocsv(H:python数据分析数据out2。csv,sep?)!typeH:python数据分析数据out2。csv?id?name?grade0?1?lucky?871?2?peter?922?3?lili?85
In〔33〕:通过设置index和header分别处理行和列索引df。tocsv(H:python数据分析数据out3。csv,indexFalse)!typeH:python数据分析数据out3。csvid,name,grade1,lucky,872,peter,923,lili,85