pythonpandas提取列索引(Python3.5 Pandas模块缺失值处理和层次索引实例详解)
pythonpandas提取列索引
Python3.5 Pandas模块缺失值处理和层次索引实例详解本文实例讲述了python3.5 pandas模块缺失值处理和层次索引。分享给大家供大家参考,具体如下:
1、pandas缺失值处理
|
import numpy as np import pandas as pd from pandas import series,dataframe df3 = dataframe([ [ "tom" ,np.nan, 456.67 , "m" ], [ "merry" , 34 , 345.56 ,np.nan], [np.nan,np.nan,np.nan,np.nan], [ "john" , 23 ,np.nan, "m" ], [ "joe" , 18 , 385.12 , "f" ] ],columns = [ "name" , "age" , "salary" , "gender" ]) print (df3) print ( "=======判断nan值=======" ) print (df3.isnull()) print ( "=======判断非nan值=======" ) print (df3.notnull()) print ( "=======删除包含nan值的行=======" ) print (df3.dropna()) print ( "=======删除全部为nan值的行=======" ) print (df3.dropna(how = "all" )) df3.ix[ 2 , 0 ] = "gerry" #修改第2行第0列的值 print (df3) print ( "=======删除包含nan值的列=======" ) print (df3.dropna(axis = 1 )) |
运行结果:
name age salary gender
0 tom nan 456.67 m
1 merry 34.0 345.56 nan
2 nan nan nan nan
3 john 23.0 nan m
4 joe 18.0 385.12 f
=======判断nan值=======
name age salary gender
0 false true false false
1 false false false true
2 true true true true
3 false false true false
4 false false false false
=======判断非nan值=======
name age salary gender
0 true false true true
1 true true true false
2 false false false false
3 true true false true
4 true true true true
=======删除包含nan值的行=======
name age salary gender
4 joe 18.0 385.12 f
=======删除全部为nan值的行=======
name age salary gender
0 tom nan 456.67 m
1 merry 34.0 345.56 nan
3 john 23.0 nan m
4 joe 18.0 385.12 f
name age salary gender
0 tom nan 456.67 m
1 merry 34.0 345.56 nan
2 gerry nan nan nan
3 john 23.0 nan m
4 joe 18.0 385.12 f
=======删除包含nan值的列=======
name
0 tom
1 merry
2 gerry
3 john
4 joe
|
import numpy as np import pandas as pd from pandas import series,dataframe df4 = dataframe(np.random.randn( 7 , 3 )) print (df4) df4.ix[: 4 , 1 ] = np.nan #第0至3行,第1列的数据 df4.ix[: 2 , 2 ] = np.nan print (df4) print (df4.fillna( 0 )) #将缺失值用传入的指定值0替换 print (df4.fillna({ 1 : 0.5 , 2 : - 1 })) #将缺失值按照指定形式填充 |
运行结果:
0 1 2
0 -0.737618 -0.530302 -2.716457
1 0.810339 0.063028 -0.341343
2 0.070564 0.347308 -0.121137
3 -0.501875 -1.573071 -0.816077
4 -2.159196 -0.659185 -0.885185
5 0.175086 -0.954109 -0.758657
6 0.395744 -0.875943 0.950323
0 1 2
0 -0.737618 nan nan
1 0.810339 nan nan
2 0.070564 nan nan
3 -0.501875 nan -0.816077
4 -2.159196 nan -0.885185
5 0.175086 -0.954109 -0.758657
6 0.395744 -0.875943 0.950323
0 1 2
0 -0.737618 0.000000 0.000000
1 0.810339 0.000000 0.000000
2 0.070564 0.000000 0.000000
3 -0.501875 0.000000 -0.816077
4 -2.159196 0.000000 -0.885185
5 0.175086 -0.954109 -0.758657
6 0.395744 -0.875943 0.950323
0 1 2
0 -0.737618 0.500000 -1.000000
1 0.810339 0.500000 -1.000000
2 0.070564 0.500000 -1.000000
3 -0.501875 0.500000 -0.816077
4 -2.159196 0.500000 -0.885185
5 0.175086 -0.954109 -0.758657
6 0.395744 -0.875943 0.950323
2、pandas常用数学统计方法
|
import numpy as np import pandas as pd from pandas import series,dataframe #pandas常用数学统计方法 arr = np.array([ [ 98.5 , 89.5 , 88.5 ], [ 98.5 , 85.5 , 88 ], [ 70 , 85 , 60 ], [ 80 , 85 , 82 ] ]) df1 = dataframe(arr,columns = [ "语文" , "数学" , "英语" ]) print (df1) print ( "=======针对列计算总统计值=======" ) print (df1.describe()) print ( "=======默认计算各列非nan值个数=======" ) print (df1.count()) print ( "=======计算各行非nan值个数=======" ) print (df1.count(axis = 1 )) |
运行结果:
语文 数学 英语
0 98.5 89.5 88.5
1 98.5 85.5 88.0
2 70.0 85.0 60.0
3 80.0 85.0 82.0
=======针对列计算总统计值=======
语文 数学 英语
count 4.000000 4.000000 4.000000
mean 86.750000 86.250000 79.625000
std 14.168627 2.179449 13.412525
min 70.000000 85.000000 60.000000
25% 77.500000 85.000000 76.500000
50% 89.250000 85.250000 85.000000
75% 98.500000 86.500000 88.125000
max 98.500000 89.500000 88.500000
=======默认计算各列非nan值个数=======
语文 4
数学 4
英语 4
dtype: int64
=======计算各行非nan值个数=======
0 3
1 3
2 3
3 3
dtype: int64
|
import numpy as np import pandas as pd from pandas import series,dataframe、 #2.pandas相关系数与协方差 df2 = dataframe({ "gdp" :[ 12 , 23 , 34 , 45 , 56 ], "air_temperature" :[ 23 , 25 , 26 , 27 , 30 ], "year" :[ "2001" , "2002" , "2003" , "2004" , "2005" ] }) print (df2) print ( "=========相关系数========" ) print (df2.corr()) print ( "=========协方差========" ) print (df2.cov()) print ( "=========两个量之间的相关系数========" ) print (df2[ "gdp" ].corr(df2[ "air_temperature" ])) print ( "=========两个量之间协方差========" ) print (df2[ "gdp" ].cov(df2[ "air_temperature" ])) |
运行结果:
gdp air_temperature year
0 12 23 2001
1 23 25 2002
2 34 26 2003
3 45 27 2004
4 56 30 2005
=========相关系数========
gdp air_temperature
gdp 1.000000 0.977356
air_temperature 0.977356 1.000000
=========协方差========
gdp air_temperature
gdp 302.5 44.0
air_temperature 44.0 6.7
=========两个量之间的相关系数========
0.97735555485
=========两个量之间协方差========
44.0
|
import numpy as np import pandas as pd from pandas import series,dataframe #3.pandas唯一值、值计数及成员资格 df3 = dataframe({ "order_id" :[ "1001" , "1002" , "1003" , "1004" , "1005" ], "member_id" :[ "m01" , "m01" , "m02" , "m01" , "m02" ,], "order_amt" :[ 345 , 312.2 , 123 , 250.2 , 235 ] }) print (df3) print ( "=========去重后的数组=========" ) print (df3[ "member_id" ].unique()) print ( "=========值出现的频率=========" ) print (df3[ "member_id" ].value_counts()) print ( "=========成员资格=========" ) df3 = df3[ "member_id" ] mask = df3.isin([ "m01" ]) print (mask) print (df3[mask]) |
运行结果:
member_id order_amt order_id
0 m01 345.0 1001
1 m01 312.2 1002
2 m02 123.0 1003
3 m01 250.2 1004
4 m02 235.0 1005
=========去重后的数组=========
['m01' 'm02']
=========值出现的频率=========
m01 3
m02 2
name: member_id, dtype: int64
=========成员资格=========
0 true
1 true
2 false
3 true
4 false
name: member_id, dtype: bool
0 m01
1 m01
3 m01
name: member_id, dtype: object
3、pandas层次索引
|
import numpy as np import pandas as pd from pandas import series,dataframe #3.pandas层次索引 data = series([ 998.4 , 6455 , 5432 , 9765 , 5432 ], index = [[ "2001" , "2001" , "2001" , "2002" , "2002" ], [ "苹果" , "香蕉" , "西瓜" , "苹果" , "西瓜" ]] ) print (data) df4 = dataframe({ "year" :[ 2001 , 2001 , 2002 , 2002 , 2003 ], "fruit" :[ "apple" , "banana" , "apple" , "banana" , "apple" ], "production" :[ 2345 , 5632 , 3245 , 6432 , 4532 ], "profits" :[ 245.6 , 432.7 , 534.1 , 354 , 467.8 ] }) print (df4) print ( "=======层次化索引=======" ) df4 = df4.set_index([ "year" , "fruit" ]) print (df4) print ( "=======依照索引取值=======" ) print (df4.ix[ 2002 , "apple" ]) print ( "=======依照层次化索引统计数据=======" ) print (df4. sum (level = "year" )) print (df4.mean(level = "fruit" )) print (df4. min (level = [ "year" , "fruit" ])) |
运行结果:
2001 苹果 998.4
香蕉 6455.0
西瓜 5432.0
2002 苹果 9765.0
西瓜 5432.0
dtype: float64
fruit production profits year
0 apple 2345 245.6 2001
1 banana 5632 432.7 2001
2 apple 3245 534.1 2002
3 banana 6432 354.0 2002
4 apple 4532 467.8 2003
=======层次化索引=======
production profits
year fruit
2001 apple 2345 245.6
banana 5632 432.7
2002 apple 3245 534.1
banana 6432 354.0
2003 apple 4532 467.8
=======依照索引取值=======
production 3245.0
profits 534.1
name: (2002, apple), dtype: float64
=======依照层次化索引统计数据=======
production profits
year
2001 7977 678.3
2002 9677 888.1
2003 4532 467.8
production profits
fruit
apple 3374 415.833333
banana 6032 393.350000
production profits
year fruit
2001 apple 2345 245.6
banana 5632 432.7
2002 apple 3245 534.1
banana 6432 354.0
2003 apple 4532 467.8
希望本文所述对大家python程序设计有所帮助。
原文链接:https://blog.csdn.net/loveliuzz/article/details/78498121
- 如何查看python是否安装selenium(selenium+python截图不成功的解决方法)
- python函数大全详细(详解Python函数式编程—高阶函数)
- python技巧图解(Python魔法方法功能与用法简介)
- python快速数据分类(Python基于滑动平均思想实现缺失数据填充的方法)
- python高德地图可视化(利用python和百度地图API实现数据地图标注的方法)
- python单例编程(浅谈Python反射 & 单例模式)
- python pandas dataframe 查询(Python实现从SQL型数据库读写dataframe型数据的方法基于pandas)
- python实用教程(Python简直是万能的,这5大主要用途你一定要知道!推荐)
- python中创建类的方法(Python中如何导入类示例详解)
- python最基本的编程工具(5款Python程序员高频使用开发工具推荐)
- pythonmatplotlib实例(Python3使用Matplotlib 绘制精美的数学函数图形)
- python编程将一个三位数反序输出(python实现整数的二进制循环移位)
- python数据表教程(详解Python sys.argv使用方法)
- python编写一个名片(详解Python做一个名片管理系统)
- python实现将txt转化为excel(python实现Excel文件转换为TXT文件)
- pythonselenium接口自动测试(python3+selenium自动化测试框架详解)
- 苏志燮赵恩静结婚,韩国四大公共财产变三人,这么快就有替补了(苏志燮赵恩静结婚)
- 《内在美》后,一大波新韩剧来袭,李钟硕朴信惠宋慧乔玄彬回归(一大波新韩剧来袭)
- 给孩子选购保温杯,注意这4个步骤,比颜值更重要(给孩子选购保温杯)
- 保温好 容量大 颜值高 保温杯你给娃娃买对了吗(保温好容量大颜值高)
- 《道德经》 人生避开骄狂,才能免去祸患(道德经人生避开骄狂)
- 郭麒麟(郭麒麟)
热门推荐
- sql字母通配符(详解SQL 通配符)
- mysql怎么查看表的字段(Mysql 获取表的comment 字段操作)
- sqlserver怎么手动添加数据库表(SQL Server 数据库调整表中列的顺序操作方法及遇到问题)
- SQL Server作业
- python 写入d盘文件(python文件写入write的操作)
- 怎么用powershell执行代码(如何在power shell添加vim实现代码示例)
- js怎么做一个计时器(JavaScript实现简单计时器)
- 宝塔部署ssl证书(宝塔面板设置SSL并开启HTTPS的方法)
- dede标签使用(解决{dede:arclist keyword='动态获取关键词'})
- 织梦dedecms当前栏目页面样式(织梦dedecms二次开发之install安装改动攻略)
排行榜
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9