笔记 - 全栈学习笔记123

二、pandas模块

2.1 一维Series

Series创建

from pandas import Series

# 用一维数组做data参数
Series(data=[1,2,3,'四'])
"""
0    1
1    2
2    3
3    四
dtype: object
"""
# index参数设置显式索引
Series(data=[1,2,3,'四'],index=['a','b','c','d'])  
"""
a    1
b    2
c    3
d    四
dtype: object
"""

# 用numpy一维数组做data参数
Series(data=np.random.randint(0,100,size=(3,)))  
"""
0    99
1    76
2    23
dtype: int64
"""

# 用字典做data参数
dic = {
    '语文':100,
    '数学':99,
}
s = Series(data=dic)

索引、切片
```
s[0]
s.语文
s[0:2]
```

常用属性

s.shape  # 形状：(2,)
s.size  # 大小：2
s.index  # 索引：Index(['语文', '数学'], dtype='object')
s.values  # 值：array([100,  99])
s.dtype  # 元素类型：dtype('int64')

常用方法

s.head(3)  # 前3个元素
s.tail(3)  # 后3个元素
s.unique()  # 去除重复
s.isnull()  # 判断每一个元素是否为空
"""
a    False
b     True
c    False
d     True
dtype: bool
"""
s.notnull()  # 判断每一个元素是否为非空

s1 = Series(data=[1,2,3],index=['a','b','c'])
s2 = Series(data=[1,2,3],index=['a','d','c'])
s = s1 + s2  # 索引一致的元素进行加法运算，否则为NaN
s = s1.add(s2)  # 调用add方法进行加法运算
"""
a    2.0
b    NaN
c    6.0
d    NaN
dtype: float64
"""

2.2 二维DataFrame

DataFrame创建

from pandas import DataFrame

# 用二维数组做data参数
DataFrame(data=[[1,2,3],[4,5,6,7]])
"""
	0	1	2	3
0	1	2	3	NaN
1	4	5	6	7.0
"""
# 用numpy二维数组做data参数
DataFrame(data=np.random.randint(0,100,size=(4,6)))

# 字典：key为列索引，values为行索引
dic2 = {
    '小明':[100,99], '小红':[90,80], '小强':[88,70],
}
df = DataFrame(data=dic2,index=['语文','数学'])
"""
		 小明	小红 小强
语文	100	90	88
数学	99	80	70
"""

常用属性

df.shape  # (2,2)
df.size  # 4
df.index  # 行索引：Index(['语文', '数学'], dtype='object')
df.columns  # 列索引：Index(['小明', '小红', '小强'], dtype='object')
df.values  # 值
"""
array([[100,  90],
       [ 99,  80]])
"""

索引、切片

# 隐式索引：iloc
df.iloc[0]  # 取单行
df.iloc[[1, 2, 3 ]]  # 取多行
df.iloc[1, 2]  # 取单个元素：df.iloc[行，列]

# 显式索引（只能取列）
df['小明']  # 取单列
df[['小明', '小红']]  # 取多列

# 显示索引：loc（只能取行）
df.loc['语文']  # 取单行
df.loc[['语文', '数学']]  # 取多行

df.loc['数学', '小明']  # 取单个元素：df.loc[行，列]


# 切片
df[0:2]  # 前两行
df.iloc[:, 0:2]  # 前两列

运算

# 同Series，索引一致的元素进行运算，否则为NaN

时间格式化

import pandas

df['time'] = pandas.to_datetime(df['time'])

将某一列元素做为行索引

# 不改变原始数据
df.set_index('time')

# 改变原始数据
df.set_index('time',inplace=True)

二、pandas模块

2.1 一维Series

2.2 二维DataFrame

二、pandas模块