Python的pandas库基础知识(超详细教学)

Python的pandas库基础知识(超详细教学)

目录

一、配置环境

二、序列和数据表

2.1 初始化

2.2  获取数值

2.3 获取索引 

2.4 索引取内容

2.5 索引改变取值

2.6 字典生成序列

2.7 计算取值出现次数

2.8 数据表

 2.9 数据表添加新变量

2.10 获取列名

2.11 根据列名获取数据 

2.12 输出固定行

2.13 输出多行

2.14 输出指定行和列

2.15 输出性别为“男”的行和列

2.16 获取指定行

2.17 获取指定列

2.18 获取指定位置数据

2.19 索引转化 

2.20 判断条件

2.21 重新赋值

三、数据聚合和分组运算

3.1 获取数据集

3.2 读取数据集

3.3 计算每列均值

3.4 计算每列的最小值 

3.5 计算每列的最大值

3.6 计算每列的样本数量

3.7 行计算

3.8 分组计算均值

3.9 分组计算偏度

3.10 聚合运算

3.10.1 分组前

3.10.2 分组后

四、数据可视化

4.1 安装matplotlib库

 4.2 检测matplotlib库

4.3  箱线图

4.4 散点图

4.5 六边形热力图

4.6 折线图


        Pandas是Python中用于数据处理和分析的核心库,提供了快速、灵活且明确的数据结构,主要包括一维的Series和二维的DataFrame。它支持从CSV、Excel、SQL等多种数据源导入数据,并具备数据清洗、合并、重塑、分组统计、时间序列分析等功能。Pandas还易于与其他Python数据分析库集成,是金融、统计、社会科学和工程等领域进行数据分析和处理的强大工具。

一、配置环境

在命令行中运行以下命令:
pip show pandas
 如果为以下内容,则表示未安装pandas库
要安装Pandas库,你可以使用Python的包管理工具pip。在命令行界面(例如终端、命令提示符或Anaconda Prompt,取决于你的操作系统和Python安装方式)中,输入以下命令: 
pip install pandas
安装成功展示图: 

二、序列和数据表

2.1 初始化

         Series可以存储任何数据类型,例如整数、浮点数、字符串、python对象等,每个元素都有一个索引。
import pandas as pd A = pd.Series(data = [1, 2, 3, 4, 5], index = ["A", "B", "C", "D", "E"], name = "A1") print(A) 

2.2  获取数值

import pandas as pd A = pd.Series(data = [1, 2, 3, 4, 5], index = ["A", "B", "C", "D", "E"], name = "A1") print(A) print("数值:", A.values) 

2.3 获取索引 

import pandas as pd A = pd.Series(data = [1, 2, 3, 4, 5], index = ["A", "B", "C", "D", "E"], name = "A1") print(A) print("索引:", A.index) 

2.4 索引取内容

import pandas as pd A = pd.Series(data = [1, 2, 3, 4, 5], index = ["A", "B", "C", "D", "E"], name = "A1") print(A) print(A[["A", "C"]]) 

2.5 索引改变取值

import pandas as pd A = pd.Series(data = [1, 2, 3, 4, 5], index = ["A", "B", "C", "D", "E"], name = "A1") print(A) A[["A", "C"]] = [11, 12] print(A)

2.6 字典生成序列

import pandas as pd A = pd.Series({"A":1, "B":2, "C":3, "D":4}) print(A) 

2.7 计算取值出现次数

import pandas as pd A = pd.Series({"A":1, "B":2, "C":3, "D":4, "E":2, "F":3}) print(A.value_counts()) 

2.8 数据表

import pandas as pd A = {"name": ["小米", "小华", "小魅", "小破", "小领"], "age": ["20", "18", "16", "23", "19"], "sex": ["男", "男", "女", "男", "女"]} B = pd.DataFrame(A) print(B) 

 2.9 数据表添加新变量

import pandas as pd A = {"name": ["小米", "小华", "小魅", "小破", "小领"], "age": ["20", "18", "16", "23", "19"], "sex": ["男", "男", "女", "男", "女"]} B = pd.DataFrame(A) print(B) B["high"] = ["180", "183", "160", "178", "158"] print(B) 

2.10 获取列名

import pandas as pd A = {"name": ["小米", "小华", "小魅", "小破", "小领"], "age": ["20", "18", "16", "23", "19"], "sex": ["男", "男", "女", "男", "女"], "high": ["180", "183", "160", "178", "158"]} B = pd.DataFrame(A) print(B) print("数据表列名:", B.columns) 

2.11 根据列名获取数据 

import pandas as pd A = {"name": ["小米", "小华", "小魅", "小破", "小领"], "age": ["20", "18", "16", "23", "19"], "sex": ["男", "男", "女", "男", "女"], "high": ["180", "183", "160", "178", "158"]} B = pd.DataFrame(A) print(B) print(B[["name", "sex"]]) 

2.12 输出固定行

import pandas as pd A = {"name": ["小米", "小华", "小魅", "小破", "小领"], "age": ["20", "18", "16", "23", "19"], "sex": ["男", "男", "女", "男", "女"], "high": ["180", "183", "160", "178", "158"]} B = pd.DataFrame(A) print(B.loc[2]) 

2.13 输出多行

import pandas as pd A = {"name": ["小米", "小华", "小魅", "小破", "小领"], "age": ["20", "18", "16", "23", "19"], "sex": ["男", "男", "女", "男", "女"], "high": ["180", "183", "160", "178", "158"]} B = pd.DataFrame(A) print(B.loc[2 : 4]) 

2.14 输出指定行和列

import pandas as pd A = {"name": ["小米", "小华", "小魅", "小破", "小领"], "age": ["20", "18", "16", "23", "19"], "sex": ["男", "男", "女", "男", "女"], "high": ["180", "183", "160", "178", "158"]} B = pd.DataFrame(A) print(B.loc[2 : 4, ["name", "high"]]) 

2.15 输出性别为“男”的行和列

import pandas as pd A = {"name": ["小米", "小华", "小魅", "小破", "小领"], "age": ["20", "18", "16", "23", "19"], "sex": ["男", "男", "女", "男", "女"], "high": ["180", "183", "160", "178", "158"]} B = pd.DataFrame(A) print(B.loc[B.sex == "男", ["name", "sex"]]) 

2.16 获取指定行

import pandas as pd A = {"name": ["小米", "小华", "小魅", "小破", "小领"], "age": ["20", "18", "16", "23", "19"], "sex": ["男", "男", "女", "男", "女"], "high": ["180", "183", "160", "178", "158"]} B = pd.DataFrame(A) print(B.iloc[0 : 2]) 

2.17 获取指定列

import pandas as pd A = {"name": ["小米", "小华", "小魅", "小破", "小领"], "age": ["20", "18", "16", "23", "19"], "sex": ["男", "男", "女", "男", "女"], "high": ["180", "183", "160", "178", "158"]} B = pd.DataFrame(A) print(B.iloc[ : , 0 : 2]) 

2.18 获取指定位置数据

import pandas as pd A = {"name": ["小米", "小华", "小魅", "小破", "小领"], "age": ["20", "18", "16", "23", "19"], "sex": ["男", "男", "女", "男", "女"], "high": ["180", "183", "160", "178", "158"]} B = pd.DataFrame(A) print(B.iloc[0 : 2, 0 : 2]) 

2.19 索引转化 

import numpy as np import pandas as pd A = {"name": ["小米", "小华", "小魅", "小破", "小领"], "age": ["20", "18", "16", "23", "19"], "sex": ["男", "男", "女", "男", "女"], "high": ["180", "183", "160", "178", "158"]} B = pd.DataFrame(A) # 转换为列表 print(B.iloc[list(B.sex == "男"), 0 : 3]) # 转换为数组 print(B.iloc[np.array(B.sex == "男"), 0 : 3])

2.20 判断条件

import numpy as np import pandas as pd A = {"name": ["小米", "小华", "小魅", "小破", "小领"], "age": ["20", "18", "16", "23", "19"], "sex": ["男", "男", "女", "男", "女"], "high": ["180", "183", "160", "178", "158"]} B = pd.DataFrame(A) print(list(B.age >= "18"))

2.21 重新赋值

import numpy as np import pandas as pd A = {"name": ["小米", "小华", "小魅", "小破", "小领"], "age": ["20", "18", "16", "23", "19"], "sex": ["男", "男", "女", "男", "女"], "high": ["180", "183", "160", "178", "158"]} B = pd.DataFrame(A) B.high = ["179", "186", "168", "183", "160"] print(B)

  

三、数据聚合和分组运算

3.1 获取数据集

iris.csv(iris数据集、鸢尾花数据集)资源-ZEEKLOG文库https://download.ZEEKLOG.net/download/Z0412_J0103/90215255https://download.ZEEKLOG.net/download/Z0412_J0103/90215255

3.2 读取数据集

        鸢尾花数据集(Iris Dataset),又称安德森鸢尾花卉数据集(Anderson’s Iris Data Set),是数据科学与机器学习领域中最著名的经典数据集之一。

        鸢尾花数据集可以通过多种方式获取,如Scikit-learn提供的内置数据集,以及UCI机器学习库等。获取后,可以使用Python等编程语言进行数据加载、预处理和模型训练等操作。

        鸢尾花数据集以其简洁明了的数据结构和广泛的应用场景,成为了机器学习初学者的首选案例。通过学习和实践这一数据集,初学者可以逐步掌握机器学习的基础知识和技能。
import numpy as np import pandas as pd iris = pd.read_csv("D:/iris.csv") print(iris.head()) 

3.3 计算每列均值

import numpy as np import pandas as pd iris = pd.read_csv("D:/iris.csv") print(iris.iloc[ : , 1 : 5].apply(func = np.mean, axis = 0)) 

3.4 计算每列的最小值 

import numpy as np import pandas as pd iris = pd.read_csv("D:/iris.csv") min = iris.iloc[ : , 1 : 5].apply(func = np.min , axis = 0) print(min)

3.5 计算每列的最大值

import numpy as np import pandas as pd iris = pd.read_csv("D:/iris.csv") max = iris.iloc[ : , 1 : 5].apply(func = np.max , axis = 0) print(max)

3.6 计算每列的样本数量

import numpy as np import pandas as pd iris = pd.read_csv("D:/iris.csv") size = iris.iloc[ : , 1 : 5].apply(func = np.size , axis = 0) print(size)

3.7 行计算

只展示前五行 

其中代码的axis=0要改成axis=1
import numpy as np import pandas as pd iris = pd.read_csv("D:/iris.csv") data = iris.iloc[0 : 5, 1 : 5].apply(func = (np.min, np.max, np.mean, np.std, np.var) , axis = 1) print(data)

3.8 分组计算均值

import numpy as np import pandas as pd iris = pd.read_csv("D:/iris.csv") res = iris.drop("Id", axis = 1).groupby(by = "Species").mean() print(res)

3.9 分组计算偏度

import numpy as np import pandas as pd iris = pd.read_csv("D:/iris.csv") res = iris.drop("Id", axis = 1).groupby(by = "Species").skew() print(res)

3.10 聚合运算

3.10.1 分组前

import numpy as np import pandas as pd iris = pd.read_csv("D:/iris.csv") res = iris.drop("Id", axis = 1).agg({"SepalLengthCm" : ["min", "max", "mean"], "SepalWidthCm" : ["min", "max", "mean"], "PetalLengthCm" : ["min", "max", "mean"]}) print(res)

3.10.2 分组后

import numpy as np import pandas as pd iris = pd.read_csv("D:/iris.csv") res = (iris.drop("Id", axis = 1).groupby(by = "SepalLengthCm") .agg({"SepalLengthCm" : ["min", "max", "mean"], "SepalWidthCm" : ["min"], "PetalLengthCm" : ["skew"]})) print(res)

四、数据可视化

   Mtplotlib是Python中一个广泛使用的绘图库,它提供了一个类似于MATLAB的绘图框架。Mtplotlib可以生成高质量的图表,这些图表可以用于数据可视化、科学研究、教育以及出版等领域。 

4.1 安装matplotlib库

pip install matplotlib
安装成功展示图: 

 4.2 检测matplotlib库

pip show matplotlib

4.3  箱线图

import numpy as np import pandas as pd from matplotlib import pyplot as plt iris = pd.read_csv("D:/iris.csv") iris.iloc[ : , 1 : 6].boxplot(column = ["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"], by = "Species", figsize=(10,10)) plt.show() 

4.4 散点图

import numpy as np import pandas as pd from matplotlib import pyplot as plt iris = pd.read_csv("D:/iris.csv") color = iris.Species.map({"setosa" : "blue", "versicolor" : "green", "virginica" : "red"}) iris.plot(kind = "scatter" , x = "SepalLengthCm", y = "SepalWidthCm", s = 30, c = color, figsize = (10,10)) plt.show()

4.5 六边形热力图

import numpy as np import pandas as pd from matplotlib import pyplot as plt iris = pd.read_csv("D:/iris.csv") iris.plot(kind = "hexbin" , x = "SepalLengthCm", y = "SepalWidthCm", gridsize = 15, figsize = (10,7), sharex = False) plt.show()

4.6 折线图

import numpy as np import pandas as pd from matplotlib import pyplot as plt iris = pd.read_csv("D:/iris.csv") iris.iloc[ : , 0 : 5].plot(kind = "line", x = "Id", figsize = (12, 8)) plt.show()

上一篇文章:Python的Numpy库应用入门(超详细教程)-ZEEKLOG博客https://blog.ZEEKLOG.net/Z0412_J0103/article/details/144840505https://blog.ZEEKLOG.net/Z0412_J0103/article/details/144840505下一篇文章: Python的Matplotlib库应用(超详细教程)-ZEEKLOG博客https://blog.ZEEKLOG.net/Z0412_J0103/article/details/144900714https://blog.ZEEKLOG.net/Z0412_J0103/article/details/144900714

Read more

前端高频面试题:TypeScript 篇(2026 最新版)

前端高频面试题:TypeScript 篇(2026 最新版) TypeScript(TS)已成为现代前端开发的标配,尤其在 React、Vue、Angular 等框架中,几乎是大厂必考点。2026 年面试趋势:更注重类型安全、高级类型工具、实际项目应用和tsconfig 配置。以下精选 20+ 高频题(基于最新大厂真题汇总),分为基础、中级、高级,并附详细解答和代码示例。建议结合项目实战记忆! 基础篇(必背,考察理解 TS 核心价值) 1. 什么是 TypeScript?它与 JavaScript 的区别是什么? TypeScript 是 JavaScript 的超集(superset),由 Microsoft 开发,最终编译成纯 JS

By Ne0inhk

前端多版本零404部署实践:为什么会404,以及怎么彻底解决

这是一篇给“小白也能看懂”的实践文:讲清现象、根因、方案选择与我们的落地实现。 1. 现象:为什么发布新版本后会出现 404? 一个真实场景: * 10:00 用户打开了你的网页(加载的是 v1.0.4 的 HTML) * 10:10 你发布了 v1.0.5 * 用户没有刷新页面,继续点击某个功能 * 页面尝试按旧 HTML 里的地址加载某个 chunk:/assets/pages-about-about.DK5VADjQ.js * 服务器上只剩 v1.0.5 的文件,旧的被删了 → 直接 404 关键点: * HTML 决定了要加载哪些 JS/CSS(包含具体

By Ne0inhk

openclaw 钉钉 Webhook 完全指南

📮 钉钉 Webhook 完全指南 整理者:✨ 小琳 | 更新于 2026-02-05 一、基础知识 Webhook vs 插件 方式优点缺点OpenClaw 插件集成简单,双向通信只能回复,不能主动发Webhook 机器人支持主动推送,格式丰富单向,需要自己处理签名 结论:需要主动推送消息时,用 Webhook。 消息格式支持 格式插件Webhook纯文本✅✅Markdown✅✅链接卡片❌✅按钮卡片❌✅@ 用户❌✅ 二、@ 用户功能 核心原理 两个地方必须同时设置: 1. 消息内容中包含 @手机号 或 @所有人 2. JSON 的 at 字段中指定 atMobiles 或 isAtAll 缺一不可! JSON 示例 @ 所有人:

By Ne0inhk
部署OpenClaw首选远程软件——UU远程:从准备到落地,新手也能轻松上手

部署OpenClaw首选远程软件——UU远程:从准备到落地,新手也能轻松上手

前言 在企业为客户远程部署、技术博主带粉丝实操教学、远程技术支持等真实场景中,稳定、低延迟、高同步的远程工具是完成 AI 工具部署的关键。本地部署无需依赖云服务器,成本更低、更安全,但传统远程软件往往延迟高、操作卡顿,严重影响部署效率与体验。 本文将以OpenClaw轻量 AI 辅助服务工具为部署对象,全程依托网易 UU 远程实现流畅远程控制与协助,详细讲解网易 UU 远程的核心优势,从 UU 远程环境准备、OpenClaw 远程部署,到基于网易UU远程的实时监视 OpenClaw 状态,零门槛、无复杂配置。借助网易 UU 远程的低延迟与高稳定性,企业可高效为客户远程交付,博主可轻松带粉丝同步实操,新手也能跟着完整落地。 本篇文章分别从准备工作、远程部署、远程监视三个维度进行实操教学,一步步拆解如何运用远程UU进行远程部署openclaw。 一、网易UU远程介绍 网易UU远程是网易出品的一款轻量化、零配置、高稳定的远程控制工具,区别于传统远程工具(

By Ne0inhk