python数据可视化(3)——绘制漏斗图

课程学习来源:b站up:【蚂蚁学python】
【课程链接:【【数据可视化】Python数据图表可视化入门到实战】
【课程资料链接:【链接】】

Python绘制电商网站转化漏斗图

漏斗图:适用于业务流程环节多的流程分析,通过漏斗各环节业务数据的比较,能够直观地发现问题所在

实例:数据来自kaggle网站的"E-commerce website Funnel analysis’
地址为:https://www.kaggle.com/aerodinamicc/ecommerce-website-funnel-analysis

网站很简单,有四个页面数据:
首页用户访问数据

  1. home page table.csv,
  2. 2.search page table.csv,搜索页用户访问数据
  3. payment page table.csv,支付信息页用户访问数据
  4. payment confirmation table.csv,支付成功页用户访问数据
  5. user table.csv,用户信息数据

目标:绘制转化漏斗,查看是否正常

1.读取数据

import pandas as pd

df_home_page = pd.read_csv("../DATA_POOL/PY_DATA/ant-learn-visualization-master/datas/ecommerce-website-funnel-analysis/home_page_table.csv")
df_search_page = pd.read_csv("../DATA_POOL/PY_DATA/ant-learn-visualization-master/datas/ecommerce-website-funnel-analysis/search_page_table.csv")
df_payment_page = pd.read_csv("../DATA_POOL/PY_DATA/ant-learn-visualization-master/datas/ecommerce-website-funnel-analysis/payment_page_table.csv")
df_payment_confirmation_page = pd.read_csv("../DATA_POOL/PY_DATA/ant-learn-visualization-master/datas/ecommerce-website-funnel-analysis/payment_confirmation_table.csv")
df_user_table = pd.read_csv("../DATA_POOL/PY_DATA/ant-learn-visualization-master/datas/ecommerce-website-funnel-analysis/user_table.csv")

查看数据

df_home_page.head(3)
user_id page
0 313593 home_page
1 468315 home_page
2 264005 home_page
df_search_page.head(3)
user_id page
0 15866 search_page
1 347058 search_page
2 577020 search_page
df_payment_page.head(3)
user_id page
0 253019 payment_page
1 310478 payment_page
2 304081 payment_page
df_payment_confirmation_page.head(3)
user_id page
0 123100 payment_confirmation_page
1 704999 payment_confirmation_page
2 407188 payment_confirmation_page
df_user_table.head(3)
user_id date device sex
0 450007 2015-02-28 Desktop Female
1 756838 2015-01-13 Desktop Male
2 568983 2015-04-09 Desktop Male
df_user_table["device"].value_counts()
device
Desktop    60200
Mobile     30200
Name: count, dtype: int64
df_user_table["sex"].value_counts()
sex
Male      45325
Female    45075
Name: count, dtype: int64

2.关联五个数据表成为一个大表

df_merge = df_user_table

for df_inter in [df_home_page, df_search_page, df_payment_page, df_payment_confirmation_page]:
    df_merge = pd.merge(
        left=df_merge,
        right=df_inter,
        left_on="user_id",
        right_on="user_id",
        how="left",
        suffixes=('', '_page')  # 假设我们想要保留原始列名,并对新列名添加'_page'后缀
    )

注意上述代码的合并过程中,由于不同的表格存在相同的列名,因此可能会报错,suffixes这行代码就是为了避免报错。

df_merge
user_id date device sex page page_page page_page page_page
0 450007 2015-02-28 Desktop Female home_page NaN NaN NaN
1 756838 2015-01-13 Desktop Male home_page NaN NaN NaN
2 568983 2015-04-09 Desktop Male home_page search_page NaN NaN
3 190794 2015-02-18 Desktop Female home_page search_page NaN NaN
4 537909 2015-01-15 Desktop Male home_page NaN NaN NaN
... ... ... ... ... ... ... ... ...
90395 307667 2015-03-30 Desktop Female home_page NaN NaN NaN
90396 642989 2015-02-08 Desktop Female home_page search_page NaN NaN
90397 659645 2015-04-13 Desktop Male home_page search_page NaN NaN
90398 359779 2015-03-23 Desktop Male home_page NaN NaN NaN
90399 438929 2015-03-26 Mobile Female home_page NaN NaN NaN

90400 rows × 8 columns

为了避免列名重复,我们要对列名进行重命名

df_merge.columns = [
    "user_id","data","device","sex",
    "home_page","search_page","payment_page","comfirmation_pages"
]

df_merge
user_id data device sex home_page search_page payment_page comfirmation_pages
0 450007 2015-02-28 Desktop Female home_page NaN NaN NaN
1 756838 2015-01-13 Desktop Male home_page NaN NaN NaN
2 568983 2015-04-09 Desktop Male home_page search_page NaN NaN
3 190794 2015-02-18 Desktop Female home_page search_page NaN NaN
4 537909 2015-01-15 Desktop Male home_page NaN NaN NaN
... ... ... ... ... ... ... ... ...
90395 307667 2015-03-30 Desktop Female home_page NaN NaN NaN
90396 642989 2015-02-08 Desktop Female home_page search_page NaN NaN
90397 659645 2015-04-13 Desktop Male home_page search_page NaN NaN
90398 359779 2015-03-23 Desktop Male home_page NaN NaN NaN
90399 438929 2015-03-26 Mobile Female home_page NaN NaN NaN

90400 rows × 8 columns

3.计算每个页面的用户数

目的是给漏斗图填充数据,pyecharts需要的格式为:数据格式为[(key1,value1),(key2,value2)]

datas = []
for column in ["home_page","search_page","payment_page","comfirmation_pages"]:
    user_count = df_merge[column].dropna().size
    datas.append((column,user_count))
print(datas)
[('home_page', 90400), ('search_page', 45200), ('payment_page', 6030), ('comfirmation_pages', 452)]
# 为了方便对比查看,进行归一化
max_count = datas[0][1]# 即max_count=90400
datas_norm = [
    (x,round(y*100/max_count,2))
    for x,y in datas
]
datas_norm
[('home_page', 100.0),
 ('search_page', 50.0),
 ('payment_page', 6.67),
 ('comfirmation_pages', 0.5)]

4.绘制漏斗图

from pyecharts import options as opts
from pyecharts.charts import Funnel

funnel = Funnel()
funnel.add("用户比例",datas_norm)
<pyecharts.charts.basic_charts.funnel.Funnel at 0x198a3fcaff0>
from IPython.display import HTML

# 同上,读取 HTML 文件内容
# bar.render()的值是一个路径,以字符串形式表示
with open(funnel.render(), 'r', encoding='utf-8') as file:
    html_content = file.read()

# 直接在 JupyterLab 中渲染 HTML
HTML(html_content)
Awesome-pyecharts

在这里插入图片描述

相关推荐

  1. Python数据绘制折线图

    2024-07-13 11:04:03       52 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-07-13 11:04:03       66 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-07-13 11:04:03       70 阅读
  3. 在Django里面运行非项目文件

    2024-07-13 11:04:03       57 阅读
  4. Python语言-面向对象

    2024-07-13 11:04:03       68 阅读

热门阅读

  1. 深度学习之轻量化神经网络MobileNet

    2024-07-13 11:04:03       22 阅读
  2. 基于深度学习的RGB图像和IMU的数据融合

    2024-07-13 11:04:03       21 阅读
  3. F12打不开、打开后页面跳转、控制台持续刷新

    2024-07-13 11:04:03       20 阅读
  4. SQL注入:基于错误

    2024-07-13 11:04:03       20 阅读
  5. Python高级(四)_内存管理

    2024-07-13 11:04:03       26 阅读
  6. 菜单(Menu)

    2024-07-13 11:04:03       20 阅读
  7. QAbstractButton

    2024-07-13 11:04:03       20 阅读
  8. Fastadmin之 按钮触发弹窗

    2024-07-13 11:04:03       24 阅读
  9. 我会什么开发技能

    2024-07-13 11:04:03       29 阅读
  10. iptables配置网络地址转换(nat)

    2024-07-13 11:04:03       27 阅读