课程学习来源:b站up:【蚂蚁学python】
【课程链接:【【数据可视化】Python数据图表可视化入门到实战】】
【课程资料链接:【链接】】
Python绘制电商网站转化漏斗图
漏斗图:适用于业务流程环节多的流程分析,通过漏斗各环节业务数据的比较,能够直观地发现问题所在
实例:数据来自kaggle网站的"E-commerce website Funnel analysis’
地址为:https://www.kaggle.com/aerodinamicc/ecommerce-website-funnel-analysis
网站很简单,有四个页面数据:
首页用户访问数据
- home page table.csv,
- 2.search page table.csv,搜索页用户访问数据
- payment page table.csv,支付信息页用户访问数据
- payment confirmation table.csv,支付成功页用户访问数据
- user table.csv,用户信息数据
目标:绘制转化漏斗,查看是否正常
1.读取数据
import pandas as pd
df_home_page = pd.read_csv("../DATA_POOL/PY_DATA/ant-learn-visualization-master/datas/ecommerce-website-funnel-analysis/home_page_table.csv")
df_search_page = pd.read_csv("../DATA_POOL/PY_DATA/ant-learn-visualization-master/datas/ecommerce-website-funnel-analysis/search_page_table.csv")
df_payment_page = pd.read_csv("../DATA_POOL/PY_DATA/ant-learn-visualization-master/datas/ecommerce-website-funnel-analysis/payment_page_table.csv")
df_payment_confirmation_page = pd.read_csv("../DATA_POOL/PY_DATA/ant-learn-visualization-master/datas/ecommerce-website-funnel-analysis/payment_confirmation_table.csv")
df_user_table = pd.read_csv("../DATA_POOL/PY_DATA/ant-learn-visualization-master/datas/ecommerce-website-funnel-analysis/user_table.csv")
查看数据
df_home_page.head(3)
user_id | page | |
---|---|---|
0 | 313593 | home_page |
1 | 468315 | home_page |
2 | 264005 | home_page |
df_search_page.head(3)
user_id | page | |
---|---|---|
0 | 15866 | search_page |
1 | 347058 | search_page |
2 | 577020 | search_page |
df_payment_page.head(3)
user_id | page | |
---|---|---|
0 | 253019 | payment_page |
1 | 310478 | payment_page |
2 | 304081 | payment_page |
df_payment_confirmation_page.head(3)
user_id | page | |
---|---|---|
0 | 123100 | payment_confirmation_page |
1 | 704999 | payment_confirmation_page |
2 | 407188 | payment_confirmation_page |
df_user_table.head(3)
user_id | date | device | sex | |
---|---|---|---|---|
0 | 450007 | 2015-02-28 | Desktop | Female |
1 | 756838 | 2015-01-13 | Desktop | Male |
2 | 568983 | 2015-04-09 | Desktop | Male |
df_user_table["device"].value_counts()
device
Desktop 60200
Mobile 30200
Name: count, dtype: int64
df_user_table["sex"].value_counts()
sex
Male 45325
Female 45075
Name: count, dtype: int64
2.关联五个数据表成为一个大表
df_merge = df_user_table
for df_inter in [df_home_page, df_search_page, df_payment_page, df_payment_confirmation_page]:
df_merge = pd.merge(
left=df_merge,
right=df_inter,
left_on="user_id",
right_on="user_id",
how="left",
suffixes=('', '_page') # 假设我们想要保留原始列名,并对新列名添加'_page'后缀
)
注意上述代码的合并过程中,由于不同的表格存在相同的列名,因此可能会报错,suffixes这行代码就是为了避免报错。
df_merge
user_id | date | device | sex | page | page_page | page_page | page_page | |
---|---|---|---|---|---|---|---|---|
0 | 450007 | 2015-02-28 | Desktop | Female | home_page | NaN | NaN | NaN |
1 | 756838 | 2015-01-13 | Desktop | Male | home_page | NaN | NaN | NaN |
2 | 568983 | 2015-04-09 | Desktop | Male | home_page | search_page | NaN | NaN |
3 | 190794 | 2015-02-18 | Desktop | Female | home_page | search_page | NaN | NaN |
4 | 537909 | 2015-01-15 | Desktop | Male | home_page | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... |
90395 | 307667 | 2015-03-30 | Desktop | Female | home_page | NaN | NaN | NaN |
90396 | 642989 | 2015-02-08 | Desktop | Female | home_page | search_page | NaN | NaN |
90397 | 659645 | 2015-04-13 | Desktop | Male | home_page | search_page | NaN | NaN |
90398 | 359779 | 2015-03-23 | Desktop | Male | home_page | NaN | NaN | NaN |
90399 | 438929 | 2015-03-26 | Mobile | Female | home_page | NaN | NaN | NaN |
90400 rows × 8 columns
为了避免列名重复,我们要对列名进行重命名
df_merge.columns = [
"user_id","data","device","sex",
"home_page","search_page","payment_page","comfirmation_pages"
]
df_merge
user_id | data | device | sex | home_page | search_page | payment_page | comfirmation_pages | |
---|---|---|---|---|---|---|---|---|
0 | 450007 | 2015-02-28 | Desktop | Female | home_page | NaN | NaN | NaN |
1 | 756838 | 2015-01-13 | Desktop | Male | home_page | NaN | NaN | NaN |
2 | 568983 | 2015-04-09 | Desktop | Male | home_page | search_page | NaN | NaN |
3 | 190794 | 2015-02-18 | Desktop | Female | home_page | search_page | NaN | NaN |
4 | 537909 | 2015-01-15 | Desktop | Male | home_page | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... |
90395 | 307667 | 2015-03-30 | Desktop | Female | home_page | NaN | NaN | NaN |
90396 | 642989 | 2015-02-08 | Desktop | Female | home_page | search_page | NaN | NaN |
90397 | 659645 | 2015-04-13 | Desktop | Male | home_page | search_page | NaN | NaN |
90398 | 359779 | 2015-03-23 | Desktop | Male | home_page | NaN | NaN | NaN |
90399 | 438929 | 2015-03-26 | Mobile | Female | home_page | NaN | NaN | NaN |
90400 rows × 8 columns
3.计算每个页面的用户数
目的是给漏斗图填充数据,pyecharts需要的格式为:数据格式为[(key1,value1),(key2,value2)]
datas = []
for column in ["home_page","search_page","payment_page","comfirmation_pages"]:
user_count = df_merge[column].dropna().size
datas.append((column,user_count))
print(datas)
[('home_page', 90400), ('search_page', 45200), ('payment_page', 6030), ('comfirmation_pages', 452)]
# 为了方便对比查看,进行归一化
max_count = datas[0][1]# 即max_count=90400
datas_norm = [
(x,round(y*100/max_count,2))
for x,y in datas
]
datas_norm
[('home_page', 100.0),
('search_page', 50.0),
('payment_page', 6.67),
('comfirmation_pages', 0.5)]
4.绘制漏斗图
from pyecharts import options as opts
from pyecharts.charts import Funnel
funnel = Funnel()
funnel.add("用户比例",datas_norm)
<pyecharts.charts.basic_charts.funnel.Funnel at 0x198a3fcaff0>
from IPython.display import HTML
# 同上,读取 HTML 文件内容
# bar.render()的值是一个路径,以字符串形式表示
with open(funnel.render(), 'r', encoding='utf-8') as file:
html_content = file.read()
# 直接在 JupyterLab 中渲染 HTML
HTML(html_content)