数据分析:筛选多组交集特征

介绍

有时候需要在多个组间筛选它们的交集特征,本文利用R语言实现该目的

加载R包

library(UpSetR)
library(tidyverse)

Upset画图

movies <- read.csv(system.file("extdata", "movies.csv", package = "UpSetR"), 
                   header = T, sep = ";")
movies_list <- list(
  Action = movies %>%
    dplyr::filter(Action == 1) %>%
    dplyr::pull(Name),
  Adventure = movies %>%
    dplyr::filter(Adventure == 1) %>%
    dplyr::pull(Name),
  Children = movies %>%
    dplyr::filter(Children == 1) %>%
    dplyr::pull(Name),
  Comedy = movies %>%
    dplyr::filter(Comedy == 1) %>%
    dplyr::pull(Name),
  Crime = movies %>%
    dplyr::filter(Crime == 1) %>%
    dplyr::pull(Name),
  Documentary = movies %>%
    dplyr::filter(Documentary == 1) %>%
    dplyr::pull(Name)  
)

movies_pl <- UpSetR::upset(
  data = fromList(movies_list),
  nsets = 3, 
  sets = c("Action", "Adventure", "Children", 
           "Comedy", "Crime", "Documentary"),
  order.by = "freq",
  main.bar.color = "gray10",
  sets.bar.color = "gray",
  matrix.color = "gray10",
  mainbar.y.label = "NO. of movies",
  sets.x.label = "NO. of movies")

movies_pl

在这里插入图片描述

判断交集特征

  • 去冗余变量 df_uniq_movie

  • 分组变量标签 df_group_movie


df_uniq_movie <- data.frame(feature = unique(unlist(movies_list)))
df_group_movie <- lapply(movies_list, function(x){
  data.frame(feature = x)
}) %>% 
  dplyr::bind_rows(.id = "Sequence")
  • 给变量打上交集标签
df_int_movie <- lapply(df_uniq_movie$feature, function(x){
  intersection <- df_group_movie %>% 
    dplyr::filter(feature == x) %>% 
    dplyr::arrange(Sequence) %>% 
    dplyr::pull(Sequence) %>% 
    paste0(collapse = "|")
  # build the dataframe
  return(data.frame(feature = x, int = intersection))
}) %>% 
  dplyr::bind_rows()

head(df_int_movie)

在这里插入图片描述

相关推荐

  1. 【机器学习】数据分析特征

    2024-04-25 08:56:06       11 阅读
  2. 分析特征函数

    2024-04-25 08:56:06       19 阅读
  3. 解决方案:Pandas如何条件筛选数据

    2024-04-25 08:56:06       11 阅读

最近更新

  1. TCP协议是安全的吗?

    2024-04-25 08:56:06       16 阅读
  2. 阿里云服务器执行yum,一直下载docker-ce-stable失败

    2024-04-25 08:56:06       16 阅读
  3. 【Python教程】压缩PDF文件大小

    2024-04-25 08:56:06       15 阅读
  4. 通过文章id递归查询所有评论(xml)

    2024-04-25 08:56:06       18 阅读

热门阅读

  1. 构建数据驱动的文化价值体系,还得靠数据分析

    2024-04-25 08:56:06       15 阅读
  2. 每天一个数据分析题(二百九十)

    2024-04-25 08:56:06       13 阅读
  3. React的Key和diff

    2024-04-25 08:56:06       12 阅读
  4. 掌握Midjourney视觉艺术的关键提示词指南

    2024-04-25 08:56:06       14 阅读
  5. windows ubuntu sed,awk,grep篇:2:sed 替换命令

    2024-04-25 08:56:06       11 阅读
  6. 机器学习之sklearn基础教程

    2024-04-25 08:56:06       12 阅读
  7. TensorFlow 用 hashtable 的意义

    2024-04-25 08:56:06       12 阅读
  8. 每天学习一个Linux命令之htop

    2024-04-25 08:56:06       15 阅读
  9. 《AI创作类工具之九—— Rytr》

    2024-04-25 08:56:06       14 阅读