📜  如何使用嵌套条件提取 R DataFrame 中的随机行样本

📅  最后修改于: 2022-05-13 01:55:43.360000             🧑  作者: Mango

如何使用嵌套条件提取 R DataFrame 中的随机行样本

在本文中,我们将学习如何在具有嵌套条件的 R 编程语言中提取 DataFrame 中的随机行样本。

方法一:使用sample()

我们将使用 sample()函数来执行此任务。 R 语言中的sample()函数根据函数调用中提供的参数创建随机样本。它接受一个向量或一个正整数作为函数参数中的对象。

我们将使用的另一个函数是which()。此函数将帮助我们提供提取样本的条件。 which()函数返回满足参数中给定条件的元素(以及元素的索引)。

使用中的数据帧:

  nameyearlengtheducation
1Welcome1040yes
2to51NAyes
3Geeks19NAno
4for126100no
5Geeks9995yes

因此,要实现这种方法,首先创建数据帧,然后将其与将用于从数据帧中提取行的条件一起传递给 sample()。下面给出了使用上述数据框来说明相同的实现。

示例 1:

R
df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
df[ sample(which (df$year > 5) ,2), ]


R
df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 3 rows
print("3 samples")
df[ sample(which (df$education !="no") ,3), ]


R
library(dplyr)
 
df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
 
filter(df, df$name != "to") %>% sample_n(., 2)


R
library(dplyr)
 
df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
 
filter(df, df$year >20 ) %>% sample_n(., 2)


输出:

name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "2 samples"
     name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes

示例 2:

电阻

df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 3 rows
print("3 samples")
df[ sample(which (df$education !="no") ,3), ]

输出:

name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "3 samples"
     name year length education
5   Geeks   99     95       yes
1 Welcome   10     40       yes
2      to   51     NA       yes

方法二:使用sample_n()函数

R 语言中的 sample_n()函数用于从数据框中获取随机样本样本。



除了sample_n ()函数,我们还使用了filter() 函数。 R 语言中的 filter()函数用于选择案例并根据过滤表达式过滤掉值。

我们已经加载了dplyr包,因为它包含filter()sample_n()函数。在过滤器函数的参数中,我们将示例dataframe->df嵌套条件作为参数传递。然后我们使用我们的 sample_n()函数在满足条件后从数据帧中提取“ n ”个样本。

示例 1:

电阻

library(dplyr)
 
df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
 
filter(df, df$name != "to") %>% sample_n(., 2)

输出:

name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "2 samples"
     name year length education
1 Welcome   10     40       yes
2   Geeks   99     95       yes

示例 2:

电阻

library(dplyr)
 
df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
 
filter(df, df$year >20 ) %>% sample_n(., 2)

输出:

name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "2 samples"
  name year length education
1  for  126    100        no
2   to   51     NA       yes