如何使用嵌套条件提取 R DataFrame 中的随机行样本

在本文中，我们将学习如何在具有嵌套条件的 R 编程语言中提取 DataFrame 中的随机行样本。

方法一：使用sample()

我们将使用 sample()函数来执行此任务。 R 语言中的sample()函数根据函数调用中提供的参数创建随机样本。它接受一个向量或一个正整数作为函数参数中的对象。

我们将使用的另一个函数是which()。此函数将帮助我们提供提取样本的条件。 which()函数返回满足参数中给定条件的元素（以及元素的索引）。

Syntax: df[ sample(which ( conditions ) ,n), ]

Parameters:

df: DataFrame
n: number of samples to be generated
conditions: samples are extracted according to this condition. Ex: df$year > 5

编程需要懂一点英语

使用中的数据帧：

	name	year	length	education
1	Welcome	10	40	yes
2	to	51	NA	yes
3	Geeks	19	NA	no
4	for	126	100	no
5	Geeks	99	95	yes

因此，要实现这种方法，首先创建数据帧，然后将其与将用于从数据帧中提取行的条件一起传递给 sample()。下面给出了使用上述数据框来说明相同的实现。

示例 1：

R

df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
df[ sample(which (df$year > 5) ,2), ]

R

df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 3 rows
print("3 samples")
df[ sample(which (df$education !="no") ,3), ]

R

library(dplyr)
 
df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
 
filter(df, df$name != "to") %>% sample_n(., 2)

R

library(dplyr)
 
df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
 
filter(df, df$year >20 ) %>% sample_n(., 2)

输出：

name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "2 samples"
     name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes

示例 2：

电阻

df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 3 rows
print("3 samples")
df[ sample(which (df$education !="no") ,3), ]

输出：

name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "3 samples"
     name year length education
5   Geeks   99     95       yes
1 Welcome   10     40       yes
2      to   51     NA       yes

方法二：使用sample_n()函数

R 语言中的 sample_n()函数用于从数据框中获取随机样本样本。

Syntax: sample_n(x, n)

Parameters:

x: Data Frame
n: size/number of items to select

编程需要懂一点英语

除了sample_n ()函数，我们还使用了filter() 函数。 R 语言中的 filter()函数用于选择案例并根据过滤表达式过滤掉值。

Syntax: filter(x, expr)

Parameters:

x: Object to be filtered
expr: expression as a base for filtering

编程需要懂一点英语

我们已经加载了dplyr包，因为它包含filter()和sample_n()函数。在过滤器函数的参数中，我们将示例dataframe->df和嵌套条件作为参数传递。然后我们使用我们的 sample_n()函数在满足条件后从数据帧中提取“ n ”个样本。

Syntax: filter(df, condition) %>% sample_n(., n)

Parameters:

df: Dataframe Object
condition: Nested conditionals. Ex: df$name != “to”
n: Number of samples

编程需要懂一点英语

示例 1：

电阻

library(dplyr)
 
df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                  
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
 
filter(df, df$name != "to") %>% sample_n(., 2)

输出：

name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "2 samples"
     name year length education
1 Welcome   10     40       yes
2   Geeks   99     95       yes

示例 2：

电阻

library(dplyr)
 
df <- data.frame( name = c("Welcome", "to", "Geeks",
                           "for", "Geeks"),
                year = c(10, 51, 19, 126, 99),
                  
                length = c(40, NA, NA, 100, 95),
                  
                education = c("yes", "yes", "no",
                              "no", "yes") )
df
 
# Printing 2 rows
print("2 samples")
 
filter(df, df$year >20 ) %>% sample_n(., 2)

输出：

name year length education
1 Welcome   10     40       yes
2      to   51     NA       yes
3   Geeks   19     NA        no
4     for  126    100        no
5   Geeks   99     95       yes
[1] "2 samples"
  name year length education
1  for  126    100        no
2   to   51     NA       yes