📌  相关文章
📜  如何在 R 中将 DataFrame 列从字符转换为数字?

📅  最后修改于: 2022-05-13 01:54:51.398000             🧑  作者: Mango

如何在 R 中将 DataFrame 列从字符转换为数字?

在本文中,我们将讨论如何在 R 编程语言中将 DataFrame 列从字符转换为数字。

所有数据框列都与一个类相关联,该类是该列的元素所属的数据类型的指示符。因此,为了模拟数据类型转换,在这种情况下必须将数据元素转换为所需的数据类型,即该列的所有元素都应该有资格成为数值。

注意: sapply() 方法可用于以向量的形式检索列变量的数据类型。

方法一:使用transform()方法

字符类型的列,是单个字符或字符串,只有在这些转换是可能的情况下才能转换为数值。否则,数据将丢失并在执行时被编译器强制为缺失值或 NA 值。



这种方法描述了由于插入缺失值或 NA 值代替字符而导致的数据丢失。引入这些 NA 值是因为不能直接进行相互转换。

R
# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
               col1 = as.character(6 : 9),
               col2 = factor(4 : 7),
               col3 = letters[2 : 5],
               col4 = 97 : 100, stringsAsFactors = FALSE)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of 
# each variable
sapply(data_frame, class)
  
# converting character type 
# column to numeric
data_frame_col1 <- transform(data_frame,
                             col1 = as.numeric(col1))
print("Modified col1 DataFrame")
print (data_frame_col1)
  
# indicating the data type of 
# each variable
sapply(data_frame_col1, class)
  
# converting character type column
# to numeric
data_frame_col3 <- transform(data_frame, 
                             col3 = as.numeric(col3))
print("Modified col3 DataFrame")
print (data_frame_col3)
  
# indicating the data type of each
# variable
sapply(data_frame_col3, class)


R
# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
               col1 = as.character(6 : 9),
               col2 = factor(4 : 7),
               col3 = c("Geeks", "For", "Geeks", "Gooks"),
               col4 = 97 : 100)
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of
# each variable
sapply(data_frame, class)
  
# converting character type column 
# to numeric
data_frame_col3 <- transform(data_frame,
                             col3 = as.numeric(as.factor(col3)))
print("Modified col3 DataFrame")
print (data_frame_col3)
  
# indicating the data type of each
# variable
sapply(data_frame_col3, class)


R
# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
               col1 = as.character(6:9),
               col2 = as.character(4:7),
               col3 = c("Geeks","For","Geeks","Gooks"),
               col4 = letters[1:4])
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each
# variable
sapply(data_frame, class)
  
# defining the vector of columns to 
# convert to numeric
vec <- c(1,2)
  
# apply the conversion on columns
data_frame[ , vec] <- apply(data_frame[ , vec,drop=F], 2,           
                    function(x) as.numeric(as.character(x)))
print("Modified DataFrame")
print (data_frame)
  
# indicating the data type of each variable
sapply(data_frame, class)


输出:

说明:使用sapply()方法,dataframe的col3的类是一个字符,即由单字节字符值组成,但是在transform()方法的应用中,这些字符值被转换为缺失或NA 值,因为字符不能直接转换为数字数据。因此,这会导致数据丢失。

可以通过不使用 stringAsFactors=FALSE 进行转换,然后首先使用 as.factor() 将字符隐式转换为因子,然后使用 as.numeric() 隐式转换为数字数据类型。即使在这种情况下,有关实际字符串的信息也会完全丢失。但是,数据变得不明确,并可能导致实际数据丢失。根据列值的字典排序结果简单地为数据分配数值。

电阻

# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
               col1 = as.character(6 : 9),
               col2 = factor(4 : 7),
               col3 = c("Geeks", "For", "Geeks", "Gooks"),
               col4 = 97 : 100)
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of
# each variable
sapply(data_frame, class)
  
# converting character type column 
# to numeric
data_frame_col3 <- transform(data_frame,
                             col3 = as.numeric(as.factor(col3)))
print("Modified col3 DataFrame")
print (data_frame_col3)
  
# indicating the data type of each
# variable
sapply(data_frame_col3, class)

输出:



[1] "Original DataFrame"
col1 col2  col3 col4
1    6    4 Geeks   97
2    7    5   For   98
3    8    6 Geeks   99
4    9    7 Gooks  100
   col1      col2      col3      col4
"factor"  "factor"  "factor" "integer"
[1] "Modified col3 DataFrame"
col1 col2 col3 col4
1    6    4    2   97
2    7    5    1   98
3    8    6    2   99
4    9    7    3  100
   col1      col2      col3      col4
"factor"  "factor" "numeric" "integer"

说明:col3 中的第一个和第三个字符串相同,因此分配了相同的数值。总的来说,这些值按升序排序,然后分配相应的整数值。 “for”是字典序中出现的最小字符串,因此,赋值为1,然后“Geeks”,其两个实例都映射到2,“Gooks”被赋值为3。因此,col3类型更改为数字。

方法 2:使用apply() 方法

R 中的 apply() 方法允许在多个列上同时应用一个函数。该函数可以是用户定义的或内置的,这取决于用户的需要。

例子:

电阻

# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
               col1 = as.character(6:9),
               col2 = as.character(4:7),
               col3 = c("Geeks","For","Geeks","Gooks"),
               col4 = letters[1:4])
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each
# variable
sapply(data_frame, class)
  
# defining the vector of columns to 
# convert to numeric
vec <- c(1,2)
  
# apply the conversion on columns
data_frame[ , vec] <- apply(data_frame[ , vec,drop=F], 2,           
                    function(x) as.numeric(as.character(x)))
print("Modified DataFrame")
print (data_frame)
  
# indicating the data type of each variable
sapply(data_frame, class)

输出:

[1] "Original DataFrame"
 col1 col2  col3 col4
1    6    4 Geeks    a
2    7    5   For    b
3    8    6 Geeks    c
4    9    7 Gooks    d
   col1     col2     col3     col4
"factor" "factor" "factor" "factor"
[1] "Modified DataFrame"
 col1 col2  col3 col4
1    6    4 Geeks    a
2    7    5   For    b
3    8    6 Geeks    c
4    9    7 Gooks    d
    col1      col2      col3      col4
"numeric" "numeric"  "factor"  "factor" 

说明: col1 和 col2 类型转换为数字。但是,这种方法适用于转换为字符的纯数字数据。在执行 col3 和 col4 时,它会抛出错误“由强制引入的 NA”。