使用 Pandas 选择包含特定文本的行(1)

📌 相关文章

📜 使用 Pandas 选择包含特定文本的行(1)

📅 最后修改于: 2023-12-03 15:06:49.567000 🧑 作者: Mango

使用 Pandas 选择包含特定文本的行

在数据分析中，我们经常需要从数据集中选择包含特定文本的行。Pandas 提供了一些方法来实现这个目标。下面我们来介绍一些常用的方法。

1. 使用 str.contains() 方法

str.contains() 方法可以对 Pandas DataFrame 或 Series 中的单元格进行文本匹配。其用法如下：

df[df['column_name'].str.contains('text')]

其中，df 为 DataFrame，column_name 为需要匹配的列名，text 为需要匹配的文本。

例如，我们有一个 DataFrame df：

import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David'],
                   'Age': [25, 32, 18, 47],
                   'Gender': ['F', 'M', 'M', 'M']})

我们可以使用以下代码选择包含文本 "ob" 的行：

df[df['Name'].str.contains('ob')]

输出结果为：

| | Name | Age | Gender | | --- | ------- | --- | ------ | | 1 | Bob | 32 | M | | 2 | Charlie | 18 | M |

2. 使用 str.startswith() 方法

如果你只想选择以某个特定文本开头的行，可以使用 str.startswith() 方法。其用法如下：

df[df['column_name'].str.startswith('text')]

其中，df 为 DataFrame，column_name 为需要匹配的列名，text 为需要匹配的文本。

例如，我们有一个 DataFrame df：

import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David'],
                   'Age': [25, 32, 18, 47],
                   'Gender': ['F', 'M', 'M', 'M']})

我们可以使用以下代码选择以文本 "A" 开头的行：

df[df['Name'].str.startswith('A')]

输出结果为：

| | Name | Age | Gender | | --- | ----- | --- | ------ | | 0 | Alice | 25 | F |

3. 使用 str.endswith() 方法

如果你只想选择以某个特定文本结尾的行，可以使用 str.endswith() 方法。其用法如下：

df[df['column_name'].str.endswith('text')]

其中，df 为 DataFrame，column_name 为需要匹配的列名，text 为需要匹配的文本。

例如，我们有一个 DataFrame df：

import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David'],
                   'Age': [25, 32, 18, 47],
                   'Gender': ['F', 'M', 'M', 'M']})

我们可以使用以下代码选择以文本 "e" 结尾的行：

df[df['Name'].str.endswith('e')]

输出结果为：

| | Name | Age | Gender | | --- | ------- | --- | ------ | | 0 | Alice | 25 | F | | 2 | Charlie | 18 | M |

总结

使用 Pandas 选择包含特定文本的行非常简单。只要使用 str.contains()、str.startswith() 或 str.endswith() 方法即可。需要注意的是，这些方法只能用于文字列。忽略大小写的方式匹配文本，可使用 case=False 参数。