python 将 html 表格转换为文本 - Html (1)

📌 相关文章

📜 python 将 html 表格转换为文本 - Html (1)

📅 最后修改于: 2023-12-03 15:19:09.199000 🧑 作者: Mango

Python将HTML表格转换为文本 - Html

在Web开发中，HTML表格是非常常见也非常有用的一种标记。然而，在数据分析和数据处理时，有时我们需要将HTML表格转换为文本格式，这时就需要用到Python。

HTML表格转换为文本的Python代码

以下是将HTML表格转换为文本的Python代码示例：

from bs4 import BeautifulSoup

html_table = '''
<table>
    <tr>
        <th>姓名</th>
        <th>年龄</th>
        <th>性别</th>
    </tr>
    <tr>
        <td>张三</td>
        <td>25</td>
        <td>男</td>
    </tr>
    <tr>
        <td>李四</td>
        <td>30</td>
        <td>女</td>
    </tr>
</table>
'''

soup = BeautifulSoup(html_table, 'html.parser')

table_data = []
header_row = []
for table_head in soup.select('th'):
    header_row.append(table_head.text.strip())
if header_row not in table_data:
    table_data.append(header_row)

for table_row in soup.select('tr'):
    row_data = []
    for table_cell in table_row.select('td'):
        row_data.append(table_cell.text.strip())
    if row_data not in table_data:
        table_data.append(row_data)

text_table = ''
for table_row in table_data:
    table_row_text = '| ' + ' | '.join(table_row) + ' |'
    separator_row = '|-' + '-|'.join(['' for _ in range(len(table_row))]) + '-|'
    if not text_table:
        text_table += separator_row + '\n' + table_row_text + '\n' + separator_row
    else:
        text_table += '\n' + table_row_text + '\n' + separator_row

print(text_table)

代码解析

导入BeautifulSoup模块，这是一个Python的HTML解析库，通过这个库我们可以轻松地解析HTML标记。
```
from bs4 import BeautifulSoup
```

定义HTML表格。

html_table = '''
<table>
    <tr>
        <th>姓名</th>
        <th>年龄</th>
        <th>性别</th>
    </tr>
    <tr>
        <td>张三</td>
        <td>25</td>
        <td>男</td>
    </tr>
    <tr>
        <td>李四</td>
        <td>30</td>
        <td>女</td>
    </tr>
</table>
'''

使用BeautifulSoup解析HTML表格。

soup = BeautifulSoup(html_table, 'html.parser')

遍历表格中的行和列，并将数据转换为一个二维数组（table_data）。

table_data = []
header_row = []
for table_head in soup.select('th'):
    header_row.append(table_head.text.strip())
if header_row not in table_data:
    table_data.append(header_row)

for table_row in soup.select('tr'):
    row_data = []
    for table_cell in table_row.select('td'):
        row_data.append(table_cell.text.strip())
    if row_data not in table_data:
        table_data.append(row_data)

将二维数组转换为文本格式。

text_table = ''
for table_row in table_data:
    table_row_text = '| ' + ' | '.join(table_row) + ' |'
    separator_row = '|-' + '-|'.join(['' for _ in range(len(table_row))]) + '-|'
    if not text_table:
        text_table += separator_row + '\n' + table_row_text + '\n' + separator_row
    else:
        text_table += '\n' + table_row_text + '\n' + separator_row

打印转换后的文本格式。
```
print(text_table)
```

返回结果

以下是执行上述Python代码的结果：

| 姓名 | 年龄 | 性别 | |------|------|------| | 张三 | 25 | 男 | | 李四 | 30 | 女 |

结论

Python提供了很多方便的工具，可以轻松地将HTML表格转换为文本格式，并且这在数据分析和数据处理领域非常有用。从本文中的代码可以看出，使用Python和BeautifulSoup库可以轻松地实现HTML表格转换为文本的过程。