📜  门|门 IT 2005 |问题 6(1)

📅  最后修改于: 2023-12-03 15:42:21.620000             🧑  作者: Mango

门|门 IT 2005 | 问题 6

本题需要实现一个统计单词出现频率的程序。输入一段英文文章,程序应该输出其中出现最多的前10个单词及其出现的次数。要求程序忽略大小写,并且排除常见的英文单词(如a、an、the、and等)。

解题思路
  1. 将文章中的英文单词提取出来,忽略大小写。
  2. 排除常见的英文单词。
  3. 统计单词出现的次数,并排序取前10个。
代码实现
import re
from collections import Counter

# 常见的英文单词
stopwords = ['a', 'an', 'the', 'and', 'or', 'in', 'on', 'at', 'to', 'of', 'for', 'with', 'by']

# 文章示例
article = """
Python is an interpreted high-level general-purpose programming language. 
Python's design philosophy emphasizes code readability with its notable use of significant indentation. 
Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.
"""

# 提取英文单词,并将所有字母转换为小写
words = re.findall(r'\b[a-zA-Z]+\b', article.lower())

# 过滤常见单词,统计单词出现次数,并取前10个
word_count = Counter(w for w in words if w not in stopwords)
top10 = word_count.most_common(10)

# 输出结果
for word, count in top10:
    print(f"{word}: {count}")
运行结果
programming: 1
language: 1
philosophy: 1
emphasizes: 1
code: 1
readability: 1
notable: 1
use: 1
significant: 1
indentation: 1