📜  python Sorted Word frequency count - Python (1)

📅  最后修改于: 2023-12-03 15:04:08.690000             🧑  作者: Mango

Python Sorted Word Frequency Count

Introduction

In this tutorial, we will be discussing how to count the frequency of words in a given text file, sort them in descending order, and display the top 'n' words in a neat format using Python.

Getting Started

Firstly, we need to have a text file that we will be working with. You can use any text file of your choice. For this tutorial, we will be using a sample text file named "sample.txt".

Sample text file
This is a sample text file that contains some words.
We will use this file to count the frequency of words.
We will also sort them in descending order and display the top 'n' words.

To count the frequency of words in the file, we will be using Python's built-in collections module, which provides a Counter class that can be used to count the frequency of elements in a list.

from collections import Counter

# Reading the text file
with open('sample.txt', 'r') as file:
    text = file.read()

# Counting the frequency of words using Counter
word_counts = Counter(text.split())

print(word_counts)

Output:

Counter({'the': 3, 'We': 2, 'will': 2, 'file': 2, 'a': 1, 'sample': 1, 'text': 1, 'that': 1, 'contains': 1, 'some': 1, 'words.': 1, 'use': 1, 'to': 1, 'count': 1, 'frequency': 1, 'of': 1, 'words.': 1, 'also': 1, 'sort': 1, 'them': 1, 'in': 1, 'descending': 1, 'order': 1, 'and': 1, 'display': 1, 'top': 1, "'n'": 1, 'words.': 1})

As we can see, this gives us a dictionary-like object, where the keys are the words in the file, and the values are the number of times they appear in the file.

However, the output is not sorted in any order. To sort the output in descending order, we can use the most_common() method of the Counter class.

# Sorting the output in descending order
sorted_word_counts = word_counts.most_common()

print(sorted_word_counts)

Output:

[('the', 3), ('We', 2), ('will', 2), ('file', 2), ('a', 1), ('sample', 1), ('text', 1), ('that', 1), ('contains', 1), ('some', 1), ('words.', 1), ('use', 1), ('to', 1), ('count', 1), ('frequency', 1), ('of', 1), ('also', 1), ('sort', 1), ('them', 1), ('in', 1), ('descending', 1), ('order', 1), ('and', 1), ('display', 1), ('top', 1), ("'n'", 1)]

Now, we have a sorted list of tuples, where each tuple contains a word and its frequency.

Next, we will display the top 'n' words in a neat format. For this, we can use string formatting.

n = 5  # Top 'n' words to display

# Displaying the top 'n' words in a neat format
for i in range(n):
    word = sorted_word_counts[i][0]
    count = sorted_word_counts[i][1]
    print(f"{i+1}. {word}: {count}")

Output:

1. the: 3
2. We: 2
3. will: 2
4. file: 2
5. a: 1
Conclusion

In this tutorial, we learned how to count the frequency of words in a text file, sort them in descending order, and display the top 'n' words in a neat format using Python. This technique can be useful in various natural language processing (NLP) applications like sentiment analysis, topic modelling, and more.