📌  相关文章
📜  Python – 获取文件中的字符数、单词数、空格数和行数

📅  最后修改于: 2022-05-13 01:55:42.641000             🧑  作者: Mango

Python – 获取文件中的字符数、单词数、空格数和行数

先决条件: Python中的文件处理

给定一个文本文件fname ,任务是计算文件中字符、单词、空格和行的总数。

众所周知, Python提供了多种用于处理文件的内置功能和模块。让我们讨论使用Python计算文件中字符、单词、空格和行总数的不同方法。

文本文件示例的图像。

方法#1:天真的方法
在这种方法中,想法是通过开发我们自己的逻辑来解决任务。在不使用Python的任何内置函数的情况下,将计算文件的字符总数、单词数、空格数和行数。

下面是上述方法的实现。

# Python implementation to compute
# number of characters, words, spaces
# and lines in a file
  
# Function to count number 
# of characters, words, spaces 
# and lines in a file
def counter(fname):
  
    # variable to store total word count
    num_words = 0
      
    # variable to store total line count
    num_lines = 0
      
    # variable to store total character count
    num_charc = 0
      
    # variable to store total space count
    num_spaces = 0
      
    # opening file using with() method
    # so that file gets closed 
    # after completion of work
    with open(fname, 'r') as f:
          
        # loop to iterate file
        # line by line
        for line in f:
              
            # incrementing value of 
            # num_lines with each 
            # iteration of loop to
            # store total line count 
            num_lines += 1
              
            # declaring a variable word
            # and assigning its value as Y
            # because every file is 
            # supposed to start with 
            # a word or a character
            word = 'Y'
              
            # loop to iterate every
            # line letter by letter
            for letter in line:
                  
                # condition to check 
                # that the encountered character
                # is not white space and a word
                if (letter != ' ' and word == 'Y'):
                      
                    # incrementing the word
                    # count by 1
                    num_words += 1
                      
                    # assigning value N to 
                    # variable word because until
                    # space will not encounter
                    # a word can not be completed
                    word = 'N'
                      
                # condition to check 
                # that the encountered character
                # is a white space
                elif (letter == ' '):
                      
                    # incrementing the space
                    # count by 1
                    num_spaces += 1
                      
                    # assigning value Y to
                    # variable word because after
                    # white space a word
                    # is supposed to occur
                    word = 'Y'
                      
                # loop to iterate every 
                # letter character by 
                # character
                for i in letter:
                      
                    # condition to check 
                    # that the encountered character 
                    # is not  white space and not
                    # a newline character
                    if(i !=" " and i !="\n"):
                          
                        # incrementing character
                        # count by 1
                        num_charc += 1
                          
    # printing total word count 
    print("Number of words in text file: ", num_words)
      
    # printing total line count
    print("Number of lines in text file: ", num_lines)
      
    # printing total character count
    print('Number of characters in text file: ', num_charc)
      
    # printing total space count
    print('Number of spaces in text file: ', num_spaces)
      
# Driver Code: 
if __name__ == '__main__': 
    fname = 'File1.txt'
    try: 
        counter(fname) 
    except: 
        print('File not found')

输出:

Number of words in text file:  25
Number of lines in text file:  4
Number of characters in text file:  91
Number of spaces in text file:  21


方法#2:使用一些内置函数和 OS 模块函数
在这种方法中,想法是使用 OS 模块的os.linesep()方法来分隔当前平台上的行。当解释器的扫描仪遇到os.linesep时,它会将其替换为\n 字符。之后, strip()split()函数将用于执行任务。
进一步了解strip()split()函数。

下面是上述方法的实现。

# Python implementation to compute
# number of characters, words, spaces
# and lines in a file
  
# importing os module
import os
  
# Function to count number 
# of characters, words, spaces 
# and lines in a file
def counter(fname):
      
    # variable to store total word count
    num_words = 0
      
    # variable to store total line count
    num_lines = 0
      
    # variable to store total character count
    num_charc = 0
      
    # variable to store total space count
    num_spaces = 0
      
    # opening file using with() method
    # so that file gets closed 
    # after completion of work
    with open(fname, 'r') as f:
          
        # loop to iterate file
        # line by line
        for line in f:
              
            # separating a line 
            # from \n character 
            # and storing again in line 
            # variable for further operations
            line = line.strip(os.linesep)
              
            # splitting the line 
            # to make a list of
            # all the words present
            # in that line and storing
            # that list in
            # wordlist variable
            wordslist = line.split()
              
            # incrementing value of 
            # num_lines with each 
            # iteration of loop to
            # store total line count
            num_lines = num_lines + 1
              
            # incrementing value of 
            # num_words by the 
            # number of items in the
            # list wordlist
            num_words = num_words + len(wordslist)
              
            # incrementing value of 
            # num_charc by 1 whenever
            # value of variable c is other 
            # than white space in the separated line
            num_charc = num_charc + sum(1 for c in line 
                          if c not in (os.linesep, ' '))
              
            # incrementing value of 
            # num_spaces by 1 whenever
            # value of variable s is 
            # white space in the separated line
            num_spaces = num_spaces + sum(1 for s in line 
                                if s in (os.linesep, ' '))
      
    # printing total word count
    print("Number of words in text file: ", num_words)
      
    # printing total line count
    print("Number of lines in text file: ", num_lines)
      
    # printing total character count
    print("Number of characters in text file: ", num_charc)
      
    # printing total space count
    print("Number of spaces in text file: ", num_spaces)
  
# Driver Code: 
if __name__ == '__main__': 
    fname = 'File1.txt'
    try: 
        counter(fname) 
    except: 
        print('File not found')

输出:

Number of words in text file:  25
Number of lines in text file:  4
Number of characters in text file:  91
Number of spaces in text file:  21