📜  如何在C / C++, Python和Java分割字符串?

📅  最后修改于: 2021-05-26 00:16:17             🧑  作者: Mango

用一些定界符分割字符串是非常常见的任务。例如,我们有一个逗号分隔的文件列表,我们希望数组中有单个条目。
几乎所有的编程语言都提供了一个由某些定界符分割字符串的函数。

在C中:

// Splits str[] according to given delimiters.
// and returns next token. It needs to be called
// in a loop to get all tokens. It returns NULL
// when there are no more tokens.
char * strtok(char str[], const char *delims);
C
// A C/C++ program for splitting a string
// using strtok()
#include 
#include 
 
int main()
{
    char str[] = "Geeks-for-Geeks";
 
    // Returns first token
    char *token = strtok(str, "-");
   
    // Keep printing tokens while one of the
    // delimiters present in str[].
    while (token != NULL)
    {
        printf("%s\n", token);
        token = strtok(NULL, "-");
    }
 
    return 0;
}


C++
#include 
using namespace std;
 
// A quick way to split strings separated via spaces.
void simple_tokenizer(string s)
{
    stringstream ss(s);
    string word;
    while (ss >> word) {
        cout << word << endl;
    }
}
 
int main(int argc, char const* argv[])
{
    string a = "How do you do!";
    // Takes only space seperated C++ strings.
    simple_tokenizer(a);
    cout << endl;
    return 0;
}


C++
#include 
using namespace std;
 
void tokenize(string s, string del = " ")
{
    int start = 0;
    int end = s.find(del);
    while (end != -1) {
        cout << s.substr(start, end - start) << endl;
        start = end + del.size();
        end = s.find(del, start);
    }
    cout << s.substr(start, end - start);
}
int main(int argc, char const* argv[])
{
    // Takes C++ string with any separator
    string a = "Hi$%do$%you$%do$%!";
    tokenize(a, "$%");
    cout << endl;
 
    return 0;
}


Java
// A Java program for splitting a string
// using split()
import java.io.*;
public class Test
{
    public static void main(String args[])
    {
        String Str = new String("Geeks-for-Geeks");
 
        // Split above string in at-most two strings 
        for (String val: Str.split("-", 2))
            System.out.println(val);
 
        System.out.println("");
   
        // Splits Str into all possible tokens
        for (String val: Str.split("-"))
            System.out.println(val);
    }
}


Python
line = "Geek1 \nGeek2 \nGeek3";
print line.split()
print line.split(' ', 1)


Output: Geeks
    for
    Geeks

在C++中

Note:  The main disadvantage of strtok() is that it only works for C style strings.
       Therefore we need to explicitly convert C++ string into a char array.
       Many programmers are unaware that C++ has two additional APIs which are more elegant
       and works with C++ string. 

方法1:使用C++的stringstream API

先决条件:stringstream API

可以使用字符串对象初始化Stringstream对象,它会自动在空间char上标记字符串。就像“ cin”流stringstream一样,它允许您将字符串作为单词流读取。

Some of the Most Common used functions of StringStream.
clear() — flushes the stream 
str() —  converts a stream of words into a C++ string object.
operator << — pushes a string object into the stream.
operator >> — extracts a word from the stream.

下面的代码对此进行了演示。

C++

#include 
using namespace std;
 
// A quick way to split strings separated via spaces.
void simple_tokenizer(string s)
{
    stringstream ss(s);
    string word;
    while (ss >> word) {
        cout << word << endl;
    }
}
 
int main(int argc, char const* argv[])
{
    string a = "How do you do!";
    // Takes only space seperated C++ strings.
    simple_tokenizer(a);
    cout << endl;
    return 0;
}
Output : How 
     do 
     you
     do!

方法2:使用C++ find()和substr()API。

先决条件:查找函数substr()

此方法更健壮,并且可以使用任何定界符而不是空格来解析字符串(尽管默认行为是在空格之间进行分隔。)从下面的代码中可以很容易地理解逻辑。

C++

#include 
using namespace std;
 
void tokenize(string s, string del = " ")
{
    int start = 0;
    int end = s.find(del);
    while (end != -1) {
        cout << s.substr(start, end - start) << endl;
        start = end + del.size();
        end = s.find(del, start);
    }
    cout << s.substr(start, end - start);
}
int main(int argc, char const* argv[])
{
    // Takes C++ string with any separator
    string a = "Hi$%do$%you$%do$%!";
    tokenize(a, "$%");
    cout << endl;
 
    return 0;
}
Output: How 
    do 
    you
    do
    !

在Java:
在Java,split()是String类中的方法。

// expregexp is the delimiting regular expression; 
// limit is the number of returned strings
public String[] split(String regexp, int limit);

// We can call split() without limit also
public String[] split(String regexp)

Java

// A Java program for splitting a string
// using split()
import java.io.*;
public class Test
{
    public static void main(String args[])
    {
        String Str = new String("Geeks-for-Geeks");
 
        // Split above string in at-most two strings 
        for (String val: Str.split("-", 2))
            System.out.println(val);
 
        System.out.println("");
   
        // Splits Str into all possible tokens
        for (String val: Str.split("-"))
            System.out.println(val);
    }
}

输出:

Geeks
for-Geeks

Geeks
for
Geeks

在Python:
Python的split()方法用指定的分隔符将给定的字符串断开后,将返回一个字符串列表。

// regexp is the delimiting regular expression; 
  // limit is limit the number of splits to be made 
  str.split(regexp = "", limit = string.count(str))  

Python

line = "Geek1 \nGeek2 \nGeek3";
print line.split()
print line.split(' ', 1)

输出:

['Geek1', 'Geek2', 'Geek3']
['Geek1', '\nGeek2 \nGeek3']