将 NULL 终止的字符串传递给 C 库

如果需要一个扩展模块，该模块需要将一个以 NULL 结尾的字符串传递给 C 库。让我们看看如何使用 Python 的 Unicode字符串实现来做到这一点。 C 库有许多函数可以对声明为type char *的以 NULL 结尾的字符串进行操作。

下面给出的代码具有我们将说明和测试问题的 C函数。 C函数（代码 #1 ）简单地打印单个字符的十六进制表示，以便可以轻松调试传递的字符串。

代码#1：

void print_chars(char *s)
{
    while (*s)
    {
        printf("%2x ", (unsigned char) *s);
        s++;
    }
    printf("\n");
}
  
print_chars("Hello");

输出：

48 65 6c 6c 6f

要从Python调用这样的 C函数，选择很少。首先是它可以被限制为仅使用“y”转换代码到PyArg_ParseTuple()的字节操作，如下面的代码所示。

代码#2：

static PyObject * py_print_chars(PyObject * self, PyObject * args)
{
    char * s;
    if (! PyArg_ParseTuple(args, "y", &s))
    {
        return NULL;
    }
    print_chars(s);
    Py_RETURN_NONE;
}

让我们看看生成的函数是如何操作的，以及嵌入了 NULL 字节和 Unicode字符串的字节是如何被拒绝的。

代码#3：

print (print_chars(b'Hello World'))
  
print ("\n", print_chars(b'Hello\x00World'))
  
print ("\n", print_chars('Hello World'))

输出：

48 65 6c 6c 6f 20 57 6f 72 6c 64

Traceback (most recent call last):
File "", line 1, in 
TypeError: must be bytes without null bytes, not bytes

Traceback (most recent call last):
File "", line 1, in 
TypeError: 'str' does not support the buffer interface

如果要改为传递 Unicode字符串，请使用“s”格式代码到PyArg_ParseTuple() ，如下所示。

代码#4：

static PyObject *py_print_chars(PyObject *self, PyObject *args)
{
    char *s;
    if (!PyArg_ParseTuple(args, "s", &s))
    {
        return NULL;
    }
    print_chars(s);
    Py_RETURN_NONE;
}

使用上述代码（代码 #4 ）将自动将所有字符串转换为以 NULL 结尾的 UTF-8 编码。如下面的代码所示。

代码#5：

print (print_chars('Hello World'))
  
# UTF-8 encoding
print ("\n", print_chars('Spicy Jalape\u00f1o'))
   
print ("\n", print_chars('Hello\x00World'))
   
print ("\n", print_chars(b'Hello World'))

输出：

48 65 6c 6c 6f 20 57 6f 72 6c 64

53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f

Traceback (most recent call last):
File "", line 1, in 
TypeError: must be str without null characters, not str

Traceback (most recent call last):
File "", line 1, in 
TypeError: must be str, not bytes

如果使用PyObject *并且不能使用PyArg_ParseTuple() ，下面的代码说明了如何从字节和字符串对象中检查和提取合适的char *引用。

代码 #6：从字节转换

// Some Python Object
PyObject *obj;
  
// Conversion from bytes 
{
    char *s;
    s = PyBytes_AsString(o);
    if (!s)
    {
        /* TypeError already raised */
        return NULL; 
    }
    print_chars(s);
}

代码 #7：从字符串转换为 UTF-8 字节

{
  
    PyObject *bytes;
    char *s;
  
    if (!PyUnicode_Check(obj))
    {
        PyErr_SetString(PyExc_TypeError, "Expected string");
        return NULL;
    }
  
    bytes = PyUnicode_AsUTF8String(obj);
    s = PyBytes_AsString(bytes);
    print_chars(s);
    Py_DECREF(bytes);
}

两种代码转换都保证以 NULL 结尾的数据，但没有检查字符串内其他地方嵌入的 NULL 字节。这需要检查它是否重要。

注意：在PyArg_ParseTuple()中使用“s”格式代码会带来隐藏的内存开销，这很容易被忽略。在编写使用此转换的代码时，将创建一个 UTF-8字符串并永久附加到原始字符串对象，如果该对象包含非 ASCII字符，则会增加字符串的大小，直到它被垃圾回收。

代码#8：

import sys
s = 'Spicy Jalape\u00f1o'
print ("Size : ", sys.getsizeof(s))
  
# passing string
print("\n", print_chars(s))
  
# increasing size
print ("\nSize : ", sys.getsizeof(s))

输出：

Size : 87

53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f

Size : 103