home / skills / benchflow-ai / skillsbench / fuzzing-python

fuzzing-python skill

safe

/tasks/setup-fuzzing-py/environment/skills/fuzzing-python

This skill helps you create and integrate Python fuzzing targets using Atheris and LibFuzzer for Python libraries.

npx playbooks add skill benchflow-ai/skillsbench --skill fuzzing-python

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

16.5 KB

---
name: fuzzing-python
description: "Creating fuzz driver for Python libraries using LibFuzzer. This skill is useful when agent needs to work with creating fuzz drivers / fuzz targets for Python project and libraries."
license: Apache License 2.0. https://github.com/google/atheris/blob/775b08fb1a781142540995e8a2817c48ffae343f/LICENSE
---

# Python Fuzzing Skill

## Setting up fuzzing for a Python project

Fuzz testing for Python projects are Atheris.
Atheris is a coverage-guided Python fuzzing engine.
It supports fuzzing of Python code, but also native extensions written for CPython.
Atheris is based off of libFuzzer.
When fuzzing native code,
Atheris can be used in combination with Address Sanitizer or Undefined Behavior Sanitizer to catch extra bugs.

You can install prebuilt versions of Atheris with pip:

```sh
pip3 install atheris
```

These wheels come with a built-in libFuzzer,
which is fine for fuzzing Python code.
If you plan to fuzz native extensions,
you may need to build from source to ensure the libFuzzer version in Atheris matches your Clang version.

## Using Atheris

### Example

```python
#!/usr/bin/python3

import atheris

with atheris.instrument_imports():
  import some_library
  import sys

def TestOneInput(data):
  some_library.parse(data)

atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()
```

When fuzzing Python, Atheris will report a failure if the Python code under test throws an uncaught exception.

### Python coverage

Atheris collects Python coverage information by instrumenting bytecode.
There are 3 options for adding this instrumentation to the bytecode:

- You can instrument the libraries you import:

  ```python
  with atheris.instrument_imports():
    import foo
    from bar import baz
  ```

  This will cause instrumentation to be added to `foo` and `bar`, as well as
  any libraries they import.

- Or, you can instrument individual functions:

  ```python
  @atheris.instrument_func
  def my_function(foo, bar):
    print("instrumented")
  ```

- Or finally, you can instrument everything:

  ```python
  atheris.instrument_all()
  ```

  Put this right before `atheris.Setup()`. This will find every Python function
  currently loaded in the interpreter, and instrument it.
  This might take a while.

Atheris can additionally instrument regular expression checks, e.g. `re.search`.
To enable this feature, you will need to add:
`atheris.enabled_hooks.add("RegEx")`
To your script before your code calls `re.compile`.
Internally this will import the `re` module and instrument the necessary functions.
This is currently an experimental feature.

Similarly, Atheris can instrument str methods; currently only `str.startswith`
and `str.endswith` are supported. To enable this feature, add
`atheris.enabled_hooks.add("str")`. This is currently an experimental feature.

#### Why am I getting "No interesting inputs were found"?

You might see this error:

```
ERROR: no interesting inputs were found. Is the code instrumented for coverage? Exiting.
```

You'll get this error if the first 2 calls to `TestOneInput` didn't produce any
coverage events. Even if you have instrumented some Python code,
this can happen if the instrumentation isn't reached in those first 2 calls.
(For example, because you have a nontrivial `TestOneInput`). You can resolve
this by adding an `atheris.instrument_func` decorator to `TestOneInput`,
using `atheris.instrument_all()`, or moving your `TestOneInput` function into an
instrumented module.

### Visualizing Python code coverage

Examining which lines are executed is helpful for understanding the
effectiveness of your fuzzer. Atheris is compatible with
[`coverage.py`](https://coverage.readthedocs.io/): you can run your fuzzer using
the `coverage.py` module as you would for any other Python program. Here's an
example:

```bash
python3 -m coverage run your_fuzzer.py -atheris_runs=10000  # Times to run
python3 -m coverage html
(cd htmlcov && python3 -m http.server 8000)
```

Coverage reports are only generated when your fuzzer exits gracefully. This
happens if:

- you specify `-atheris_runs=<number>`, and that many runs have elapsed.
- your fuzzer exits by Python exception.
- your fuzzer exits by `sys.exit()`.

No coverage report will be generated if your fuzzer exits due to a
crash in native code, or due to libFuzzer's `-runs` flag (use `-atheris_runs`).
If your fuzzer exits via other methods, such as SIGINT (Ctrl+C), Atheris will
attempt to generate a report but may be unable to (depending on your code).
For consistent reports, we recommend always using
`-atheris_runs=<number>`.

If you'd like to examine coverage when running with your corpus, you can do
that with the following command:

```
python3 -m coverage run your_fuzzer.py corpus_dir/* -atheris_runs=$(( 1 + $(ls corpus_dir | wc -l) ))
```

This will cause Atheris to run on each file in `<corpus-dir>`, then exit.
Note: atheris use empty data set as the first input even if there is no empty file in `<corpus_dir>`.
Importantly, if you leave off the `-atheris_runs=$(ls corpus_dir | wc -l)`, no
coverage report will be generated.

Using coverage.py will significantly slow down your fuzzer, so only use it for
visualizing coverage; don't use it all the time.

### Fuzzing Native Extensions

In order for fuzzing native extensions to be effective, your native extensions
must be instrumented. See [Native Extension Fuzzing](https://github.com/google/atheris/blob/master/native_extension_fuzzing.md)
for instructions.

### Structure-aware Fuzzing

Atheris is based on a coverage-guided mutation-based fuzzer (LibFuzzer). This
has the advantage of not requiring any grammar definition for generating inputs,
making its setup easier. The disadvantage is that it will be harder for the
fuzzer to generate inputs for code that parses complex data types. Often the
inputs will be rejected early, resulting in low coverage.

Atheris supports custom mutators
[(as offered by LibFuzzer)](https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md)
to produce grammar-aware inputs.

Example (Atheris-equivalent of the
[example in the LibFuzzer docs](https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md#example-compression)):

```python
@atheris.instrument_func
def TestOneInput(data):
  try:
    decompressed = zlib.decompress(data)
  except zlib.error:
    return

  if len(decompressed) < 2:
    return

  try:
    if decompressed.decode() == 'FU':
      raise RuntimeError('Boom')
  except UnicodeDecodeError:
    pass
```

To reach the `RuntimeError` crash, the fuzzer needs to be able to produce inputs
that are valid compressed data and satisfy the checks after decompression.
It is very unlikely that Atheris will be able to produce such inputs: mutations
on the input data will most probably result in invalid data that will fail at
decompression-time.

To overcome this issue, you can define a custom mutator function (equivalent to
`LLVMFuzzerCustomMutator`).
This example produces valid compressed data. To enable Atheris to make use of
it, pass the custom mutator function to the invocation of `atheris.Setup`.

```python
def CustomMutator(data, max_size, seed):
  try:
    decompressed = zlib.decompress(data)
  except zlib.error:
    decompressed = b'Hi'
  else:
    decompressed = atheris.Mutate(decompressed, len(decompressed))
  return zlib.compress(decompressed)

atheris.Setup(sys.argv, TestOneInput, custom_mutator=CustomMutator)
atheris.Fuzz()
```

As seen in the example, the custom mutator may request Atheris to mutate data
using `atheris.Mutate()` (this is equivalent to `LLVMFuzzerMutate`).

You can experiment with [custom_mutator_example.py](example_fuzzers/custom_mutator_example.py)
and see that without the mutator Atheris would not be able to find the crash,
while with the mutator this is achieved in a matter of seconds.

```shell
$ python3 example_fuzzers/custom_mutator_example.py --no_mutator
[...]
#2      INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 37Mb
#524288 pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 262144 rss: 37Mb
#1048576        pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 349525 rss: 37Mb
#2097152        pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 299593 rss: 37Mb
#4194304        pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 279620 rss: 37Mb
[...]

$ python3 example_fuzzers/custom_mutator_example.py
[...]
INFO: found LLVMFuzzerCustomMutator (0x7f9c989fb0d0). Disabling -len_control by default.
[...]
#2      INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 37Mb
#3      NEW    cov: 4 ft: 4 corp: 2/11b lim: 4096 exec/s: 0 rss: 37Mb L: 10/10 MS: 1 Custom-
#12     NEW    cov: 5 ft: 5 corp: 3/21b lim: 4096 exec/s: 0 rss: 37Mb L: 10/10 MS: 7 Custom-CrossOver-Custom-CrossOver-Custom-ChangeBit-Custom-
 === Uncaught Python exception: ===
RuntimeError: Boom
Traceback (most recent call last):
  File "example_fuzzers/custom_mutator_example.py", line 62, in TestOneInput
    raise RuntimeError('Boom')
[...]
```

Custom crossover functions (equivalent to `LLVMFuzzerCustomCrossOver`) are also
supported. You can pass the custom crossover function to the invocation of
`atheris.Setup`. See its usage in [custom_crossover_fuzz_test.py](src/custom_crossover_fuzz_test.py).

#### Structure-aware Fuzzing with Protocol Buffers

[libprotobuf-mutator](https://github.com/google/libprotobuf-mutator) has
bindings to use it together with Atheris to perform structure-aware fuzzing
using protocol buffers.

See the documentation for
[atheris_libprotobuf_mutator](contrib/libprotobuf_mutator/README.md).

## Integration with OSS-Fuzz

Atheris is fully supported by [OSS-Fuzz](https://github.com/google/oss-fuzz), Google's continuous fuzzing service for open source projects. For integrating with OSS-Fuzz, please see [https://google.github.io/oss-fuzz/getting-started/new-project-guide/python-lang](https://google.github.io/oss-fuzz/getting-started/new-project-guide/python-lang).

## API

The `atheris` module provides three key functions: `instrument_imports()`, `Setup()` and `Fuzz()`.

In your source file, import all libraries you wish to fuzz inside a `with atheris.instrument_imports():`-block, like this:

```python
# library_a will not get instrumented
import library_a

with atheris.instrument_imports():
    # library_b will get instrumented
    import library_b
```

Generally, it's best to import `atheris` first and then import all other libraries inside of a `with atheris.instrument_imports()` block.

Next, define a fuzzer entry point function and pass it to `atheris.Setup()` along with the fuzzer's arguments (typically `sys.argv`). Finally, call `atheris.Fuzz()` to start fuzzing. You must call `atheris.Setup()` before `atheris.Fuzz()`.

#### `instrument_imports(include=[], exclude=[])`

- `include`: A list of fully-qualified module names that shall be instrumented.
- `exclude`: A list of fully-qualified module names that shall NOT be instrumented.

This should be used together with a `with`-statement. All modules imported in
said statement will be instrumented. However, because Python imports all modules
only once, this cannot be used to instrument any previously imported module,
including modules required by Atheris. To add coverage to those modules, use
`instrument_all()` instead.

A full list of unsupported modules can be retrieved as follows:

```python
import sys
import atheris
print(sys.modules.keys())
```

#### `instrument_func(func)`

- `func`: The function to instrument.

This will instrument the specified Python function and then return `func`. This
is typically used as a decorator, but can be used to instrument individual
functions too. Note that the `func` is instrumented in-place, so this will
affect all call points of the function.

This cannot be called on a bound method - call it on the unbound version.

#### `instrument_all()`

This will scan over all objects in the interpreter and call `instrument_func` on
every Python function. This works even on core Python interpreter functions,
something which `instrument_imports` cannot do.

This function is experimental.

#### `Setup(args, test_one_input, internal_libfuzzer=None)`

- `args`: A list of strings: the process arguments to pass to the fuzzer, typically `sys.argv`. This argument list may be modified in-place, to remove arguments consumed by the fuzzer.
  See [the LibFuzzer docs](https://llvm.org/docs/LibFuzzer.html#options) for a list of such options.
- `test_one_input`: your fuzzer's entry point. Must take a single `bytes` argument. This will be repeatedly invoked with a single bytes container.
- `internal_libfuzzer`: Indicates whether libfuzzer will be provided by atheris or by an external library (see [native_extension_fuzzing.md](./native_extension_fuzzing.md)). If unspecified, Atheris will determine this
  automatically. If fuzzing pure Python, leave this as `True`.

#### `Fuzz()`

This starts the fuzzer. You must have called `Setup()` before calling this function. This function does not return.

In many cases `Setup()` and `Fuzz()` could be combined into a single function, but they are
separated because you may want the fuzzer to consume the command-line arguments it handles
before passing any remaining arguments to another setup function.

#### `FuzzedDataProvider`

Often, a `bytes` object is not convenient input to your code being fuzzed. Similar to libFuzzer, we provide a FuzzedDataProvider to translate these bytes into other input forms.

You can construct the FuzzedDataProvider with:

```python
fdp = atheris.FuzzedDataProvider(input_bytes)
```

The FuzzedDataProvider then supports the following functions:

```python
def ConsumeBytes(count: int)
```

Consume `count` bytes.

```python
def ConsumeUnicode(count: int)
```

Consume unicode characters. Might contain surrogate pair characters, which according to the specification are invalid in this situation. However, many core software tools (e.g. Windows file paths) support them, so other software often needs to too.

```python
def ConsumeUnicodeNoSurrogates(count: int)
```

Consume unicode characters, but never generate surrogate pair characters.

```python
def ConsumeString(count: int)
```

Alias for `ConsumeBytes` in Python 2, or `ConsumeUnicode` in Python 3.

```python
def ConsumeInt(int: bytes)
```

Consume a signed integer of the specified size (when written in two's complement notation).

```python
def ConsumeUInt(int: bytes)
```

Consume an unsigned integer of the specified size.

```python
def ConsumeIntInRange(min: int, max: int)
```

Consume an integer in the range [`min`, `max`].

```python
def ConsumeIntList(count: int, bytes: int)
```

Consume a list of `count` integers of `size` bytes.

```python
def ConsumeIntListInRange(count: int, min: int, max: int)
```

Consume a list of `count` integers in the range [`min`, `max`].

```python
def ConsumeFloat()
```

Consume an arbitrary floating-point value. Might produce weird values like `NaN` and `Inf`.

```python
def ConsumeRegularFloat()
```

Consume an arbitrary numeric floating-point value; never produces a special type like `NaN` or `Inf`.

```python
def ConsumeProbability()
```

Consume a floating-point value in the range [0, 1].

```python
def ConsumeFloatInRange(min: float, max: float)
```

Consume a floating-point value in the range [`min`, `max`].

```python
def ConsumeFloatList(count: int)
```

Consume a list of `count` arbitrary floating-point values. Might produce weird values like `NaN` and `Inf`.

```python
def ConsumeRegularFloatList(count: int)
```

Consume a list of `count` arbitrary numeric floating-point values; never produces special types like `NaN` or `Inf`.

```python
def ConsumeProbabilityList(count: int)
```

Consume a list of `count` floats in the range [0, 1].

```python
def ConsumeFloatListInRange(count: int, min: float, max: float)
```

Consume a list of `count` floats in the range [`min`, `max`]

```python
def PickValueInList(l: list)
```

Given a list, pick a random value

```python
def ConsumeBool()
```

Consume either `True` or `False`.

## Important considerations for fuzz targets

Some important things to remember about fuzz targets:

- The fuzzing engine will execute the fuzz target many times with different inputs in the same process.
- It must tolerate any kind of input (empty, huge, malformed, etc).
- It must not exit() on any input.
- It may use threads but ideally all threads should be joined at the end of the function.
- It must be as deterministic as possible. Non-determinism (e.g. random decisions not based on the input bytes) will make fuzzing inefficient.
- It must be fast. Try avoiding cubic or greater complexity, logging, or excessive memory consumption.
- Ideally, it should not modify any global state (although that’s not strict).
- Usually, the narrower the target the better. E.g. if your target can parse several data formats, split it into several targets, one per format.

Overview

This skill helps create LibFuzzer-compatible fuzz drivers for Python libraries using Atheris. It guides setting up instrumentation, writing TestOneInput entry points, and integrating custom mutators or coverage tools to find crashes in pure Python and native extensions. The content focuses on practical patterns and common pitfalls for producing effective fuzz targets.

How this skill works

The skill explains how to import and instrument target modules with atheris.instrument_imports(), instrument individual functions, or instrument all loaded code. It shows how to register a bytes-based TestOneInput with atheris.Setup() and start fuzzing with atheris.Fuzz(), how to use FuzzedDataProvider to convert bytes into structured inputs, and how to add custom mutators/crossovers for structure-aware fuzzing.

When to use it

When you need a coverage-guided fuzzer for Python code or CPython native extensions.
When you want to build a libFuzzer-style fuzz target that integrates with Atheris.
When inputs require grammar-aware mutations (use custom mutator or protobuf mutator).
When preparing a project for continuous fuzzing (e.g., OSS-Fuzz) or CI.
When you want to visualize Python coverage with coverage.py while fuzzing.

Best practices

Import atheris first and import target modules inside with atheris.instrument_imports() to get bytecode coverage.
Keep TestOneInput small and deterministic; use atheris.instrument_func on it if coverage initialization fails.
Use FuzzedDataProvider to parse bytes into structured values rather than manual slicing.
Provide custom mutator/crossover for parsers or compressed formats to reach deeper states.
Build Atheris from source when fuzzing native extensions to match your clang/libFuzzer versions.
Run with -atheris_runs=<N> to produce reliable coverage reports via coverage.py.

Example use cases

Fuzzing a JSON/YAML parsing library by converting bytes into strings via FuzzedDataProvider.
Testing a Python wrapper around a C extension, built with ASAN/UBSAN enabled.
Adding a custom mutator that decompresses input, mutates payload, and recompresses to exercise parser internals.
Integrating Atheris fuzz targets into OSS-Fuzz for continuous automated testing.
Instrumenting regex and str hooks to improve fuzzer effectiveness on string-heavy code.

FAQ

Why does Atheris report "no interesting inputs were found"?

This happens if the first two TestOneInput calls produce no coverage events. Instrument the entry function (atheris.instrument_func), use instrument_all(), or ensure TestOneInput calls into instrumented code early.

How do I fuzz native C extensions?

Instrument the native extension for coverage, build with matching clang/libFuzzer if needed, and follow Atheris native extension fuzzing guidance. Consider building Atheris from source to match toolchain versions.