pyquist.audio

The Audio class — a thin, validated wrapper around a 2D numpy.ndarray of float32 samples.

Raw audio sample manipulation via Numpy-backed containers.

Everything in this module centers on Audio, a thin wrapper around a 2D float32 numpy array shaped (num_samples, num_channels) plus a sample_rate in Hz. By convention, sample values in [-1.0, 1.0] are digital full-scale; values outside that range are valid in memory but clip on playback or when written to most file formats.

Construct one from a numpy array, or load existing audio from disk or the web:

import numpy as np
import pyquist as pq

sr = 44100
t = np.arange(sr) / sr
tone = pq.Audio(0.5 * np.sin(2 * np.pi * 440 * t), sample_rate=sr)  # 1s of A4

riff = pq.Audio.from_file("guitar.wav")
drums = pq.Audio.from_url("https://example.com/drums.mp3")

Audio behaves like a numpy array where it can — it supports indexing, slicing, len(), and elementwise arithmetic (+, -, *, /, in-place variants), all returning Audio:

mix = riff + drums[: len(riff)]  # sum the overlapping region
mix *= 0.5  # halve the amplitude in place

On top of that it offers music-specific helpers that return new Audio objects:

clip = mix.as_mono().segment(offset=1.0, duration=3.0).resample(8000)
clip.normalize(peak_dbfs=-1.0)
clip.write("clip.wav")

See Audio.zeros() for an empty destination buffer and Audio.concatenate() to join buffers end to end. To turn musical events into Audio, see pyquist.score.

class pyquist.audio.Audio(samples, sample_rate=None)[source]

Bases: object

A wrapper around a 2D float32 numpy array of audio samples.

The two primary attributes are samples (a float32 array shaped (num_samples, num_channels)) and sample_rate (Hz, or None for buffers without a defined rate). By convention, sample values in [-1.0, 1.0] correspond to digital full-scale amplitude; values outside this range are valid in memory but will clip when sent to playback or written to most file formats.

Example

>>> import numpy as np
>>> import pyquist as pq
>>> sr = 44100
>>> t = np.arange(sr) / sr
>>> audio = pq.Audio(np.sin(2 * np.pi * 440 * t), sample_rate=sr)
>>> pq.play(audio)
Parameters:

Wraps an existing numpy array as Audio.

Parameters:
  • samples (ndarray) – A numpy array of samples. Accepted as 0-D, 1-D, or 2-D (see the samples setter for shape normalization). Must be float32 or float64 (the latter is auto-converted).

  • sample_rate (Optional[int]) – Optional sample rate in Hz; None for unspecified (e.g. when used as a real-time block buffer).

classmethod zeros(num_samples, num_channels, sample_rate=None)[source]

Creates a silent (zero-filled) Audio of the given shape.

Useful as a destination buffer that you fill in via audio.samples or via in-place arithmetic.

Parameters:
  • num_samples (int) – Number of samples per channel. Must be >= 0.

  • num_channels (int) – Number of channels (1 for mono, 2 for stereo). Must be >= 0.

  • sample_rate (Optional[int]) – Optional sample rate in Hz.

Return type:

Audio

classmethod from_file(file)[source]

Loads an Audio from a file on disk or a file-like object.

Decoding is delegated to soundfile (libsndfile), which supports WAV, FLAC, OGG, MP3, and most common formats. The file’s native sample rate is preserved; channels remain in their original order. Use resample() to change the rate after loading.

Raises FileNotFoundError (with the offending path) when file is a path that doesn’t exist — clearer than libsndfile’s generic "System error" message.

Parameters:

file (Union[str, Path, IO])

Return type:

Audio

classmethod from_url(url)[source]

Downloads an audio file from a URL and loads it as Audio.

The full response is buffered in memory before decoding.

Parameters:

url (str)

Return type:

Audio

classmethod concatenate(audios)[source]

Joins a sequence of Audio end-to-end along the sample axis.

All inputs must share a num_channels and a sample_rate; otherwise ValueError is raised. The list must be non-empty.

Parameters:

audios (list[Audio]) – A non-empty list of Audio to join in order.

Return type:

Audio

property samples: ndarray

The underlying (num_samples, num_channels) float32 array.

Returned by reference: in-place mutations (audio.samples[0] = 0, audio.samples *= 0.5) modify the audio directly. Reassigning the attribute (audio.samples = new_array) re-runs validation.

property sample_rate: int | None

The sample rate in Hz, or None if unspecified.

property num_samples: int

Number of samples per channel (samples.shape[0]).

property num_channels: int

Number of channels (samples.shape[1]); 1 for mono, 2 for stereo.

property shape: tuple

(num_samples, num_channels).

Type:

Shape of the underlying array

property duration: float

Duration of the audio in seconds. Requires sample_rate to be set.

property peak_amplitude: float

Peak absolute sample value across all samples and channels.

This is a linear amplitude (not decibels): 1.0 corresponds to digital full scale. Empty audio returns 0.0. Use pyquist.helper.amplitude_to_db() to convert to dBFS.

clear()[source]

Fills the audio with silence (zeros) in place.

Shape, dtype, and sample_rate are unchanged.

Return type:

None

segment(*, offset=None, duration=None)[source]

Returns a new Audio containing a time-slice of this one.

Both offset and duration are in seconds and require sample_rate to be set. Out-of-range values are clamped: a negative offset is treated as zero, and a duration that runs past the end is truncated. With both arguments None this is a no-op that returns self.

Parameters:
  • offset (Optional[float]) – Start time in seconds. Defaults to the beginning.

  • duration (Optional[float]) – Length in seconds. Defaults to the rest of the audio.

Return type:

Audio

Returns:

A new Audio carrying the same sample_rate as self.

normalize(*, peak_dbfs=0.0, in_place=True)[source]

Scales the audio so its peak amplitude matches peak_dbfs.

peak_dbfs is measured in decibels relative to digital full scale (dBFS). 0.0 means full-scale (peak = 1.0); -6.0 means roughly half full-scale (peak ≈ 0.501); positive values exceed full scale and will clip on playback. Silent audio (all zeros) is returned unchanged.

Parameters:
  • peak_dbfs (float) – Target peak level in dBFS. Defaults to 0.0.

  • in_place (bool) – If True (default), modifies and returns self. If False, returns a new Audio and leaves the original untouched.

Return type:

Audio

clip(*, peak_amplitude=1.0, in_place=True)[source]

Symmetrically clamps every sample to [-peak_amplitude, +peak_amplitude].

This is a hard clip — samples beyond the threshold are truncated, not scaled. To rescale instead, use normalize().

Parameters:
  • peak_amplitude (float) – Symmetric clip threshold in linear amplitude. Defaults to 1.0 (digital full scale).

  • in_place (bool) – If True (default), modifies and returns self. If False, returns a new Audio and leaves the original untouched.

Return type:

Audio

as_mono()[source]

Returns a mono (1-channel) version of the audio.

Multi-channel audio is mixed down by averaging across channels (mean, not sum), which preserves perceived loudness without risking clipping. If the audio is already mono, returns self (no copy).

Return type:

Audio

as_stereo()[source]

Returns a stereo (2-channel) version of the audio.

Mono audio is duplicated across both channels (the same signal in L and R). Stereo audio is returned as self (no copy). Audio with 3 or more channels raises ValueError — this method does not try to guess a downmix.

Return type:

Audio

resample(new_sample_rate, **kwargs)[source]

Returns a new Audio resampled to new_sample_rate.

Resampling is performed by soxr using a bandlimited sinc filter; extra keyword arguments (e.g. quality='VHQ') are forwarded to soxr.resample(). The number of channels is preserved; the number of samples scales by new_sample_rate / self.sample_rate.

Raises ValueError if self.sample_rate is None or new_sample_rate is non-positive.

Parameters:

new_sample_rate (int)

Return type:

Audio

write(file, **kwargs)[source]

Writes the audio to a file via soundfile.

The output format is inferred from the file extension (.wav, .flac, .ogg, …). Extra keyword arguments are forwarded to soundfile.write() (e.g. subtype='PCM_24'). Samples outside [-1.0, 1.0] will clip in fixed-point formats; consider calling clip() or normalize() first.

Raises ValueError if self.sample_rate is None.

Parameters:

file (Union[str, IO])

Return type:

None