coolpuppy Python API

While coolpup.py was designed with CLI in mind, it’s possible to use the classes and functions directly in Python code to perform pileups.

coolpuppy.coolpup module

class coolpuppy.coolpup.CoordCreator(features, resolution, *, features_format='auto', flank=100000, rescale_flank=None, chroms='all', minshift=100000, maxshift=1000000, nshifts=10, mindist='auto', maxdist=None, local=False, subset=0, trans=False, seed=None)

Bases: object

__init__(features, resolution, *, features_format='auto', flank=100000, rescale_flank=None, chroms='all', minshift=100000, maxshift=1000000, nshifts=10, mindist='auto', maxdist=None, local=False, subset=0, trans=False, seed=None)

Generator of coordinate pairs for pileups.

Parameters:
  • features (DataFrame) – A bed- or bedpe-style file with coordinates.

  • resolution (int, optional) – Data resolution.

  • features_format (str, optional) –

    Format of the features. Options:

    bed: chrom, start, end bedpe: chrom1, start1, end1, chrom2, start2, end2 auto (default): determined from the columns in the DataFrame

  • flank (int, optional) – Padding around the central bin, in bp. For example, with 5000 bp resolution and 100000 flank, final pileup is 205000×205000 bp. The default is 100000.

  • rescale_flank (float, optional) – Fraction of ROI size added on each end when extracting snippets, if rescale. The default is None. If specified, overrides flank.

  • chroms (str or list, optional) – Which chromosomes to use for pileups. Has to be in a list even for a single chromosome, e.g. [‘chr1’]. The default is “all”

  • minshift (int, optional) – Minimal shift applied when generating random controls, in bp. The default is 10 ** 5.

  • maxshift (int, optional) – Maximal shift applied when generating random controls, in bp. The default is 10 ** 6.

  • nshifts (int, optional) – How many shifts to generate per region of interest. Does not take chromosome boundaries into account The default is 10.

  • mindist (int, optional) – Shortest interactions to consider. Uses midpoints of regions of interest. “auto” selects it to avoid the two shortest diagonals of the matrix, i.e. 2 * flank + 2 * resolution The default is “auto”.

  • maxdist (int, optional) – Longest interactions to consider. The default is None.

  • local (bool, optional) – Whether to generate local coordinates, i.e. on-diagonal. The default is False.

  • subset (int, optional) – What subset of the coordinate files to use. 0 or negative to use all. The default is 0.

  • seed (int, optional) – Seed for np.random to make it reproducible. The default is None.

  • trans (bool, optional) – Whether to generate inter-chromosomal (trans) pileups. The default is False

Return type:

Object that generates coordinates for pileups required for PileUpper.

bedpe2bed(df, ends=True, how='center')
empty_stream(*args, **kwargs)
filter_func_all(intervals)
filter_func_chrom(chrom)
filter_func_region(region)
filter_func_trans_pairs(region1, region2)
get_combinations(filter_func1, filter_func2=None, intervals=None, control=False, groupby=[], modify_2Dintervals_func=None)
get_intervals_stream(filter_func1, filter_func2=None, intervals=None, control=False, groupby=[], modify_2Dintervals_func=None)
process()
class coolpuppy.coolpup.PileUpper(clr, CC, *, view_df=None, clr_weight_name='weight', expected=False, expected_value_col='balanced.avg', ooe=True, control=False, coverage_norm=False, rescale=False, rescale_size=99, flip_negative_strand=False, ignore_diags=2, store_stripes=False, nproc=1)

Bases: object

__init__(clr, CC, *, view_df=None, clr_weight_name='weight', expected=False, expected_value_col='balanced.avg', ooe=True, control=False, coverage_norm=False, rescale=False, rescale_size=99, flip_negative_strand=False, ignore_diags=2, store_stripes=False, nproc=1)

Creates pileups

Parameters:
  • clr (cool) – Cool file with Hi-C data.

  • CC (CoordCreator) – CoordCreator object with correct settings.

  • clr_weight_name (bool or str, optional) – Whether to use balanced data, and which column to use as weights. The default is “weight”. Provide False to use raw data.

  • expected (DataFrame, optional) – If using expected, pandas DataFrame with by-distance expected. The default is False.

  • expected_value_col (str, optional) – Which column in the expected_df contains values to use for normalization

  • view_df (DataFrame) – A dataframe with region coordinates used in expected (see bioframe documentation for details). Can be ommited if no expected is provided, or expected is for whole chromosomes.

  • ooe (bool, optional) – Whether to normalize each snip by expected value. If False, all snips are accumulated, all expected values are accumulated, and then the former divided by the latter - like with randomly shifted controls. Only has effect when expected is provided.

  • control (bool, optional) – Whether to use randomly shifted controls. The default is False.

  • coverage_norm (bool or str, optional) – Whether to normalize final the final pileup by accumulated coverage as an alternative to balancing. Useful for single-cell Hi-C data. Can be either boolean, or string: “cis” or “total” to use “cov_cis_raw” or “cov_tot_raw” columns in the cooler bin table, respectively. If True, will attempt to use “cov_tot_raw” if available, otherwise will compute and store coverage in the cooler with default column names, and use “cov_tot_raw”. Alternatively, if a different string is provided, will attempt to use a column with the that name in the cooler bin table, and will raise a ValueError if it does not exist. Only allowed when clr_weight_name is False. The default is False.

  • rescale (bool, optional) – Whether to rescale the pileups. The default is False

  • rescale_size (int, optional) – Final shape of rescaled pileups. E.g. if 99, pileups will be squares of 99×99 pixels. The default is 99.

  • flip_negative_strand (bool, optional) – Flip snippets so the positive strand always points to bottom-right. Requires strands to be annotated for each feature (or two strands for bedpe format features)

  • ignore_diags (int, optional) – How many diagonals to ignore to avoid short-distance artefacts. The default is 2.

  • store_stripes (bool, optional) – Whether to store horizontal and vertical stripes and coordinates in the output The default is False

  • nproc (int, optional) – Number of processes to use. The default is 1.

Return type:

Object that generates pileups.

accumulate_stream(snip_stream, postprocess_func=None, extra_funcs=None)
Parameters:
  • snip_stream (generator) –

    Generator of pd.Series, each one containing at least:

    a snippet as a 2D array in [‘data’], [‘cov_start’] and [‘cov_end’] as 1D arrays (can be all 0)

    And any other annotations

  • postprocess_func (function, optional) – Any additional postprocessing of each snip needed, in one function. Can be used to modify the data in un-standard way, or create groups when it can’t be done before snipping, or to assign each snippet to multiple groups. Example: lib.puputils.group_by_region.

  • extra_funcs (dict, optional) – Any additional functions to be applied every time a snip is added to a pileup or two pileups are summed up - see _add_snip and sum_pups.

Returns:

outdict – Dictionary of accumulated snips (each as a Series) for each group. Always includes “all”

Return type:

dict

get_data(region1, region2=None)

Get sparse data for a region

Parameters:
  • region1 (tuple or str) – Region for which to load the data. Either tuple of (chr, start, end), or string with region name.

  • region2 (tuple or str, optional) – Second region for between which and the first region to load the data. Either tuple of (chr, start, end), or string with region name. Default is None

Returns:

data – Sparse csr matrix for the corresponding region.

Return type:

csr

get_expected_trans(region1, region2)
make_outmap()

Generate zero-filled array of the right shape

Returns:

outmap – Array of zeros of correct shape.

Return type:

array

pileup_region(region1, region2=None, groupby=[], modify_2Dintervals_func=None, postprocess_func=None, extra_sum_funcs=None)
Parameters:
  • region1 (str) – Region name.

  • region2 (str, optional) – Region name.

  • groupby (list of str, optional) – Which attributes of each snip to assign a group to it

  • modify_2Dintervals_func (function, optional) – A function to apply to a dataframe of genomic intervals used for pileups. If possible, much preferable to postprocess_func for better speed. Good example is the bin_distance_intervals function above.

  • postprocess_func (function, optional) – Additional function to apply to each snippet before grouping. Good example is the lib.puputils.bin_distance function, but using bin_distance_intervals as modify_2Dintervals_func is much faster.

  • extra_sum_funcs (dict, optional) – Any additional functions to be applied every time a snip is added to a pileup or two pileups are summed up - see _add_snip and sum_pups.

Returns:

pileup – accumulated snips as a dict

Return type:

dict

pileupsByDistanceWithControl(nproc=None, distance_edges='default', groupby=[], ignore_group_order=False)

Perform by-distance pileups across all chromosomes and applies required normalization. Simple wrapper around pileupsWithControl

Parameters:
  • nproc (int, optional) – How many cores to use. Sends a whole chromosome per process. The default is None, which uses the same number as nproc set at creation of the object.

  • distance_edges (list/array of int) – How to group snips by distance (based on their centres). Default uses separations [0, 50_000, 100_000, 200_000, …]

  • groupby (list of str, optional) – Which attributes of each snip to assign a group to it

Returns:

pileup_df – Normalized pileups in a pandas DataFrame, with columns data and num. data contains the normalized pileups, and num - how many snippets were combined (the regions of interest, not control regions). Each distance band is a row, annotated in column distance_band

Return type:

2D array

pileupsByStrandByDistanceWithControl(nproc=None, distance_edges='default', groupby=[], ignore_group_order=False)

Perform by-strand by-distance pileups across all chromosomes and applies required normalization. Simple wrapper around pileupsWithControl. Assumes the features in CoordCreator file has a “strand” column.

Parameters:
  • nproc (int, optional) – How many cores to use. Sends a whole chromosome per process. The default is None, which uses the same number as nproc set at creation of the object.

  • distance_edges (list/array of int) – How to group snips by distance (based on their centres). Default uses separations [0, 50_000, 100_000, 200_000, …]

  • groupby (list of str, optional) – Which attributes of each snip to assign a group to it

Returns:

pileup_df – Normalized pileups in a pandas DataFrame, with columns data and num. data contains the normalized pileups, and num - how many snippets were combined (the regions of interest, not control regions). Each distance band is a row, annotated in columns separation

Return type:

2D array

pileupsByStrandWithControl(nproc=None, groupby=[], ignore_group_order=False)

Perform by-strand pileups across all chromosomes and applies required normalization. Simple wrapper around pileupsWithControl. Assumes the features in CoordCreator file has a “strand” column.

Parameters:
  • nproc (int, optional) – How many cores to use. Sends a whole chromosome per process. The default is None, which uses the same number as nproc set at creation of the object.

  • groupby (list of str, optional) – Which attributes of each snip to assign a group to it

Returns:

pileup_df – Normalized pileups in a pandas DataFrame, with columns data and num. data contains the normalized pileups, and num - how many snippets were combined (the regions of interest, not control regions). Each distance band is a row, annotated in columns separation

Return type:

2D array

pileupsByWindowWithControl(nproc=None)

Perform by-window (i.e. for each region) pileups across all chromosomes and applies required normalization. Simple wrapper around pileupsWithControl

Parameters:

nproc (int, optional) – How many cores to use. Sends a whole chromosome per process. The default is None, which uses the same number as nproc set at creation of the object.

Returns:

pileup_df – Normalized pileups in a pandas DataFrame, with columns data and num. data contains the normalized pileups, and num - how many snippets were combined (the regions of interest, not control regions). Each window is a row (coordinates are recorded in columns [‘chrom’, ‘start’, ‘end’]), plus an additional row is created with all data (with “all” in the “chrom” column and -1 in start and end).

Return type:

2D array

pileupsWithControl(nproc=None, groupby=[], ignore_group_order=False, modify_2Dintervals_func=None, postprocess_func=None, extra_sum_funcs=None)

Perform pileups across all chromosomes and applies required normalization

Parameters:
  • nproc (int, optional) – How many cores to use. Sends a whole chromosome per process. The default is None, which uses the same number as nproc set at creation of the object.

  • groupby (list of str, optional) – Which attributes of each snip to assign a group to it

  • ignore_group_order (bool or str or list, optional) – When using groupby, reorder so that e.g. group1-group2 and group2-group1 will be combined into one and flipped to the correct orientation. If using multiple paired groupings (e.g. group1-group2 and category1-category2), need to specify which grouping should be prioritised, e.g. “group” or [“group1”, “group2”]. For flip_negative_strand, +- and -+ strands will be combined

  • modify_2Dintervals_func (function, optional) – Function to apply to the DataFrames of coordinates before fetching snippets based on them. Preferable to using the postprocess_func, since at the earlier stage it can be vectorized and much more efficient.

  • postprocess_func (function, optional) – Additional function to apply to each snippet before grouping. Good example is the lib.puputils.bin_distance function.

  • extra_sum_funcs (dict, optional) – Any additional functions to be applied every time a snip is added to a pileup or two pileups are summed up - see _add_snip and sum_pups.

Returns:

pileup_df – Normalized pileups in a pandas DataFrame, with columns data and num. data contains the normalized pileups, and num - how many snippets were combined (the regions of interest, not control regions). Each condition from groupby is a row, plus an additional row all is created with all data.

Return type:

2D array

coolpuppy.coolpup.assign_groups(intervals, groupby=[])

Assign groups to rows based on a list of columns

Parameters:
  • intervals (pd.DataFrame) – Dataframe containing intervals with any annotations.

  • groupby (list, optional) – List of columns to use to assign a group. The default is [].

Returns:

intervals – Adds a “group” column with the annotation based on groupby. If groupby is empty, assigns “all” to all rows.

Return type:

pd.DataFrame

coolpuppy.coolpup.bin_distance_intervals(intervals, band_edges='default')
Parameters:
  • intervals (pd.DataFrame) – Dataframe containing intervals with any annotations. Has to have a ‘distance’ column

  • band_edges (list or array-like, or "default", optional) – Edges of distance bands used to split the intervals into groups. Default is np.append([0], 50000 * 2 ** np.arange(30))

Returns:

snip – The same dataframe with added [‘distance_band’] annotation.

Return type:

pd.DataFrame

coolpuppy.coolpup.expand(intervals, flank, resolution, rescale_flank=None)
coolpuppy.coolpup.expand2D(intervals, flank, resolution, rescale_flank=None)
coolpuppy.coolpup.flip_mark_intervals_func(intervals, flipby, flip_negative_strand, extra_func=None)
coolpuppy.coolpup.flip_snip_func(snip, groupby, ignore_group_order, extra_func=None)
coolpuppy.coolpup.pileup(clr, features, features_format='bed', view_df=None, expected_df=None, expected_value_col='balanced.avg', clr_weight_name='weight', flank=100000, minshift=100000, maxshift=1000000, nshifts=0, ooe=True, mindist='auto', maxdist=None, min_diag=2, subset=0, by_window=False, by_strand=False, by_distance=False, groupby=[], ignore_group_order=False, flip_negative_strand=False, local=False, coverage_norm=False, trans=False, rescale=False, rescale_flank=1, rescale_size=99, store_stripes=False, nproc=1, seed=None)

Create pileups

Parameters:
  • clr (cool) – Cool file with Hi-C data.

  • features (DataFrame) – A bed- or bedpe-style file with coordinates.

  • features_format (str, optional) –

    Format of the features. Options:

    bed: chrom, start, end bedpe: chrom1, start1, end1, chrom2, start2, end2 auto (default): determined from the columns in the DataFrame

  • view_df (DataFrame) – A dataframe with region coordinates used in expected (see bioframe documentation for details). Can be ommited if no expected is provided, or expected is for whole chromosomes.

  • expected_df (DataFrame, optional) – If using expected, pandas DataFrame with by-distance expected. The default is False.

  • expected_value_col (str, optional) – Which column in the expected_df contains values to use for normalization

  • clr_weight_name (bool or str, optional) – Whether to use balanced data, and which column to use as weights. The default is “weight”. Provide False to use raw data.

  • flank (int, optional) – Padding around the central bin, in bp. For example, with 5000 bp resolution and 100000 flank, final pileup is 205000×205000 bp. The default is 100000.

  • minshift (int, optional) – Minimal shift applied when generating random controls, in bp. The default is 10 ** 5.

  • maxshift (int, optional) – Maximal shift applied when generating random controls, in bp. The default is 10 ** 6.

  • nshifts (int, optional) – How many shifts to generate per region of interest. Does not take chromosome boundaries into account The default is 10.

  • ooe (bool, optional) – Whether to normalize each snip by expected value. If False, all snips are accumulated, all expected values are accumulated, and then the former divided by the latter - like with randomly shifted controls. Only has effect when expected is provided. Default is True.

  • mindist (int, optional) – Shortest interactions to consider. Uses midpoints of regions of interest. “auto” selects it to avoid the two shortest diagonals of the matrix, i.e. 2 * flank + 2 * resolution The default is “auto”.

  • maxdist (int, optional) – Longest interactions to consider. The default is None.

  • min_diag (int, optional) – How many diagonals to ignore to avoid short-distance artefacts. The default is 2.

  • subset (int, optional) – What subset of the coordinate files to use. 0 or negative to use all. The default is 0.

  • by_window (bool, optional) – Whether to create a separate pileup for each feature by accumulating all of its interactions with other features. Produces as many pileups, as there are features. The default is False.

  • by_strand (bool, optional) – Whether to create a separate pileup for each combination of “strand1”, “strand2” in features. If features_format==’bed’, first creates pairwise combinations of features, and the original features need to have a column “strand”. If features_format==’bedpe’, they need to have “strand1” and “strand2” columns. The default is False.

  • by_distance (bool or list, optional) –

    Whether to create a separate pileup for different distance separations. If features_format==’bed’, internally creates pairwise combinations of features. If True, splits all separations using edges defined like this:

    band_edges = np.append([0], 50000 * 2 ** np.arange(30))

    Alternatively, a list of integer values can be given with custom distance edges. The default is False.

  • groupby (list of str, optional) – Additional columns of features to use for groupby. If feature_format==’bed’, each columns should be specified twice with suffixes “1” and “2”, i.e. if features have a column “group”, specify [“group1”, “group2”]. The default is [].

  • ignore_group_order (bool or str or list, optional) – When using groupby, reorder so that e.g. group1-group2 and group2-group1 will be combined into one and flipped to the correct orientation. If using multiple paired groupings (e.g. group1-group2 and category1-category2), need to specify which grouping should be prioritised, e.g. “group” or [“group1”, “group2”]. For flip_negative_strand, +- and -+ strands will be combined

  • flip_negative_strand (bool, optional) – Flip snippets so the positive strand always points to bottom-right. Requires strands to be annotated for each feature (or two strands for bedpe format features)

  • local (bool, optional) – Whether to generate local coordinates, i.e. on-diagonal. The default is False.

  • coverage_norm (bool or str, optional) – Whether to normalize final the final pileup by accumulated coverage as an alternative to balancing. Useful for single-cell Hi-C data. Can be either boolean, or string: “cis” or “total” to use “cov_cis_raw” or “cov_tot_raw” columns in the cooler bin table, respectively. If True, will attempt to use “cov_tot_raw” if available, otherwise will compute and store coverage in the cooler with default column names, and use “cov_tot_raw”. Alternatively, if a different string is provided, will attempt to use a column with the that name in the cooler bin table, and will raise a ValueError if it does not exist. Only allowed when clr_weight_name is False. The default is False.

  • trans (bool, optional) – Whether to generate inter-chromosomal (trans) pileups. The default is False

  • rescale (bool, optional) – Whether to rescale the pileups. The default is False

  • rescale_flank (float, optional) – Fraction of ROI size added on each end when extracting snippets, if rescale. The default is None. If specified, overrides flank.

  • rescale_size (int, optional) – Final shape of rescaled pileups. E.g. if 99, pileups will be squares of 99×99 pixels. The default is 99.

  • store_stripes (bool, optional) – Whether to store horizontal and vertical stripes and coordinates in the output The default is False

  • nproc (int, optional) – Number of processes to use. The default is 1.

  • seed (int, optional) – Seed for np.random to make it reproducible. The default is None.

Returns:

  • pileup_df - pandas DataFrame containing the pileups and their grouping information,

  • if any, all possible annotations from the arguments of this function.

coolpuppy.lib.io module

coolpuppy.lib.io.is_gz_file(filepath)
coolpuppy.lib.io.load_array_with_header(filename)

Load array from files generated using save_array_with_header. They are simple txt files with an optional header in the first lines, commented using “# “. If uncommented, the header is in YAML.

Parameters:

filename (string) – File to load from.

Returns:

data – Dictionary with information from the header. Access the associated data in an array using data[‘data’].

Return type:

dict

coolpuppy.lib.io.load_pileup_df(filename, quaich=False, skipstripes=False)

Loads a dataframe saved using save_pileup_df

Parameters:
  • filename (str) – File to load from.

  • quaich (bool, optional) – Whether to assume standard quaich file naming to extract sample name and bedname. The default is False.

Returns:

annotation – Pileups are in the “data” column, all metadata in other columns

Return type:

pd.DataFrame

coolpuppy.lib.io.load_pileup_df_list(files, quaich=False, nice_metadata=True, skipstripes=False)
Parameters:
  • files (iterable) – Files to read pileups from.

  • quaich (bool, optional) – Whether to assume standard quaich file naming to extract sample name and bedname. The default is False.

  • nice_metadata (bool, optional) – Whether to add nicer metadata for direct plotting. The default is True. Adds a “norm” column (“expected”, “shifts” or “none”).

Returns:

pups – Combined dataframe with all pileups and annotations from all files.

Return type:

pd.DataFrame

coolpuppy.lib.io.save_array_with_header(array, header, filename)

Save a numpy array with a YAML header generated from a dictionary

Parameters:
  • array (np.array) – Array to save.

  • header (dict) – Dictionaty to save into the header.

  • filename (string) – Name of file to save array and metadata into.

coolpuppy.lib.io.save_pileup_df(filename, df, metadata=None, mode='w', compression='lzf')

Saves a dataframe with metadata into a binary HDF5 file`

Parameters:
  • filename (str) – File to save to.

  • df (pd.DataFrame) – DataFrame to save into binary hdf5 file.

  • metadata (dict, optional) – Dictionary with meatadata.

  • mode (str, optional) – Mode for the first time access to the output file: ‘w’ to overwrite if file exists, or ‘a’ to fail if output file already exists

  • compression (str, optional) – Compression to use for saving, e.g. ‘gzip’. Defaults to ‘lzf’

Return type:

None.

Notes

Replaces None in metadata values with False, since HDF5 doesn’t support None

coolpuppy.lib.io.sniff_for_header(file, sep='\t', comment='#')

Warning: reads the entire file into a StringIO buffer!

coolpuppy.lib.numutils module

coolpuppy.lib.numutils.corner_cv(amap, i=4)

Get coefficient of variation for upper left and lower right corners of a pileup to estimate how noisy it is

Parameters:
  • amap (2D array) – Pileup.

  • i (int, optional) – How many bins to use from each upper left and lower right corner: final corner shape is i^2. The default is 4.

Returns:

CV – Coefficient of variation for the corner pixels.

Return type:

float

coolpuppy.lib.numutils.get_domain_score(amap, flank=1)

Divide sum of values in a square from the central part of a matrix by the upper and right rectangles corresponding to interactions of the central region with its surroundings.

Parameters:
  • amap (2D array) – Pileup.

  • flank (int) – Relative padding used, i.e. if 1 the central third is used, if 2 the central fifth is used. The default is 1.

Returns:

score – Domain score.

Return type:

float

coolpuppy.lib.numutils.get_enrichment(amap, n)

Get values from the center of a pileup for a square with side n

Parameters:
  • amap (2D array) – Pileup.

  • n (int) – Side of the central square to use.

Returns:

enrichment – Mean of the pixels in the central square.

Return type:

float

coolpuppy.lib.numutils.get_insulation_strength(amap, ignore_central=0, ignore_diags=2)

Divide values in upper left and lower right corners over upper right and lower left, ignoring the central bins.

Parameters:
  • amap (2D array) – Pileup.

  • ignore_central (int, optional) – How many central bins to ignore. Has to be odd or 0. The default is 0.

Returns:

Insulation strength.

Return type:

float

coolpuppy.lib.numutils.get_local_enrichment(amap, flank=1)

Get values for a square from the central part of a pileup, ignoring padding

Parameters:
  • amap (2D array) – Pileup.

  • flank (int) – Relative padding used, i.e. if 1 the central third is used, if 2 the central fifth is used. The default is 1.

Returns:

enrichment – Mean of the pixels in the central square.

Return type:

float

coolpuppy.lib.numutils.norm_cis(amap, i=3)

Normalize the pileup by mean of pixels from upper left and lower right corners

Parameters:
  • amap (2D array) – Pileup.

  • i (int, optional) – How many bins to use from each upper left and lower right corner: final corner shape is i^2. 0 will not normalize. The default is 3.

Returns:

amap – Normalized pileup.

Return type:

2D array

coolpuppy.lib.puputils module

coolpuppy.lib.puputils.accumulate_values(dict1, dict2, key)

Useful as an extra_sum_func

coolpuppy.lib.puputils.bin_distance(snip, band_edges='default')
Parameters:
  • snip (pd.Series) – Series containing any annotations. Has to have [‘distance’]

  • band_edges (list or array-like, or "default", optional) – Edges of distance bands used to assign the distance band. Default is np.append([0], 50000 * 2 ** np.arange(30))

Returns:

snip – The same snip with added [‘distance_band’] annotation.

Return type:

pd.Series

coolpuppy.lib.puputils.divide_pups(pup1, pup2)

Divide two pups and get the resulting pup. Requires that the pups have identical shapes, resolutions, flanks, etc. If pups contain stripes, these will only be divided if stripes have identical coordinates.

coolpuppy.lib.puputils.get_score(pup, center=3, ignore_central=3)

Calculate a reasonable score for any kind of pileup For non-local (off-diagonal) pileups, calculates average signal in the central pixels (based on ‘center’). For local non-rescaled pileups calculates insulation strength, and ignores the central bins (based on ‘ignore_central’) For local rescaled pileups calculates enrichment in the central rescaled area relative to the two neighouring areas on the sides.

Parameters:
  • pup (pd.Series or dict) – Series or dict with pileup in ‘data’ and annotations in other keys. Will correctly calculate enrichment score with annotations in ‘local’ (book), ‘rescale’ (bool) and ‘rescale_flank’ (float)

  • enrichment (int, optional) – Passed to ‘get_enrichment’ to calculate the average strength of central pixels. The default is 3.

  • ignore_central (int, optional) – How many central bins to ignore for calculation of insulation in local pileups. The default is 3.

Returns:

Score.

Return type:

float

coolpuppy.lib.puputils.group_by_region(snip)
coolpuppy.lib.puputils.norm_coverage(snip)

Normalize a pileup by coverage arrays

Parameters:
  • loop (2D array) – Pileup.

  • cov_start (1D array) – Accumulated coverage of the left side of the pileup.

  • cov_end (1D array) – Accumulated coverage of the bottom side of the pileup.

Returns:

loop – Normalized pileup.

Return type:

2D array

coolpuppy.lib.puputils.sum_pups(pup1, pup2, extra_funcs={})

Preserves data, stripes, cov_start, cov_end, n, num and coordinates Assumes n=1 if not present, and calculates num if not present If store_stripes is set to False, stripes and coordinates will be empty

extra_funcs allows to give arbitrary functions to accumulate extra information from the two pups.

coolpuppy.plotpup module

coolpuppy.plotpup.add_heatmap(data, flank, rescale, rescale_flank, n, max_coordinates, height=1, aspect='auto', color=None, cmap='coolwarm', norm=<Mock name='mock.LogNorm()' id='140362637391696'>, plot_ticks=False, stripe=False, font_scale=1)

Adds the array contained in data.values[0] to the current axes as a heatmap of stripes

coolpuppy.plotpup.add_score(score, height=1, color=None, font_scale=1)

Adds the value contained in score.values[0] to the current axes as a label in top left corner

coolpuppy.plotpup.add_stripe_lineplot(data, resolution, flank, rescale, rescale_flank, height=1, aspect='auto', color=None, cmap='coolwarm', scale='log', norm=<Mock name='mock.LogNorm()' id='140362637391696'>, plot_ticks=False, stripe=False, font_scale=1, colnames=None)

Adds the array contained in data.values[0] to the current axes as a heatmap of stripes and an average lineplot on top. Only works with one condition at a time.

coolpuppy.plotpup.auto_rows_cols(n)

Automatically determines number of rows and cols for n pileups

Parameters:

n (int) – Number of pileups.

Returns:

  • rows (int) – How many rows to use.

  • cols (int) – How many columsn to use.

coolpuppy.plotpup.get_min_max(pups, vmin=None, vmax=None, sym=True, scale='log')

Automatically determine minimal and maximal colour intensity for pileups

Parameters:
  • pups (np.array) – Numpy array of numpy arrays conaining pileups.

  • vmin (float, optional) – Force certain minimal colour. The default is None.

  • vmax (float, optional) – Force certain maximal colour. The default is None.

  • sym (bool, optional) – Whether the output should be cymmetrical around 0. The default is True.

Returns:

  • vmin (float) – Selected minimal colour.

  • vmax (float) – Selected maximal colour.

coolpuppy.plotpup.plot(pupsdf, cols=None, rows=None, score='score', center=3, ignore_central=3, col_order=None, row_order=None, vmin=None, vmax=None, sym=True, norm_corners=0, cmap='coolwarm', cmap_emptypixel=(0.98, 0.98, 0.98), scale='log', height=1, aspect=1, font='DejaVu Sans', font_scale=1, plot_ticks=False, colnames=None, rownames=None, **kwargs)
coolpuppy.plotpup.plot_stripes(pupsdf, cols=None, rows=None, col_order=None, row_order=None, vmin=None, vmax=None, sym=True, cmap='coolwarm', cmap_emptypixel=(0.98, 0.98, 0.98), scale='log', height=1, aspect='auto', stripe='corner_stripe', stripe_sort='sum', out_sorted_bedpe=None, font='DejaVu Sans', font_scale=1, plot_ticks=False, colnames=None, rownames=None, lineplot=False, **kwargs)
coolpuppy.plotpup.sort_separation(sep_string_series, sep='Mb')