coolpuppy Python API¶
While coolpup.py was designed with CLI in mind, it’s possible to use the classes and functions directly in Python code to perform pileups.
coolpuppy.coolpup module¶
- class coolpuppy.coolpup.CoordCreator(features, resolution, *, features_format='auto', flank=100000, rescale_flank=None, chroms='all', minshift=100000, maxshift=1000000, nshifts=10, mindist='auto', maxdist=None, local=False, subset=0, trans=False, seed=None)¶
Bases:
object
- __init__(features, resolution, *, features_format='auto', flank=100000, rescale_flank=None, chroms='all', minshift=100000, maxshift=1000000, nshifts=10, mindist='auto', maxdist=None, local=False, subset=0, trans=False, seed=None)¶
Generator of coordinate pairs for pileups.
- Parameters
features (DataFrame) – A bed- or bedpe-style file with coordinates.
resolution (int, optional) – Data resolution.
features_format (str, optional) –
- Format of the features. Options:
bed: chrom, start, end bedpe: chrom1, start1, end1, chrom2, start2, end2 auto (default): determined from the columns in the DataFrame
flank (int, optional) – Padding around the central bin, in bp. For example, with 5000 bp resolution and 100000 flank, final pileup is 205000×205000 bp. The default is 100000.
rescale_flank (float, optional) – Fraction of ROI size added on each end when extracting snippets, if rescale. The default is None. If specified, overrides flank.
chroms (str or list, optional) – Which chromosomes to use for pileups. Has to be in a list even for a single chromosome, e.g. [‘chr1’]. The default is “all”
minshift (int, optional) – Minimal shift applied when generating random controls, in bp. The default is 10 ** 5.
maxshift (int, optional) – Maximal shift applied when generating random controls, in bp. The default is 10 ** 6.
nshifts (int, optional) – How many shifts to generate per region of interest. Does not take chromosome boundaries into account The default is 10.
mindist (int, optional) – Shortest interactions to consider. Uses midpoints of regions of interest. “auto” selects it to avoid the two shortest diagonals of the matrix, i.e. 2 * flank + 2 * resolution The default is “auto”.
maxdist (int, optional) – Longest interactions to consider. The default is None.
local (bool, optional) – Whether to generate local coordinates, i.e. on-diagonal. The default is False.
subset (int, optional) – What subset of the coordinate files to use. 0 or negative to use all. The default is 0.
seed (int, optional) – Seed for np.random to make it reproducible. The default is None.
trans (bool, optional) – Whether to generate inter-chromosomal (trans) pileups. The default is False
- Return type
Object that generates coordinates for pileups required for PileUpper.
- bedpe2bed(df, ends=True, how='center')¶
- empty_stream(*args, **kwargs)¶
- filter_func_all(intervals)¶
- filter_func_chrom(chrom)¶
- filter_func_region(region)¶
- filter_func_trans_pairs(region1, region2)¶
- get_combinations(filter_func1, filter_func2=None, intervals=None, control=False, groupby=[], modify_2Dintervals_func=None)¶
- get_intervals_stream(filter_func1, filter_func2=None, intervals=None, control=False, groupby=[], modify_2Dintervals_func=None)¶
- process()¶
- class coolpuppy.coolpup.PileUpper(clr, CC, *, view_df=None, clr_weight_name='weight', expected=False, expected_value_col='balanced.avg', ooe=True, control=False, coverage_norm=False, rescale=False, rescale_size=99, flip_negative_strand=False, ignore_diags=2, store_stripes=False, nproc=1)¶
Bases:
object
- __init__(clr, CC, *, view_df=None, clr_weight_name='weight', expected=False, expected_value_col='balanced.avg', ooe=True, control=False, coverage_norm=False, rescale=False, rescale_size=99, flip_negative_strand=False, ignore_diags=2, store_stripes=False, nproc=1)¶
Creates pileups
- Parameters
clr (cool) – Cool file with Hi-C data.
CC (CoordCreator) – CoordCreator object with correct settings.
clr_weight_name (bool or str, optional) – Whether to use balanced data, and which column to use as weights. The default is “weight”. Provide False to use raw data.
expected (DataFrame, optional) – If using expected, pandas DataFrame with by-distance expected. The default is False.
view_df (DataFrame) – A dataframe with region coordinates used in expected (see bioframe documentation for details). Can be ommited if no expected is provided, or expected is for whole chromosomes.
ooe (bool, optional) – Whether to normalize each snip by expected value. If False, all snips are accumulated, all expected values are accumulated, and then the former divided by the latter - like with randomly shifted controls. Only has effect when expected is provided.
control (bool, optional) – Whether to use randomly shifted controls. The default is False.
coverage_norm (bool or str, optional) – Whether to normalize final the final pileup by accumulated coverage as an alternative to balancing. Useful for single-cell Hi-C data. Can be either boolean, or string: “cis” or “total” to use “cov_cis_raw” or “cov_tot_raw” columns in the cooler bin table, respectively. If True, will attempt to use “cov_tot_raw” if available, otherwise will compute and store coverage in the cooler with default column names, and use “cov_tot_raw”. Alternatively, if a different string is provided, will attempt to use a column with the that name in the cooler bin table, and will raise a ValueError if it does not exist. Only allowed when clr_weight_name is False. The default is False.
rescale (bool, optional) – Whether to rescale the pileups. The default is False
rescale_size (int, optional) – Final shape of rescaled pileups. E.g. if 99, pileups will be squares of 99×99 pixels. The default is 99.
flip_negative_strand (bool, optional) – Flip snippets so the positive strand always points to bottom-right. Requires strands to be annotated for each feature (or two strands for bedpe format features)
ignore_diags (int, optional) – How many diagonals to ignore to avoid short-distance artefacts. The default is 2.
store_stripes (bool, optional) – Whether to store horizontal and vertical stripes and coordinates in the output The default is False
nproc (int, optional) – Number of processes to use. The default is 1.
- Return type
Object that generates pileups.
- accumulate_stream(snip_stream, postprocess_func=None, extra_funcs=None)¶
- Parameters
snip_stream (generator) –
- Generator of pd.Series, each one containing at least:
a snippet as a 2D array in [‘data’], [‘cov_start’] and [‘cov_end’] as 1D arrays (can be all 0)
And any other annotations
postprocess_func (function, optional) – Any additional postprocessing of each snip needed, in one function. Can be used to modify the data in un-standard way, or create groups when it can’t be done before snipping, or to assign each snippet to multiple groups. Example: lib.puputils.group_by_region.
extra_funcs (dict, optional) – Any additional functions to be applied every time a snip is added to a pileup or two pileups are summed up - see _add_snip and sum_pups.
- Returns
outdict – Dictionary of accumulated snips (each as a Series) for each group. Always includes “all”
- Return type
dict
- get_data(region1, region2=None)¶
Get sparse data for a region
- Parameters
region1 (tuple or str) – Region for which to load the data. Either tuple of (chr, start, end), or string with region name.
region2 (tuple or str, optional) – Second region for between which and the first region to load the data. Either tuple of (chr, start, end), or string with region name. Default is None
- Returns
data – Sparse csr matrix for the corresponding region.
- Return type
csr
- get_expected_trans(region1, region2)¶
- make_outmap()¶
Generate zero-filled array of the right shape
- Returns
outmap – Array of zeros of correct shape.
- Return type
array
- pileup_region(region1, region2=None, groupby=[], modify_2Dintervals_func=None, postprocess_func=None, extra_sum_funcs=None)¶
- Parameters
region1 (str) – Region name.
region2 (str, optional) – Region name.
groupby (list of str, optional) – Which attributes of each snip to assign a group to it
modify_2Dintervals_func (function, optional) – A function to apply to a dataframe of genomic intervals used for pileups. If possible, much preferable to postprocess_func for better speed. Good example is the bin_distance_intervals function above.
postprocess_func (function, optional) – Additional function to apply to each snippet before grouping. Good example is the lib.puputils.bin_distance function, but using bin_distance_intervals as modify_2Dintervals_func is much faster.
extra_sum_funcs (dict, optional) – Any additional functions to be applied every time a snip is added to a pileup or two pileups are summed up - see _add_snip and sum_pups.
- Returns
pileup – accumulated snips as a dict
- Return type
dict
- pileupsByDistanceWithControl(nproc=None, distance_edges='default', groupby=[])¶
Perform by-distance pileups across all chromosomes and applies required normalization. Simple wrapper around pileupsWithControl
- Parameters
nproc (int, optional) – How many cores to use. Sends a whole chromosome per process. The default is None, which uses the same number as nproc set at creation of the object.
distance_edges (list/array of int) – How to group snips by distance (based on their centres). Default uses separations [0, 50_000, 100_000, 200_000, …]
groupby (list of str, optional) – Which attributes of each snip to assign a group to it
- Returns
pileup_df – Normalized pileups in a pandas DataFrame, with columns data and num. data contains the normalized pileups, and num - how many snippets were combined (the regions of interest, not control regions). Each distance band is a row, annotated in column distance_band
- Return type
2D array
- pileupsByStrandByDistanceWithControl(nproc=None, distance_edges='default', groupby=[])¶
Perform by-strand by-distance pileups across all chromosomes and applies required normalization. Simple wrapper around pileupsWithControl. Assumes the features in CoordCreator file has a “strand” column.
- Parameters
nproc (int, optional) – How many cores to use. Sends a whole chromosome per process. The default is None, which uses the same number as nproc set at creation of the object.
distance_edges (list/array of int) – How to group snips by distance (based on their centres). Default uses separations [0, 50_000, 100_000, 200_000, …]
groupby (list of str, optional) – Which attributes of each snip to assign a group to it
- Returns
pileup_df – Normalized pileups in a pandas DataFrame, with columns data and num. data contains the normalized pileups, and num - how many snippets were combined (the regions of interest, not control regions). Each distance band is a row, annotated in columns separation
- Return type
2D array
- pileupsByStrandWithControl(nproc=None, groupby=[])¶
Perform by-strand pileups across all chromosomes and applies required normalization. Simple wrapper around pileupsWithControl. Assumes the features in CoordCreator file has a “strand” column.
- Parameters
nproc (int, optional) – How many cores to use. Sends a whole chromosome per process. The default is None, which uses the same number as nproc set at creation of the object.
groupby (list of str, optional) – Which attributes of each snip to assign a group to it
- Returns
pileup_df – Normalized pileups in a pandas DataFrame, with columns data and num. data contains the normalized pileups, and num - how many snippets were combined (the regions of interest, not control regions). Each distance band is a row, annotated in columns separation
- Return type
2D array
- pileupsByWindowWithControl(nproc=None)¶
Perform by-window (i.e. for each region) pileups across all chromosomes and applies required normalization. Simple wrapper around pileupsWithControl
- Parameters
nproc (int, optional) – How many cores to use. Sends a whole chromosome per process. The default is None, which uses the same number as nproc set at creation of the object.
- Returns
pileup_df – Normalized pileups in a pandas DataFrame, with columns data and num. data contains the normalized pileups, and num - how many snippets were combined (the regions of interest, not control regions). Each window is a row (coordinates are recorded in columns [‘chrom’, ‘start’, ‘end’]), plus an additional row is created with all data (with “all” in the “chrom” column and -1 in start and end).
- Return type
2D array
- pileupsWithControl(nproc=None, groupby=[], modify_2Dintervals_func=None, postprocess_func=None, extra_sum_funcs=None)¶
Perform pileups across all chromosomes and applies required normalization
- Parameters
nproc (int, optional) – How many cores to use. Sends a whole chromosome per process. The default is None, which uses the same number as nproc set at creation of the object.
groupby (list of str, optional) – Which attributes of each snip to assign a group to it
modify_2Dintervals_func (function, optional) – Function to apply to the DataFrames of coordinates before fetching snippets based on them. Preferable to using the postprocess_func, since at the earlier stage it can be vectorized and much more efficient.
postprocess_func (function, optional) – Additional function to apply to each snippet before grouping. Good example is the lib.puputils.bin_distance function.
extra_sum_funcs (dict, optional) – Any additional functions to be applied every time a snip is added to a pileup or two pileups are summed up - see _add_snip and sum_pups.
- Returns
pileup_df – Normalized pileups in a pandas DataFrame, with columns data and num. data contains the normalized pileups, and num - how many snippets were combined (the regions of interest, not control regions). Each condition from groupby is a row, plus an additional row all is created with all data.
- Return type
2D array
- coolpuppy.coolpup.assign_groups(intervals, groupby=[])¶
Assign groups to rows based on a list of columns
- Parameters
intervals (pd.DataFrame) – Dataframe containing intervals with any annotations.
groupby (list, optional) – List of columns to use to assign a group. The default is [].
- Returns
intervals – Adds a “group” column with the annotation based on groupby. If groupby is empty, assigns “all” to all rows.
- Return type
pd.DataFrame
- coolpuppy.coolpup.bin_distance_intervals(intervals, band_edges='default')¶
- Parameters
intervals (pd.DataFrame) – Dataframe containing intervals with any annotations. Has to have a ‘distance’ column
band_edges (list or array-like, or "default", optional) – Edges of distance bands used to split the intervals into groups. Default is np.append([0], 50000 * 2 ** np.arange(30))
- Returns
snip – The same dataframe with added [‘distance_band’] annotation.
- Return type
pd.DataFrame
- coolpuppy.coolpup.expand(intervals, flank, resolution, rescale_flank=None)¶
- coolpuppy.coolpup.expand2D(intervals, flank, resolution, rescale_flank=None)¶
- coolpuppy.coolpup.pileup(clr, features, features_format='bed', view_df=None, expected_df=None, expected_value_col='balanced.avg', clr_weight_name='weight', flank=100000, minshift=100000, maxshift=1000000, nshifts=0, ooe=True, mindist='auto', maxdist=None, min_diag=2, subset=0, by_window=False, by_strand=False, by_distance=False, groupby=[], flip_negative_strand=False, local=False, coverage_norm=False, trans=False, rescale=False, rescale_flank=1, rescale_size=99, store_stripes=False, nproc=1, seed=None)¶
Create pileups
- Parameters
clr (cool) – Cool file with Hi-C data.
features (DataFrame) – A bed- or bedpe-style file with coordinates.
features_format (str, optional) –
- Format of the features. Options:
bed: chrom, start, end bedpe: chrom1, start1, end1, chrom2, start2, end2 auto (default): determined from the columns in the DataFrame
view_df (DataFrame) – A dataframe with region coordinates used in expected (see bioframe documentation for details). Can be ommited if no expected is provided, or expected is for whole chromosomes.
expected_df (DataFrame, optional) – If using expected, pandas DataFrame with by-distance expected. The default is False.
expected_value_col (str, optional) – Which column in the expected_df contains values to use for normalization
clr_weight_name (bool or str, optional) – Whether to use balanced data, and which column to use as weights. The default is “weight”. Provide False to use raw data.
flank (int, optional) – Padding around the central bin, in bp. For example, with 5000 bp resolution and 100000 flank, final pileup is 205000×205000 bp. The default is 100000.
minshift (int, optional) – Minimal shift applied when generating random controls, in bp. The default is 10 ** 5.
maxshift (int, optional) – Maximal shift applied when generating random controls, in bp. The default is 10 ** 6.
nshifts (int, optional) – How many shifts to generate per region of interest. Does not take chromosome boundaries into account The default is 10.
ooe (bool, optional) – Whether to normalize each snip by expected value. If False, all snips are accumulated, all expected values are accumulated, and then the former divided by the latter - like with randomly shifted controls. Only has effect when expected is provided. Default is True.
mindist (int, optional) – Shortest interactions to consider. Uses midpoints of regions of interest. “auto” selects it to avoid the two shortest diagonals of the matrix, i.e. 2 * flank + 2 * resolution The default is “auto”.
maxdist (int, optional) – Longest interactions to consider. The default is None.
min_diag (int, optional) – How many diagonals to ignore to avoid short-distance artefacts. The default is 2.
subset (int, optional) – What subset of the coordinate files to use. 0 or negative to use all. The default is 0.
by_window (bool, optional) – Whether to create a separate pileup for each feature by accumulating all of its interactions with other features. Produces as many pileups, as there are features. The default is False.
by_strand (bool, optional) – Whether to create a separate pileup for each combination of “strand1”, “strand2” in features. If features_format==’bed’, first creates pairwise combinations of features, and the original features need to have a column “strand”. If features_format==’bedpe’, they need to have “strand1” and “strand2” columns. The default is False.
by_distance (bool or list, optional) –
Whether to create a separate pileup for different distance separations. If features_format==’bed’, internally creates pairwise combinations of features. If True, splits all separations using edges defined like this:
band_edges = np.append([0], 50000 * 2 ** np.arange(30))
Alternatively, a list of integer values can be given with custom distance edges. The default is False.
groupby (list of str, optional) – Additional columns of features to use for groupby. If feature_format==’bed’, each columns should be specified twice with suffixes “1” and “2”, i.e. if features have a column “group”, specify [“group1”, “group2”]. The default is [].
flip_negative_strand (bool, optional) – Flip snippets so the positive strand always points to bottom-right. Requires strands to be annotated for each feature (or two strands for bedpe format features)
local (bool, optional) – Whether to generate local coordinates, i.e. on-diagonal. The default is False.
coverage_norm (bool or str, optional) – Whether to normalize final the final pileup by accumulated coverage as an alternative to balancing. Useful for single-cell Hi-C data. Can be either boolean, or string: “cis” or “total” to use “cov_cis_raw” or “cov_tot_raw” columns in the cooler bin table, respectively. If True, will attempt to use “cov_tot_raw” if available, otherwise will compute and store coverage in the cooler with default column names, and use “cov_tot_raw”. Alternatively, if a different string is provided, will attempt to use a column with the that name in the cooler bin table, and will raise a ValueError if it does not exist. Only allowed when clr_weight_name is False. The default is False.
trans (bool, optional) – Whether to generate inter-chromosomal (trans) pileups. The default is False
rescale (bool, optional) – Whether to rescale the pileups. The default is False
rescale_flank (float, optional) – Fraction of ROI size added on each end when extracting snippets, if rescale. The default is None. If specified, overrides flank.
rescale_size (int, optional) – Final shape of rescaled pileups. E.g. if 99, pileups will be squares of 99×99 pixels. The default is 99.
store_stripes (bool, optional) – Whether to store horizontal and vertical stripes and coordinates in the output The default is False
nproc (int, optional) – Number of processes to use. The default is 1.
seed (int, optional) – Seed for np.random to make it reproducible. The default is None.
- Returns
pileup_df - pandas DataFrame containing the pileups and their grouping information,
if any, all possible annotations from the arguments of this function.
coolpuppy.lib.io module¶
- coolpuppy.lib.io.is_gz_file(filepath)¶
- coolpuppy.lib.io.load_array_with_header(filename)¶
Load array from files generated using save_array_with_header. They are simple txt files with an optional header in the first lines, commented using “# “. If uncommented, the header is in YAML.
- Parameters
filename (string) – File to load from.
- Returns
data – Dictionary with information from the header. Access the associated data in an array using data[‘data’].
- Return type
dict
- coolpuppy.lib.io.load_pileup_df(filename, quaich=False, skipstripes=False)¶
Loads a dataframe saved using save_pileup_df
- Parameters
filename (str) – File to load from.
quaich (bool, optional) – Whether to assume standard quaich file naming to extract sample name and bedname. The default is False.
- Returns
annotation – Pileups are in the “data” column, all metadata in other columns
- Return type
pd.DataFrame
- coolpuppy.lib.io.load_pileup_df_list(files, quaich=False, nice_metadata=True, skipstripes=False)¶
- Parameters
files (iterable) – Files to read pileups from.
quaich (bool, optional) – Whether to assume standard quaich file naming to extract sample name and bedname. The default is False.
nice_metadata (bool, optional) – Whether to add nicer metadata for direct plotting. The default is True. Adds a “norm” column (“expected”, “shifts” or “none”).
- Returns
pups – Combined dataframe with all pileups and annotations from all files.
- Return type
pd.DataFrame
- coolpuppy.lib.io.save_array_with_header(array, header, filename)¶
Save a numpy array with a YAML header generated from a dictionary
- Parameters
array (np.array) – Array to save.
header (dict) – Dictionaty to save into the header.
filename (string) – Name of file to save array and metadata into.
- coolpuppy.lib.io.save_pileup_df(filename, df, metadata=None, mode='w', compression='lzf')¶
Saves a dataframe with metadata into a binary HDF5 file`
- Parameters
filename (str) – File to save to.
df (pd.DataFrame) – DataFrame to save into binary hdf5 file.
metadata (dict, optional) – Dictionary with meatadata.
mode (str, optional) – Mode for the first time access to the output file: ‘w’ to overwrite if file exists, or ‘a’ to fail if output file already exists
compression (str, optional) – Compression to use for saving, e.g. ‘gzip’. Defaults to ‘lzf’
- Return type
None.
Notes
Replaces None in metadata values with False, since HDF5 doesn’t support None
- coolpuppy.lib.io.sniff_for_header(file, sep='\t', comment='#')¶
Warning: reads the entire file into a StringIO buffer!
coolpuppy.lib.numutils module¶
- coolpuppy.lib.numutils.corner_cv(amap, i=4)¶
Get coefficient of variation for upper left and lower right corners of a pileup to estimate how noisy it is
- Parameters
amap (2D array) – Pileup.
i (int, optional) – How many bins to use from each upper left and lower right corner: final corner shape is i^2. The default is 4.
- Returns
CV – Coefficient of variation for the corner pixels.
- Return type
float
- coolpuppy.lib.numutils.get_domain_score(amap, flank=1)¶
Divide sum of values in a square from the central part of a matrix by the upper and right rectangles corresponding to interactions of the central region with its surroundings.
- Parameters
amap (2D array) – Pileup.
flank (int) – Relative padding used, i.e. if 1 the central third is used, if 2 the central fifth is used. The default is 1.
- Returns
score – Domain score.
- Return type
float
- coolpuppy.lib.numutils.get_enrichment(amap, n)¶
Get values from the center of a pileup for a square with side n
- Parameters
amap (2D array) – Pileup.
n (int) – Side of the central square to use.
- Returns
enrichment – Mean of the pixels in the central square.
- Return type
float
- coolpuppy.lib.numutils.get_insulation_strength(amap, ignore_central=0, ignore_diags=2)¶
Divide values in upper left and lower right corners over upper right and lower left, ignoring the central bins.
- Parameters
amap (2D array) – Pileup.
ignore_central (int, optional) – How many central bins to ignore. Has to be odd or 0. The default is 0.
- Returns
Insulation strength.
- Return type
float
- coolpuppy.lib.numutils.get_local_enrichment(amap, flank=1)¶
Get values for a square from the central part of a pileup, ignoring padding
- Parameters
amap (2D array) – Pileup.
flank (int) – Relative padding used, i.e. if 1 the central third is used, if 2 the central fifth is used. The default is 1.
- Returns
enrichment – Mean of the pixels in the central square.
- Return type
float
- coolpuppy.lib.numutils.norm_cis(amap, i=3)¶
Normalize the pileup by mean of pixels from upper left and lower right corners
- Parameters
amap (2D array) – Pileup.
i (int, optional) – How many bins to use from each upper left and lower right corner: final corner shape is i^2. 0 will not normalize. The default is 3.
- Returns
amap – Normalized pileup.
- Return type
2D array
coolpuppy.lib.puputils module¶
- coolpuppy.lib.puputils.accumulate_values(dict1, dict2, key)¶
Useful as an extra_sum_func
- coolpuppy.lib.puputils.bin_distance(snip, band_edges='default')¶
- Parameters
snip (pd.Series) – Series containing any annotations. Has to have [‘distance’]
band_edges (list or array-like, or "default", optional) – Edges of distance bands used to assign the distance band. Default is np.append([0], 50000 * 2 ** np.arange(30))
- Returns
snip – The same snip with added [‘distance_band’] annotation.
- Return type
pd.Series
- coolpuppy.lib.puputils.divide_pups(pup1, pup2)¶
Divide two pups and get the resulting pup. Requires that the pups have identical shapes, resolutions, flanks, etc. If pups contain stripes, these will only be divided if stripes have identical coordinates.
- coolpuppy.lib.puputils.get_score(pup, center=3, ignore_central=3)¶
Calculate a reasonable score for any kind of pileup For non-local (off-diagonal) pileups, calculates average signal in the central pixels (based on ‘center’). For local non-rescaled pileups calculates insulation strength, and ignores the central bins (based on ‘ignore_central’) For local rescaled pileups calculates enrichment in the central rescaled area relative to the two neighouring areas on the sides.
- Parameters
pup (pd.Series or dict) – Series or dict with pileup in ‘data’ and annotations in other keys. Will correctly calculate enrichment score with annotations in ‘local’ (book), ‘rescale’ (bool) and ‘rescale_flank’ (float)
enrichment (int, optional) – Passed to ‘get_enrichment’ to calculate the average strength of central pixels. The default is 3.
ignore_central (int, optional) – How many central bins to ignore for calculation of insulation in local pileups. The default is 3.
- Returns
Score.
- Return type
float
- coolpuppy.lib.puputils.group_by_region(snip)¶
- coolpuppy.lib.puputils.norm_coverage(snip)¶
Normalize a pileup by coverage arrays
- Parameters
loop (2D array) – Pileup.
cov_start (1D array) – Accumulated coverage of the left side of the pileup.
cov_end (1D array) – Accumulated coverage of the bottom side of the pileup.
- Returns
loop – Normalized pileup.
- Return type
2D array
- coolpuppy.lib.puputils.sum_pups(pup1, pup2, extra_funcs={})¶
Preserves data, stripes, cov_start, cov_end, n, num and coordinates Assumes n=1 if not present, and calculates num if not present If store_stripes is set to False, stripes and coordinates will be empty
extra_funcs allows to give arbitrary functions to accumulate extra information from the two pups.
coolpuppy.plotpup module¶
- coolpuppy.plotpup.add_heatmap(data, flank, rescale, rescale_flank, n, max_coordinates, height=1, aspect='auto', color=None, cmap='coolwarm', norm=<Mock name='mock.LogNorm()' id='139946815870288'>, plot_ticks=False, stripe=False, font_scale=1)¶
Adds the array contained in data.values[0] to the current axes as a heatmap of stripes
- coolpuppy.plotpup.add_score(score, height=1, color=None, font_scale=1)¶
Adds the value contained in score.values[0] to the current axes as a label in top left corner
- coolpuppy.plotpup.add_stripe_lineplot(data, resolution, flank, rescale, rescale_flank, height=1, aspect='auto', color=None, cmap='coolwarm', scale='log', norm=<Mock name='mock.LogNorm()' id='139946815870288'>, plot_ticks=False, stripe=False, font_scale=1, colnames=None)¶
Adds the array contained in data.values[0] to the current axes as a heatmap of stripes and an average lineplot on top. Only works with one condition at a time.
- coolpuppy.plotpup.auto_rows_cols(n)¶
Automatically determines number of rows and cols for n pileups
- Parameters
n (int) – Number of pileups.
- Returns
rows (int) – How many rows to use.
cols (int) – How many columsn to use.
- coolpuppy.plotpup.get_min_max(pups, vmin=None, vmax=None, sym=True, scale='log')¶
Automatically determine minimal and maximal colour intensity for pileups
- Parameters
pups (np.array) – Numpy array of numpy arrays conaining pileups.
vmin (float, optional) – Force certain minimal colour. The default is None.
vmax (float, optional) – Force certain maximal colour. The default is None.
sym (bool, optional) – Whether the output should be cymmetrical around 0. The default is True.
- Returns
vmin (float) – Selected minimal colour.
vmax (float) – Selected maximal colour.
- coolpuppy.plotpup.plot(pupsdf, cols=None, rows=None, score='score', center=3, ignore_central=3, col_order=None, row_order=None, vmin=None, vmax=None, sym=True, norm_corners=0, cmap='coolwarm', cmap_emptypixel=(0.98, 0.98, 0.98), scale='log', height=1, aspect=1, font='DejaVu Sans', font_scale=1, plot_ticks=False, colnames=None, rownames=None, **kwargs)¶
- coolpuppy.plotpup.plot_stripes(pupsdf, cols=None, rows=None, col_order=None, row_order=None, vmin=None, vmax=None, sym=True, cmap='coolwarm', cmap_emptypixel=(0.98, 0.98, 0.98), scale='log', height=1, aspect='auto', stripe='corner_stripe', stripe_sort='sum', out_sorted_bedpe=None, font='DejaVu Sans', font_scale=1, plot_ticks=False, colnames=None, rownames=None, lineplot=False, **kwargs)¶
- coolpuppy.plotpup.sort_separation(sep_string_series, sep='Mb')¶