hatchet package

Subpackages

Submodules

hatchet.frame module

class hatchet.frame.Frame(attrs=None, **kwargs)[source]

Bases: object

The frame index for a node. The node only stores its frame.

Parameters:: attrs (dict) – dictionary of attributes and values

copy()[source]

get(name, default=None)[source]

property tuple_repr: Make a tuple of attributes and values based on reader.

values(names)[source]: Return a tuple of attribute values from this Frame.

hatchet.graph module

class hatchet.graph.Graph(roots)[source]

Bases: object

A possibly multi-rooted tree or graph from one input dataset.

copy(old_to_new=None)[source]

Create and return a copy of this graph.

Parameters:: old_to_new (dict, optional) – if provided, this dictionary will be populated with mappings from old node -> new node

enumerate_depth()[source]

enumerate_traverse()[source]

find_merges()[source]

Find nodes that have the same parent and frame.

Find nodes that have the same parent and duplicate frame, and return a mapping from nodes that should be eliminated to nodes they should be merged into.

Returns:: dictionary from nodes to their merge targets
Return type:: (dict)

static from_lists(*roots)[source]: Convenience method to invoke Node.from_lists() on each root value.

is_tree()[source]: True if this graph is a tree, false otherwise.

merge_nodes(merges)[source]

Merge some nodes in a graph into others.

merges is a dictionary keyed by old nodes, with values equal to the nodes that they need to be merged into. Old nodes’ parents and children are connected to the new node.

Parameters:: merges (dict) – dictionary from source nodes -> targets

node_order_traverse(order='pre', attrs=None, visited=None)[source]

Preorder traversal of all roots of this Graph, sorting by “node order” column.

Parameters:: attrs (list or str, optional) – If provided, extract these fields from nodes while traversing and yield them. See traverse() for details.

Only preorder traversal is currently supported.

normalize()[source]

traverse(order='pre', attrs=None, visited=None)[source]

Preorder traversal of all roots of this Graph.

Parameters:: attrs (list or str, optional) – If provided, extract these fields from nodes while traversing and yield them. See traverse() for details.

Only preorder traversal is currently supported.

union(other, old_to_new=None)[source]

Create the union of self and other and return it as a new Graph.

This creates a new graph and does not modify self or other. The new Graph has entirely new nodes.

Parameters:

other (Graph) – another Graph
old_to_new (dict, optional) – if provided, this dictionary will be populated with mappings from old node -> new node

Returns:

new Graph containing all nodes and edges from self and other

Return type:

(Graph)

hatchet.graph.index_by(attr, objects)[source]

Put objects into lists based on the value of an attribute.

Returns:: dictionary of lists of objects, keyed by attribute value
Return type:: (dict)

hatchet.graphframe module

exception hatchet.graphframe.EmptyFilter[source]

Bases: Exception

Raised when a filter would otherwise return an empty GraphFrame.

class hatchet.graphframe.GraphFrame(graph, dataframe, exc_metrics=None, inc_metrics=None, default_metric='time', metadata={})[source]

Bases: object

An input dataset is read into an object of this type, which includes a graph and a dataframe.

add(other)[source]

Returns the column-wise sum of two graphframes as a new graphframe.

This graphframe is the union of self’s and other’s graphs, and does not modify self or other.

Returns:: new graphframe
Return type:: (GraphFrame)

copy()[source]

Return a partially shallow copy of the graphframe.

This copies the DataFrame object, but the data is comprised of references. The Graph is shared between self and the new GraphFrame.

Parameters:

self (GraphFrame) – Object to make a copy of.

Returns:

Copy of self: graph (graph): Reference to self’s graph dataframe (DataFrame): Pandas “non-deep” copy of dataframe exc_metrics (list): Copy of self’s exc_metrics inc_metrics (list): Copy of self’s inc_metrics default_metric (str): N/A metadata (dict): Copy of self’s metadata

Return type:

other (GraphFrame)

deepcopy()[source]

Return a deep copy of the graphframe.

Parameters:

self (GraphFrame) – Object to make a copy of.

Returns:

Copy of self: graph (graph): Deep copy of self’s graph dataframe (DataFrame): Pandas “deep” copy with node objects updated to match graph from “node_clone” exc_metrics (list): Copy of self’s exc_metrics inc_metrics (list): Copy of self’s inc_metrics default_metric (str): N/A metadata (dict): Copy of self’s metadata

Return type:

other (GraphFrame)

div(other)[source]

Returns the column-wise float division of two graphframes as a new graphframe.

This graphframe is the union of self’s and other’s graphs, and does not modify self or other.

Returns:: new graphframe
Return type:: (GraphFrame)

drop_index_levels(function=<function mean>)[source]: Drop all index levels but node.

filter(filter_obj, squash=True, update_inc_cols=True, num_procs=2, rec_limit=1000, multi_index_mode='off')[source]

Filter the dataframe using a user-supplied function.

Note: Operates in parallel on user-supplied lambda functions.

Parameters:

filter_obj (callable, list, or QueryMatcher) – the filter to apply to the GraphFrame.
squash (boolean, optional) – if True, automatically call squash for the user.
update_inc_cols (boolean, optional) – if True, update inclusive columns when performing squash.
rec_limit – set Python recursion limit, increase if running into recursion depth errors) (default: 1000).

static from_caliper(filename_or_stream, query=None)[source]

Read in a Caliper .cali or .json file.

Parameters:

filename_or_stream (str or file-like) – name of a Caliper output file in .cali or JSON-split format, or an open file object to read one
query (str) – cali-query in CalQL format

static from_caliperreader(filename_or_caliperreader, native=False, string_attributes=[])[source]

Read in a native Caliper cali file using Caliper’s python reader.

Parameters:

filename_or_caliperreader (str or CaliperReader) – name of a Caliper output file in .cali format, or a CaliperReader object
native (bool) – use native or user-readable metric names (default)
string_attributes (str or list, optional) – Adds existing string attributes from within the caliper file to the dataframe

static from_cprofile(filename)[source]: Read in a pstats/prof file generated using python’s cProfile.

static from_gprof_dot(filename)[source]: Read in a DOT file generated by gprof2dot.

static from_hdf(filename, **kwargs)[source]

static from_hpctoolkit(dirname)[source]

Read an HPCToolkit database directory into a new GraphFrame.

Parameters:: dirname (str) – parent directory of an HPCToolkit experiment.xml file
Returns:: new GraphFrame containing HPCToolkit profile data
Return type:: (GraphFrame)

static from_json(json_spec, **kwargs)[source]

static from_lists(*lists)[source]

Make a simple GraphFrame from lists.

This creates a Graph from lists (see Graph.from_lists()) and uses it as the index for a new GraphFrame. Every node in the new graph has exclusive time of 1 and inclusive time is computed automatically.

static from_literal(graph_dict)[source]: Create a GraphFrame from a list of dictionaries.

static from_pyinstrument(filename)[source]: Read in a JSON file generated using Pyinstrument.

static from_spotdb(db_key, list_of_ids=None)[source]

Read multiple graph frames from a SpotDB instance

Parameters:

db_key (str or SpotDB object) –
locator for SpotDB instance This can be a SpotDB object directly, or a locator for a spot database, which is a string with either:
- A directory for .cali files,
- A .sqlite file name
- A SQL database URL (e.g., “mysql://hostname/db”)
list_of_ids – The list of run IDs to read from the database. If this is None, returns all runs.

Returns:

A list of graphframes, one for each requested run that was found

static from_tau(dirname)[source]: Read in a profile generated using TAU.

static from_timemory(input=None, select=None, **_kwargs)[source]

Read in timemory data.

Links:: https://github.com/NERSC/timemory https://timemory.readthedocs.io

Parameters:

input (str or file-stream or dict or None) –
Valid argument types are:
1. Filename for a timemory JSON tree file
2. Open file stream to one of these files
3. Dictionary from timemory JSON tree
Currently, timemory supports two JSON layouts: flat and tree. The former is a 1D-array representation of the hierarchy which represents the hierarchy via indentation schemes in the labels and is not compatible with hatchet. The latter is a hierarchical representation of the data and is the required JSON layout when using hatchet. Timemory JSON tree files typically have the extension “.tree.json”.

If input is None, this assumes that timemory has been recording data within the application that is using hatchet. In this situation, this method will attempt to import the data directly from timemory.

At the time of this writing, the direct data import will:
1. Stop any currently collecting components
2. Aggregate child thread data of the calling thread
3. Clear all data on the child threads
4. Aggregate the data from any MPI and/or UPC++ ranks.
Thus, if MPI or UPC++ is used, every rank must call this routine. The zeroth rank will have the aggregation and all the other non-zero ranks will only have the rank-specific data.

Whether or not the per-thread and per-rank data itself is combined is controlled by the collapse_threads and collapse_processes attributes in the timemory.settings submodule.

In the C++ API, it is possible for only #1 to be applied and data can be obtained for an individual thread and/or rank without aggregation. This is not currently available to Python, however, it can be made available upon request via a GitHub Issue.
select (list of str) – A list of strings which match the component enumeration names, e.g. [“cpu_clock”].
per_thread (boolean) – Ensures that when applying filters to the graphframe, frames with identical name/file/line/etc. info but from different threads are not combined
per_rank (boolean) – Ensures that when applying filters to the graphframe, frames with identical name/file/line/etc. info but from different ranks are not combined

generate_exclusive_columns(inc_metrics=None)[source]

Generates exclusive metrics from available inclusive metrics. :param inc_metrics: Instead of generating the exclusive time for each inclusive metric, it is possible to specify those metrics manually. Defaults to None. :type inc_metrics: str, list, optional

Currently, this function determines which metrics to generate by looking for one of two things:

An inclusive metric ending in “(inc)” that does not have an exclusive metric with the same name (minus “(inc)”)
An inclusive metric not ending in “(inc)”

The metrics that are generated will have one of two name formats:

If the corresponding inclusive metric’s name ends in “(inc)”, the exclusive metric will have the same name, minus “(inc)”
If the corresponding inclusive metric’s name does not end in “(inc)”, the exclusive metric will have the same name as the inclusive metric, followed by a “(exc)” suffix

groupby_aggregate(groupby_function, agg_function)[source]

Groupby-aggregate dataframe and reindex the Graph.

Reindex the graph to match the groupby-aggregated dataframe.

Update the frame attributes to contain those columns in the dataframe index.

Parameters:

self (graphframe) – self’s graphframe
groupby_function – groupby function on dataframe
agg_function – aggregate function on dataframe

Returns:

new graphframe with reindexed graph and groupby-aggregated dataframe

Return type:

(GraphFrame)

mul(other)[source]

Returns the column-wise float multiplication of two graphframes as a new graphframe.

This graphframe is the union of self’s and other’s graphs, and does not modify self or other.

Returns:: new graphframe
Return type:: (GraphFrame)

show_metric_columns()[source]: Returns a list of dataframe column labels.

squash(update_inc_cols=True)[source]

Rewrite the Graph to include only nodes present in the DataFrame’s rows.

This can be used to simplify the Graph, or to normalize Graph indexes between two GraphFrames.

Parameters:: update_inc_cols (boolean, optional) – if True, update inclusive columns.

sub(other)[source]

Returns the column-wise difference of two graphframes as a new graphframe.

This graphframe is the union of self’s and other’s graphs, and does not modify self or other.

Returns:: new graphframe
Return type:: (GraphFrame)

subgraph_sum(columns, out_columns=None, function=<function GraphFrame.<lambda>>)[source]

Compute sum of elements in subgraphs.

For each row in the graph, out_columns will contain the element-wise sum of all values in columns for that row’s node and all of its descendants.

This algorithm is worst-case quadratic in the size of the graph, so we try to call subtree_sum if we can. In general, there is not a particularly efficient algorithm known for subgraph sums, so this does about as well as we know how.

Parameters:

columns (list of str) – names of columns to sum (default: all columns)
out_columns (list of str) – names of columns to store results (default: in place)
function (callable) – associative operator used to sum elements, sum of an all-NA series is NaN (default: sum(min_count=1))

subtree_sum(columns, out_columns=None, function=<function GraphFrame.<lambda>>)[source]

Compute sum of elements in subtrees. Valid only for trees.

For each row in the graph, out_columns will contain the element-wise sum of all values in columns for that row’s node and all of its descendants.

This algorithm will multiply count nodes with in-degree higher than one – i.e., it is only correct for trees. Prefer using subgraph_sum (which calls subtree_sum if it can), unless you have a good reason not to.

Parameters:

columns (list of str) – names of columns to sum (default: all columns)
out_columns (list of str) – names of columns to store results (default: in place)
function (callable) – associative operator used to sum elements, sum of an all-NA series is NaN (default: sum(min_count=1))

to_dict()[source]

to_dot(metric=None, name='name', rank=0, thread=0, threshold=0.0)[source]: Write the graph in the graphviz dot format: https://www.graphviz.org/doc/info/lang.html

to_flamegraph(metric=None, name='name', rank=0, thread=0, threshold=0.0)[source]: Write the graph in the folded stack output required by FlameGraph http://www.brendangregg.com/flamegraphs.html

to_hdf(filename, key='hatchet_graphframe', **kwargs)[source]

to_json()[source]

to_literal(name='name', rank=0, thread=0, cat_columns=[])[source]: Format this graph as a list of dictionaries for Roundtrip visualizations.

tree(metric_column=None, annotation_column=None, precision=3, name_column='name', expand_name=False, context_column='file', rank=0, thread=0, depth=10000, highlight_name=False, colormap='RdYlGn', invert_colormap=False, colormap_annotations=None, render_header=True, min_value=None, max_value=None)[source]

Visualize the Hatchet graphframe as a tree

Parameters:

metric_column (str, list, optional) – Columns to use the metrics from. Defaults to None.
annotation_column (str, optional) – Column to use as an annotation. Defaults to None.
precision (int, optional) – Precision of shown numbers. Defaults to 3.
name_column (str, optional) – Column of the node name. Defaults to “name”.
expand_name (bool, optional) – Limits the lenght of the node name. Defaults to False.
context_column (str, optional) – Shows the file this function was called in (Available with HPCToolkit). Defaults to “file”.
rank (int, optional) – Specifies the rank to take the data from. Defaults to 0.
thread (int, optional) – Specifies the thread to take the data from. Defaults to 0.
depth (int, optional) – Sets the maximum depth of the tree. Defaults to 10000.
highlight_name (bool, optional) – Highlights the names of the nodes. Defaults to False.
colormap (str, optional) – Specifies a colormap to use. Defaults to “RdYlGn”.
invert_colormap (bool, optional) – Reverts the chosen colormap. Defaults to False.
colormap_annotations (str, list, dict, optional) – Either provide the name of a colormap, a list of colors to use or a dictionary which maps the used annotations to a color. Defaults to None.
render_header (bool, optional) – Shows the Preamble. Defaults to True.
min_value (int, optional) – Overwrites the min value for the coloring legend. Defaults to None.
max_value (int, optional) – Overwrites the max value for the coloring legend. Defaults to None.

Returns:

String representation of the tree, ready to print

Return type:

str

unify(other)[source]

Returns a unified graphframe.

Ensure self and other have the same graph and same node IDs. This may change the node IDs in the dataframe.

Update the graphs in the graphframe if they differ.

update_inclusive_columns()[source]: Update inclusive columns (typically after operations that rewire the graph.

exception hatchet.graphframe.InvalidFilter[source]

Bases: Exception

Raised when an invalid argument is passed to the filter function.

hatchet.graphframe.parallel_apply(filter_function, dataframe, queue)[source]: A function called in parallel, which does a pandas apply on part of a dataframe and returns the results via multiprocessing queue function.

hatchet.node module

exception hatchet.node.MultiplePathError[source]

Bases: Exception

Raised when a node is asked for a single path but has multiple.

class hatchet.node.Node(frame_obj, parent=None, hnid=-1, depth=-1)[source]

Bases: object

A node in the graph. The node only stores its frame.

add_child(node)[source]: Adds a child to this node’s list of children.

add_parent(node)[source]: Adds a parent to this node’s list of parents.

copy()[source]: Copy this node without preserving parents or children.

dag_equal(other, vs=None, vo=None)[source]: Check if DAG rooted at self has the same structure as that rooted at other.

classmethod from_lists(lists)[source]

Construct a hierarchy of nodes from recursive lists.

For example, this will construct a simple tree:

Node.from_lists(
    ["a",
        ["b", "d", "e"],
        ["c", "f", "g"],
    ]
)

     a
    / \
   b   c
 / |   | \
d  e   f  g

And this will construct a simple diamond DAG:

d = Node(Frame(name="d"))
Node.from_lists(
    ["a",
        ["b", d],
        ["c", d]
    ]
)

  a
 / \
b   c
 \ /
  d

In the above examples, the ‘a’ represents a Node with its frame == Frame(name=”a”).

node_order_traverse(order='pre', attrs=None, visited=None)[source]

Traverse the tree depth-first and yield each node, sorting children by “node order”.

Parameters:

order (str) – “pre” or “post” for preorder or postorder (default: pre)
attrs (list or str, optional) – if provided, extract these fields from nodes while traversing and yield them
visited (dict, optional) – dictionary in which each visited node’s in-degree will be stored

path(attrs=None)[source]

Path to this node from root. Raises if there are multiple paths.

This is useful for trees (where each node only has one path), as it just gets the only element from self.paths. This will fail with a MultiplePathError if there is more than one path to this node.

paths()[source]

List of tuples, one for each path from this node to any root.

Paths are tuples of node objects.

traverse(order='pre', attrs=None, visited=None)[source]

Traverse the tree depth-first and yield each node.

Parameters:

order (str) – “pre” or “post” for preorder or postorder (default: pre)
attrs (list or str, optional) – if provided, extract these fields from nodes while traversing and yield them
visited (dict, optional) – dictionary in which each visited node’s in-degree will be stored

hatchet.node.node_traversal_order(node)[source]: Deterministic key function for sorting nodes by specified “node order” (which gets assigned to _hatchet_nid) in traversals.

hatchet.node.traversal_order(node)[source]: Deterministic key function for sorting nodes in traversals.

hatchet package

Subpackages

Submodules

hatchet.frame module

hatchet.graph module

hatchet.graphframe module

hatchet.node module

hatchet.version module

Module contents