hatchet package
Subpackages
- hatchet.cython_modules package
- hatchet.external package
- hatchet.query package
- hatchet.readers package
- Submodules
- hatchet.readers.caliper_native_reader module
- hatchet.readers.caliper_reader module
- hatchet.readers.cprofile_reader module
- hatchet.readers.dataframe_reader module
- hatchet.readers.gprof_dot_reader module
- hatchet.readers.hdf5_reader module
- hatchet.readers.hpctoolkit_reader module
- hatchet.readers.json_reader module
- hatchet.readers.literal_reader module
- hatchet.readers.pyinstrument_reader module
- hatchet.readers.spotdb_reader module
- hatchet.readers.tau_reader module
- hatchet.readers.timemory_reader module
- Module contents
- hatchet.util package
- hatchet.vis package
- hatchet.writers package
Submodules
hatchet.frame module
hatchet.graph module
- class hatchet.graph.Graph(roots)[source]
Bases:
object
A possibly multi-rooted tree or graph from one input dataset.
- copy(old_to_new=None)[source]
Create and return a copy of this graph.
- Parameters:
old_to_new (dict, optional) – if provided, this dictionary will be populated with mappings from old node -> new node
- find_merges()[source]
Find nodes that have the same parent and frame.
Find nodes that have the same parent and duplicate frame, and return a mapping from nodes that should be eliminated to nodes they should be merged into.
- Returns:
dictionary from nodes to their merge targets
- Return type:
(dict)
- static from_lists(*roots)[source]
Convenience method to invoke Node.from_lists() on each root value.
- merge_nodes(merges)[source]
Merge some nodes in a graph into others.
merges
is a dictionary keyed by old nodes, with values equal to the nodes that they need to be merged into. Old nodes’ parents and children are connected to the new node.- Parameters:
merges (dict) – dictionary from source nodes -> targets
- node_order_traverse(order='pre', attrs=None, visited=None)[source]
Preorder traversal of all roots of this Graph, sorting by “node order” column.
- Parameters:
attrs (list or str, optional) – If provided, extract these fields from nodes while traversing and yield them. See
traverse()
for details.
Only preorder traversal is currently supported.
- traverse(order='pre', attrs=None, visited=None)[source]
Preorder traversal of all roots of this Graph.
- Parameters:
attrs (list or str, optional) – If provided, extract these fields from nodes while traversing and yield them. See
traverse()
for details.
Only preorder traversal is currently supported.
hatchet.graphframe module
- exception hatchet.graphframe.EmptyFilter[source]
Bases:
Exception
Raised when a filter would otherwise return an empty GraphFrame.
- class hatchet.graphframe.GraphFrame(graph, dataframe, exc_metrics=None, inc_metrics=None, default_metric='time', metadata={})[source]
Bases:
object
An input dataset is read into an object of this type, which includes a graph and a dataframe.
- add(other)[source]
Returns the column-wise sum of two graphframes as a new graphframe.
This graphframe is the union of self’s and other’s graphs, and does not modify self or other.
- Returns:
new graphframe
- Return type:
- copy()[source]
Return a partially shallow copy of the graphframe.
This copies the DataFrame object, but the data is comprised of references. The Graph is shared between self and the new GraphFrame.
- Parameters:
self (GraphFrame) – Object to make a copy of.
- Returns:
- Copy of self
graph (graph): Reference to self’s graph dataframe (DataFrame): Pandas “non-deep” copy of dataframe exc_metrics (list): Copy of self’s exc_metrics inc_metrics (list): Copy of self’s inc_metrics default_metric (str): N/A metadata (dict): Copy of self’s metadata
- Return type:
other (GraphFrame)
- deepcopy()[source]
Return a deep copy of the graphframe.
- Parameters:
self (GraphFrame) – Object to make a copy of.
- Returns:
- Copy of self
graph (graph): Deep copy of self’s graph dataframe (DataFrame): Pandas “deep” copy with node objects updated to match graph from “node_clone” exc_metrics (list): Copy of self’s exc_metrics inc_metrics (list): Copy of self’s inc_metrics default_metric (str): N/A metadata (dict): Copy of self’s metadata
- Return type:
other (GraphFrame)
- div(other)[source]
Returns the column-wise float division of two graphframes as a new graphframe.
This graphframe is the union of self’s and other’s graphs, and does not modify self or other.
- Returns:
new graphframe
- Return type:
- filter(filter_obj, squash=True, update_inc_cols=True, num_procs=2, rec_limit=1000, multi_index_mode='off')[source]
Filter the dataframe using a user-supplied function.
Note: Operates in parallel on user-supplied lambda functions.
- Parameters:
filter_obj (callable, list, or QueryMatcher) – the filter to apply to the GraphFrame.
squash (boolean, optional) – if True, automatically call squash for the user.
update_inc_cols (boolean, optional) – if True, update inclusive columns when performing squash.
rec_limit – set Python recursion limit, increase if running into recursion depth errors) (default: 1000).
- static from_caliper(filename_or_stream, query=None)[source]
Read in a Caliper .cali or .json file.
- Parameters:
filename_or_stream (str or file-like) – name of a Caliper output file in .cali or JSON-split format, or an open file object to read one
query (str) – cali-query in CalQL format
- static from_caliperreader(filename_or_caliperreader, native=False, string_attributes=[])[source]
Read in a native Caliper cali file using Caliper’s python reader.
- Parameters:
filename_or_caliperreader (str or CaliperReader) – name of a Caliper output file in .cali format, or a CaliperReader object
native (bool) – use native or user-readable metric names (default)
string_attributes (str or list, optional) – Adds existing string attributes from within the caliper file to the dataframe
- static from_cprofile(filename)[source]
Read in a pstats/prof file generated using python’s cProfile.
- static from_hpctoolkit(dirname)[source]
Read an HPCToolkit database directory into a new GraphFrame.
- Parameters:
dirname (str) – parent directory of an HPCToolkit experiment.xml file
- Returns:
new GraphFrame containing HPCToolkit profile data
- Return type:
- static from_lists(*lists)[source]
Make a simple GraphFrame from lists.
This creates a Graph from lists (see
Graph.from_lists()
) and uses it as the index for a new GraphFrame. Every node in the new graph has exclusive time of 1 and inclusive time is computed automatically.
- static from_spotdb(db_key, list_of_ids=None)[source]
Read multiple graph frames from a SpotDB instance
- Parameters:
db_key (str or SpotDB object) –
locator for SpotDB instance This can be a SpotDB object directly, or a locator for a spot database, which is a string with either:
A directory for .cali files,
A .sqlite file name
A SQL database URL (e.g., “mysql://hostname/db”)
list_of_ids – The list of run IDs to read from the database. If this is None, returns all runs.
- Returns:
A list of graphframes, one for each requested run that was found
- static from_timemory(input=None, select=None, **_kwargs)[source]
Read in timemory data.
- Parameters:
input (str or file-stream or dict or None) –
Valid argument types are:
Filename for a timemory JSON tree file
Open file stream to one of these files
Dictionary from timemory JSON tree
Currently, timemory supports two JSON layouts: flat and tree. The former is a 1D-array representation of the hierarchy which represents the hierarchy via indentation schemes in the labels and is not compatible with hatchet. The latter is a hierarchical representation of the data and is the required JSON layout when using hatchet. Timemory JSON tree files typically have the extension “.tree.json”.
If input is None, this assumes that timemory has been recording data within the application that is using hatchet. In this situation, this method will attempt to import the data directly from timemory.
At the time of this writing, the direct data import will:
Stop any currently collecting components
Aggregate child thread data of the calling thread
Clear all data on the child threads
Aggregate the data from any MPI and/or UPC++ ranks.
Thus, if MPI or UPC++ is used, every rank must call this routine. The zeroth rank will have the aggregation and all the other non-zero ranks will only have the rank-specific data.
Whether or not the per-thread and per-rank data itself is combined is controlled by the collapse_threads and collapse_processes attributes in the timemory.settings submodule.
In the C++ API, it is possible for only #1 to be applied and data can be obtained for an individual thread and/or rank without aggregation. This is not currently available to Python, however, it can be made available upon request via a GitHub Issue.
select (list of str) – A list of strings which match the component enumeration names, e.g. [“cpu_clock”].
per_thread (boolean) – Ensures that when applying filters to the graphframe, frames with identical name/file/line/etc. info but from different threads are not combined
per_rank (boolean) – Ensures that when applying filters to the graphframe, frames with identical name/file/line/etc. info but from different ranks are not combined
- generate_exclusive_columns(inc_metrics=None)[source]
Generates exclusive metrics from available inclusive metrics. :param inc_metrics: Instead of generating the exclusive time for each inclusive metric, it is possible to specify those metrics manually. Defaults to None. :type inc_metrics: str, list, optional
Currently, this function determines which metrics to generate by looking for one of two things:
An inclusive metric ending in “(inc)” that does not have an exclusive metric with the same name (minus “(inc)”)
An inclusive metric not ending in “(inc)”
The metrics that are generated will have one of two name formats:
If the corresponding inclusive metric’s name ends in “(inc)”, the exclusive metric will have the same name, minus “(inc)”
If the corresponding inclusive metric’s name does not end in “(inc)”, the exclusive metric will have the same name as the inclusive metric, followed by a “(exc)” suffix
- groupby_aggregate(groupby_function, agg_function)[source]
Groupby-aggregate dataframe and reindex the Graph.
Reindex the graph to match the groupby-aggregated dataframe.
Update the frame attributes to contain those columns in the dataframe index.
- Parameters:
self (graphframe) – self’s graphframe
groupby_function – groupby function on dataframe
agg_function – aggregate function on dataframe
- Returns:
new graphframe with reindexed graph and groupby-aggregated dataframe
- Return type:
- mul(other)[source]
Returns the column-wise float multiplication of two graphframes as a new graphframe.
This graphframe is the union of self’s and other’s graphs, and does not modify self or other.
- Returns:
new graphframe
- Return type:
- squash(update_inc_cols=True)[source]
Rewrite the Graph to include only nodes present in the DataFrame’s rows.
This can be used to simplify the Graph, or to normalize Graph indexes between two GraphFrames.
- Parameters:
update_inc_cols (boolean, optional) – if True, update inclusive columns.
- sub(other)[source]
Returns the column-wise difference of two graphframes as a new graphframe.
This graphframe is the union of self’s and other’s graphs, and does not modify self or other.
- Returns:
new graphframe
- Return type:
- subgraph_sum(columns, out_columns=None, function=<function GraphFrame.<lambda>>)[source]
Compute sum of elements in subgraphs.
For each row in the graph,
out_columns
will contain the element-wise sum of all values incolumns
for that row’s node and all of its descendants.This algorithm is worst-case quadratic in the size of the graph, so we try to call
subtree_sum
if we can. In general, there is not a particularly efficient algorithm known for subgraph sums, so this does about as well as we know how.- Parameters:
columns (list of str) – names of columns to sum (default: all columns)
out_columns (list of str) – names of columns to store results (default: in place)
function (callable) – associative operator used to sum elements, sum of an all-NA series is NaN (default: sum(min_count=1))
- subtree_sum(columns, out_columns=None, function=<function GraphFrame.<lambda>>)[source]
Compute sum of elements in subtrees. Valid only for trees.
For each row in the graph,
out_columns
will contain the element-wise sum of all values incolumns
for that row’s node and all of its descendants.This algorithm will multiply count nodes with in-degree higher than one – i.e., it is only correct for trees. Prefer using
subgraph_sum
(which callssubtree_sum
if it can), unless you have a good reason not to.- Parameters:
columns (list of str) – names of columns to sum (default: all columns)
out_columns (list of str) – names of columns to store results (default: in place)
function (callable) – associative operator used to sum elements, sum of an all-NA series is NaN (default: sum(min_count=1))
- to_dot(metric=None, name='name', rank=0, thread=0, threshold=0.0)[source]
Write the graph in the graphviz dot format: https://www.graphviz.org/doc/info/lang.html
- to_flamegraph(metric=None, name='name', rank=0, thread=0, threshold=0.0)[source]
Write the graph in the folded stack output required by FlameGraph http://www.brendangregg.com/flamegraphs.html
- to_literal(name='name', rank=0, thread=0, cat_columns=[])[source]
Format this graph as a list of dictionaries for Roundtrip visualizations.
- tree(metric_column=None, annotation_column=None, precision=3, name_column='name', expand_name=False, context_column='file', rank=0, thread=0, depth=10000, highlight_name=False, colormap='RdYlGn', invert_colormap=False, colormap_annotations=None, render_header=True, min_value=None, max_value=None)[source]
Visualize the Hatchet graphframe as a tree
- Parameters:
metric_column (str, list, optional) – Columns to use the metrics from. Defaults to None.
annotation_column (str, optional) – Column to use as an annotation. Defaults to None.
precision (int, optional) – Precision of shown numbers. Defaults to 3.
name_column (str, optional) – Column of the node name. Defaults to “name”.
expand_name (bool, optional) – Limits the lenght of the node name. Defaults to False.
context_column (str, optional) – Shows the file this function was called in (Available with HPCToolkit). Defaults to “file”.
rank (int, optional) – Specifies the rank to take the data from. Defaults to 0.
thread (int, optional) – Specifies the thread to take the data from. Defaults to 0.
depth (int, optional) – Sets the maximum depth of the tree. Defaults to 10000.
highlight_name (bool, optional) – Highlights the names of the nodes. Defaults to False.
colormap (str, optional) – Specifies a colormap to use. Defaults to “RdYlGn”.
invert_colormap (bool, optional) – Reverts the chosen colormap. Defaults to False.
colormap_annotations (str, list, dict, optional) – Either provide the name of a colormap, a list of colors to use or a dictionary which maps the used annotations to a color. Defaults to None.
render_header (bool, optional) – Shows the Preamble. Defaults to True.
min_value (int, optional) – Overwrites the min value for the coloring legend. Defaults to None.
max_value (int, optional) – Overwrites the max value for the coloring legend. Defaults to None.
- Returns:
String representation of the tree, ready to print
- Return type:
str
hatchet.node module
- exception hatchet.node.MultiplePathError[source]
Bases:
Exception
Raised when a node is asked for a single path but has multiple.
- class hatchet.node.Node(frame_obj, parent=None, hnid=-1, depth=-1)[source]
Bases:
object
A node in the graph. The node only stores its frame.
- dag_equal(other, vs=None, vo=None)[source]
Check if DAG rooted at self has the same structure as that rooted at other.
- classmethod from_lists(lists)[source]
Construct a hierarchy of nodes from recursive lists.
For example, this will construct a simple tree:
Node.from_lists( ["a", ["b", "d", "e"], ["c", "f", "g"], ] )
a / \ b c / | | \ d e f g
And this will construct a simple diamond DAG:
d = Node(Frame(name="d")) Node.from_lists( ["a", ["b", d], ["c", d] ] )
a / \ b c \ / d
In the above examples, the ‘a’ represents a Node with its frame == Frame(name=”a”).
- node_order_traverse(order='pre', attrs=None, visited=None)[source]
Traverse the tree depth-first and yield each node, sorting children by “node order”.
- Parameters:
order (str) – “pre” or “post” for preorder or postorder (default: pre)
attrs (list or str, optional) – if provided, extract these fields from nodes while traversing and yield them
visited (dict, optional) – dictionary in which each visited node’s in-degree will be stored
- path(attrs=None)[source]
Path to this node from root. Raises if there are multiple paths.
This is useful for trees (where each node only has one path), as it just gets the only element from
self.paths
. This will fail with a MultiplePathError if there is more than one path to this node.
- paths()[source]
List of tuples, one for each path from this node to any root.
Paths are tuples of node objects.
- traverse(order='pre', attrs=None, visited=None)[source]
Traverse the tree depth-first and yield each node.
- Parameters:
order (str) – “pre” or “post” for preorder or postorder (default: pre)
attrs (list or str, optional) – if provided, extract these fields from nodes while traversing and yield them
visited (dict, optional) – dictionary in which each visited node’s in-degree will be stored