dcarte package

Submodules

dcarte.config module

dcarte.config.baseline_config(home: pathlib.Path, root: pathlib.Path, files: list) dict

baseline_config create a baseline config dict

Parameters
  • home (Path) – [description]

  • root (Path) – [description]

Returns

[description]

Return type

dict

dcarte.config.create_config(home: pathlib.Path, root: pathlib.Path, dcarte_home: pathlib.Path)

create_config creates a baseline config file

Parameters
  • home (Path) – [description]

  • root (Path) – [description]

Returns

[description]

Return type

[type]

dcarte.config.get_config() dict

get_config a function that returns or creates and returns a local config file

Parameters
  • config_file (str, optional) – [description]. Defaults to ‘/dcarte/config.yaml’.

  • root (Path, optional) – [description]. Defaults to Path(‘__file__’).parent.absolute().

  • home (Path, optional) – [description]. Defaults to Path(‘~’).expanduser().

Returns

containing all the configuration information neeeded for dcarte

Return type

[dict]

dcarte.config.get_mac() str

get_mac return mac address of the compute node or computer

Returns

[description]

Return type

str

dcarte.config.get_token() str

get_token opens the access-tokens website to create a unique REST token

Returns

a token generated at https://research.minder.care/portal/access-tokens

Return type

str

dcarte.config.update_config()

update_config updates the central config file with data from new_dict

Parameters
  • new_dict (dict) – [description]

  • home (Path, optional) – [description]. Defaults to Path(‘~’).expanduser().

dcarte.domains module

dcarte.domains.domains()

domains prints the current potential local domains as a table to stdout

[extended_summary]

dcarte.load module

dcarte.load.get_defaults(**kwargs)

get_defaults [summary]

[extended_summary]

Returns

[description]

Return type

[type]

dcarte.load.load(dataset: str, domain: str, **kwargs)

load [summary]

[extended_summary]

Parameters
  • dataset (str) – [description]

  • domain (str) – [description]

Raises

Exception – [description]

Returns

[description]

Return type

[type]

dcarte.local module

class dcarte.local.LocalDataset()

Bases: object

LocalDataset [summary]

[extended_summary]

Parameters
  • dataset_name ([str]) – [description]

  • datasets ([list]) – [description]

  • pipeline ([list]) – [description]

  • domain ([str]) – [description]

  • module ([list]) – [description]

  • dependencies ([list]) – [description]

  • since ([list]) – [description]

  • until ([list]) – [description]

  • delay ([list]) – [description]

  • reload ([list]) – [description]

  • reapply ([list]) – [description]

  • update ([list]) – [description]

  • home ([Path]) – [description]

  • compression ([str]) – [description]

  • data_folder ([str]) – [description]

  • data ([pd.DataFrame]) – [description]

Returns

[description]

Return type

[type]

compression: str = 'GZIP'
data: pandas.core.frame.DataFrame
data_folder: str = '/Users/eyalsoreq/dcarte/data'
dataset_name: str
datasets: dict
delay: float = 1
dependencies: list
domain: str
home: pathlib.Path = '/Users/eyalsoreq'
load_dataset()

load_dataset [summary]

[extended_summary]

load_metadata()

load_metadata [summary]

[extended_summary]

Returns

[description]

Return type

[type]

module: str = 'base'
pipeline: list
process_dataset()

process_dataset [summary]

[extended_summary]

reapply: bool = False
reapply_dataset()

reapply_dataset [summary]

[extended_summary]

register_dataset() None

register_dataset [summary]

[extended_summary]

reload: bool = False
save_dataset() None

save_dataset [summary]

[extended_summary]

since: str = '2019-04-01'
until: str = '2022-01-17T11:14:24.488976Z'
update: bool = False
update_dataset()

update_dataset [summary]

[extended_summary]

update_metadata()

update_metadata [summary]

[extended_summary]

dcarte.minder module

class dcarte.minder.MinderDataset()

Bases: object

MinderDataset class handles the downloading of datasets from the minder reserch platform

[extended_summary]

Parameters
  • dataset_name ([str]) – [description]

  • datasets ([list]) – [description]

  • columns ([list]) – [description]

  • domain ([str]) – [description]

  • dtypes ([list]) – [description]

  • since ([list]) – [description]

  • until ([list]) – [description]

  • delay ([list]) – [description]

  • auth ([list]) – [description]

  • headers ([list]) – [description]

  • server ([list]) – [description]

  • token ([list]) – [description]

  • compression ([list]) – [description]

  • data_folder ([list]) – [description]

  • data ([list]) – [description]

  • request_id ([list]) – [description]

  • reload ([list]) – [description]

  • reapply ([list]) – [description]

  • update ([list]) – [description]

Raises

Exception – [description]

Returns

[description]

Return type

[type]

append_dataset()
columns: list
compression: str = 'GZIP'
data: pandas.core.frame.DataFrame
data_folder: str = '/Users/eyalsoreq/dcarte/data'
dataset_name: str
datasets: list
delay: float = 1
domain: str
download_data()
download_dataset()
dtypes: list
get_output()
headers: dict
home: pathlib.Path = '/Users/eyalsoreq'
load_dataset()
load_metadata()
post_request()
process_request()
reapply: bool = False
reload: bool = False
request_id: str = ''
save_dataset()
server: str = 'https://research.minder.care/api/export'
since: str = '2019-04-01'
until: str = '2022-01-17T11:14:24.552084Z'
update: bool = False
update_dataset()
update_metadata()

dcarte.utils module

class dcarte.utils.BearerAuth(token)

Bases: requests.auth.AuthBase

BearerAuth manages the coupling of a token to requests framework

Parameters

requests ([type]) – [description]

dcarte.utils.angles_to_time(day)

angles_to_time [summary]

[extended_summary]

Parameters
  • angles ([type]) – [description]

  • day ([type], optional) – [description]. Defaults to 24*60**2.

Returns

[description]

Return type

[type]

dcarte.utils.between_time(df, factor, start_time, end_time)
dcarte.utils.date2iso()

date2iso convert a date string to iso format

Parameters
  • date (str) – [description]

  • output_fmt (str, optional) – [description]. Defaults to ‘%Y-%m-%dT%H:%M:%S.%f’.

Returns

[description]

Return type

[type]

dcarte.utils.epoch_to_local() pandas.core.series.Series

epoch_to_local converts epoch

[extended_summary]

Parameters
  • dt (pd.Series) – [description]

  • tz (str, optional) – [description]. Defaults to ‘Europe/London’.

  • unit (str, optional) – [description]. Defaults to ‘s’.

  • shift (int, optional) – [description]. Defaults to 0.

Returns

[description]

Return type

pd.Series

dcarte.utils.inject_metadata(table: pyarrow.lib.table, meta_content: dict) pyarrow.lib.Table

inject_metadata replaces metadata in a parquet pyspark file

[extended_summary]

Parameters
  • table (pa.table) – [description]

  • meta_content (dict) – [description]

Returns

[description]

Return type

pa.Table

dcarte.utils.isnotebook() bool

isnotebook checks if the run environment is a jupyter notebook

Returns

[description]

Return type

bool

dcarte.utils.lagged_df()

lagged_df returns a lagged dataframe for a specific factor

[extended_summary]

Parameters
  • df (pd.DataFrame) – [description]

  • factor (str, optional) – [description]. Defaults to ‘activity’.

  • lags (int, optional) – [description]. Defaults to 7.

Returns

[description]

Return type

[type]

dcarte.utils.load_csv_from_zip(_zip, csv_file)
dcarte.utils.load_yaml(local_file: str) dict

load_yaml loads a yaml file into a dictionary

Parameters

local_file (str) – [description]

Returns

[description]

Return type

dict

dcarte.utils.load_zip_csv(zip_file: str, csv_file: str) pandas.core.frame.DataFrame

load_zip_csv returns a specific csv file from a zip file

Parameters
  • zip_file (str) – [description]

  • csv_file (str) – [description]

Returns

[description]

Return type

pd.DataFrame

dcarte.utils.localize_time(timezones)

localize_time [summary]

[extended_summary]

Parameters
  • df (pd.DataFrame) – [description]

  • factors (list) – [description]

  • timezones ([type], optional) – [description]. Defaults to None.

Returns

[description]

Return type

[type]

dcarte.utils.mean_time(kind)

mean_time [summary]

[extended_summary]

Parameters
  • times ([type]) – [description]

  • kind (str, optional) – [description]. Defaults to ‘time’.

Returns

[description]

Return type

[type]

dcarte.utils.merge_dicts(d1: dict, d2: dict)

merge_dicts merges two dictionaries

Parameters
  • d1 (dict) – [description]

  • d2 (dict) – [description]

Yields

[type] – [description]

dcarte.utils.path_exists(local_file: str)

path_exists checks if a file exists in the local filesystem

[extended_summary]

Parameters

local_file (str) – [description]

Returns

[description]

Return type

[type]

dcarte.utils.process_transition(covariates) pandas.core.frame.DataFrame

process_transition convert a timeseries DataFrame with datetimes to a transition dataframe

Parameters
  • df (pd.DataFrame) – [description]

  • groupby (list) – [description]

  • datetime (str) – [description]

  • value (str) – [description]

  • covariates ([type], optional) – [description]. Defaults to None.

Returns

[description]

Return type

[type]

dcarte.utils.read_metadata(filename: str)

read_metadata return only the metadata from a parquet pyspark file

Parameters

filename (str) – [description]

Returns

[description]

Return type

[type]

dcarte.utils.read_table() pandas.core.frame.DataFrame

read_table reads a parquet pyspark file

Parameters
  • filename (str) – filename either in relative or in absoulte path

  • columns (list, optional) – specific columns to load. Defaults to None.

Returns

[description]

Return type

pd.DataFrame

dcarte.utils.reindex_ts(x: pandas.core.series.Series, freq: str)

reindex_ts reindex a timeseries by some freq

Parameters
  • x (pd.Series) – [description]

  • freq (str) – goes in the time format of pandas

Returns

[description]

Return type

[type]

dcarte.utils.seconds_to_time(seconds)

seconds_to_time [summary]

[extended_summary]

Parameters

seconds ([type]) – [description]

Returns

[description]

Return type

[type]

dcarte.utils.segment_freq() pandas.core.frame.DataFrame

segment_freq [summary]

[extended_summary]

Parameters
  • v (pd.Series) – [description]

  • window_length (int, optional) – [description]. Defaults to 59.

  • polyorder (int, optional) – [description]. Defaults to 1.

  • r (int, optional) – [description]. Defaults to 10.

Returns

[description]

Return type

pd.DataFrame

dcarte.utils.segment_summary(shift)

segment_summary [summary]

[extended_summary]

Parameters

vc ([type]) – [description]

dcarte.utils.set_path(local_file: str)

set_path checks if a parent folder exists and if not creates it :param local_file: [description] :type local_file: str

dcarte.utils.shift_row_to_bottom(df, index_to_shift)

shift_row_to_bottom [summary]

[extended_summary]

Parameters
  • df ([type]) – [description]

  • index_to_shift ([type]) – [description]

Returns

[description]

Return type

[type]

dcarte.utils.shift_row_to_top(df, index_to_shift)

shift_row_to_top [summary]

[extended_summary]

Parameters
  • df ([type]) – [description]

  • index_to_shift ([type]) – [description]

Returns

[description]

Return type

[type]

dcarte.utils.std_time(kind)

std_time [summary]

[extended_summary]

Parameters
  • times ([type]) – [description]

  • kind (str, optional) – [description]. Defaults to ‘time’.

Returns

[description]

Return type

[type]

dcarte.utils.str_to_time(time)

str_to_time [summary]

[extended_summary]

Parameters

time ([type]) – [description]

Returns

[description]

Return type

[type]

dcarte.utils.time_cdf(times: pandas.core.series.Series, name: str) pandas.core.series.Series

time_cdf return a cdf of times as a pandas Series

Parameters
  • times (pd.Series) – [description]

  • name (str) – [description]

Returns

[description]

Return type

pd.Series

dcarte.utils.time_to_angles(day)

time_to_angles [summary]

[extended_summary]

Parameters
  • time ([type]) – [description]

  • day ([type], optional) – [description]. Defaults to 24*60**2.

Returns

[description]

Return type

[type]

dcarte.utils.timer()

timer is a wrapper decorator to report functions duration :param desc: [description line to print to sdout]. Defaults to None. :type desc: str, optional

dcarte.utils.times_to_angles(day)

times_to_angles [summary]

[extended_summary]

Parameters
  • times ([type]) – [description]

  • day ([type], optional) – [description]. Defaults to 24*60**2.

Returns

[description]

Return type

[type]

dcarte.utils.update_table() None

update_table updates a parquet pyspark file

Parameters
  • data (pd.DataFrame) – [description]

  • filename (str) – [description]

  • compression (str) – [description]

  • meta_content (dict, optional) – [description]. Defaults to {}.

dcarte.utils.update_yaml(local_file: str, data: dict)

update_yaml updates a dictionary structure onto a yaml file

[extended_summary]

Parameters
  • local_file (str) – [description]

  • data (dict) – [description]

dcarte.utils.utc_to_local()

utc_to_local converts a timeseries from utc to a specific timezone

Parameters
  • dt (pd.Series) – [description]

  • tz (str, optional) – [description]. Defaults to ‘Europe/London’.

  • shift (int, optional) – [description]. Defaults to -2.

Returns

[description]

Return type

[type]

dcarte.utils.write_table() None

write_table writes data into a parquet pyspark file

[extended_summary]

Parameters
  • data (pd.DataFrame) – [description]

  • filename (str) – [description]

  • compression (str) – [description]

  • meta_content (dict, optional) – [description]. Defaults to {}.

dcarte.utils.write_yaml(local_file: str, data: dict)

write_yaml writes a dictionary structure into a yaml file

Parameters
  • local_file (str) – [description]

  • data (dict) – [description]

Module contents

dcarte dataset fusion tools