dcarte package
Submodules
dcarte.config module
- dcarte.config.baseline_config(home: pathlib.Path, root: pathlib.Path, files: list) dict
baseline_config create a baseline config dict
- Parameters
home (Path) – [description]
root (Path) – [description]
- Returns
[description]
- Return type
dict
- dcarte.config.create_config(home: pathlib.Path, root: pathlib.Path, dcarte_home: pathlib.Path)
create_config creates a baseline config file
- Parameters
home (Path) – [description]
root (Path) – [description]
- Returns
[description]
- Return type
[type]
- dcarte.config.get_config() dict
get_config a function that returns or creates and returns a local config file
- Parameters
config_file (str, optional) – [description]. Defaults to ‘/dcarte/config.yaml’.
root (Path, optional) – [description]. Defaults to Path(‘__file__’).parent.absolute().
home (Path, optional) – [description]. Defaults to Path(‘~’).expanduser().
- Returns
containing all the configuration information neeeded for dcarte
- Return type
[dict]
- dcarte.config.get_mac() str
get_mac return mac address of the compute node or computer
- Returns
[description]
- Return type
str
- dcarte.config.get_token() str
get_token opens the access-tokens website to create a unique REST token
- Returns
a token generated at https://research.minder.care/portal/access-tokens
- Return type
str
- dcarte.config.update_config()
update_config updates the central config file with data from new_dict
- Parameters
new_dict (dict) – [description]
home (Path, optional) – [description]. Defaults to Path(‘~’).expanduser().
dcarte.domains module
- dcarte.domains.domains()
domains prints the current potential local domains as a table to stdout
[extended_summary]
dcarte.load module
- dcarte.load.get_defaults(**kwargs)
get_defaults [summary]
[extended_summary]
- Returns
[description]
- Return type
[type]
- dcarte.load.load(dataset: str, domain: str, **kwargs)
load [summary]
[extended_summary]
- Parameters
dataset (str) – [description]
domain (str) – [description]
- Raises
Exception – [description]
- Returns
[description]
- Return type
[type]
dcarte.local module
- class dcarte.local.LocalDataset()
Bases:
objectLocalDataset [summary]
[extended_summary]
- Parameters
dataset_name ([str]) – [description]
datasets ([list]) – [description]
pipeline ([list]) – [description]
domain ([str]) – [description]
module ([list]) – [description]
dependencies ([list]) – [description]
since ([list]) – [description]
until ([list]) – [description]
delay ([list]) – [description]
reload ([list]) – [description]
reapply ([list]) – [description]
update ([list]) – [description]
home ([Path]) – [description]
compression ([str]) – [description]
data_folder ([str]) – [description]
data ([pd.DataFrame]) – [description]
- Returns
[description]
- Return type
[type]
- compression: str = 'GZIP'
- data: pandas.core.frame.DataFrame
- data_folder: str = '/Users/eyalsoreq/dcarte/data'
- dataset_name: str
- datasets: dict
- delay: float = 1
- dependencies: list
- domain: str
- home: pathlib.Path = '/Users/eyalsoreq'
- load_dataset()
load_dataset [summary]
[extended_summary]
- load_metadata()
load_metadata [summary]
[extended_summary]
- Returns
[description]
- Return type
[type]
- module: str = 'base'
- pipeline: list
- process_dataset()
process_dataset [summary]
[extended_summary]
- reapply: bool = False
- reapply_dataset()
reapply_dataset [summary]
[extended_summary]
- register_dataset() None
register_dataset [summary]
[extended_summary]
- reload: bool = False
- save_dataset() None
save_dataset [summary]
[extended_summary]
- since: str = '2019-04-01'
- until: str = '2022-01-17T11:14:24.488976Z'
- update: bool = False
- update_dataset()
update_dataset [summary]
[extended_summary]
- update_metadata()
update_metadata [summary]
[extended_summary]
dcarte.minder module
- class dcarte.minder.MinderDataset()
Bases:
objectMinderDataset class handles the downloading of datasets from the minder reserch platform
[extended_summary]
- Parameters
dataset_name ([str]) – [description]
datasets ([list]) – [description]
columns ([list]) – [description]
domain ([str]) – [description]
dtypes ([list]) – [description]
since ([list]) – [description]
until ([list]) – [description]
delay ([list]) – [description]
auth ([list]) – [description]
headers ([list]) – [description]
server ([list]) – [description]
token ([list]) – [description]
compression ([list]) – [description]
data_folder ([list]) – [description]
data ([list]) – [description]
request_id ([list]) – [description]
reload ([list]) – [description]
reapply ([list]) – [description]
update ([list]) – [description]
- Raises
Exception – [description]
- Returns
[description]
- Return type
[type]
- append_dataset()
- columns: list
- compression: str = 'GZIP'
- data: pandas.core.frame.DataFrame
- data_folder: str = '/Users/eyalsoreq/dcarte/data'
- dataset_name: str
- datasets: list
- delay: float = 1
- domain: str
- download_data()
- download_dataset()
- dtypes: list
- get_output()
- headers: dict
- home: pathlib.Path = '/Users/eyalsoreq'
- load_dataset()
- load_metadata()
- post_request()
- process_request()
- reapply: bool = False
- reload: bool = False
- request_id: str = ''
- save_dataset()
- server: str = 'https://research.minder.care/api/export'
- since: str = '2019-04-01'
- until: str = '2022-01-17T11:14:24.552084Z'
- update: bool = False
- update_dataset()
- update_metadata()
dcarte.utils module
- class dcarte.utils.BearerAuth(token)
Bases:
requests.auth.AuthBaseBearerAuth manages the coupling of a token to requests framework
- Parameters
requests ([type]) – [description]
- dcarte.utils.angles_to_time(day)
angles_to_time [summary]
[extended_summary]
- Parameters
angles ([type]) – [description]
day ([type], optional) – [description]. Defaults to 24*60**2.
- Returns
[description]
- Return type
[type]
- dcarte.utils.between_time(df, factor, start_time, end_time)
- dcarte.utils.date2iso()
date2iso convert a date string to iso format
- Parameters
date (str) – [description]
output_fmt (str, optional) – [description]. Defaults to ‘%Y-%m-%dT%H:%M:%S.%f’.
- Returns
[description]
- Return type
[type]
- dcarte.utils.epoch_to_local() pandas.core.series.Series
epoch_to_local converts epoch
[extended_summary]
- Parameters
dt (pd.Series) – [description]
tz (str, optional) – [description]. Defaults to ‘Europe/London’.
unit (str, optional) – [description]. Defaults to ‘s’.
shift (int, optional) – [description]. Defaults to 0.
- Returns
[description]
- Return type
pd.Series
- dcarte.utils.inject_metadata(table: pyarrow.lib.table, meta_content: dict) pyarrow.lib.Table
inject_metadata replaces metadata in a parquet pyspark file
[extended_summary]
- Parameters
table (pa.table) – [description]
meta_content (dict) – [description]
- Returns
[description]
- Return type
pa.Table
- dcarte.utils.isnotebook() bool
isnotebook checks if the run environment is a jupyter notebook
- Returns
[description]
- Return type
bool
- dcarte.utils.lagged_df()
lagged_df returns a lagged dataframe for a specific factor
[extended_summary]
- Parameters
df (pd.DataFrame) – [description]
factor (str, optional) – [description]. Defaults to ‘activity’.
lags (int, optional) – [description]. Defaults to 7.
- Returns
[description]
- Return type
[type]
- dcarte.utils.load_csv_from_zip(_zip, csv_file)
- dcarte.utils.load_yaml(local_file: str) dict
load_yaml loads a yaml file into a dictionary
- Parameters
local_file (str) – [description]
- Returns
[description]
- Return type
dict
- dcarte.utils.load_zip_csv(zip_file: str, csv_file: str) pandas.core.frame.DataFrame
load_zip_csv returns a specific csv file from a zip file
- Parameters
zip_file (str) – [description]
csv_file (str) – [description]
- Returns
[description]
- Return type
pd.DataFrame
- dcarte.utils.localize_time(timezones)
localize_time [summary]
[extended_summary]
- Parameters
df (pd.DataFrame) – [description]
factors (list) – [description]
timezones ([type], optional) – [description]. Defaults to None.
- Returns
[description]
- Return type
[type]
- dcarte.utils.mean_time(kind)
mean_time [summary]
[extended_summary]
- Parameters
times ([type]) – [description]
kind (str, optional) – [description]. Defaults to ‘time’.
- Returns
[description]
- Return type
[type]
- dcarte.utils.merge_dicts(d1: dict, d2: dict)
merge_dicts merges two dictionaries
- Parameters
d1 (dict) – [description]
d2 (dict) – [description]
- Yields
[type] – [description]
- dcarte.utils.path_exists(local_file: str)
path_exists checks if a file exists in the local filesystem
[extended_summary]
- Parameters
local_file (str) – [description]
- Returns
[description]
- Return type
[type]
- dcarte.utils.process_transition(covariates) pandas.core.frame.DataFrame
process_transition convert a timeseries DataFrame with datetimes to a transition dataframe
- Parameters
df (pd.DataFrame) – [description]
groupby (list) – [description]
datetime (str) – [description]
value (str) – [description]
covariates ([type], optional) – [description]. Defaults to None.
- Returns
[description]
- Return type
[type]
- dcarte.utils.read_metadata(filename: str)
read_metadata return only the metadata from a parquet pyspark file
- Parameters
filename (str) – [description]
- Returns
[description]
- Return type
[type]
- dcarte.utils.read_table() pandas.core.frame.DataFrame
read_table reads a parquet pyspark file
- Parameters
filename (str) – filename either in relative or in absoulte path
columns (list, optional) – specific columns to load. Defaults to None.
- Returns
[description]
- Return type
pd.DataFrame
- dcarte.utils.reindex_ts(x: pandas.core.series.Series, freq: str)
reindex_ts reindex a timeseries by some freq
- Parameters
x (pd.Series) – [description]
freq (str) – goes in the time format of pandas
- Returns
[description]
- Return type
[type]
- dcarte.utils.seconds_to_time(seconds)
seconds_to_time [summary]
[extended_summary]
- Parameters
seconds ([type]) – [description]
- Returns
[description]
- Return type
[type]
- dcarte.utils.segment_freq() pandas.core.frame.DataFrame
segment_freq [summary]
[extended_summary]
- Parameters
v (pd.Series) – [description]
window_length (int, optional) – [description]. Defaults to 59.
polyorder (int, optional) – [description]. Defaults to 1.
r (int, optional) – [description]. Defaults to 10.
- Returns
[description]
- Return type
pd.DataFrame
- dcarte.utils.segment_summary(shift)
segment_summary [summary]
[extended_summary]
- Parameters
vc ([type]) – [description]
- dcarte.utils.set_path(local_file: str)
set_path checks if a parent folder exists and if not creates it :param local_file: [description] :type local_file: str
- dcarte.utils.shift_row_to_bottom(df, index_to_shift)
shift_row_to_bottom [summary]
[extended_summary]
- Parameters
df ([type]) – [description]
index_to_shift ([type]) – [description]
- Returns
[description]
- Return type
[type]
- dcarte.utils.shift_row_to_top(df, index_to_shift)
shift_row_to_top [summary]
[extended_summary]
- Parameters
df ([type]) – [description]
index_to_shift ([type]) – [description]
- Returns
[description]
- Return type
[type]
- dcarte.utils.std_time(kind)
std_time [summary]
[extended_summary]
- Parameters
times ([type]) – [description]
kind (str, optional) – [description]. Defaults to ‘time’.
- Returns
[description]
- Return type
[type]
- dcarte.utils.str_to_time(time)
str_to_time [summary]
[extended_summary]
- Parameters
time ([type]) – [description]
- Returns
[description]
- Return type
[type]
- dcarte.utils.time_cdf(times: pandas.core.series.Series, name: str) pandas.core.series.Series
time_cdf return a cdf of times as a pandas Series
- Parameters
times (pd.Series) – [description]
name (str) – [description]
- Returns
[description]
- Return type
pd.Series
- dcarte.utils.time_to_angles(day)
time_to_angles [summary]
[extended_summary]
- Parameters
time ([type]) – [description]
day ([type], optional) – [description]. Defaults to 24*60**2.
- Returns
[description]
- Return type
[type]
- dcarte.utils.timer()
timer is a wrapper decorator to report functions duration :param desc: [description line to print to sdout]. Defaults to None. :type desc: str, optional
- dcarte.utils.times_to_angles(day)
times_to_angles [summary]
[extended_summary]
- Parameters
times ([type]) – [description]
day ([type], optional) – [description]. Defaults to 24*60**2.
- Returns
[description]
- Return type
[type]
- dcarte.utils.update_table() None
update_table updates a parquet pyspark file
- Parameters
data (pd.DataFrame) – [description]
filename (str) – [description]
compression (str) – [description]
meta_content (dict, optional) – [description]. Defaults to {}.
- dcarte.utils.update_yaml(local_file: str, data: dict)
update_yaml updates a dictionary structure onto a yaml file
[extended_summary]
- Parameters
local_file (str) – [description]
data (dict) – [description]
- dcarte.utils.utc_to_local()
utc_to_local converts a timeseries from utc to a specific timezone
- Parameters
dt (pd.Series) – [description]
tz (str, optional) – [description]. Defaults to ‘Europe/London’.
shift (int, optional) – [description]. Defaults to -2.
- Returns
[description]
- Return type
[type]
- dcarte.utils.write_table() None
write_table writes data into a parquet pyspark file
[extended_summary]
- Parameters
data (pd.DataFrame) – [description]
filename (str) – [description]
compression (str) – [description]
meta_content (dict, optional) – [description]. Defaults to {}.
- dcarte.utils.write_yaml(local_file: str, data: dict)
write_yaml writes a dictionary structure into a yaml file
- Parameters
local_file (str) – [description]
data (dict) – [description]
Module contents
dcarte dataset fusion tools