regi0

regi0.match(left: pandas.Series, right: pandas.Series, preprocess: bool = False, fuzzy: bool = False, threshold: float = 0.8) → pandas.Series

Compares values between two different Series to check if they match.

Parameters

left (Series) – Left Series.
right (Series) – Right Series.
preprocess (bool) – Whether to clean and standardize values before comparing them.
fuzzy (bool) – Whether to compare values using fuzzy logic.
threshold (float) – Threshold to define equal values using fuzzy logic.

Returns

Series with booleans indicating whether the values match.

Return type

Series

regi0.read_geographic_table(path: Union[str, pathlib.Path], lon_col: str, lat_col: str, crs: str = 'epsg:4326', drop_empty_coords: bool = False, reset_index: bool = True) → geopandas.GeoDataFrame

Reads tabular data (csv, txt, xls or xlsx) and converts it to a GeoDataFrame.

Parameters

path (str or Path) – Filename with extension. Can be a relative or absolute path.
lon_col (str) – Name of the longitude column.
lat_col (str) – Name of the latitude column.
crs (str) – Coordinate reference system with the corresponding EPSG code. Must be in the form epsg:code.
drop_empty_coords (bool) – Whether to remove rows with missing or incomplete coordinates.
reset_index (bool) – Whether to reset the result’s index after removing rows with missing or incomplete coordinates. Only has effect when drop_empty_coords is True.

Returns

GeoDataFrame with the records.

Return type

gpd.GeoDataFrame

regi0.read_table(path: Union[str, pathlib.Path], **kwargs) → pandas.DataFrame

Reads tabular data (csv, txt, xls or xlsx).

Parameters

path (str or Path) – Filename with extension. Can be a relative or absolute path.
**kwargs – pandas read_csv, read_table and read_excel keyword arguments.

Returns

DataFrame with the tabular data.

Return type

pd.DataFrame

regi0.verify(df: pandas.DataFrame, observed_col: str, expected: pandas.Series, flag_name: str, add_suggested: bool = False, suggested_name: Optional[str] = None, add_source: bool = False, source: Optional[pandas.Series] = None, source_name: Optional[str] = None, drop: bool = False, **kwargs) → pandas.DataFrame

Verifies that the values in a specific column from df match some expected values.

Parameters

df (DataFrame) – DataFrame with values.
observed_col (str) – Name of the column in df with the values to verify.
expected (Series) – Series with expected values. Has to match df length.
flag_name (str) – Name of the resulting column indicating whether the observed values match the expected values.
add_suggested (bool) – Whether to add a column to the result with suggested values for those rows where the observed values do not match the expected values.
suggested_name (str) – Name of the column for the suggested values. Only has effect when add_suggested=True is passed.
add_source (bool) –
source (Series) –
drop (bool) – Whether to drop the rows where the observed values do not match the expected values.
kwargs – Keyword arguments accepted by the match function.

Returns

Copy of df with extra columns.

Return type

DataFrame

regi0.write_table(df: pandas.DataFrame, path: Union[str, pathlib.Path], **kwargs) → None

Writes tabular data (csv, txt, xls or xlsx) to disk.

Parameters

df (pd.DataFrame) – DataFrame to write to disk.
path (str or Path) – Filename with extension. Can be a relative or absolute path.
**kwargs – Keyword arguments for pandas read_csv, read_table and read_excel functions.

Return type

None