regi0

regi0.match(left: pandas.Series, right: pandas.Series, preprocess: bool = False, fuzzy: bool = False, threshold: float = 0.8) pandas.Series

Compares values between two different Series to check if they match.

Parameters
  • left (Series) – Left Series.

  • right (Series) – Right Series.

  • preprocess (bool) – Whether to clean and standardize values before comparing them.

  • fuzzy (bool) – Whether to compare values using fuzzy logic.

  • threshold (float) – Threshold to define equal values using fuzzy logic.

Returns

Series with booleans indicating whether the values match.

Return type

Series

regi0.read_geographic_table(path: Union[str, pathlib.Path], lon_col: str, lat_col: str, crs: str = 'epsg:4326', drop_empty_coords: bool = False, reset_index: bool = True) geopandas.GeoDataFrame

Reads tabular data (csv, txt, xls or xlsx) and converts it to a GeoDataFrame.

Parameters
  • path (str or Path) – Filename with extension. Can be a relative or absolute path.

  • lon_col (str) – Name of the longitude column.

  • lat_col (str) – Name of the latitude column.

  • crs (str) – Coordinate reference system with the corresponding EPSG code. Must be in the form epsg:code.

  • drop_empty_coords (bool) – Whether to remove rows with missing or incomplete coordinates.

  • reset_index (bool) – Whether to reset the result’s index after removing rows with missing or incomplete coordinates. Only has effect when drop_empty_coords is True.

Returns

GeoDataFrame with the records.

Return type

gpd.GeoDataFrame

regi0.read_table(path: Union[str, pathlib.Path], **kwargs) pandas.DataFrame

Reads tabular data (csv, txt, xls or xlsx).

Parameters
  • path (str or Path) – Filename with extension. Can be a relative or absolute path.

  • **kwargs – pandas read_csv, read_table and read_excel keyword arguments.

Returns

DataFrame with the tabular data.

Return type

pd.DataFrame

regi0.verify(df: pandas.DataFrame, observed_col: str, expected: pandas.Series, flag_name: str, add_suggested: bool = False, suggested_name: Optional[str] = None, add_source: bool = False, source: Optional[pandas.Series] = None, source_name: Optional[str] = None, drop: bool = False, **kwargs) pandas.DataFrame

Verifies that the values in a specific column from df match some expected values.

Parameters
  • df (DataFrame) – DataFrame with values.

  • observed_col (str) – Name of the column in df with the values to verify.

  • expected (Series) – Series with expected values. Has to match df length.

  • flag_name (str) – Name of the resulting column indicating whether the observed values match the expected values.

  • add_suggested (bool) – Whether to add a column to the result with suggested values for those rows where the observed values do not match the expected values.

  • suggested_name (str) – Name of the column for the suggested values. Only has effect when add_suggested=True is passed.

  • add_source (bool) –

  • source (Series) –

  • drop (bool) – Whether to drop the rows where the observed values do not match the expected values.

  • kwargs – Keyword arguments accepted by the match function.

Returns

Copy of df with extra columns.

Return type

DataFrame

regi0.write_table(df: pandas.DataFrame, path: Union[str, pathlib.Path], **kwargs) None

Writes tabular data (csv, txt, xls or xlsx) to disk.

Parameters
  • df (pd.DataFrame) – DataFrame to write to disk.

  • path (str or Path) – Filename with extension. Can be a relative or absolute path.

  • **kwargs – Keyword arguments for pandas read_csv, read_table and read_excel functions.

Return type

None