regi0.taxonomic
- regi0.taxonomic.get_canonical_name(names: pandas.Series) pandas.Series
Extracts the canonical name (genus and specific epithet) of a Series of scientific names. It does this by removing special characters, numbers and Open Nomenclature qualifiers (such as aff. or cf.) and then taking the first two words.
- Parameters
names (Series) – Series with the scientific names.
- Returns
Series with the extracted canonical names.
- Return type
Series
- regi0.taxonomic.get_checklist_fields(names: Union[list, pandas.Series, str], checklist: Union[str, pathlib.Path, pandas.DataFrame], name_field: str, fields: Union[list, str, tuple], add_supplied_names: bool = False, expand: bool = True) pandas.DataFrame
Retrieves values for one or multiple fields from a checklist given some species names.
- Parameters
names (Series) – Series with species names.
checklist (str, Path or DataFrame) – Path to table or DataFrame wih checklist information.
name_field (str) – Name of the column in checklist with species names.
fields (list, str or tuple) – List of fields (columns) to retrieve from checklist.
add_supplied_names (bool) – Whether to add names as an extra column in the result.
expand (bool) – Whether to expand result rows to match names size. If False, the number of rows will correspond to the number of unique names in names.
- Returns
DataFrame with the values retrieved from checklist.
- Return type
DataFrame
- regi0.taxonomic.get_checklist_fields_multiple(names: Union[list, pandas.Series, str], filenames: list, name_field: str, fields: Union[list, str], add_supplied_names: bool = False, expand: bool = True, keep_first: bool = True, add_source: bool = False, source_name: str = 'source') pandas.DataFrame
Retrieves values for one or multiple fields from multiple checklists given some species names. If a species name is found on more than one checklist, only the field(s) values for one of them is kept.
- Parameters
names – Series with species names.
filenames – List of checklist file names.
name_field – Name of the column in checklist with species names.
fields – List of fields (columns) to retrieve from checklist.
add_supplied_names – Whether to add names as an extra column in the result.
expand – Whether to expand result rows to match names size. If False, the number of rows will correspond to the number of unique names in names.
keep_first – Whether to keep the first match from a checklist or use the latest.
add_source – Whether to add the checklist name where the values were retrieved from.
source_name – Name of the column with the source.
- Returns
DataFrame with the values retrieved from the checklists.
- Return type
pd.DataFrame
- regi0.taxonomic.is_in_checklist(names: Union[list, pandas.Series, str], checklist: pandas.DataFrame, name_field: str, add_supplied_names: bool = False, expand: bool = True) pandas.DataFrame
Checks whether some species names are found in a given checklist.
- Parameters
names – Series with species names.
checklist – DataFrame wih checklist information.
name_field – Name of the column in checklist with species names.
add_supplied_names – Whether to add names as an extra column in the result.
expand – Whether to expand result rows to match names size. If False, the number of rows will correspond to the number of unique names in names.
- Returns
DataFrame with a Boolean Series indicating whether names are present in checklist. If add_supplied_names=True is passed, the result will have an extra column.
- Return type
pd.DataFrame
- regi0.taxonomic.is_in_checklist_multiple(names: Union[list, pandas.Series, str], filenames: list, name_field: str, add_supplied_names: bool = False, expand: bool = True, keep_first: bool = True, add_source: bool = False, source_name: str = 'source') Union[pandas.DataFrame, pandas.Series]
Checks whether some species names are found in a multiple checklist.
- Parameters
names – Series with species names.
filenames – List of checklist file names.
name_field – Name of the column in checklist with species names.
add_supplied_names – Whether to add names as an extra column in the result.
expand – Whether to expand result rows to match names size. If False, the number of rows will correspond to the number of unique names in names.
keep_first – Whether to keep the first match from a checklist or use the latest.
add_source – Whether to add the checklist name where the values were retrieved from.
source_name – Name of the column with the source.
- Returns
DataFrame with a Boolean Series indicating whether names are present in the checklists. If add_supplied_names=True or add_source=True, the result will have extra columns.
- Return type
pd.DataFrame