Extending

Custom Type (from scratch)

Each visions type is a subclass of VisionsBaseType requiring a unique implementation of two methods:

  1. get_relations. Returns the set of relations mapping from another type to the current type.

  2. contains_op. Checks whether a series is of the type visions_type, returns Bool.

Let’s inspect the source code for Ordinal to gather intuition.

visions.types.ordinal.py
from visions.types.type import VisionsBaseType
from visions.types.categorical import Categorical


class Ordinal(VisionsBaseType):
    """**Ordinal** implementation of :class:`visions.types.VisionsBaseType`.
    Examples:
        >>> x = pd.Series([1, 2, 3, 1, 1], dtype='category')
        >>> x in visions.Ordinal
        True
    """

    @staticmethod
    def get_relations():
        return [IdentityRelation(Categorical)]

    @classmethod
    def contains_op(cls, series: pd.Series, *args) -> bool:
        return pdt.is_categorical_dtype(series) and series.cat.ordered

In this example, ordinal declares a single IdentityRelation from categorical. The meaning of IdentityRelation is not particularly important in this case, just know it means the mapping function between categorical and ordinal is the identity function.

We can also see the contains_op requires the sequence to be an ordered categorical machine type representation.

Alternatively you can choose to base a type on an existing type. This is convenient when you only change a single relation.

Note

Your custom type might be helpful for others, in which case you can choose to contribute it to visions. Read more on how to contribute.

Another example

The default typesets in visions consider Boolean and Categorical to be distinct types. In fact, Boolean is a special case of Categorical where the number of categories is 2 and contains the values “True” and “False” (“Man” and “Woman” wouldn’t be binary).

Note that for data analysis, the distinction makes sense. For a boolean we could the true/false ratio for example.

See: examples/data_analysis for an example.

Custom Types (extend a type)

Error

This feature has been removed, please use the declarative API

Another option is to create a new type based on an existing type. This is useful for small changes, such as adding a single relation.

Each type has the method evolve_type for this purpose.

Declarative API

Another option is to create a new type using the declarative API. We can use the create_type function for this purpose.

Add a inference relation from integer to datetime (YYYYMMDD)
 from visions import create_type
 from visions.types.date_time import DateTime
 from visions.types.generic import Generic


 def integer_is_mydatetime(series: pd.Series, state):
     return test_utils.coercion_test(lambda s: func(s.astype(str)))(series)


 def integer_to_mydatetime(series: pd.Series) -> pd.Series:
     return pd.to_datetime(series)


 MyDateTime = create_type(
     "MyDateTime",
     contains=DateTime.contains_op,
     identity=Generic,
     inference=InferenceRelation(
         relationship=integer_is_mydatetime,
         transformer=integer_to_mydatetime,
         related_type=Integer,
     )
 )

 print(MyDateTime)
 # Prints: MyDateTime

Hint

While developing new type relations, you can use this helper function to debug:

for column, type_before, type_after in compare_detect_inference_frame(df, typeset):
     print(f"{column} was {type_before} is {type_after}")

Please read the Type changes section in the functional API documentation for more details.

Custom Typesets (from scratch)

Although visions comes with an array of starter typesets suitable for most standard usage you may quickly find yourself looking to expand upon those types to suit your own domain specific needs. In order to meet those needs there are a number of easy mechanisms to either extend pre-existing typesets or define your own from scratch.

For example, you could define a custom typeset with only time specific types as follows:

Custom time typeset
class CustomSet(VisionTypeset):
    """Typeset that exclusively supports time related types

    Includes support for the following types:

    - DateTime
    - TimeDelta
    - Date
    - Time

    """

    def __init__(self):
        types = [
            DateTime,
            TimeDelta,
            Date,
            Time,
        ]
        super().__init__(types)

or even more simply,

Custom time typeset (simplified)
import visions as v
types = [v.DateTime, v.TimeDelta, v.Date, v.Time]
CustomSet = VisionTypeset(types)

Custom typesets (extend typeset)

Alternatively, typesets support a limited algebra allowing you to define new typesets based on simple manipulations to pre-existing sets.

Custom time typeset
import visions as v
typeset = CompleteSet() - v.Time + v.Date

# Alternatively
typeset = typeset.replace(v.Time, v.Date)

Just like addition and subtraction elsewhere in Python, you can split these operations up in any way imaginable:

import visions as v
rdw_typeset = CompleteSet()
rdw_typeset -= v.Boolean
rdw_typeset += BooleanNL
rdw_typeset -= v.Integer
rdw_typeset += DateTimeIntYYYYMMDD
rdw_typeset -= v.Categorical
rdw_typeset += CategoricalStr