Lecture 17
These are the basic component of Python’s object oriented system - we’ve been using them regularly all over the place and will now look at how they are defined and used.
When instantiating a class object (e.g. rect()
) we invoke the __init__()
method if it is present in the classes’ definition.
class rect:
"""An object representation of a rectangle"""
# Constructor
def __init__(self, p1 = (0,0), p2 = (1,1)):
self.p1 = p1
self.p2 = p2
# Methods
def area(self):
return ((self.p1[0] - self.p2[0]) *
(self.p1[1] - self.p2[1]))
def set_p1(self, p1):
self.p1 = p1
def set_p2(self, p2):
self.p2 = p2
We’ve seen a number of objects (i.e. Pandas DataFrames) that allow for method chaining to construct a pipeline of operations. We can achieve the same by having our class methods return itself via self
.
class rect:
"""An object representation of a rectangle"""
# Constructor
def __init__(self, p1 = (0,0), p2 = (1,1)):
self.p1 = p1
self.p2 = p2
# Methods
def area(self):
return ((self.p1[0] - self.p2[0]) *
(self.p1[1] - self.p2[1]))
def set_p1(self, p1):
self.p1 = p1
return self
def set_p2(self, p2):
self.p2 = p2
return self
All class objects have a default print method / string conversion method, but the default behavior is not very useful,
Both of the above are handled by the __str__()
method which is implicitly created for our class - we can override this,
There is another special method which is responsible for the printing of the object (see rect()
above) called __repr__()
which is responsible for printing the classes representation. If possible this is meant to be a valid Python expression capable of recreating the object.
Part of the object oriented system is that classes can inherit from other classes, meaning they gain access to all of their parents attributes and methods. We will not go too in depth on this topic beyond showing the basic functionality.
class square(rect):
def __init__(self, p1=(0,0), l=1):
assert isinstance(l, (float, int)), \
"l must be a numnber"
p2 = (p1[0]+l, p1[1]+l)
self.l = l
super().__init__(p1, p2)
def set_p1(self, p1):
self.p1 = p1
self.p2 = (self.p1[0]+self.l, self.p1[1]+self.l)
return self
def set_p2(self, p2):
raise RuntimeError("Squares take l not p2")
def set_l(self, l):
assert isinstance(l, (float, int)), \
"l must be a numnber"
self.l = l
self.p2 = (self.p1[0]+l, self.p1[1]+l)
return self
def __repr__(self):
return f"square({self.p1}, {self.l})"
square((0, 0), 1)
1
1
4
Error: AssertionError: l must be a numnber
Error: AssertionError: l must be a numnber
Error: RuntimeError: Squares take l not p2
When using an object with a for loop, python looks for the __iter__()
method which is expected to return an iterator object (e.g. iter()
of a list, tuple, etc.).
class rect:
"""An object representation of a rectangle"""
# Constructor
def __init__(self, p1 = (0,0), p2 = (1,1)):
self.p1 = p1
self.p2 = p2
# Methods
def area(self):
return ((self.p1[0] - self.p2[0]) *
(self.p1[1] - self.p2[1]))
def __iter__(self):
return iter( [
self.p1,
(self.p1[0], self.p2[1]),
self.p2,
(self.p2[0], self.p1[1])
] )
A class itself can be made iterable by adding a __next__()
method which is called until a StopIteration
exception is encountered. In which case, __iter__()
is still needed but should just return self
.
class rect:
def __init__(self, p1 = (0,0), p2 = (1,1)):
self.p1 = p1
self.p2 = p2
self.vertices = [self.p1, (self.p1[0], self.p2[1]),
self.p2, (self.p2[0], self.p1[1]) ]
self.index = 0
# Methods
def area(self):
return ((self.p1[0] - self.p2[0]) *
(self.p1[1] - self.p2[1]))
def __iter__(self):
return self
def __next__(self):
if self.index == len(self.vertices):
self.index = 0
raise StopIteration
v = self.vertices[self.index]
self.index += 1
return v
There is a lot of bookkeeping in the implementation above - we can simplify this significantly by using a generator function with __iter__()
. A generator is a function which uses yield
instead of return
which allows the function to preserve state between next()
calls.
class rect:
"""An object representation of a rectangle"""
# Constructor
def __init__(self, p1 = (0,0), p2 = (1,1)):
self.p1 = p1
self.p2 = p2
# Methods
def area(self):
return ((self.p1[0] - self.p2[0]) *
(self.p1[1] - self.p2[1]))
def __iter__(self):
vertices = [ self.p1, (self.p1[0], self.p2[1]),
self.p2, (self.p2[0], self.p1[1]) ]
for v in vertices:
yield v
We can examine all of a classes’ methods and attributes using dir()
,
array(['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__',
'__init__', '__init_subclass__', '__iter__', '__le__', '__lt__',
'__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'__weakref__', 'area'], dtype='<U17')
Where did p1
and p2
go?
array(['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__gt__', '__hash__',
'__init__', '__init_subclass__', '__iter__', '__le__', '__lt__',
'__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'__weakref__', 'area', 'p1', 'p2'], dtype='<U17')
The simplest way to create a new transformer is to use FunctionTransformer()
from the preprocessing submodule which allows for converting a Python function into a transformer.
{'accept_sparse': False, 'check_inverse': True, 'feature_names_out': None, 'func': <ufunc 'log'>, 'inv_kw_args': None, 'inverse_func': None, 'kw_args': None, 'validate': False}
['__annotations__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__sklearn_is_fitted__', '__str__', '__subclasshook__', '__weakref__', '_check_feature_names', '_check_input', '_check_inverse_transform', '_check_n_features', '_get_param_names', '_get_tags', '_more_tags', '_parameter_constraints', '_repr_html_', '_repr_html_inner', '_repr_mimebundle_', '_sklearn_auto_wrap_output_keys', '_transform', '_validate_data', '_validate_params', 'accept_sparse', 'check_inverse', 'feature_names_in_', 'feature_names_out', 'fit', 'fit_transform', 'func', 'get_feature_names_out', 'get_params', 'inv_kw_args', 'inverse_func', 'inverse_transform', 'kw_args', 'n_features_in_', 'set_output', 'set_params', 'transform', 'validate']
For a more full featured transformer, it is possible to construct it as a class that inherits from BaseEstimator
and TransformerMixin
classes from the base
submodule.
['__class__' '__delattr__' '__dict__' '__dir__' '__doc__' '__eq__' '__format__'
'__ge__' '__getattribute__' '__getstate__' '__gt__' '__hash__' '__init__'
'__init_subclass__' '__le__' '__lt__' '__module__' '__ne__' '__new__'
'__reduce__' '__reduce_ex__' '__repr__' '__setattr__' '__setstate__'
'__sizeof__' '__str__' '__subclasshook__' '__weakref__' '_check_feature_names'
'_check_n_features' '_get_param_names' '_get_tags' '_more_tags' '_repr_html_'
'_repr_html_inner' '_repr_mimebundle_' '_sklearn_auto_wrap_output_keys'
'_validate_data' '_validate_params' 'b' 'fit' 'fit_transform' 'get_params' 'm'
'set_output' 'set_params' 'transform']
We employed a couple of special methods that are worth mentioning in a little more detail.
_validate_data()
& _check_feature_names()
are methods that are inherited from BaseEstimator
they are responsible for setting and checking the n_features_in_
and the feature_names_in_
attributes respectively.
In general one or both is run during fit()
with reset=True
in which case the respective attribute will be set.
Later, in tranform()
one or both will again be called with reset=False
and the properties of X
will be checked against the values in the attribute.
These are worth using as they promote an interface consistent with sklearn and also provide convenient error checking with useful warning / error messages.
check_is_fitted()
This is another useful helper function from sklearn.utils
- it is fairly simplistic in that it checks for the existence of a specified attribute. If no attribute is given then it checks for any attributes ending in _
that do not begin with __
.
Again this is useful for providing a consistent interface and useful error / warning messages.
See also the other check*()
functions in sklearn.utils
.
If you want to implement your own custom modeling function it is possible, there are different Mixin base classes in sklearn.base
that provide the common core interface.
Class | Description |
---|---|
base.BiclusterMixin |
Mixin class for all bicluster estimators |
base.ClassifierMixin |
Mixin class for all classifiers |
base.ClusterMixin |
Mixin class for all cluster estimators |
base.DensityMixin |
Mixin class for all density estimators |
base.RegressorMixin |
Mixin class for all regression estimators |
base.TransformerMixin |
Mixin class for all transformers |
base.OneToOneFeatureMixin |
Provides get_feature_names_out for simple transformers |
Sta 663 - Spring 2023