Usage

This document describes how to use the methods and classes provided by cyberpandas.

We’ll assume that the following imports have been performed.

In [1]: import ipaddress

In [2]: import pandas as pd

In [3]: from cyberpandas import IPArray, to_ipaddress

Parsing

First, you’ll need some IP Address data. Much like pandas’ pandas.to_datetime(), cyberpandas provides to_ipaddress() for converting sequences of anything to a specialized array, IPArray in this case.

From Strings

to_ipaddress() can parse a sequence strings where each element represents an IP address.

In [4]: to_ipaddress([
   ...:     '192.168.1.1',
   ...:     '2001:0db8:85a3:0000:0000:8a2e:0370:7334',
   ...: ])
   ...: 
Out[4]: IPArray(['192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])

You can also parse a container of bytes (Python 2 parlance).

In [5]: to_ipaddress([
   ...:     b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xc0\xa8\x01\x01',
   ...:     b' \x01\r\xb8\x85\xa3\x00\x00\x00\x00\x8a.\x03ps4',
   ...: ])
   ...: 
Out[5]: IPArray(['192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])

If you have a buffer / bytestring, see From Bytes.

From Integers

IP Addresses are just integers, and to_ipaddress() can parse a sequence of them.

In [6]: to_ipaddress([
   ...:    3232235777,
   ...:    42540766452641154071740215577757643572
   ...: ])
   ...: 
Out[6]: IPArray(['192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])

There’s also the IPArray.from_pyints() method that does the same thing.

From Bytes

If you have a correctly structured buffer of bytes or bytestring, you can directly construct an IPArray without any intermediate copies.

In [7]: stream = (b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xc0\xa8\x01'
   ...:           b'\x01 \x01\r\xb8\x85\xa3\x00\x00\x00\x00\x8a.\x03ps4')
   ...: 

In [8]: IPArray.from_bytes(stream)
Out[8]: IPArray(['192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])

stream is expected to be a sequence of bytes representing IP Addresses (note that it’s just a bytestring that’s be split across two lines for readability). Each IP Address should be 128 bits, left padded with 0s for IPv4 addresses. In particular, IPArray.to_bytes() produces such a sequence of bytes.

Pandas Integration

IPArray satisfies pandas extension array interface, which means that it can safely be stored inside pandas’ Series and DataFrame.

In [9]: values = to_ipaddress([
   ...:     0,
   ...:     3232235777,
   ...:     42540766452641154071740215577757643572
   ...: ])
   ...: 

In [10]: values
Out[10]: IPArray(['0.0.0.0', '192.168.1.1', '2001:db8:85a3::8a2e:370:7334'])

In [11]: ser = pd.Series(values)

In [12]: ser
Out[12]: 
0                         0.0.0.0
1                     192.168.1.1
2    2001:db8:85a3::8a2e:370:7334
dtype: ip

In [13]: df = pd.DataFrame({"addresses": values})

In [14]: df
Out[14]: 
                      addresses
0                       0.0.0.0
1                   192.168.1.1
2  2001:db8:85a3::8a2e:370:7334

Most pandas methods that make sense should work. The following section will call out points of interest.

Indexing

If your selection returns a scalar, you get back an ipaddress.IPv4Address or ipaddress.IPv6Address.

In [15]: ser[0]
Out[15]: IPv4Address('0.0.0.0')

In [16]: df.loc[2, 'addresses']
Out[16]: IPv6Address('2001:db8:85a3::8a2e:370:7334')

Missing Data

The address 0 (0.0.0.0) is used to represent missing values.

In [17]: ser.isna()
Out[17]: 
0     True
1    False
2    False
dtype: bool

In [18]: ser.dropna()
Out[18]: 
1                     192.168.1.1
2    2001:db8:85a3::8a2e:370:7334
dtype: ip

IP Accessor

cyberpandas offers an accessor for IP-specific methods.

In [19]: ser.ip.isna
Out[19]: 
0     True
1    False
2    False
dtype: bool

In [20]: df['addresses'].ip.is_ipv6
Out[20]: 
0    False
1    False
2     True
Name: addresses, dtype: bool