PyPI Dependency Network

The Python Package Index is a repository of software for the Python programming language. What my colleague Ed David and I did were mine all the packages in the site together with the information on their dependencies. Ed used SQLite as our database engine to keep all mined information.

Using python’s graph-tool package, I then built the dependency network and created a network visualization. Note that each package or library uses a set of other existing packages or libraries, making the former dependent on the latter. Anyway, once the data were made available, constructing the network was pretty straightforward. Below is how the resulting network looks.

The Full PyPI Dependency Network

The full PyPI dependency network laid out using the SFDP spring-block algorithm.

I wanted to focus on the giant component of the network. In Python, the code is

from graph_tool.all import * 
# g is of type `<class 'graph_tool.Graph'>`
gc = GraphView(g, vfilt=label_largest_component(g, directed=False))

Below is the resulting filtered network.

Giant Component of the Dependency Network

Focusing on the giant component of the dependency network.

The node sizes quantifies in_degree values of the nodes. The bigger the nodes, the more packages/libraries are dependent on them. The top two biggest nodes are setuptools and django.

One comment on "PyPI Dependency Network

  1. Mohamad
    December 6, 2015

    can you publish your data base??
    I want to find a way to rank the packages(Other than dependency graph), giving access to your dataset would save me a lot of time.


