Just sharing stuff…
The Python Package Index is a repository of software for the Python programming language. What my colleague Ed David and I did were mine all the packages in the site together with the information on their dependencies. Ed used SQLite as our database engine to keep all mined information.
Using python’s graph-tool package, I then built the dependency network and created a network visualization. Note that each package or library uses a set of other existing packages or libraries, making the former dependent on the latter. Anyway, once the data were made available, constructing the network was pretty straightforward. Below is how the resulting network looks.
I wanted to focus on the giant component of the network. In Python, the code is
from graph_tool.all import * ### # g is of type `<class 'graph_tool.Graph'>` ### gc = GraphView(g, vfilt=label_largest_component(g, directed=False))
Below is the resulting filtered network.