Thanks to a good friend and colleague**, last week, I learned about the existence of the Metro Manila Development Authority Live Traffic Monitoring System. Below is a video clip of the traffic data generated by the MMDA platform. Each of the circle represents an MMDA metrobase or station where traffic along a specific road segment is assessed (every 15 minutes). Traffic flow is only classified into three states: Light, Moderate, or Heavy.
The data cover the period when there was a (INC) mass gathering along Megamall and Shaw Boulevard area—the visualization has actually captured this. In the viz, SM Megamall and Ortigas Avenue are mostly on a “heavy” traffic situation—for both the north and south bounds. For the southbound traffic, road segments before Shaw Boulevard sustained heavy traffic flows; after which, it was medium to light all the way. For the northbound traffic, on the other hand, everything before SM Megamall was jammed and all bases after were in either medium or light traffic condition. This clearly illustrates where the bottleneck was.
Details behind the visualization are discussed in the next section.
The Metro Manila Development Authority Live Traffic Monitoring System gives traffic flow reports at intersections around Metro Manila, and is updated every 10 to 15 minutes by the MMDA MetroBase. As I understand, MMDA officers from post manually key in the status. The data do not directly provide traffic velocity, but instead, they indicate whether the traffic flows along road segments are: Light, Moderate, or Heavy (LMH). Aside from the LMH statuses, the platform also gives alerts—whether there’s an accident, a road construction, or a rally happening, among others.
When I saw the site, I had the urge to revisit web scraping (again). I thought that this was a good personal project to “refresh” ones web scraping skills, and data viz skills as well.
For web scraping, I used the python packages urllib2 and BeautifulSoup. The latter helps to find tags in whatever html page one is looking at, allowing the extraction of only the necessary information on a page. For page downloading, I used urllib2—a library for getting and opening URLs. If you want to mine data from a website with URL URL
, just execute:
import urllib2 mypage = urllib2.Request(URL) thepage = urllib2.urlopen(mypage).read()
Then use the BeautifulSoup package to “read” the page.
from bs4 import BeautifulSoup soup = BeautifulSoup(thepage)
Since the update is done every fifteen minutes, it is necessary to put a pause in the system at every iteration:
import time def sleep(): print "Sleeping for 15 minutes" time.sleep(60*15) return
All scraped (“streaming”) information are stored in a database (sqlite). For each area (k
, i.e. Edsa, C5, etc.), a tower (towerName
) is assigned (this is probably the intersection) that gives both the northbound (nstat
) and southbound (sstat
) statuses along a specific road segment.
entry = MMDATraffic() entry.LINE = k entry.TOWER_ID = towerID entry.TOWER_NAME = towerName entry.NB_STATUS = nstat entry.SB_STATUS = sstat entry.UPDATE_INFO = dt entry.DATETIME = mytime dbsession.add(entry)
The final code is less than 100 lines! My colleague Ed is currently playing with SVG scripting to dynamically visualize the scraped data. While he is finishing the SVG rendering, I used a visualization software to have a peek at the data.
Status: 1 – Light, 2 – Moderate, 3 – Heavy.
Visualization Snapshot. The orange plots are for the north-bound traffic; the blue plots, on the other hand, are for the south-bound. The snapshot is at 5PM of August 29, 2015. Note that there is a rally going on at this hour along EDSA. Thus, one may consider this dynamics an anomaly (compared to regular Fridays). The subplots on top show the number of Light (1), Moderate (2), and Heavy (3) traffic flows along the given lines (EDSA, Ortigas Ave., C5, etc.). Each of the circles at the bottom plots are the “sensing towers”. The color (light to dark) and the size (small to big) indicate the status along the specific junction.
Thanks to Ed David for letting me use his code on databasing, particularly, creating the sql engine using sqlalchemy
.
** Follow Reina Reyes’s blog for updates on their work on the MMDA data and Manila traffic.