Skip to main content

Turning data into action

By [email protected] - 24th January 2018 - 11:37

In an ideal world, cooperation among departments, such as police, fire and health, should be a holistic characteristic of government. However, as anyone who has worked in civic administration knows, you can lead siloed departments to a common goal, but you can't get them to collaborate. Location intelligence technology allows us to see the assets of an entire area, from the infrastructure of pipes and train lines to dynamic day-to-day operations like garbage pickups and snowploughing. This ‘digital twin’ of the city gives a common context for these different departments to be able to collaborate effortlessly.

And why is it so important for this kind of cooperation to be so easy? Because when large-scale emergencies arise in a city like New York in the US, time cannot be spared for assembling resources to implement a common action plan for operations. When days, let alone hours, can make or break a crisis, having the city's departments functioning like a single operational entity is crucial. So, New York created an online tool that connected each of its vital resources, channelling them into the initiatives where and when they were needed most.

Helping children

Good access to reliable data is the first step in having a holistic data analytics system that serves the community. When examining the issues of New York City, there were two glaring examples of lack of data that produced inefficiency as well as inequality. The first involved the disparity between affluent and lower-income children in public schools. Several studies have suggested that access to prekindergarten education provides children with a head start that lasts throughout their education. However, since public schooling in the US begins at kindergarten, children from more affluent families will likely have an educational head start over lower-income children due to their families' ability to pay for prekindergarten education. To counteract this dynamic, Mayor Bill de Blasio’s administration made it a key initiative for his first year in office to provide universal, free-of-cost access to prekindergarten education.

The City of New York was therefore charged with reaching out to guardians of all four-year-olds after making pre-kindergarten available to all residents. But just making prekindergarten services widely available to all New Yorkers would not have been sufficient. This multipronged effort included: teacher recruitment; pre-kindergarten site selection and approval; and lastly, outreach to parents of four-year-olds for enrolment. Prior to this effort, the city had not previously targeted the four-year-old demographic, and therein lay the challenge: four-year-olds typically do not have a digital footprint.

So, the city's task was to identify teachers and locations where pre-kindergarten programmes could take place and then identify the parents most likely to enrol their children in these programmes. The city's analytics team worked to identify data that helped locate four-year-olds and their parents who would likely enrol them in pre-kindergarten. The city then obtained commercial data from credit reporting agency Experian, based on consumer transactions. This included names, addresses and phone numbers of people whom Experian believed were parents of eligible four-year-olds. This data was then smartly integrated with internal city data.

When integrating addresses of data records, the analytics team would generate a unique building identification number (BIN) associated with these records and which the Department of Buildings assigns to the approximately one million buildings in the city. The analytics team then developed an enhancement to the city's geocoder that enables any address in the city to be geocoded easily and thus matched to a BIN, even if the address had been typed incorrectly. The open source geocoder was modified so that it could clean, de-duplicate and correct any and all address records submitted to be reconciled and translated into a BIN. The geocoder was attached to the Mayor's Office of Data Analytics (MODA)’s internal city data warehouse called ‘Databridge’, which stored hundreds of thousands of geospatially integrated records of city data. This enabled the geocoder to match and qualify addresses based on previously reconciled records in Databridge.

Essentially, if the geocoder could not recognise the address it was fed, the geocoder’s new enhanced capability allowed for it to search the Databridge for occurrences of that address so that it could identify the closest possible match for correct BIN translation. The enhancement dramatically extended the geocoder’s ability to find a match between a messy address and a BIN. This would result in an increase in the number of individuals who would be informed by phone about local pre-kindergarten services, enabling the city to use outreach resources as efficiently as possible.

It was imperative that outreach targeted the individuals and families that actually needed these services, as opposed to a sweeping, random canvassing effort. From there, government employees used a number of campaign outreach techniques dictated by the data they acquired. Outreach teams were sent to the homes of families that were correctly identified as likely to enrol their children in pre-kindergarten. After that, city employees even helped parents through the enrolment process, following up with 1.2 million calls, emails, and texts. Pre-kindergarten enrolment increased 170% with the implementation of this programme.

Helping tenants

Another recent example of data presenting the foundation for urban analytics was when the New York City Commission on Human Rights (CCHR) partnered with MODA for help using open data to target landlords who were suspected of perpetrating income discrimination. New York City human rights law prohibits housing providers from discriminating against potential tenants based on income. This is intended to protect poor or homeless New Yorkers with housing vouchers from discrimination based on socioeconomic status.

Unfortunately, landlords often ignore this law for fear of delinquent renters. According to the CCHR, the city agency responsible for enforcing the human rights law, income discrimination is one of the top three housing-related complaints it receives, and it is currently investigating more than 200 cases against landlords and brokers across the city. As part of this initiative, the CCHR has partnered with MODA to help identify where income discrimination occurs in an effort to mitigate the abuses and find a solution.

The data they found proved to be very helpful in identifying landlords who were violating the law in vulnerable communities to more effectively root out the source of income discrimination in housing. Through location intelligence and analytical mapping, MODA identified certain behaviours and traits of people in neighbourhoods and buildings. Essentially, MODA built a map of designated locations in NYC that had associated neighbourhood traits using the four datasets that were mentioned, and then associated geospatially and temporally any and all historical violations that the CCHR had previously captured through submitted complaints. This allowed MODA to not only identify traits of neighbourhoods where violations have been recorded as occurring, but also allowed for them to understand to some extent the frequency of the occurrences and their seasonality, if possible, hence associating patterns and behaviours to the perpetrators of the violations.

It drew on four primary data sources: NYPD Seven Major Felonies data, Department of Education School Quality Report data on student achievement scores, Department of City Planning data on land use, and federal Housing and Urban Development data on the number of housing vouchers by census tract.

Together, these sources of information painted a picture on a layered map of the neighbourhoods with low crime, great schools, and plentiful housing; but, suspiciously, no voucher holders lived there. This data informed the CCHR as to which neighbourhoods and buildings should be tested for housing discrimination. So, the CCHR sent actors to pose as housing applicants with and without housing vouchers to see how they were received by landlords and building management companies and then used the results as evidence of possible income discrimination.

Helping health

Of course, data is useless unless it can improve operations, but it must first be given context for it to have meaning and purpose. And the most tangible, actionable context that data can be given is a where and a when. If the place and time of an occurrence is known, it has meaning and can be acted upon. And nothing demands faster action than a deadly disease spreading in a crowded modern city.

In the summer of 2015, an outbreak of Legionnaire’s disease left seven people dead and 86 infected in New York's South Bronx. City officials managed to identify the source of the infection – cooling towers filled with the disease-causing bacteria – and quarantined those areas where the towers were thought to be located.

Right away, the city enlisted spatial analysis technology to develop time-enabled video maps using Esri's ArcGIS, placing this event in a where and when context. So, at the start of the outbreak, the Department of Health could see that the first reported cases were very close to the cooling towers, and as time progressed, it saw more and more cases radiate away from the towers. Its suspicions were backed up with further molecular testing. The Median Center tool in ArcGIS was used to analyse different incident point cases to determine the centre of the outbreak. Then, by using the Near function, city officials were able to calculate the distance of all the cases to the median as well as the cooling towers and figure out, on average, how far they were to the source of the outbreak. This allowed the Department of Health to quantify how far the contaminant was from where people were living.

Once it was understood where the infection originated and how far it had spread, a law was passed that anyone who owned a building with a cooling tower had to register it with the city, have it cleaned and sterilised, and be compliant with the new regulations within a month or face a $25,000 fine. Nevertheless, the city could not wait this long for buildings to get up to code. A month is more than enough time for further infection to spread, in which case New York's Health Department would be liable for a much larger and far more devastating outbreak.

New York City therefore had a challenge to meet that involved both data and location: identify every cooling tower in the city. However, this had never been done before, and there are more than one million buildings in New York City. The Health Department had limited resources to do this type of outreach. So the First Deputy Mayor's Office led a task force consisting of the Department of Emergency Management, the Department of Buildings, the New York Fire Department, the Department of Health, and MODA to form a plan.

MODA staff created a machine-learning algorithm to identify all the buildings in New York City that were most likely to have a cooling tower on top of them. By understanding the traits and characteristics of the buildings in the South Bronx where the bacterium, Legionella, was found in cooling towers and engaging subject matter experts, MODA was able to build a model that searched the more than one million buildings in the city that had those same physical characteristics and identify which were likely to have cooling towers. This was a true ‘learning’ algorithm because it was designed to get smarter and more successful in identifying buildings that had cooling towers on them as the daily results from the citywide canvassers came in nightly. By integrating those daily results into the algorithm, it ultimately grew to be able to have an 80% success rate in finding buildings in NYC that had cooling towers.

They ultimately realised that buildings under seven storeys wouldn't have cooling towers, so these could be eliminated from the dataset. But in the task ahead, accuracy and efficiency were crucial since if even one cooling tower was missed, it could endanger the entire city by continuing to spread the infection.

During the Legionnaire’s disease outbreak, it was discovered that each department working in concert to solve the problem had its own internal spatial analytics server. The NYPD, the fire department, emergency management, and housing authority all were using location intelligence for their own operations. Why not combine these systems to facilitate a more efficient and capable response to citywide emergencies such as this? So MODA created a Citywide Intelligence Hub, which integrated cross-departmental enterprise systems specifically for urban analytics and data sharing.

The hub accomplished three things: it translated complex city questions into solution-based analytics challenges; made city data accessible and the quality of that data attainable to the people who needed it; and gave value to operations by creating an informed framework for decision-making, based on analysis rather than troubleshooting.

The problem facing the Department of Health and MODA was that by trying to identify all these cooling towers in New York, they were essentially looking for a needle in a haystack. What enacting a data analytics framework accomplished was not to simply identify the needle but to burn down the haystack. A well-functioning data analytics system adds value to operations so that better-informed decisions can be made rather than just augmenting existing data. This involves a location-based data analytics system as well as feedback from the real world, which enables communication between stakeholders.

In the future, Citywide Intelligence Hub can be used to identify the spatial aspects of other urban emergency incidents, such as a blackout. For instance, it can identify areas with buildings that are most likely to have stuck elevators and show which of those buildings are most likely to have residents who are most vulnerable, such as the elderly or infirm. This gives emergency responders a better idea of how to best allocate resources.

Citywide Intelligence Hub is currently just a proof of concept, but the impetus behind it –integrating a large city's disparate departments to create a collaborative and multifaceted location intelligence system – proved to be a success. Without data analytics, there was a one in 10 chance of identifying the buildings with cooling towers in New York City. Data analytics gave the city an eight in 10 chance of finding the towers, which are favourable odds in any context.

Amen Ra Mashariki is head of urban analytics at Esri (

Download a PDF of this article