Data Clinic: What We Learned from Open Data on Bullying and Harassment in NYC Schools

How much of a problem is school bullying in New York City? The answer depends on who (and how) you ask.

On March 6th, 2018, the Two Sigma Data Clinic hosted “The State of Open Data on School Bullying and Harassment” as part of NYC Open Data Week. The two-hour public event featured a comparative analysis of federal and local datasets, followed by a panel discussion on what open data can reveal—and conceal—about this important school safety issue.

Here’s what we learned.

1. Fewer than a quarter of the city’s 1,700+ public and charter schools reported a single incident of bullying or harassment to the U.S. Department of Education’s Office for Civil Rights.

This figure refers to the 2013-14 school year—the most recent time period available from the biennial Civil Rights Data Collection survey conducted by the federal Office for Civil Rights at the time of analysis.

Number of schools reporting ________ allegations of bullying/harassment

Source: Office for Civil Rights, 2013-14 school year.The actual percentage of schools experiencing discriminatory bullying or harassment is almost certainly higher, researchers and advocates say. According to Johanna Miller, advocacy director of the New York Civil Liberties Union, it should be 100 percent. “There is no school with zero incidents,” she said. “Any school that says there are zero bullying incidents is lying … or they’re not paying attention.”

A 2015 audit of 10 New York City schools by New York State Comptroller Thomas P. DiNapoli found that those schools failed to report more than 400 violent or disruptive incidents, including incidents of bullying and harassment, to the state education department during the 2010-11 and 2011-12 school years.

More recently, the city has faced intense scrutiny following a fatal November 2017 stabbing at a Bronx High School by a student who said he was a victim of discriminatory bullying.

“Now there’s a sort of ‘crackdown’ on schools to report bullying,” said Amy Zimmer, who covered education for the local news site

2. The annual School Survey by New York City’s own Department of Education paints a considerably more varied picture of school bullying and harassment.

The 2013-14 survey asked students, teachers, and parents questions about school climate—including their perceptions of school bullying and harassment.

One school, for example, had 35 percent of students in grades 6-12 (the subset of students who are given the survey) who said that they were bullied or harassed “all of the time” during the 2013-14 school year, though that school reported zero bullying or harassment allegations to the federal Civil Rights Data Collection. Another school reported 26 allegations of sex-based harassment, but only around 1 percent of its students said they were bullied or harassed “all of the time” or even “most of the time” on the School Survey.

Perceptions of bullying/harassment can vary substantially, regardless of what is reported
% of students who said bullying/harassment based on differences happens “all of the time” in schools that reported 0 allegations and 1+ allegations
Source: Office for Civil Rights and NYC School Survey, 2013-14 school year.

Allegations of bullying or harassment are “anecdotally not as easy for districts and schools to provide in a standardized way” as purely administrative data like the number of students enrolled, said Julia Bloom-Weltman, research director at AEM Corporation, which provides technical assistance to schools submitting data to the federal Office for Civil Rights.

Still, it is clear that bullying perceptions vary a lot in schools, despite what is ultimately reported to the Office of Civil Rights.

3. Responses to the NYC School Survey tend to be at odds with the federal civil rights reporting in larger schools.

Outliers notwithstanding, how much do the local and federal surveys agree or disagree? Conceptually, it makes sense to think of schools with “agreement” between the two surveys as schools that either had high perceptions of bullying according to the NYC School Survey and reported one or more incidents of bullying to federal survey, or schools that had low perceptions of bullying and reported zero incidents. On the other hand, schools with “disagreement” can be thought of either as schools with high perceptions of bullying that reported zero incidents, or schools with low perceptions of bullying that reported one or more incidents.

Using this framework, the Data Clinic classified schools into “agreement” and “disagreement” zones (more on the methodology here ). Looking at student perceptions of bullying compared with the federal civil rights reporting, for example, more than a third of schools fell into the “disagreement” zone:

Zones of (dis)agreement
Source: Office for Civil Rights and NYC School Survey, 2013-14 school year.

A preliminary analysis by the Data Clinic suggested that larger schools were more likely to have perceptions of bullying in the NYC School Survey (whether by students, parents, or teachers) that were at odds with reported incidents in the federal data.

4. When comparing different surveys, it’s important to keep in mind what they were—and weren’t—originally intended to do.

Even though they both measure school bullying and harassment, the local and federal surveys are not directly comparable, because they were designed with different purposes in mind.

The federal Office for Civil Rights was established in the ‘60s to enforce civil rights legislation. The Civil Rights Data Collection thus asks schools to report allegations of bullying or harassment specifically on the basis of race, sex, and disability—three of the categories protected under various civil rights statutes—as a compliance requirement. This information was not originally intended to be “open” (i.e., accessible by the general public) when it was first collected in 1968. In fact, it was 2006 before the agency made a concerted effort to streamline data gathering, and it wasn’t until 2016 that the  data was released online. Even today, files for previous years are only available upon request (and they are mailed on DVD-ROMs).

As a result, some schools may view the Civil Rights Data Collection as an accounting exercise, rather than a tool for real-time accountability and policy-setting, said Johanna Miller.

The NYC School Survey, on the other hand, was an integral part of the NYC Department of Education’s school report cards from the survey’s inception in 2007 up until 2013. But respondents may have historically felt less comfortable answering honestly or even filling out the survey knowing that their responses would be used for accountability purposes, said Meghan McCormick, an associate at the social policy research firm MDRC. It was also the accountability aspect of the survey that prompted departmental resistance to including questions about cyberbullying, which was viewed by officials as “happening outside of schools,” she added.

Both surveys have evolved since the 2013-14 school year. The 2015-16 Civil Rights Data Collection required schools to report allegations of religious and sexual orientation harassment in addition to sex, race, and disability harassment, while the latest iteration of the NYC School Survey is asking students about the specific types of bullying or harassment they experience (though cyberbullying is still not mentioned explicitly).

Still, allegations may differ from self-reported feelings about school bullying and harassment for a variety of reasons, not least because of the different intentions and (dis)incentives at play.

5. There are several databases at the city, state, and federal level that collect information on school bullying and harassment. Even for the data savvy, navigating the open data landscape can be a challenge.

In 2015, the Mayor’s Office of Data Analytics outlined its vision for open data as “an invitation for anyone, anytime, anywhere to engage with New York City.” Indeed, New York’s open data law, which requires city agencies to post their data on the NYC open data portal, has enabled New Yorkers to understand the inner workings of their city in new and important ways.

But just because a dataset is posted on a public web portal does not mean it is easy to use or to link with other open datasets.

Like the data on school bullying and harassment.

A link to the NYC School Survey data is on the NYC open data portal, and the federal Civil Rights Data Collection file is available on

But when the Data Clinic set out to compare the two surveys, the first hurdle we faced was simply figuring out how to merge the two files together. This is because the local and federal datasets use different coding schemes to identify schools. Imagine, for instance, trying to match “24Q290” to “362058006223”—these are the ID numbers assigned to the same school by the NYC Department of Education and the federal Office for Civil Rights, respectively. (That school was also labeled differently in the two datasets; “A.C.E. Academy for Scholars at the Geraldine Ferarro Campus” versus “PS 290.”)

It took creating a crosswalk between four different local, state, and federal agencies, each with their own coding schemes, just to join these datasets and begin the analysis.

Navigating the complicated landscape of open data on school bullying/harassment in NYC
Datasets are “open” but siloed across federal, state, and local agencies, creating significant barriers to entry
Source: U.S. Department of Education (ED); National Center for Education Statistics (NCES); Office for Civil Rights (OCR); New York City Department of Education (NYCDOE); New York State Department of Education (NYSED).

What’s more, school bullying and harassment is also recorded in two separate New York state databases, VADIR (Violent and Disruptive Incident Reporting) and the database created by the Dignity for All Students Act (DASA) in 2012. Amy Zimmer, who wrote a 2015 article for DNAInfo about New York’s “murky system for tracking bullying in schools,” said she had been unaware of the federal Office for Civil Rights data until she was asked to be a panelist at the Data Clinic’s Open Data Week event. We at the Data Clinic, in turn, were unaware of VADIR and DASA until we came across her DNAInfo article.

Clearly, making data publicly available is only the first step to breaking down the silos in which government information is often contained.

The Research Alliance for NYC Schools has made significant strides in this regard by developing a “School-Level Master File” that allows school records to be linked over time and across a variety of sources. Though that dataset currently only includes information from the NYC Department of Education, Jasmine Soltani, data manager at the Research Alliance, said that the Data Clinic’s work combining local and federal school surveys has inspired her to bring in more data from other government agencies.

Ultimately, where the responsibility for lowering the barriers to entry for government “open” data should fall—whether on individual groups like the Data Clinic, on civic partnerships like the Research Alliance, or on the government agencies themselves—remains an open question.

Learn more about The Two Sigma Data Clinic here.

For the slides from our presentation, click here.

For the data and code that generated the analysis, click here.

Download PDF
Read more from Data Clinic

This content is being distributed for informational and educational purposes only and is not an offer to sell or the solicitation of an offer to buy any securities or other instruments. The information contained herein is not intended to provide, and should not be relied upon for, investment advice.   The views expressed herein are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”).  Such views reflect the assumptions of the author(s) of the document and are subject to change without notice. The document may employ data derived from third-party sources. No representation is made by Two Sigma as to the accuracy of such information and the use of such information in no way implies an endorsement of the source of such information or its validity.   The copyrights and/or trademarks in some of the images, logos or other material used herein may be owned by entities other than Two Sigma. If so, such copyrights and/or trademarks are most likely owned by the entity that created the material and are used purely for identification and comment as fair use under international copyright and/or trademark laws. Use of such image, copyright or trademark does not imply any association with such organization (or endorsement of such organization) by Two Sigma, nor vice versa.  Click here for other important disclaimers and disclosures.