  • Data Science

The State of Open Data on School Bullying

How much of a problem is school bullying in NYC? The answer depends on who you ask. Data Clinic volunteers compared local surveys (where many students say bullying is happening) with federal data (where a majority of schools report zero incidents), to analyze these disparities for the 2013-14 school year.

  • Data Science

The Future of Pandas

Architecture overview for the future of the Python Pandas data analytics library.

  • Data Science

BeakerX (for PyData NYC)

An overview of BeakerX, a collection of kernels and extensions to the Jupyter interactive computing platform.

  • Data Science
  • Markets & Economy

Forecasting Inflation like a Data Scientist

One way allocators can improve their inflation forecasts is to analyze it from as many perspectives as possible—just as a data scientist would.

  • Data Science

Introducing Pandas UDFs for PySpark

A Two Sigma researcher introduces the Pandas UDFs feature in the upcoming Apache Spark 2.3 release, which substantially improves the performance and usability of user-defined functions (UDFs) in Python.

  • Data Science

Responsive and Scalable Real-time Data Analytics

Designing a system that can extract immediate insights from large amounts of data in real-time requires a special way of thinking. This talk presents a “reactive” approach to designing real-time, responsive, and scalable data applications that can continuously compute analytics on-the-fly. It also highlights a case study as an example of reactive design in action.