The way PySAL users interact with geographic data is directly tied to manipulating unlabeled Numpy arrays. With Pandas readily available for most users, this is tedious fait accompli for users, since the core data model for PySAL's API was designed before Pandas existed. While newer projects have attempted to extend Pandas for geographic data, those packages have difficult-to-install dependencies that make them inaccessible for many end users, and make design decisions which the author thinks are somewhat suboptimal for PySAL's use case. Therefore, I propose to specify and implement a tabular spatial data model in Python leveraging Pandas dataframes directly, rather than through subclassing. This follows in the spatialite idea that spatial data is first data, then spatial. Thus, tooling must focus on enabling spatial operations directly on Pandas dataframes by exploiting column dtype information and dispatching appropriately, instead of a subclassing approach. In the end, this project would result in a simpler data model for PySAL.

Student

ljwolf

Mentors

  • Jay L.
  • Carson Farmer
  • edunham
  • Philip Stephens
  • Serge Rey
close

2016