Island Beach Cleaned Data

Hi all,

First off, I want to begin with a rundown of the data that we are working with here:

  • 15 years of student collected data for Island Beach State Park
    • Measurements include:
      • Dune locations.
      • Dune elevation.
      • Shoreface position.
      • Duneline position.
    • How were the datasets stored?
      • Data was collected yearly into an excel spreadsheets where students were able to do exploratory data analysis.
    • Why did this data format require cleaning?
      • While the excel format was certainly formatted in a human understandable way, it was not formatted in a machine readable way.
    • What steps were taken to clean this data?
      • Total Station Dataset:
        • All years were broken into their own csv file.
        • Appropriate column names in the format “year_filename_measurement” were put in place for each measurement.
          • This new semantic naming scheme lets the data be appropriately organized into a dataframe by the likes of R, and makes exploratory data analysis much easier.
        • Files were recombined into a single csv through the use of the GNU cat command.
      • iPad Dataset:
        • All years were broken into their own csv file.
        • As above, appropriate column names were applied in the form “year_location_measurement_srs”
        • Files were recombined into a single csv through the use of the GNU cat command.
      • High tide line dataset:
        • As this was output already into a machine readable form, no cleansing was applied.

 

With the transformed into a machine readable form, we searched for a solution that would allow us to share this data in an open & collaborative way. Ultimately, we settled on Kaggle. Kaggle allows us to share this data with the rest of the world, and also provides a free platform for us to store any data analysis we do. In addition, Kaggle’s open model will allow anyone with an account to also work with our data and contribute any analysis they do.

You can view our Kaggle page by clicking here.

 

You can find any visualizations created with our data here.