Today I worked on a different course curated by my fantastic friend. We started out by going over broad view of the data science process, and delved in a bit with obtaining data and pandas.
I learned:
A majority of data science is collecting and cleaning data. Pandas is a module that allows for easier readability and manipulation of datasets within Python. One can append new columns to a data frame with df[‘new column’] = [‘val 1’, ‘val 2’, etc]. When working with larger datasets it’s very useful to use df.head() and df.tail() to quickly look at the first/last 5 rows of data.
One can sort the data by column values with the .sort_values(by=’name’) function.
I’m still confused on:
When using .sort_values, it can’t seem to properly order numbers with different digits than each other. For instance, it’ll sort [27,3,55], which is out of order. But when I put in ’03’ instead of ‘3’ it’ll sort it properly like [03,27,55].
How does one get around this?
Alas, this shall be a conundrum for tomorrow’s Isaac.
Godspeed future me 🙏🏼