Frequently Asked Questions

Published

May 1, 2020

“What do you mean by data science programming?”

Most likely, you have had 1-2 courses of programming before you have taken DS 250. Unlike traditional computer science courses, DS 250 uses Python in an interactive mode instead of building programs. The data provider usually has some big questions that need answering; However, there are hundreds of little issues and responses along the way. We use programming to facilitate this investigation.

There are similarities with User Experience Designers. In our case, we don’t get to ask users about their experience. We use programming to ask data about its background, and each data set has its own history. We want our analysis to mold to that experience. You can think of data science programming like a first date with your data. You can’t write one long program nieve of the issues and nuances each living data set provides.


“How does DS 250 compare to DS 350?”

The two courses have similarities. You could think of DS 250 as an introduction to data wrangling and visualization. Both classes use real-world data and are built around data science projects. There are some critical differences between the two courses.

  • In this course, we use Python, and DS 350 uses R.
  • We are introducing the principles of data science programming in DS 250.
  • The course is only 2-credits.
  • DS 250 is intended to introduce visualization, wrangling, and modeling.

faq “How does DS 250 prepare me for DS 350 and CSE 450?”

You will be comfortable with interactive programming and have an introduction to the principles of data formats for data science applications. You will be introduced to principles related to machine learning, data wrangling, and data visualization.


“What programming languages do we use in this course?”

The course is done using Python. We focus on the pandas and the lets-plot plotting packages (this plotting package is modeled after ggplot2 in R).


“What are the prerequisites for this course?”

Using the new courses at BYU-I, the prerequisite is CSE 110. However, if you have experience programming from other classes, you most likely are prepared for this course.


“Why Python instead of R?”

The computer science and software engineering programs at BYU-I use Python as their foundational courses. The standard student will have some experience with Python before DS 250. Python is an essential programming language for data scientists, and we already have DS350, which is taught in R.


“What is pandas?”

pandas is the foundational data science package in Python. If you are using tabular data you will be in pandas. Polars is growing in popularity, but there isn’t as much help for it yet.


“Why are we using lets-plot instead of Seaborn or Matplotlib or Altair?”

Matplotlib was the first visualization package to gain a following in Python. Seaborn is built on top of Matplotlib. Seaborn is easy to pick-up and understand, but much less flexible. Many data scientists use both in their work—neither leverage the grammar of graphics as developed by Leland Wilkinson. Altair is built on Vega-Lite, which uses the Vega visualization grammar. It is declarative and actively developed.

Lets-plot has the advantage of being similar to ggplot2 in R and also uses a grammar of graphics. Using this plotting package should help students transition between the two so they can more quickly move between R and Python. It is intuitive and relatively easy to use. There are other plotting packages as well, but none of them are an obvious choice or a dominant choice of the others.


Back to top