Saturday, September 21, 2024

Log 3

 


I read through the classification and regression analysis part. It's fun! There's a difference between the method of analyzing data, one is classification, the other is regression. The concept is actually quite hard to understand since it involves some math concepts. I spent a lot of time trying to understand it. But after some time, I found it's actually not that hard to distinguish one from another. The classification is when you put the training dataset label A, B, C you would get a result that is either A, B or C. In other words, its results are in a fixed value, it wouldn't vary, classification just gives you the result, it just predicts one of the labels you put in, which is simply either A, B, or C. Examples are what allowed me to learn and understand the concept entirely. The example is, let's say you are filtering an email to see whether it's spam, as a result,  you put A(it's spam), B(it's not a spam), or C(unexpected error). Classification is to help you undestand whether is spam, so the result will only be one of the three result, or what we call labeled training data. You have these labels in the training data, it classifies the thing you want to analyze, and thus use the training data to give you a a result that you already told it. While regression is a whole different story. It's the prediction of continuous values, it doesn't give you a result like either A, B, or C, something you know, as in classification. On the other hand, it gives you result calculated by the regression formula which is determined by the traing data. The result is not fixed training data labels now, instead, training data now uses a formula to calculate values that can be anything, the result can be any value. The example that helped me to understand this is the stock and real estate price analysis. The price of these two are always changing, so you cannot use classification since the house price will not just jump from A price to B price then to C price, it's something that's continuing changing. Therefore, we need prediction of continuous values, we use the training data, for instance, the housing price in the past 10 years, we put it in the data, allowing he training data to come up with a formula to calculate the price of the housing data in the future 10 years. Then it will give you any value, predicted, calculated by it's formula based on it's data. Therefore, it gives you the value that's continuous, and a flexible any value, just like the stock and housing prices. There are two main variables the analysis focuses on, dependent variable and independent variable, independent variable in housing is like the location of your house, the transportation, area etc. While the dependent variable, what we want to predict, it's the price. So the relation between these two is the independent variables are things that affect the dependent variables, we input these data into our model, these features are the things we use to predict our labels, or dependent variables, so we can get the analysis on the result in the future. Another way I would like to explain the definition and relation is to assume we put in different features, such as different areas, locations, we use these features in the data to predict the price, because your housing price will definitely vary on these factors, so this is a good way to explain how independent variable can affect dependent variable in my opinion. These terms confused me a lot, because you can call independent variables as features, or predictors, and dependent variables as labels. I spent some time to finally get why the independent variable can have so many names. Firstly, just like I stated above, it has different features that cause impact to the dependent variable, which is also the way we predict things. The price is just the price, but we put in different features, or predictors, to understand how the price changes. In other words, we call it predictors because we put in different features to get the according results, we get different predictions based on our varying predictors. Predictors are the data we use to predict things, and taht thing we want to predict, is simply the dependent variable, aslo called as the result or label. Regression analysis is actually quite good for analyzing things that are continuous or fluctating since it has a habit of returning to the mean average, it returns to the average to give you a result that's more reliable and credible because it considers the general trend of the data. It will not keep growing after a rocket-like surge, instead it will consider the general trend, grows and declines of the data, to give you a result that's more reasonable and can apply to real situations, so it will be more possible and worthwhile to apply practically since it's more aligned with potential real-world outcomes, which is also why the reason is called a predictive model. These concept are really interesting after understanding it, I find it's actually a simple concept, but knowing how it can be applied and the different terms, and to know what they do in the data and in the regression analysis can be challenging.

2 comments:

  1. According to your two reflections, I can nearly conclude that your reading interest is in science. You are a very unconventional language major. I now am curious why you chose language your major here! Your concepts about regression and classification are correct. Regression analysis is one of my favorites, helping us see how a condition (dependent variable) is affected by other conditions (independent variables). If we are investigating multiple competing variables and seeing which ones have more impacts than other variables, we use multiple regression, or even path analysis or and structure equation modeling to draw a more comprehensive picture of how complex the world may be. Keep at it. I am looking forward to more of your logs!

    ReplyDelete
  2. During my studies, I have learned that classification answers the question of whether something happens (yes or no, or something else), while regression focuses on how something happens.
    Your explanation is quite clear and greatly contributes to the understanding of both classification and regression.
    I would still like you to try to keep an eye on the spelling and structure of your post.
    You could divide the text into at least 3 segments. It can be easier to follow your ideas visually.

    ReplyDelete