Deutsches R-Forum

Verfasst: **Mo Dez 30, 2024 6:43 pm**

I tried applying the Partial Proportional Odds Model (PPOM) for my ordinal response variable. These are the steps that I followed:

Step 1. Starting from 40 predictor variables, I fitted a ordered logit model first. Checked the p-values of every coefficients and removed the variables with coefficient p values > 0.1

Step 2: Iterated the same process from step 1 until all the predictor coefficients had p < 0.1. Condensed down from 40 predictors to 15.

Step 3: Brant test to check if the parallel line assumption holds (p>0.05 desired). Result of step 3: 6 out of the 15 variables violate the assumption.

Step 4: Fitted a partial proportional odds model by relaxing the assumption for the 6 violating variables.
Result of step 4: Many of the coefficients had p-values labelled "NA" and there were warning messages that read "fitted values are too close to 0 or 1".
Hauck-Donner effect detected in variables (3 of the variables had this effect).

I don't want to use multinomial logistic as that makes my output data lose the ordinality. Is there a way out of this? Any leads would be highly appreciated.

Regards,
Sagar

Verfasst: **Di Dez 31, 2024 12:09 pm**

Hello Sagar,

are you a Natural Intelligence or an LLM? If the first, do you mind telling us, what you aim to achieve with this model? In general, selecting variables based on p values is questionable.

CU,
Bernhard

Verfasst: **Di Dez 31, 2024 5:24 pm**

Hello Bernhard,
I'm not an LLM. I plan to calculate the Odds Ratio for comparing the effect of multiple predictors on an ordinal outcome variable classes.

What would be the correct approach to select the variables apart from the p-factors consideration? I want to capture the effects of as many variables as possible and stay concise at the same time.

Regards,
Sagar

Verfasst: **Di Dez 31, 2024 8:52 pm**

Hi!

pandit.sagar22 hat geschrieben: Di Dez 31, 2024 5:24 pmI plan to calculate the Odds Ratio for comparing the effect of multiple predictors on an ordinal outcome variable classes.

"an" indicates an object in the singular whilst "classes" is in the plural so I think, this wrong at the syntax level and therefore not a good starting point in trying to understand the problem. I assume this is some random ordinal regression problem.

What would be the correct approach to select the variables apart from the p-factors consideration?

What you describe is stepwise variable exclusion triggered by p-values. The problem starts with the stepwise procedure and p-values are just a part of the problem. Stepwise procedures often lead to random results as smallish differences in the data set lead to massively different predictor sets. Stepwise inclusion will often lead to different results then stepwise exclusion and there are many other unfavourable properties. Search an Google or any search machine for "why ist stepwise regression bad" or start with the literature cited in https://doi.org/10.1111/j.1365-2656.2006.01141.x

Nobody knows what a p-value means, once you have computed many regressions and done multiple testing. p-values are about the other data that could have been and nobody ever reflects what other subsets of predictors they might have included, so p-values are not worth anything once you have calculated many sets of predictors. Multiple testing is a huge well that you do not want to dive in.

I want to capture the effects of as many variables as possible and stay concise at the same time.

Unfortunately there is not one optimal answer to the question of feature selection. Personally, I am a Frank Harrell jr. fanboy and his answer would probably be: You do not have enough data to deduct the "right" parameter set and the coefficients from it. That may be very pessimistic especially as we do not know, how many observations you have. However, if you want to dive deeper into that thought, it is worth looking at the following video https://youtu.be/DF1WsYZ94Es?t=1012 at 17:00.
So let's assume, that you are not looking for the "right" predictor set but a reasonable one. With roughly 10^12 possible predicotr sets you cannot (reasonably) try all of them out.

Now, one much better answer to the feature selection problem then stepwise regression is LASSO regression. It is a principled way of evaluating data, it looks at all the predictors at once rather then with a path (so it is not path dependend) and the decision of how many predictor candidates to drop is finally made by cross validation instead of significance testing. A short explanation of LASSO regression is given by its inventor here: https://www.youtube.com/watch?v=0tfPuddPhEY
A more accessible video, probably to be viewed first, is this: https://www.youtube.com/watch?v=NGf0voTMlcs

I have never used the LASSO with an ordinal outcome variable but there are R packages made just for this purpose: https://cran.r-project.org/web/packages ... mnetcr.pdf and https://www.jstatsoft.org/article/view/v099i06/1440

A Bayesian way to mimic LASSO is to use special prior distributions in regressions but I assume that you do not want to start the journey into feature regression and into Bayes statistics at the same time (cf. "Laplace priors" and the "Bayesian LASSO" if you do).

It may be argued, however., that there are other ways
of discarding variables in a “preliminary“ way:
(1) discard variables that are highly correlated with others;
(2) discard variables that bring nothing but "noise" to the system [...]
(3) discard variables that have yielded relatively little contribution in previous studies; and
(4) discard variables by judiciously choosing variables while designing the investigation.

(Carl J. Huberty, The Problems with Stepwise Methods, https://coshima.davidrjfikis.com/EPRS85 ... roblem.pdf
written in 1989 from a pre-R and pre-LASSO perspective.)

Cheers,
Bernhard

Deutsches R-Forum

Hauck-Donner Effect Detected in a Partial Proportional Odds Model

Hauck-Donner Effect Detected in a Partial Proportional Odds Model

Re: Hauck-Donner Effect Detected in a Partial Proportional Odds Model

Re: Hauck-Donner Effect Detected in a Partial Proportional Odds Model

Re: Hauck-Donner Effect Detected in a Partial Proportional Odds Model