Even though the package is successfully installed, it is possible that I get this ImportError: No module named <package>. This is because I have several different Python interpreters and the specific Python interpreter that I am using does not have the path that the package. So, print out this in the very first line of the code:

import sys; print(sys.path)

And see if any of the paths has the specific package I am looking for. If not, 1) add the path that the package is installed or 2) install the package in the interpreter that I want to use.

Linear and Logistic regressions are usually the first algorithms people learn in predictive modeling. Due to their popularity, a lot of analysts even end up thinking that they are the only form of regressions. The ones who are slightly more involved think that they are the most important amongst all forms of regression analysis.

The truth is that there are innumerable forms of regressions, which can be performed. Each form has its own importance and a specific condition where they are best suited to apply. In this article, I have explained the most commonly used 7 forms of regressions in a simple manner. Through this article, I also hope that people develop an idea of the breadth of regressions, instead of just applying linear / logistic regression to every problem they come across and hoping that they would just fit

Table of Contents

What is Regression Analysis?

Why do we use Regression Analysis?

What are the types of Regressions?

Linear Regression

Logistic Regression

Polynomial Regression

Stepwise Regression

Ridge Regression

Lasso Regression

ElasticNet Regression

How to select the right Regression Model?

What is Regression Analysis?

Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable (s) (predictor). This technique is used for forecasting, time series modelling and finding the causal effect relationship between the variables. For example, relationship between rash driving and number of road accidents by a driver is best studied through regression.

Regression analysis is an important tool for modelling and analyzing data. Here, we fit a curve / line to the data points, in such a manner that the differences between the distances of data points from the curve or line is minimized. I’ll explain this in more details in coming sections.

Why do we use Regression Analysis?

As mentioned above, regression analysis estimates the relationship between two or more variables. Let’s understand this with an easy example:

Let’s say, you want to estimate growth in sales of a company based on current economic conditions. You have the recent company data which indicates that the growth in sales is around two and a half times the growth in the economy. Using this insight, we can predict future sales of the company based on current & past information.

There are multiple benefits of using regression analysis. They are as follows:

It indicates the significant relationships between dependent variable and independent variable.

It indicates the strength of impact of multiple independent variables on a dependent variable.

Regression analysis also allows us to compare the effects of variables measured on different scales, such as the effect of price changes and the number of promotional activities. These benefits help market researchers / data analysts / data scientists to eliminate and evaluate the best set of variables to be used for building predictive models.

How many types of regression techniques do we have?

There are various kinds of regression techniques available to make predictions. These techniques are mostly driven by three metrics (number of independent variables, type of dependent variables and shape of regression line). We’ll discuss them in detail in the following sections.

For the creative ones, you can even cook up new regressions, if you feel the need to use a combination of the parameters above, which people haven’t used before. But before you start that, let us understand the most commonly used regressions:

1. Linear Regression

It is one of the most widely known modeling technique. Linear regression is usually among the first few topics which people pick while learning predictive modeling. In this technique, the dependent variable is continuous, independent variable(s) can be continuous or discrete, and nature of regression line is linear.

Linear Regression establishes a relationship between dependent variable (Y) and one or more independent variables (X) using a best fit straight line (also known as regression line).

It is represented by an equation Y=a+b*X + e, where a is intercept, b is slope of the line and e is error term. This equation can be used to predict the value of target variable based on given predictor variable(s).

The difference between simple linear regression and multiple linear regression is that, multiple linear regression has (>1) independent variables, whereas simple linear regression has only 1 independent variable. Now, the question is “How do we obtain best fit line?”.

How to obtain best fit line (Value of a and b)?

This task can be easily accomplished by Least Square Method. It is the most common method used for fitting a regression line. It calculates the best-fit line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line. Because the deviations are first squared, when added, there is no cancelling out between positive and negative values.

We can evaluate the model performance using the metric R-square. To know more details about these metrics, you can read: Model Performance metrics Part 1, Part 2 .

Important Points:

There must be linear relationship between independent and dependent variables

Multiple regression suffers from multicollinearity, autocorrelation, heteroskedasticity.

Linear Regression is very sensitive to Outliers. It can terribly affect the regression line and eventually the forecasted values.

Multicollinearity can increase the variance of the coefficient estimates and make the estimates very sensitive to minor changes in the model. The result is that the coefficient estimates are unstable

In case of multiple independent variables, we can go with forward selection, backward eliminationand step wise approach for selection of most significant independent variables.

2. Logistic Regression

Logistic regression is used to find the probability of event=Success and event=Failure. We should use logistic regression when the dependent variable is binary (0/ 1, True/ False, Yes/ No) in nature. Here the value of Y ranges from 0 to 1 and it can represented by following equation.

odds= p/ (1-p) = probability of event occurrence / probability of not event occurrence
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk

Above, p is the probability of presence of the characteristic of interest. A question that you should ask here is “why have we used log in the equation?”.

Since we are working here with a binomial distribution (dependent variable), we need to choose a link function which is best suited for this distribution. And, it is logit function. In the equation above, the parameters are chosen to maximize the likelihood of observing the sample values rather than minimizing the sum of squared errors (like in ordinary regression).

Important Points:

It is widely used for classification problems

Logistic regression doesn’t require linear relationship between dependent and independent variables. It can handle various types of relationships because it applies a non-linear log transformation to the predicted odds ratio

To avoid over fitting and under fitting, we should include all significant variables. A good approach to ensure this practice is to use a step wise method to estimate the logistic regression

It requires large sample sizes because maximum likelihood estimates are less powerful at low sample sizes than ordinary least square

The independent variables should not be correlated with each other i.e. no multi collinearity. However, we have the options to include interaction effects of categorical variables in the analysis and in the model.

If the values of dependent variable is ordinal, then it is called as Ordinal logistic regression

If dependent variable is multi class then it is known as Multinomial Logistic regression.

3. Polynomial Regression

A regression equation is a polynomial regression equation if the power of independent variable is more than 1. The equation below represents a polynomial equation:

y=a+b*x^2

In this regression technique, the best fit line is not a straight line. It is rather a curve that fits into the data points.

Important Points:

While there might be a temptation to fit a higher degree polynomial to get lower error, this can result in over-fitting. Always plot the relationships to see the fit and focus on making sure that the curve fits the nature of the problem. Here is an example of how plotting can help:

Especially look out for curve towards the ends and see whether those shapes and trends make sense. Higher polynomials can end up producing wierd results on extrapolation.

4. Stepwise Regression

This form of regression is used when we deal with multiple independent variables. In this technique, the selection of independent variables is done with the help of an automatic process, which involves no human intervention.

This feat is achieved by observing statistical values like R-square, t-stats and AIC metric to discern significant variables. Stepwise regression basically fits the regression model by adding/dropping co-variates one at a time based on a specified criterion. Some of the most commonly used Stepwise regression methods are listed below:

Standard stepwise regression does two things. It adds and removes predictors as needed for each step.

Forward selection starts with most significant predictor in the model and adds variable for each step.

Backward elimination starts with all predictors in the model and removes the least significant variable for each step.

The aim of this modeling technique is to maximize the prediction power with minimum number of predictor variables. It is one of the method to handle higher dimensionality of data set.

5. Ridge Regression

Ridge Regression is a technique used when the data suffers from multicollinearity ( independent variables are highly correlated). In multicollinearity, even though the least squares estimates (OLS) are unbiased, their variances are large which deviates the observed value far from the true value. By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors.

Above, we saw the equation for linear regression. Remember? It can be represented as:

y=a+ b*x

This equation also has an error term. The complete equation becomes:

y=a+b*x+e (error term), [error term is the value needed to correct for a prediction error between the observed and predicted value]

=> y=a+y= a+ b1x1+ b2x2+....+e, for multiple independent variables.

In a linear equation, prediction errors can be decomposed into two sub components. First is due to the biased and second is due to the variance. Prediction error can occur due to any one of these two or both components. Here, we’ll discuss about the error caused due to variance.

Ridge regression solves the multicollinearity problem through shrinkage parameter λ (lambda). Look at the equation below.

In this equation, we have two components. First one is least square term and other one is lambda of the summation of β2 (beta- square) where β is the coefficient. This is added to least square term in order to shrink the parameter to have a very low variance.

Important Points:

The assumptions of this regression is same as least squared regression except normality is not to be assumed

It shrinks the value of coefficients but doesn’t reaches zero, which suggests no feature selection feature

Similar to Ridge Regression, Lasso (Least Absolute Shrinkage and Selection Operator) also penalizes the absolute size of the regression coefficients. In addition, it is capable of reducing the variability and improving the accuracy of linear regression models. Look at the equation below: Lasso regression differs from ridge regression in a way that it uses absolute values in the penalty function, instead of squares. This leads to penalizing (or equivalently constraining the sum of the absolute values of the estimates) values which causes some of the parameter estimates to turn out exactly zero. Larger the penalty applied, further the estimates get shrunk towards absolute zero. This results to variable selection out of given n variables.

Important Points:

The assumptions of this regression is same as least squared regression except normality is not to be assumed

It shrinks coefficients to zero (exactly zero), which certainly helps in feature selection

If group of predictors are highly correlated, lasso picks only one of them and shrinks the others to zero

7. ElasticNet Regression

ElasticNet is hybrid of Lasso and Ridge Regression techniques. It is trained with L1 and L2 prior as regularizer. Elastic-net is useful when there are multiple features which are correlated. Lasso is likely to pick one of these at random, while elastic-net is likely to pick both.

A practical advantage of trading-off between Lasso and Ridge is that, it allows Elastic-Net to inherit some of Ridge’s stability under rotation.

Important Points:

It encourages group effect in case of highly correlated variables

There are no limitations on the number of selected variables

Life is usually simple, when you know only one or two techniques. One of the training institutes I know of tells their students – if the outcome is continuous – apply linear regression. If it is binary – use logistic regression! However, higher the number of options available at our disposal, more difficult it becomes to choose the right one. A similar case happens with regression models.

Within multiple types of regression models, it is important to choose the best suited technique based on type of independent and dependent variables, dimensionality in the data and other essential characteristics of the data. Below are the key factors that you should practice to select the right regression model:

Data exploration is an inevitable part of building predictive model. It should be you first step before selecting the right model like identify the relationship and impact of variables

To compare the goodness of fit for different models, we can analyse different metrics like statistical significance of parameters, R-square, Adjusted r-square, AIC, BIC and error term. Another one is the Mallow’s Cp criterion. This essentially checks for possible bias in your model, by comparing the model with all possible submodels (or a careful selection of them).

Cross-validation is the best way to evaluate models used for prediction. Here you divide your data set into two group (train and validate). A simple mean squared difference between the observed and predicted values give you a measure for the prediction accuracy.

If your data set has multiple confounding variables, you should not choose automatic model selection method because you do not want to put these in a model at the same time.

It’ll also depend on your objective. It can occur that a less powerful model is easy to implement as compared to a highly statistically significant model.

Regression regularization methods(Lasso, Ridge and ElasticNet) works well in case of high dimensionality and multicollinearity among the variables in the data set.

End Note

By now, I hope you would have got an overview of regression. These regression techniques should be applied considering the conditions of data. One of the best trick to find out which technique to use, is by checking the family of variables i.e. discrete or continuous.

In this article, I discussed about 7 types of regression and some key facts associated with each technique. As somebody who’s new in this industry, I’d advise you to learn these techniques and later implement them in your models.

Did you find this article useful ? Share your opinions / views in the comments section below.

Before starting to learn how to use Hadoop ecosystem, it is important to understand why we need Hadoop ecosystem in addition to traditional relational database systems. So, let’s understand what the limitations of RDBMS are.

First, let’s see the landscape of databases existing today such as SQL server, Oracle, or MySQL. As companies try to embark on big data projects nowadays, they run into limitations of current relational database systems. Such limitations are:

Scalability

The new world of big data is now facing huge datasets – terabytes or petabytes. Building databases with existing RDBMS in such big scale very difficult, complex, and expensive.

Speed

Existing RDBMS are not necessarily built for dealing with data at the scale and the speed (if real-time data intake is needed).

Other

Queryability

Sophisticated processing like machine learning

However, Hadoop ecosystem is not developed to replace those existing relational database systems, but solve a different set of data problems that the existing RDBS could not solve!

Database choices

So, now let’s look at what we can choose as our database.

filesystems

Other fields: Before Hadoop ecosystem was broadly available, it was common to find information in file systems, even in XML.

HDFS (Hadoop File System): HDFS is now developed as an alternative of whatever filesystem you are using.

Databases

NoSQL (key/value such as Redis, columnstore such as MongoDB or graph database, etc.)

RDBMS (MySQL, SQL Server, Oracle)

Even though some implementations of portions of Hadoop ecosystem could be categorized as NoSQL, it is important to understand that Hadoop is actually not a database. It is an alternative file system with a processing library. In Hadoop implementations, it is common to have a NoSQL implementation as well.

Also, relational databases are still around for the problems that Hadoop is not designed to solve. So, again, Hadoop is not a replacement for RDBMS, but an addition to it.

Hadoop and Hbase

Hadoop uses an alternative filesystem, HDFS (Hadoop File System). Hbase is a NoSQL database that is commonly used with Hadoop solutions. Hbase is a wide columnstore, which means that the database consists of one key and 1 to number of values. For instance, if I have a database system for saving customer information,

In relational database:

Have separate tables for customer id, customer name, customer gender,customer location, etc.

In Hbase or NoSQL:

Have key-value pairs

Each instance can have a different number of attributes (columns)
e.g. [[name: Lynda, location: Irvine], [name: Keith]]

CAP theory and Hadoop

When we consider to chose a database system, we look at CAP theory to understand characteristics of different database systems. There are certain classes of databases that support each characteristic.

Consistency

The ability to combine two or more data modification operations as a unit.

Example: ‘money transaction’. Say, we move money from a savings account to a checking account. We want both changes to occur successfully or neither.

Availability (uptime)

The ability to make copies of the data so that if one copy goes down in one location, the data will still be available in other locations.

Partitioning (scalability)

The ability to split the dataset across the different locations or machines so we can grow the amount of our dataset.

The traditional databases can support consistency and availability, but no partitioning. As mentioned earlier, however, it is important to accommodate scalability into database system because now the amount of data is growing larger and larger. So, this is where Hadoop comes into play.

Hadoop supports high ‘Partitioning‘ and ‘Availability‘. For Partitioning, you can run Hadoop storage on any commodity hardware. When you run Hadoop system on any server, it makes three copies of the data by default, so when one server goes down due to any problem, you can just replace the hardware with a new one. The Hadoop file system automatically manage that copy process so you can scale up a Hadoop cluster nearly infinitely.

What kind of data is right for Hadoop?

So, then what kind of data is right for Hadoop system? We can think of two kinds of data; one is LOB (Line of business) and the other is behavioral data.

If there’s one hallmark of the power sector at the beginning of 2017, it’s uncertainty.

At the time of our last trend forecast list in September 2015, the utility industry was already being disrupted: Customer demand for distributed resources and the push for cleaner electricity were reshaping centralized fossil fuel-based grids across the country to accommodate variable renewables and customer-sited resources.

Those trends toward a two-way, decarbonized grid are still very much in play at the beginning of 2017. Lower prices for wind and solar energy have seen those resources reach grid parity across much of the nation, and utilities continue to add flexible natural gas generation along with new technologies like energy storage to integrate the intermittent resources coming online.

But the election of Donald Trump as U.S. president threatens the political will behind the clean energy revolution. Whereas federal regulations and incentives pushed the power sector toward an increasingly decarbonized grid throughout the last eight years, the new president has openly disavowed the idea of climate change and plans to scrap the Clean Power Plan, Obama’s signature energy regulation and the centerpiece of his climate legacy.

While Donald Trump’s election threatens federal environmental regulations and pro-clean energy policies, the U.S. power sector is already in the middle of wholesale transformation.

Just how that will manifest into actual policy is still unclear. While Trump’s appointments to the Environmental Protection Agency and Department of Energy broadly committed to regulate carbon and protect clean energy programs in their Senate confirmation hearings, both did so without any specific promises or concrete policy positions.

But regardless of the policies of the incoming administration, they will be greeted by a power sector already in the midst of wholesale transformation. To help guide the industry through these uncertain times, Utility Dive has outlined the top ten trends that will shape the U.S. power industry in 2017. This list, like the last one, isn’t meant to be exhaustive or rank one trend over another, but to simply give readers an idea of where the industry is headed at the dawn of the Trump era.

10. Coal power could get a second lease on life

If there’s one resource that embodies the uncertainty present in the power sector today, it’s coal.

Coal power has had a tough go of it over the last decade. Low natural gas prices and increased environmental regulation — particularly the Mercury and Air Toxics Standards — have meant many coal plants were not competitive in regional markets or required costly upgrades to operate, leading to widespread retirements.

Since 2000, utilities have announced more than 100 GW of coal generation retirements; in 2015 alone, nearly 14 GW came offline, accounting for 80% of the plant retirements that year.

Those trends were set to continue under the EPA’s proposed Clean Power Plan, the nation’s first set of carbon regulations for existing plants. Under that Obama program, the Energy Information Administration estimated coal retirements would accelerate, with about 90 GW expected to come offline by 2040.

If President Trump scraps the Clean Power Plan, existing coal plants could get a new lease on life.

But the Clean Power Plan was put on hold by the Supreme Court last year pending legal challenges. Now, the Trump administration has promised to cancel it outright with no concrete plans for a replacement. If that happens, and a new regulatory regime is not put in place quickly, it could give remaining coal plants a new lease on life.

While utilities are not expected to add new coal capacity in the absence of carbon rules, EIA estimates the ones already on the system could generate more, and for longer. In its latest Annual Energy Outlook, the agency forecasts that coal generation would continue to decline with the Clean Power Plan, but could stay steady for the next decade if the rules are repealed.

9. Natural gas growth will continue and could even accelerate

Carbon regulations get all the headlines, but whether coal power enjoys the resurgence predicted in EIA forecasts will depend as much on natural gas as the repeal of the Clean Power Plan.

The reason is that natural gas sets the standard for power generation in the U.S. today. Due to their low cost and flexible generation capabilities, combined cycle gas plants typically set the prices in wholesale power market auctions, helping determine the dispatch order for other resources.

A pro-gas agenda from the Trump administration would likely keep gas prices low for U.S. plants — and greatly increase the resource’s share of power generation.

Since 2010, historically low prices for natural gas have encouraged utilities to run their gas plants more, and often at the expense of coal. Coal-to-gas switching helped natural gas surpass coal as the top U.S. generating resource last year.

If those low prices continue, it could see natural gas generation increase its dominance in the U.S. power sector at the expense of other resources. In its AEO report, the EIA forecasts that an increase in domestic oil and gas production (the “high oil and gas” scenario) would allow gas generation to widen its gap over coal and stunt growth in renewable resources.

The EIA numbers are only forecasts, and the actual generation numbers will almost certainly differ. But most analysts expect pro-gas policies from the Trump administration, from the cancellation of the EPA’s methane rules to easier siting for pipelines and other gas infrastructure. If that’s the case, it would likely keep gas prices low for U.S. plants, producing generation trendlines similar to the “high oil and gas” scenarios in EIA forecasts.

8. Renewables are at grid parity and will continue to grow

While the trend for gas generation depends on the price of its fuel, the outlook for renewable energy is simpler — it will continue to grow because prices continue to fall. The only question is how quickly.

The wind and solar industries have done well in the past few years. The resources accounted for over half of the more than 14 GW of generation capacity added in 2015, and renewables’ share of new generation is only expected to increase after the extension of key federal tax incentives at the end of that year.

Before the extension of the investment tax credit for solar and production tax credit for wind, renewables and natural gas were expected to split U.S. capacity additions over the next few years, with wind and solar adding less than 5 GW annually until the 2020s. But after the tax extenders, the Rhodium Group forecasted that renewables would “run the table” for capacity additions, “with annual capacity additions topping out at an unprecedented 30 GWs in 2021.”

Wind and solar are the lowest cost generation resource across large swaths of the country — even without subsidies.

Those forecasts took the Clean Power Plan into account, but wind and solar are expected to keep growing regardless of the carbon rules or even changes to tax policy. The reason is that renewables now find themselves increasingly at grid parity with natural gas, and are cheaper than coal, across large swaths of the country

Recent numbers from the investment firm Lazard show the average levelized cost of energy (LCOE) for unsubsidized wind generation fell between $32/MWh and $62/MWh, lower than the average LCOE for natural gas, which came in between $48/MWh and $78/MWh. Utility-scale solar was not far behind, ranging between $48/MWh and $56/MWh for thin film systems. Both renewable resources were shown to be cheaper than coal.

While those numbers are U.S. averages, localized LCOE research reveals that wind and solar are the lowest cost generation resource across large swaths of the country. County-level cost analyses conducted by the University of Texas-Austin reveal that wind is the cheapest capacity across much of the heartland, while solar PV is most competitive in the Southwest, and natural gas dominates much of the South and Northeast.

7. Organized markets are in flux — and nuclear plants are at risk

If the trends for renewables and natural gas continue, it will likely mean more upheaval in the nation’s organized power markets.

In recent years, low-priced natural gas, stagnant load growth and growing penetrations of renewable energy have acted together to suppress power prices in the wholesale electricity markets that serve two-thirds of the U.S. population.

While that has helped keep electricity prices in check for consumers, it’s also made life difficult for aging baseload power plants, which have been unable to recover their fixed costs and are increasingly going offline as a result. Last summer, SNL identified 21 GW of coal, gas and nuclear generation as being “at risk” of retirement due to market conditions and aging by 2020, and nuclear lobbyists say as many as 20 nuclear plants could be threatened in the nation’s organized markets.

With gas and renewables suppressing power prices in organized markets, policymakers are looking for ways to save zero-carbon nuclear resources.

To preserve these plants, a number of states have devised “around market mechanisms” to compensate plants at risk of retirement. Last year, for instance, AEP and FirstEnergy won income supports for aging coal and nuclear plants in Ohio, only to see FERC block the subsidies and force the utilities to change course. Nuclear plants in Illinois and New York won income supports from policymakers based on their zero-carbon generation.

Grid operators, meanwhile, must continue to protect price formation and attempt to run efficient markets in the face of a litany of state interventions. How they and Donald Trump’s FERC respond to the plight of nuclear generation will have a significant impact on U.S. carbon emissions, as the zero-carbon resource accounts for nearly 20% of the nation’s generation. On top of compensation issues, nuclear operators say extending existing licenses will be key to keeping the resources on the grid.

6. Energy storage is maturing into a viable grid-scale resource

Just as a number of aging baseload plants are exiting the power system, a new grid resource is emerging on the horizon — energy storage.

Long thought of as a niche resource, energy storage showed in 2016 that it can be a viable replacement for fossil fuel peaker plants, potentially setting it up for massive growth in the coming years as other resources go offline.

In late 2015, when the Aliso Canyon methane leak shut down natural gas supplies to generators in the Los Angeles basin, California regulators enacted a number of emergency mitigation measures, including an expedited energy storage solicitation.

Local utilities responded by contracting for large battery systems on an accelerated deployment schedule. Southern California Edison chose Tesla for 20 MW (80 MWh) of storage, and San Diego Gas & Electric tapped AES for two projects totalling 37.5 MW (150 MWh). Both projects were scheduled for deployment and operation within 6 months from signing, significantly faster than the timeframe for deploying gas peaker plants.

As battery prices continue to fall, utilities and policymakers are increasingly looking at storage as an alternative to traditional peaking generation.

While battery storage remains more expensive than combined cycle gas plants on average, it may be able to challenge peaker plants on price. A proposal from the Arizona consumer advocate would mandate that some renewable resources must supply power during peak demand periods, which would necessitate the use of energy storage. With a recent solar-plus-storage PPA in Hawaii coming in at $0.11/kWh, the proposal posits that solar-plus-storage could compete with those peaker plants today, supplanting carbon-emitting generation and saving consumers money.

That “Clean Peak Standard” is still only a proposal, but it and the Aliso Canyon projects show that energy storage can be a viable alternative to natural gas peakers today. As battery prices continue to fall, utilities and policymakers could turn to storage as alternatives for traditional peaking generation in the future.

5. DER proliferation is forcing utility adaptation and policy fights

Change in the electric utility industry is by no means confined to the bulk power system. From rooftop solar to electric vehicles and smart thermostats, the American energy consumer is becoming more energy-savvy and increasingly demanding new generation and control technologies to give them a greater say in their energy consumption.

These evolving consumer preferences are forcing utilities and policymakers to adapt. Utilities facing DER proliferation must modernize their grids, installing smart meters, sensors and communication technologies that allow greater visibility into the system and two-way power flows. That requires new expenditures and justification to state regulators on grid modernization.

Often, the growth of DERs has given way to policy fights, particularly on rooftop solar. Utilities in high-distributed solar regions claim that customers with rooftop systems do not pay their fair share of grid upkeep costs, while solar advocates say utilities and regulators fail to account for the benefits they provide the grid.

The growth of DERs has given way to policy fights between utilities and solar activists over rate design and net metering.

The disputes have given way to contentious regulatory proceedings — the compensation scheme that pays solar owners the retail rate of electricity for exported power — in many states. In Nevada, for instance, regulators ended retail rate net metering for both existing and new solar owners at the end of 2015, sparking a year of controversy as solar companies protested and sued the commission.

Eventually, the governor convened an energy task force and installed a new regulator on the PUC, settling the initial contention. But net metering fights persist in states like Arizona and are expected to continue and pop up in new states as more customers go solar.

4. Utilities continue to push rate design reform

One of the main ways utilities have sought to respond to the proliferation of DERs and stagnant load growth is through rate design reform.

As customers consume less power or generate their own, utilities have sought to shore up their bottom lines by decreasing the volumetric portion of utility bills and increase the fixed portion that customers pay.

Most commonly, utilities have requested increases to fixed charges or fees, either for DER customers or the entire rate bate. But consumer and DER advocates decry the changes as limiting customer control over their energy bills, and regulators have seldom awarded utilities the full fixed charge increase amounts that they request.

While utilities push fixed charge increases in response to DERs and stagnant load growth, new rate design solutions — such as time-of-use rates and demand charges — are emerging.

But as opposition to them has mounted, some companies have opted for more sophisticated rate design solutions, such as time-of-use (TOU) rates or demand charges. TOU rates charge higher rates during periods of high power demand on the grid, incentivizing customers to shift their usage to lower-demand periods. Demand charges, common for commercial and industrial customers, charge a higher rate for each individual customer’s peak demand period each month.

Utility proposals to apply demand charges to residential customers are a relatively new development, and one that’s been met with fierce opposition from consumer advocates and solar companies. More consensus is emerging around TOU rates, however, which have been shown to help reduce customer usage and bring down system costs. California utilities will move to default TOU rates in 2019 and the state is experimenting with special rates for EV charging.

As DERs proliferate and utilities feel the continued squeeze of stagnant demand, more rate design reform debates are expected in 2017 and beyond.

3. Federal policy uncertainty is likely to persist

While rate design and DER compensation discussions play out at the state level, the picture for federal energy and environmental policy is unlikely to get much clearer in the coming year.

Coming out of the Obama administration, the general trajectory of power sector regulation was relatively straightforward. Utilities would use gas and renewables to comply with the Clean Power Plan as the federal government supported advanced research into technologies to support deeper decarbonization. States and grid operators would work together to support the reliability of the transitioning power sector, and the whole nation would gradually move toward a less carbon-intensive energy system.

Now, with the election of Donald Trump, that narrative is thrown into question. While renewables and gas are expected to continue their growth, both federal environmental regulation and clean energy supports remain in question.

For a sector that thrives on predictability at the policy level, Trump’s election has thrown federal environmental regulations and clean energy supports into question.

Last week, Trump’s pick to lead the EPA, Oklahoma Attorney General Scott Pruitt, told senators in his confirmation hearing that he would follow the 2009 endangerment finding on carbon and saw “no reason” to review it. That finding labeled CO2 and other greenhouse gases as pollutants under the Clean Air Act, meaning the EPA would have to regulate them even if it rescinded the Clean Power Plan.

But Pruitt and his surrogates at the hearing offered no details on what a new carbon regulatory scheme would look like. Since the CPP is still tied up in a court challenge and any new federal regulation could take years to write and finalize, utilities will likely have to live with uncertainty on environmental regulation for some time.

The situation is similar with Department of Energy programs. At his hearing, former Texas Gov. Rick Perry, Trump’s pick for Energy Secretary, pledged to uphold climate science and clean energy research at DOE, though he would not commit to protecting particular programs by name. The comments came just hours after a leaked budget proposal showed the Trump team may be preparing to gut DOE programs, and Perry’s apparent lack of knowledge of the proposal before it surfaced in media reports did little to inspire confidence for those reliant on DOE programs.

Whether Pruitt and Perry intend to follow through on their commitments to regulate carbon and support the DOE remains to be seen, the DOE picture will likely become clearer as the federal budget is finalized. But for an industry that thrives on predictability, federal policy in 2017 is looking like it will be anything but predictable.

2. States will lead the clean energy transition

As the federal government takes a back seat on clean energy policies, proactive states are poised to take up the mantle.

Already, states like California, Vermont, New York, Oregon and others are pushing ambitious clean energy standards, with Hawaii in particular targeting 100% renewables by 2045. And some are protecting existing clean generation, with deals in New York and Illinois to save ailing nuclear plants and give them extra compensation for their carbon-free attributes.

While federal energy policy may revert under President Trump, proactive states plan to push forward on clean energy regardless.

Meanwhile, the nation’s grid operators are figuring out ways to integrate ever-increasing amounts of intermittent renewables and devising strategies (such as in CAISO and ISO-New England) to integrate carbon pricing into their market structures.

That progress comes on the back of a productive decade for many states in decreasing emissions. Using a combo platter of natural gas, nuclear and renewables, a number of states have grown their economies over the last decade while decreasing carbon emissions — though the pace is still not yet fast enough to meet U.S. goals under the Paris Accord.

With the federal initiative on decarbonization diminishing, the role of states in the clean energy transition will become even more important. The lessons that places like California and Hawaii learn as they integrate more renewables could show the nation the best path to the deep decarbonization needed to stem the most disastrous consequences of climate change.

1. Regulators and utilities will (continue to) reform business models

On top of energy mandates and carbon goals, the single most important development coming out of state energy policymaking in 2017 may be the continued evolution of the utility business model.

As the sector moves away from the traditional model of centralized generation, many are rethinking the utility’s role to help it encourage the adoption of customer-sited resources and optimize them for the grid.

Instead of the traditional cost-of-service revenue model, in which utilities petition regulators to build infrastructure for a set rate of return, many commissions are encouraging new revenue models and incentives. The New York REV docket, for instance, is testing performance-based incentives to push utilities to serve grid needs with DERs rather than building new bulk power infrastructure.

In a shift away from the traditional cost-of-service revenue model, many state regulatory commissions are encouraging new utility revenue models and incentives.

The REV docket, the most well-known of the “utility of the future” proceedings, is working alongside a number of other similar reform initiatives in states like California, Minnesota, and Massachusetts. And regulators in Illinois, Ohio and elsewhere have stated their intention to open proceedings soon.

In some states, like California and New York, this year could see some of the regulatory initiatives solidify into concrete utility plans for grid modernization and DER optimization. Other states will watch closely for lessons learned. But if one thing’s certain, it’s that the industry itself wants the regulatory model to change — and that could be the most significant trend of all.

If we examine a single time slice of the model, it can be seen as a mixture distribution with component densities given by

It can be interpreted as an extension of a mixture model where the choice of mixture component for each observation is not independent but depends on the choice of component for the previous observations ()

Applications

Speech recognition

Natural language modeling

On-line handwriting recognition

analysis of biological sequences such as protein and DNA

Transition probability

Latent variables; discrete multinomial variables = describe which component of the mixture is responsible for generating the corresponding observation

The probability distribution of depends on the previous latent variable through conditional distribution

Conditional distribution

Inital latent node does not have a parent node, so it has a marginal distribution