It appears Google will be releasing a data set through the Linguistic Data Consortium that is quite large. No word on pricing and checking the LDC site showed a wide range.