Overview of Language Learning in Linguistics, aka Why We Care
So, let's begin with an overview of the problem.
In linguistics, a big question of interest is how children learn all the incredibly complex details that go into knowing a language natively, and specifically, how they do most of it unconsciously during a time when the concept of the number 3 is a Great, Profound, and Difficult thing. (Tying their shoes may also fall into this category of Hugely Difficult things.)
In short, their cognitive abilities don't seem to be terribly developed - and yet, when exposed to a language, they will effortlessly acquire more proficiency before the age of 6 than nearly all adults who actively try to learn another language.
As you might imagine, there's a lot of linguistic brain power invested in finding out how on earth this could be possible.
What Must Be Learnt
There are many, may different pieces of knowledge one must learn about language - for instance, the sound system (phonology), the structure of words (morphology), the structure of sentences (syntax), how to get meaning from the structure (semantics), how to get meaning from other stuff that may or may not actually be immediately in the structure (pragmatics)...not to mention the way all these knowledge systems interact with each other.
In general, there's a progression of learning - first, figure out the sound system to give you words which give you sentences which give you meanings, etc. For this reason, the age at which children are reasonably proficient in phonology will be sooner than the age at which the are proficient with the nitty gritty details of syntax, for instance.
The Problem jalenstrix is Playing With
How do children learn the stress system (metrical phonology) of English? The type of stress I mean is what tells you to put the EMphasis on a particular SYLlable in a word, rather than putting the emPHAsis on a different sylLAble. Despite a fair number of exceptions, English does have an underlying rule system that covers a hefty portion of English. For instance,
education = EduCAtion
educational = EduCAtional
This type of variation follows the metrical phonology rule system of English.
Now, the rules that apply for any given language vary quite a lot, but lots of phonologists who worry about this have come up with a set of parameters that seem to cover all the cross-linguistic variation. Thus, to get English, the parameters have a certain set of values V1; to get French, the parameters have a different set of values V2; to get Koya, V3; to get Latin, V4; Selkup, V5; and so on.
The Part Where Disbelief May Have To Be Suspended
Observation of Metrical Phonologists: Hey - we have this really fabulous, clean system that classifies all the different metrical phonology systems of the world's languages by means of a rather small set of parameters. In fact, all you really have to do is figure out the values of these parameters, and the metrical phonology system for whatever language just falls out.
Observation Two: Hey....children learn their metrical phonology very early...I mean, they can't even tie their shoes, and they've got this stuff down. :: brief ponder ::
Logic Jump: A ha! What if children's brains came equipped with the knowledge of these parameters, and all they had to do was take in enough data from their language to set them properly? That would make learning this stuff a heck of a lot easier! Let's see if that works out...
[cue various research projects on learning metrical phonology systems with knowledge of the parameters already built into the learner]
Specific Issues of Learning With Parameters, aka Nope, There Are Still Problems Here...
But wait, says B. Elan Dresher (a fun metrical phonologist who worries about learning up in Toronto), you know the part where you said all kids have to do is "take in enough data" to set their parameters? Yeah, that part's hard. The set of parameters we have is small, granted - only 5 or 6 of them. But they all interact. It may be really, really hard to figure out which parameters are responsible for the stress on a particular word heard in the input.
Dresher's solution: Let's credit the child with a little bit more built-in knowledge. Suppose that, along with the set of parameters, the child also innately knows
(a) what order to look for parameter data (i.e. try to figure out P1, and then P2, and then P3...)
(b) what specific data patterns will signal particular values of particular parameters (i.e. if you see this kind of data pattern, you know that your language is value a for P1)
And, lo, Dresher shows that this will theoretically work.
Well...that seems like an awful lot of stuff to build into the learner. I mean, brains are cool...but is there any way we can solve this learning problem without building in the very specific knowledge that Dresher wants in (a) and (b) above?
Moreover, theoretically works is nice, but how about if we instantiate this with a little data from real live learning situations, hmm?
Let's start with the real live data.
[grab data samples from input to children on the order of half a million words]
Okay, now let's talk about the assumption of (b) above, namely that the data patterns for particular parameters need to be built into the learner. One of things that may be available for free to the learner (because it's needed for other language stuff on a regular basis) is the ability to assign structure to data heard in the input. This ability is commonly called "parsing". Parsing involves taking the structural pieces available in the language and using them to build a structure for the sentence (in the case of syntax) or the word (in the case of metrical phonology, say). Therefore, what an adult does upon hearing a sentence/word is parse the sentence/word using the structural pieces available.
Janet Fodor, a fun language learning person in New York: Hey, why don't we allow the structural pieces to actually be the parameters of the language? So, parsing a sentence involves using the structural pieces of your language - and these are the correct set of parameters values for your language. Therefore, when learning, a child tries to parse the input using the various structural pieces that are available in all languages of the world. When there's a successful parsing, the child knows what structural pieces contributed to the successful structure, and therefore knows which parameter values were successful for that child's particular language.
[Note: When "the child does X" is mentioned, what this really means is "the child's unconscious language-learning abilities do X", because this is all presumably not conscious effort by the child. However, for ease of exposition, "child" or "learner" will be used for the rest of time here.]
Fodor, continuing: Hmm...but I wonder what happens if more than one set of structural pieces gives you a correct parse....you don't really want your learner guessing willy-nilly (that could lead to a lot of trouble if the learner guesses wrong). We better make the learner only use unambiguous data. This means that if a particular structural piece must be used in order to get a correct parse for a piece of data, that structural piece (and therefore, parameter value) must be right for the language.
jalenstrix's hijacking of Fodor: Well, we have a set of structural pieces for metrical phonology. How about seeing if we can end up with the English metrical phonology system using real live English data and making the learner only use unambiguous data to set parameters? If so, this gets rid of Additional Built-In Knowledge (b) of Dresher.
jalenstrix's further thoughts: Now, about Additional Built-In Knowledge (a)...do we really need to have the learner preprogrammed with the learning order for the parameters? I mean, how many different orderings will actually work, anyway? And do they show any pattern that might be classified in some other way besides "Do P1 first, then P2, then P3..."?
Cue Some Empirical Work & Some Results
If the learner parses incoming data (words, in this case) with the metrical phonology structural pieces (aka parameters), and uses unambiguous data to set the various parameter values, guess what happens?
The learner can end up with the English set of parameter values.
Moreover, of the ridiculous number of different parameter orderings available, a scant 12 or so will actually lead to the correct parameter value set.
More specifically, Additional Built-In Knowledge (a) is not necessary - a learner who parses and learns from unambiguous data can get the right answer. The learner does not need to have the parameter-specific patterns preprogrammed in.
Now, about Additional Built-In Knowledge (b). First, a little more background on the parameters in P - some are more general than others. For instance, P1 has two choices P1a and P1b. If P1a is chosen, then the learner must further choose between P2a and P2b, P3a and P3b, and P4a and P4b. So, parameters P2, P3, and P4 are less general than parameter P1 since they are effectively specifications of one of the values for P1.
A reasonable way to go for the learner might be to decide P1a vs. P1b, and then, if P1a is chosen, immediately start working on P2, P3, and P4 instead of looking at other more general parameters (say, P5). This is something like a depth-first traversal of the search space.
Another reasonable alternative is for the learner to decide P1a vs. P1b, and then, even if P1a is chosen, still go and work on the other more general parameters (say P5) before going back to the more-specific choices of P2, P3, and P4. This is something like a breadth-first traversal of the search space.
(Yet another alternative would be that it doesn't matter what sort of traversal the learner does. The learner chooses at random what parameters to work on.)
But then, guess what property all the 12 orderings which lead the learner to the correct English values have? Why, something like a breadth-first traversal of the search space? Why, yes! Yes, they do.
So what does this mean? That maybe Additional Built-In Knowledge (b) doesn't really have to be built-in so specifically. Instead, perhaps a learner could use the more general learning strategy of "Solve the more general problems first before worrying over the detailed stuff." The really peachy thing about this kind of heuristic is that it's already been touted in the realm of syntax learning as a sensible solution.
The fact that it falls out from the real live English data here? Confirmation, oooooh, baby.
So yes...this is basically the current academic happy.