Corpus Assignment Lots and Plenty

CorpusAssignment: Lots and Plenty

Thereare two words that always piqued my interest, when it comes to usageand understand the appropriateness of usage. Those two words are‘lots’ and ‘plenty’. On the surface, it seems like they areboth similar. Here are two examples of how I would use them.

Examplefor how I would use them.

  • Usage of lots – ‘I have lots of food in the refrigerator’.

  • Usage of plenty – ‘I have plenty of food in the refrigerator.’

Ithink they are both same based on this above sentence. The meaningessentially remains the same. Yet, I feel that they are notidentical. There must be something that is different between thesetwo words. Fortunately, there is a way to analyze and find out this.I can use the tools of corpus linguistics to generate data. I wouldthen proceed to use this data to gather some results. Then, combineall those results and find the answer the question.

“Howsimilar are the words – lots and plenty”.

Thepaper starts off looking at the existing methods of languageanalysis. The two methods that are available are intuition basedanalysis and evidence based analysis. I will then do the analysisusing at least two corpora that I have chosen. I have decided to usetwo corpora to better understand the results and arrive at myconclusion.

ResearchContext

Thereare two major ways of doing linguistics analysis. One of those isintuition based study. Another is using corpora based analysis.

Intuition

Theway intuition based study would work is just like the title of thestudy would suggestion. The study would go by intuition, mostly basedon experience (Stefan, 2009: 40). If I was to say ‘I have lots ofbooks’, and then I would say, ‘I have plenty of books’, I willdecide which one seems more appropriate. I will use my experience ofusing these words in sentences to answer all the linguisticsquestions I want answered.

However,the intuition based method is just that. It’s intuition.

Myreason is that I believe that what linguists should be working on isconversations like the Donny and Marsha ride request Real data. Itwill force a radical revision in how linguists do theoryconstruction. No longer would constructing a grammar Chomsky-style bea desideratum and to some degree some of the motivations for usingintuition would vanish (ref1)

Intuitionis something that rests in the mind of the person doing the study(Nikola, 2009:5). It’s like saying that the weather is warmertoday, by simply experiencing it. That would be intuition. Intuitionbased analysis won’t go far either. If I were to ask someone if hethought lots and plenty are same, then that person would use his ownunderstanding of language to answer that question.

Iwould say, lots of bias would creep into this thought process. Whatif this person has some sort of preference for the word plenty? Evenif plenty is not commonly used, he uses it. When we do intuitionbased study, we are playing with bias such as this. Further, there isno way of asking or finding out, why such a person thinks that wayabout the words being studied.

Notonly that, intuition based studies are no longer necessary, even ifit was used in the past(Aijmer &amp Altenberg, 2014:37).If one were to do a word search by hand, manually, that would takeyears. However, much of English is now digitized. Tools are nowavailable that can run through scores of text fast enough to giveresults in real time. Using these tools, I can get the one thing thatno one can argue about – data.

IfI were to attach data to the conclusions I am drawing, it becomeseasier to understand. Instead of saying, the weather is gettingwarmer I would say that the temperature has increased by 5 degreessince yesterday. That is something that can be measured, validatedand agreed upon without much discussion. Data is absolute whileintuition is simply encouraging folks to argue endlessly.

CorporaBased Study

Iam going to go with corpora based study. Then, there are so manycorpora available, and they are so easy to access. For the purposesof this study, I am going to use two corpora.

  • WebCorp – uses the entire web as its reference text.

  • British National Corpus (BNC) – a 100 million word collection from a wide cross section of British English.

Theprimary corpora for this study would be the WebCorp. The BNC is morelike a backup. I have chosen the WebCorp. It provides the followingfeatures which makes it an excellent linguistics analysis tool.

  • Standard search feature to search for a particular word or phrase. This is the keyword that will be used to do the rest of the analysis. The study is about two words, plenty and lots, those two words will become the key words.

  • Uses the internet itself, which is probably the best source of English and a huge sample to draw references from. Not only this, it allows to specify which website to search. This is useful in later stages of the research where I will need to use sub corpora to arrive at the necessary collocates.

  • Ability to search based on case sensitivity, which might be useful for advance searches.

  • Concordance control, by ensuring only one result from a particular website. This is useful because some sites tend to repeat the same sentences. This won’t really help the current study.

  • Ability to add custom sites. As mentioned above, this is where I get to create sub corpora.

  • Ability to add or remove custom phrases from being searched.

Collocation

Collocationsare understood as words that usually occur together (Biber,&amp Reppen, 2012: 105). Here is a definition and examples from the reference source.

Acollocation is a group of words that habitually co-occur.Collocations are frequently co-occurring word patterns like weaponsof mass destruction or crystal clear.(Ref 4)

Collocationscan be easily extracted using a corpus. In English, some words areused with particular words more often than others. Some of the datapertaining this ‘togetherness’ will help us answer the question,which is the reason for this study.

Concordances

Topicsrelated to colocations are concordances. In many ways, concordancescan be thought of a starting point for the linguistics analysis whileusing corpora (Biber,&amp Reppen, 2012: 110).Here are some terms that will be used throughout this analysis.Collocates are lines of text with the investigated word in them.

  • Node – the word that is currently being analyzed. In my case, that would be ‘lots’ and ‘plenty’. The node is found in the center of the line.

  • Length of text – the length of the text that is specified while searching through the corpora. If the length of the text is specified as 20, the node will be found in the center with 10 words on either side.

  • Concordances – the lines themselves which contain the node.

Inthis paper, I will be primarily focusing on using concordances tofind out about collocations. The WebCorp corpora, has excellentcollocation facilities which should aid me in my analysis.

Wordlists

Ifconcordances can be thought of as the starting point, then Wordlistsare the most advanced tools for collocations analysis. Wordlistsidentifies frequencies with which words appear in a given text source(McEnery&amp Hardie, 2011: 30).Wordlists based analysis only works when the word under investigationis frequently used.

Inthis article and the analysis that follows, wordlists aren’t reallyused. I have mentioned it here for theoretical importance. I wouldwish to use them here but the WebCorp corpora, does not have a robusttool for wordlists generation. It has limited functionality.

Findings

Anyfindings from the above study will only reflect the data obtainedfrom the corpora, and what it contains. The corpus that I have usedis WebCorp, and it uses the entire internet as its source text. It ispossible to assume that the internet is one huge collection ofmillions of words. Considering the size of the internet, the internetshould ideally reflect the way people normally use and speak Englishas part of their everyday life. If there are some parts of Englishthat simply are not available on the internet, the study won’treflect that.

Methodology

Theobjective of this study is to understand how different or similar thetwo words ‘lots’ and ‘plenty’ are. I am doing this studybecause two words that are synonyms aren’t exactly identical. Theyare meant to be similar but not precise replacements. This means,there must be something that makes one word better than the other.There must be some data that I can gather to identify which of thesewords is used more frequently and in what circumstances.

Definitions

Tobegin with, I would like to start off with the definitions of thesetwo words, along with example sentences if the definitions aren’tstraightforward. The online dictionary, dictionary.com was used toobtain the following information.

Lots(ref 5)

Verb:To divide into lots as land. Thefarm was divided into lots.

Adverb:ten, lots. A great deal greatly: Icare lots about my family.

Plenty(ref 6)

Noun- a full or abundant supply or amount. Thereis plenty of time.

Adjective- existing in ample quantity or number. Foodis never too plenty in the area.

Oneof the things that grab my attention right away is the amount ofdefinitions available for these two words. As a noun alone, lots hasmore definitions than plenty. The more definitions a particular wordhas, the more it will be used. The fewer definitions it has, thefewer occasions it would be used. From the definitions of thecorresponding words, I already have a rough idea that ‘lots’ is aused much more frequently than ‘plenty’ under most circumstances.

Itis possible to think of the dictionary as a huge corpus in its ownway. The definitions represent the sum frequency of how a particularword is used. The more a word is used, the more meaning gets assignedto it. Based on the dictionary research alone, I know that lots willbe used more frequently.

Thisalready answers the question that was asked in the beginning of thestudy. The answer is that, lots is definitely different than plenty.Not only that, it seems to be used more frequently. All that is leftto be done is to use the WebCorp tool to get data to support this.

Frequency

Thefirst factor I used is frequency. The number of times a particularword appears in a given corpora should tell us how frequently it isbeing used (Hyland,Huat &amp Handford, 2012: 60). Also, the concordances related to these words should tell me thecontext in which the words are used.

Whileusing the WebCorp tool, I used the following settings.

  • Case Insensitive – I enabled this as the study does not make any distinction between ‘Lots’ and ‘lots’.

  • Search API – Google. As the world’s most popular search engine, it is something I use every day.

  • Span – I chose the span as 50 characters which should enough for concordances.

  • Language – English. Chose English since that is the focus language here.

  • Pages – 64. This is the default number available. No option to change.

  • One concordance line per web page – I found that some websites were being overrepresented (like dictionary.com) in the results. So, used this option to ensure that more sits were included in the results.

  • Site – used this option when I wanted search within a particular subset of sites. This is similar to create sub corpora for deeper analysis.

Withthe above settings, I found that lotsappeared 59 times. Plentyappeared 53 times. Based on these frequency numbers, it looks likethey are both almost at the same frequency. The key word here is thatthey are almost used at the same frequency. The word lotsdoesappear 11 % more frequently than plenty.

Inorder to get a little more information, I decided to narrow down thefrequency. I set the ‘site’ option in the WebCorp to ‘Wikipedia’.Here something interesting happened. Both Lots and plenty appeared 63times, making both words to be even. However, I did notice how thesewords were used. Lots was almost always used as part of a sentence.Here are a couple of sentences were lots was used.

Cleromancyis a form of sortition, casting of lots,in which an outcome is determined by means to locate and identifyland, particularly for lotsin densely populated metropolitan areas

Hereis how plenty is usually found.

Landof Plenty From Wikipedia, the free encyclopedia

Twentyis plenty from Wikipedia, the free encyclopedia

Basedon the above two sentences and the sentences that were found from thesub corpora of Wikipedia, it looks like lots is used in the middle ofsentences, as part of descriptions and narratives. Plenty is used aspart of a title, for example the Title of a Song, Book or a Movie.Very rarely does plenty actually become part of a sentence unlikelots (Baker,2012: 7).

However,Wikipedia is an encyclopedia. So, this might skew the results. Idecided to run another frequency test, this time using the ‘USnewspapers’ as the sub corpora. With these sub corpora, lotsappeared 55 times while plenty appeared 50 times. Unlike thesituation with the Wikipedia sub corpora, lots and plenty were bothused as part of the sentences. Plenty continued to appear in titlesof articles while lots never appeared in any title of the resultsretrieved.

Idecided to use one more sub corpora, this time using the ‘BBC news’as the site option in the WebCorp tool. The results here were similarto what was seen when ‘US newspapers’ was chosen as the subcorpora. Lots appeared 62 times and plenty appeared 63 times. I nowrealize that Wikipedia is simply not a good sub corpus whenperforming linguistics analysis.

SubCorpora – US and BBC news

Basedon the above frequency results, I would like to use the sub corporaof US and BBC news for the rest of the study. I believe this is agood option since news is written conversationally and is not limitedacademic type uses. Also, news is something that is used every day,and I am trying to find out everyday differences between lots andplenty. So, this sub corpus is a good choice.

Hereare the results from this sub corpus. Only the last ten are shown dueto space restrictions.

Resultsfor lotsfrom the sub corpora

Resultsfor plentyfrom the sub corpora

CollocatesAnalysis

Nowthat I have a sub corpora, I decided to check collocates to identifypatterns for both the words. Here is the table with collocates forboth the words. For the purpose of this study, I have chosen rowscorresponding to words that appear at least 5 times. I have set theword position to 1.

Collocatesfor lots

Collocatesfor plenty

ComparingR1

Theword that appears most often with lots and plenty is of. In the caseof lots, ofappears 45 times, while in the case of plenty, of appears 36 times.In both cases, of appears to the right side. The next word thatappears the most is to.With lots, toappearstwice to the immediate right, whereas with plenty, it appears fivetimes.

Thatmeans, there are a lot more ‘lots of’ lines ‘plenty of’. Thisleads me to believe that when it comes to describing something thatis available in large numbers, lotsisused much more frequently than plenty.Thisalso suggests that when it comes to describing something that is inlarge numbers, I would be better off using lotsthan plenty.

Thereis one particular area where plenty seems to be used. This issomething which became obvious when plenty was used as the keywordwhen using the English Wikipedia as the sub corpora. Plenty, as aword, appears in titles of all kinds. Books, Movies and Music havethe word plenty in them, a lot of times. This would imply that whenit comes to using either lots or plenty in a title, it is alwaysbetter to go with plenty.

Conclusion

Basedon the above analysis, I have found out that both of these words areused when there is something that is present in large quantities. Inthat reference, they are both identical. There are several occasionswhere they can be replaced. For instance, instead of saying ‘Thatis plenty of footage’, it could be said ‘That is lots offootage’. However, based on collocates table, the data it presents,it seems like in every day usage lotsis preferred over plenty.That should answer the questions that started off this paper.

Arethese words identical? They definitely are not. Are they used in thesame way? The answer is yes, but with some caveats. When it comesactually making a choice based on current data lone, then lotsis preferred over plenty.The data also suggested that the word plenty is more likely to beused in the title of a work of art, than lots. If there is a moviethat refers to something about too much money, it would have a titlealong the lines of ‘Plenty of Money’ rather than ‘Lots ofMoney’.

Ialso understand that the above study is no way conclusive. Thecorpora I have chosen are web based, and within that I have limitedmy searches to news sites only. News sites are probably not the mostrepresentative of language spoken today. Perhaps, if a more detailedstudy were to be conducted, I could gather more evidence about howthese two words are different.

References

StefanTh. Gries, 2009, Whatis Corpus Linguistics?University of California.

NikolaDobrić, 2009, CORPUSLINGUISTICS – THE BASIC FORM OF LINGUISTIC ANALYSIS.Alpen-Adria Universität Klagenfurt

TheLanguage Guy.2006.” The Role of Intuition in Linguistics”.http://thelanguageguy.blogspot.in/2006/05/role-of-intuition-in-linguistics.html[accessed May 2015]

Aijmer,K., &amp Altenberg, B. (Eds.). (2014). Englishcorpus linguistics.Routledge.

Biber,D., &amp Reppen, R. (Eds.). (2012). Corpuslinguistics.Sage.

McEnery,T., &amp Hardie, A. (2011). Corpuslinguistics: Method, theory and practice.Cambridge University Press.

Hyland,K., Huat, C. M., &amp Handford, M. (Eds.). (2012). Corpusapplications in applied linguistics.A&ampC Black.

Baker,P. (Ed.). (2012). Contemporarycorpus linguistics(Vol. 16). A&ampC Black.