256

Work Journal 2019 - fws.gov · be 2+2 eversible vesicles on segment VI, but it is a little hard to tell because only the outer pair are ever eversed. This looks like Pedetontus s

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • JanuaryMonday, January 1

    I rose early this morning and worked on cleaning up my late2018 work journal, working for an hour on this.

    Wednesday, January 3

    We stopped by work this afternoon so that I could pick up mymost recent SF50 and other documentation requisite for myunemployment application. I had received a box containing199 Archaeognatha specimens in 88 vials.

    Monday, January 28

    To do:

    Time sheetAKES meeting arrangements.Associate travel card with Concur account.Refuge Notebook catch-upBiology News entries, new literature to add to Biopublications bibliography page.Biota of Canada post to akentsoc.orgFinish late 2018 work journal.Look over bristletails received.

    I worked on adding images and scripts to my 2018 workjournal, trying to finalize it.

  • John asked me to post the recent Refuge Notebook articlesfrom December, so I formatted and posted these. I also madeposts to Biology News.

    Tuesday, January 29

    To do:

    AKES meeting arrangements.STDP Final Report to LizFigure out AKES meeting topic and get going onpresentation.Restart AWCC slurm job that was canceled on26.Dec.2018.Take care of vehicle.New literature to add to Bio publicationsbibliography page and to our literature database.Biota of Canada post to akentsoc.orgFinish late 2018 work journal.Arrange for return shipping of Betula specimens.Look over bristletails received.

    Debbie helped me make travel arrangements for Fairbanks,so this is a go. Now I need to figure out my talk since I wasnot able to work on the alpine defoliation project in Januaryas I had intended.

    Liz Graham wrote, asking that I fill in the STDP reportreview. I started looking through that dataset for notablerecords. Pissodes fiskei is not in our Alaska checklist. Ichecked that sequence again. It is closest to a sequence ofPissodes costatus, which we have in Alaska. There were no

  • other new records. I need to get this into a publication ofsome sort. I also need to start an Alaska library on github.

    I worked on re-submitting AWCC/Caribou Hills data for theITSx step (see script).

    Examining specimen with barcode label UAM100185885(KNWR:Ento:11301), from Bay View Cemetery. This is femalePetrobiinae, with sole shaped lateral ocelli. There appear tobe 2+2 eversible vesicles on segment VI, but it is a little hardto tell because only the outer pair are ever eversed. This lookslike Pedetontus s. str.

    KNWR:Ento:11301, label.

    http://arctos.database.museum/guid/KNWR:Ento:11301http://arctos.database.museum/guid/KNWR:Ento:11301

  • KNWR:Ento:11301, face.

    Wednesday, January 30

    To do:

    Figure out AKES meeting topic and get going onpresentation.New literature to add to Bio publicationsbibliography page and to our literature database.Biota of Canada post to akentsoc.orgFinish late 2018 work journal.Arrange for return shipping of Betula specimens.Look over bristletails received.

    I sorted the remaining 67 or so vials of bristletail specimensgiven to me by Rod Crawford, not opening the vials, but justroughly sorting them based on what they appeared to be.There appear to be less than 88 vials listed on the loan, butthis is ok. I was hoping to see some Petridiobius specimens,but there were none. There was more Pedetontus s. str. thananything, followed by Pedetontus cf. submutans andMachilinus. Most of these specimens had been sifted frommoss or litter.

    http://arctos.database.museum/guid/KNWR:Ento:11301

  • Sorted vials of bristletails from Rod Crawford.

    I finished adding material to my late 2018 work journalincluding scans, references, etc.

    Needing to figure out what to present on next week at theAKES meeting, I resumed the AKES Newsletter article onsoil fungi affected by earthworms at Stormy Lake with theintention of adding affects to soil fungi to the earthwormpresentation I gave in the fall. See the R script.

  • Biplot of a PCA of Stormy Lake soil fungioccurrence data. Note that all nightcrawler-infested sites (1-3) are to the left of thenightcrawler-free sites (4-6).

  • Frequencies of occurrence of soil fungal OTUs.

    Thursday, January 31

    To do:

    Pay Arctos invoices.Work up Stormy Lake soil fungi data for AKESNewsletter and presentation.New literature to add to Bio publicationsbibliography page and to our literature database.Biota of Canada post to akentsoc.orgArrange for return shipping of Betula specimens.

    I paid the Arctos invoices.

  • I helped Todd with tagging eight sea otters pelts.

    Before the winter is over I want to ski down to the trailintersection near Slikok Lake where I collected specimenMOBIL6660-18 last year. This was a Diptera larva collectedfrom a spring in winter. There is no close match for the COIsequence obtained from this specimen on BOLD; the closestmatch is 89.92% similarity and is identified as Diptera. Othermatches over 89% are Chironomidae. I would like to collectmore specimens to try to put a genus name on these usingmorphology.

    Chironomid larva specimen MOBIL6660-18.

    I requested that records from dataset DS-BOWSER beincluded the next time that the BOLD BIN algorithm(Ratnasingham et al., 2013) is run. This has not been donefor some time. For example, the chironomid specimen

    http://boldsystems.org/index.php/Public_RecordView?processid=MOBIL6660-18http://boldsystems.org/index.php/Public_RecordView?processid=MOBIL6660-18

  • referred to above collected in March 2018 has not yet beenassigned to a BIN despite having a clean, full-lengthsequence.

    I did look at results of the PIPITS analysis of 2017 StormyLake fungal data (see script).

    I also skied down the spring where I had collected thatmystery chironomid before and collected more sediment,both from the spring and from the stream where it crossesthe snowmachine trail.

    Sediment sample taken from spring.

  • Sediment sample taken from stream belowspring where it crosses the trail.

  • FebruaryFriday, February 1

    To do:

    Post this week's Refuge Notebook article.Look at samples collected yesterday.Work up Stormy Lake soil fungi data for AKESNewsletter and presentation.New literature to add to Bio publicationsbibliography page and to our literature database.Biota of Canada post to akentsoc.orgArrange for return shipping of Betula specimens.

    I looked through the sediment sample from the streamcollected yesterday. There were many ostracods, a few tinyworms, two caddisfly larvae, and three Diptera larvae, eachdifferent from the other.

    I posted this week's Refuge Notebook article, starting a newvolume for the year.

    I updated the KNWR Biology's publications bibliography andposted new literature announcements tohttp://www.akentsoc.org/.

    I resumed work on the Stormy Lake fungal data (see script).

    I am excited to have just learned about FUNGuild (Nguyen et

    https://www.fws.gov/refuge/Kenai/what_we_do/science/bibliography.htmlhttp://www.akentsoc.org/

  • al., 2016). After some formatting fixes (removing spaces, etc.)I was able to submit my OTU table to FUNGuild athttp://www.stbates.org/guilds/app.php.

    Guilds v1.0 Beta report: - 386 assignments were made on OTUs within the inputfile! - Total calculating time = 8.8 seconds!

    I looked through these results in R (see script).

    http://www.stbates.org/guilds/app.php

  • Relative abundances of guilds of fungi. Sites 1-3were in the Lumbricus infestation; sites 4-6 wereoutside in otherwise similar woods.

  • Comparison of relative abundances of guilds offungi based on reads summed over infested andnot infested sites.

    There were clearly proportionately more mycorrhizal fungi inLumbricus-free plots than plots in the infestation.

    Monday, February 4

    To do:

    Get going on AKES presentation.Arrange for return shipping of Betula specimens.

    I started work on revising my worm presentation. I am tryingto determine which Eisenia species are present.

    Specimen/lot UAM:Ento:378050 is now identified as"Eisenia andrei and Amynthas," identified by AdrianWackett by morphology.

    http://arctos.database.museum/guid/UAM:Ento:378050

  • The specimen I collected from my compost pile recently(MOBIL8994-18, now also KNWR:Inv:35) was identifiedinconclusively as just Eisenia by BOLD's ID Engine. I read inRömbke et al. (2016) that these two species are separableDNA barcodes. I submitted the sequence from this specimento NCBI BLAST and looked at only results that were in theAppendix of Römbke et al. (2016). This placed my specimenin the Eisenia andrei clade.

    Eisenia andrei specimen KNWR:Inv:35, live,25.Nov.2018.

    I also updated the identification of specimenKNWR:Ento:6756 to E. andrei because the worms in mycompost pile had come from the population at my parents'house. I later made this new identification unacceptedbecause vermicomposting cultures may contain both species(Domínguez, 2018). I need to look for E. fetida in thesepopulations.

    http://boldsystems.org/index.php/Public_RecordView?processid=MOBIL8994-18http://arctos.database.museum/guid/KNWR:Inv:35http://arctos.database.museum/guid/KNWR:Inv:35http://arctos.database.museum/guid/KNWR:Ento:6756

  • I worked on revising my worm presentation on worms I gavein November, updating it with new information andmodifying for the upcoming AKES meeting.

    I skied east across Headquarters Lake, the small lake to theeast, and onto the PSDRA trail system. These trails had notbeen groomed. Upon a little searching on the internet, itlooks like this organization is no longer active. Its website isgone and I saw no activities cited past 2014.

    Wednesday, February 6

    To do:

    Finish AKES presentation.Finish travel-related arrangements.Get specimens together to take to Fairbanks.Fix akentsoc.org links and post meeting agenda.Arrange for return shipping of Betula specimens.

    My SLURM job on Yeti was canceled, again after six days. Ido not know why. I will have to look into this later. I need toget my presentation done today.

    In looking through literature for my talk, I learned that thereis another invasive species very similar to Lumbricusterrestris that I need to watch out for, Lumbricus friendi (seeCsuzdi and Szlávecz, 2003).

    I attended the all employee meeting with the regionaldirector and deputy directory in the middle of the day.

  • I worked on my AKES presentation on worms.

    Looking up more about Lumbricus friendi. This species isnot included in the key of Gates and Reynolds (2017). Thereare currently only two COI sequences from this species inBOLD, both of them from Europe.

    I examined specimens KNWR:Ento:8612 andKNWR:Ento:7096. Both appear to have canoe-shapedtubercula pubertatis, so they appear to be Lumbricusterrestris.

    Thursday, February 7

    To do:

    Finish AKES presentation.Finish travel-related arrangements.AKES accounts/passwords

    I want to find a good resource for discriminating betweenLumbricus terrestris and Lumbricus friendi. I found acomparison from The Earthworm Society of Britain's ESBEarthworm Identikit athttps://www.earthwormsoc.org.uk/fullscreen/earthwormkey.

    http://arctos.database.museum/guid/KNWR:Ento:8612http://arctos.database.museum/guid/KNWR:Ento:7096https://www.earthwormsoc.org.uk/fullscreen/earthwormkey

  • Comparison of Lumbricus terrestris andLumbricus friendi from The Earthworm Societyof Britain's ESB Earthworm Identikit.

    I requested Sherlock (2012) through ARLIS ILL.

    I spent most of the morning finishing my presentation forSaturday.

    Friday, February 8

    Much of my day was spent in travel to Fairbanks. After Derekbrought me to the museum I walked around the museumvicinity looking for willow rosette galls, but I found none. Inoted that the birches (Betula neoalaskana) still retainedmuch if not most of their seeds and these had been fallingvery recently on top of the snow.

    I attended the Alaska Entomological Society evening social atDerek's house. There I was happy to meet Jessica Rykkenand Chris Fettig.

  • Saturday, February 9

    This was a busy day at the annual meeting of the AlaskaEntomological Society. I presented on earthworms.

    Jessica Rykken presents at the 2019 annualmeeting of the Alaska Entomological Society.

    Monday, February 11

    To do:

    Uniform orderTravel voucher.AKES presentation Biology News post.Post last week's Refuge Notebook article.akentsoc.org updatesSolicit AKES Newsletter articles.Get permission to post AKES meetingpresentations.

  • E-mail regarding student presentation award.Resume AWCC analysis.Examine Lumbricus specimens.Finish article on Stormy Lake fungi.Process STDP arthropod data using improvedpipeline.Format skunk moth article for AKES.

    I made some small updates to the akentsoc.org website andmade a post about my presentation to the Biology News pageof the Refuge's website.

    I edited, formatted, and posted last week's Refuge Notebookarticle.

    I resumed the AWCC soil fungi analysis. See the commandline stuff and SLURM script.

    Examining KNWR:Inv:20, Lumbricus terrestris fromSeward. Clitellum on 32-37. This is Lumbricus terrestris.

    KNWR:Inv:33, from Rainbow Lake; KNWR:Inv:36 (justentered), from Homer; KNWR:Ento:7060, from Canoe Lake;KNWR:Ento:7206, from Fish Lake; KNWR:Ento:7096, fromMerganser Lake; and KNWR:Ento:8954, from CooperLanding are also Lumbricus terrestris. Now I have examinedat least representative individuals from all localities fromwhich I currently have specimens. All are Lumbricusterrestris; none are Lumbricus friendi.

    Tuesday, February 12

    http://arctos.database.museum/guid/KNWR:Inv:20http://arctos.database.museum/guid/KNWR:Inv:33http://arctos.database.museum/guid/KNWR:Inv:36http://arctos.database.museum/guid/KNWR:Ento:7060http://arctos.database.museum/guid/KNWR:Ento:7206http://arctos.database.museum/guid/KNWR:Ento:7096http://arctos.database.museum/guid/KNWR:Ento:8954

  • To do:

    Travel voucherTimeRequest permission from presenters to post AKESmeeting presentations.Continue AWCC analysis.Identify "snow worms" from Homer.Finish article on Stormy Lake fungi.Process STDP arthropod data using improvedpipeline.Format skunk moth article for AKES Newsletter.Format tick announcement for AKES Newsletter.Get worm protocol to Jess.

    I submitted 100K more reads to be processed bypipits_funits on Yeti. I also did some getting ready toprocess more, but I am waiting to receive results fromyesterday's job before proceeding. See the command linescript, a SLURM script, and an example of a series of SLURMscripts submitted.

    Examining worms brought to me from Homer. These looklike Dendrobaena octaedra. Clitellum on 29-33. These are D.octaedra.

    I circumnavigated the upland island east of HeadquartersLake, skiing around it in the wetlands. It was splendid. At Ispring I collected a sample of bottom sand and muck. In thelab I found some small crustaceans in this which look likeHarpacticoida.

  • Harpacticoida specimen in filamentous algaefrom bottom of spring southeast ofHeadquarters Lake.

    Wednesday, February 13

    To do:

    Continue AWCC analysis.Finish article on Stormy Lake fungi.Process STDP arthropod data using improvedpipeline.Format skunk moth article for AKES Newsletter.Format tick announcement for AKES Newsletter.Get worm protocol to Jess.

    I entered data for that copepod from yesterday(KNWR:Inv:37) and a Lumbricus terrestris specimen which

    http://arctos.database.museum/guid/KNWR:Inv:37

  • was not in Arctos for some reason (KNWR:Ento:7207).

    I resumed work on my Stormy Lake soil fungi article, makinga map. See the R script.

    Map of soil sampling locations for the AKESNewsletter article.

    My AWCC soil fungi ITSx SLURM jobs finished just beforelunch. I ran the rest of the PIPITS steps. See command lineinput and the SLURM script. The results did not look good,with way too many Cercozoa reads.

    Thursday, February 14

    To do:

    http://arctos.database.museum/guid/KNWR:Ento:7207

  • New Biology News entries.AKES student presentation award.Format this week's Refuge Notebook.Continue AWCC analysis.Finish article on Stormy Lake fungi.Process STDP arthropod data using improvedpipeline.Format skunk moth article for AKES Newsletter.Format tick announcement for AKES Newsletter.Format meeting article for AKES Newsletter.Get worm protocol to Jess.Respond to e-mail about LaTeX insect labels.

    I posted Biology News entries.

    Now I really want to know what happened with that AWCCsoil fungi analysis. I was thinking about it last night. I didsome looking at output on Yeti (see notes/script.) I mightneed to try split_libraries.py of QIIME (see Gweon etal., 2015). Looking at the documentation, I think I need touse split_libraries_fastq.py(http://qiime.org/scripts/split_libraries_fastq.html).

    Fromhttp://qiime.org/tutorials/processing_illumina_data.html:

    QIIME can be used to process single-end or paired-end readdata from the Illumina platform. The primary script formerging paired-end read data in QIIME isjoin_paired_ends.py. See the script documentation for moredetails. This is typically applied as a pre-processing step

    http://qiime.org/scripts/split_libraries_fastq.htmlhttp://qiime.org/tutorials/processing_illumina_data.html

  • before running split_libraries_fastq.py.

    I worked with QIIME for a while, but it seems the kind ofinput files I have from MrDNA are difficult to work with.

    I downloaded quality-filtered, demultiplexed reads I hadmade earlier on Galaxy (#15) and used an R script toreformat them to what pipits_funits expects.

    I ran another SLURM script. We will see what happens.

    Friday, February 15

    To do:

    Format this week's Refuge Notebook.Continue AWCC analysis.Biology News entries for Dawn's new remote sensearticle and Voices of the Kenai.Finish article on Stormy Lake fungi.Process STDP arthropod data using improvedpipeline.Format skunk moth article for AKES Newsletter.Format tick announcement for AKES Newsletter.Format meeting article for AKES Newsletter.Get worm protocol to Jess.Respond to e-mail about LaTeX insect labels.

    I canceled the SLURM job begun at the end of the dayyesterday. It was still running, but it would take days. I usedan R script to format one fasta file per sample forpipits_funits. I then wrote a series of SLURM scripts to

  • process these data (see example). I submitted the 12 jobs inparallel (see below).

    sbatch 2019-02-15-0806_ITSx_AWCC1.slurmsbatch 2019-02-15-0806_ITSx_AWCC2.slurmsbatch 2019-02-15-0806_ITSx_AWCC3.slurmsbatch 2019-02-15-0806_ITSx_AWCC4.slurmsbatch 2019-02-15-0806_ITSx_AWCC5.slurmsbatch 2019-02-15-0806_ITSx_AWCC6.slurmsbatch 2019-02-15-0806_ITSx_AWCC7.slurmsbatch 2019-02-15-0806_ITSx_AWCC8.slurmsbatch 2019-02-15-0806_ITSx_CaribouHills1.slurmsbatch 2019-02-15-0806_ITSx_CaribouHills2.slurmsbatch 2019-02-15-0806_ITSx_CaribouHills3.slurmsbatch 2019-02-15-0806_ITSx_CaribouHills4.slurm

    I posted this week's Refuge Notebook article.

    I formatted a draft announcements article for the AKESNewsletter.

    I posted three Biology News entries and added Dawn's newarticle to our literature database and to our on-linePublications Bibliography page.

    I started working on the skunk moth article for AKESNewsletter, but the author notified me that he is makingsome changes, so I will wait on this.

    I formatted the meeting article for AKES Newsletter andposted presentations to akentsoc.org so that they can belinked to in this article.

    I started on work using the vegan package to look at the

    https://www.fws.gov/refuge/Kenai/what_we_do/science/bibliography.html

  • Stormy Lake fungi data. See the R script.

    PCA plot of Stormy Lake soil fungi data withLumbricus terrestris (Lt) as an environmentalvariable.

    Monday, February 18

    I checked on Yeti on my AWCC analysis. The ITSx step hadfailed because of an error in the re-inflation step. I looked atthe files and could not see what was wrong. I may really needto somehow construct input files of the format PIPITSexpects: separate R1 and R2 fasta files for each sample.

    Might try Bayexer (https://github.com/HaisiYi/Bayexer). I

    https://github.com/HaisiYi/Bayexer

  • tried to use this, but I ran into problems. See script.

    Tuesday, February 19

    To do:

    Write Refuge Notebook article on earthworms.Continue AWCC analysis.Finish article on Stormy Lake fungi.Process STDP arthropod data using improvedpipeline.Get worm protocol to Jess.

    I came in late today due to family appointments in themorning.

    Going back to Galaxy, tried splitting original R1 FASTQ fileusing Barcode splitter tool. That worked, creating datasets188 (data) and 189 (summary, below).

    # Barcode CountAWCC1 42351AWCC2 45738AWCC3 46639AWCC4 52856AWCC5 33779AWCC6 25379AWCC7 55761AWCC8 64013CaribouHills1B 37214CaribouHills2B 50063CaribouHills3B 62257CaribouHills4B 60989

  • unmatched 777148total 1354187

    Need to trim barcodes off of these. Should I trim the primerregion also? I did some looking and reading and found thatyes, I should trim this off, so I will do so. This should betrimming 8 bp for the barcode and 18 bp for the primer, so26 bp. I did so using Trimmomatic on Galaxy (collection203).

    Todd invited me to go with him to investigate a congregationof eagles in Slikok Creek near the Sterling Highway and ArcLoop. I could not turn this down. Colin came also.

    We looked in Slikok Creek off of Arc Loop Road and off of theSterling Highway where eagles were congregating. At ArcLoop we saw a dipper working in the culvert (ebird checklist:S52946889). We did see part of an old moose carcass, butmost of the eagle activity was not centered around this. Wehad wondered if there was a run of fish or something movingin the stream, but we saw none. A few scoops with a net alongvegetation yielded a sculpin and a nine-spined stickleback(iNaturalist observation: 20498842), but that was all. Itappeared that this was just a big bird bath for eagles. Therewere many eagles drying their wings and we did see one birdjust standing in the water.

    https://ebird.org/view/checklist/S52946889https://www.inaturalist.org/observations/20498842

  • Bald eagles at Slikok Creek near the SterlingHighway (iNaturalist observation: 20498942).

    John had asked me lat in the day on Friday to write a RefugeNotebook article on worms, due tomorrow, so I need to getthis done. I did get started on it.

    Wednesday, February 20

    To do:

    Set up phone voice mail.Finish Refuge Notebook article on earthworms.Continue AWCC analysis.

    Regarding the AWCC fungi, do I need to trim anything off ofthe tails? I checked e-mail correspondence with Dr. Dowd atMrDNA lab. We had planned on using the primers below.

    illITS3kyo2 GATGAAGAACGYAGYRAAillITS4kyo3 CTBTTVCCKCTTCACTCG

    https://www.inaturalist.org/observations/20498942

  • illITS3kyo2 is the forward primer. The first two reads in thereverse FASTA file end in CGAGTGAAGCGGCAACAG. Thislooks like the reverse complement of illITS4kyo3, but I donot know how the degenerate (?) nucleotides K, V, and Ywork. Anyway, I should trim this tail off. I used FASTQTrimmer on Galaxy to trim the last 18 bp from the originalR2 file (dataset 218). 1354187 fastq reads were processed.

    I continued writing the Refuge Notebook article finishing adraft in the afternoon and getting it to John.

    I generated R2 fastq files using an R script. I uploaded theseto Yeti. My first SLURM script failed. vsearch gave theerror

    Fatal error: Invalid line 3 in FASTQ file: '+' line must be empty or identical to header

    I thought that "+" was a normal line 3 for FASTQ format.

    Thursday, February 21

    To do:

    Set up phone voice mail.Continue AWCC analysis.Format tomorrow's Refuge Notebook article.

    Troubleshooting yesterday's AWCC work. I compared the R1and R2 FASTQ files. I found two problems with the R2 filesgenerated by R. First, the R2 file had the Windows carriagereturns (CRLF) instead of UNIX (LF). Second, all of the

  • quality scores were changed to ";" so that all quality scoreswere lost.

    I installed the ShortRead package and used an R script tomake the R2 FASTQ files. I transferred this to Yeti and thenconverted the newline characters to UNIX.

    dos2unix AWCC1_R2.fastq...

    I re-ran that SLURM script from yesterday. For some reasonthe pispino_createreadpairslist worked but thepispino_createreadpairslist step did not. It workedover the command line, though, yielding 339K reads.

    I tried running the next steps of this analysis a couple ofways.

    I entered data for Nicoletiidae specimensKNWR:Ento:11302-KNWR:Ento:11304 and looked up a littleliterature on this group. I examined specimenKNWR:Ento:11302 (2 males and one female). They look likeillustrations of Grassiella as illustrated by Escherich (1905)and not Allograssiella as described by Mendes and Schmid(2010).

    Friday, February 22

    To do:

    Fill out University of Alaska Press authorquestionnaire for Drivers of Landscape Change inthe Northwest Boreal Region book.

    http://arctos.database.museum/guid/KNWR:Ento:11302http://arctos.database.museum/guid/KNWR:Ento:11304http://arctos.database.museum/guid/KNWR:Ento:11302

  • Format and post today's Refuge Notebook article.Set up phone voice mail.Continue AWCC analysis.Format skunk moth article.Finish Stormy Lake fungi article.

    That last SLURM script from yesterday was successful.

    I tried to rerun the second script, but it failed very quickly.

    I found the problem. I had cut up that original prepped.fastafile wrongly, making the second file start with a read and notthe label. I fixed this problem and re-ran this. See the Rscript, shell script for splitting the original fasta, exampleSLURM script, and shell script to run all of the SLURMscripts.

    I formatted and posted this week's Refuge Notebook article.

    I edited and formatted the Polix article for the AKESNewsletter.

    All of those SLURM jobs except the first one appeared tocomplete successfully. I think with that first one I hadneglected to remove the out_funits_001 directory orsomething like that. I restarted this.

    Monday, February 25

    To do:

    Continue AWCC analysis.

  • Quick fix in skunk moth article.Finish Stormy Lake fungi article.Post AKES meeting presentations.

    That last SLURM pipits_funits jobbed appears to haverun correctly.

    I worked on coming up with a Refuge boundary map for thepurposes of checklisting, removing all of the conveyed lands.

    Simple Kenai National Wildlife Refuge

  • boundaries map extracted for the purpose ofchecklisting.

    I converted this to WKT format usinghttps://mygeodata.cloud with the intent of suppling this overthe URL to GBIF for Refuge-specific searches, but this didnot work. I think that the URL was much too long. For thepurpose of checklisting I will need to pull data off of GBIFjust using the extent, then clip out the records from theRefuge.

    Wow, the script I had started on Thursday running the wholething through pipits_funits worked! See output. Istarted another SLURM script to run the pipits_processstep.

    I left work to take care of animals. At home I found that thatlast pipits_process step had worked (see output). I thenran the vsearch step required by LULU via a SLURM script.

    In the evening I ran LULU, used FUNGuild, and looked at theresults. See the R script and other stuff.

    A comparison of soil fungal communitiescategorized by guilds at the Alaska Wildlife

    https://mygeodata.cloud

  • Conservation Center inside the bison pens,outside the pens, and in the Caribou Hills.

    Hours today: 10:15-12:30, 16:30-17:45, 20:00-22:00, Σ =5:00 hrs.

    Tuesday, February 26

    To do:

    Continue AWCC analysis.Quick fix in skunk moth article.Revise Refuge Notebook article.Finish Stormy Lake fungi article.Post AKES meeting presentations.Backups.

    John asked for a comparison of diversity among theAWCC/Caribou Hills soil fungi. I did so (see R script).

  • Numbers of soil fungi OTUs detected inside theAWCC pens, outside the AWCC pens, and in theCaribou Hills.

    I made the small change requested for the AKES Newsletterskunk moth article.

    I worked on the introduction of my Lumbricus soil fungiarticle for the AKES Newsletter.

    I received an e-mail from Kyungsoo giving densities of up toover 30 g/m2 of ash-free dry biomass of earthworms atStormy Lake for the site closest to the boat launch. I wantedto convert these biomass numbers to something convenientfor the typical newspaper reader. Below is my back-of-the-envelope conversion.

  • Ash-free dry biomass of earthworms is roughly 15-22% oftheir live biomass (seehttps://www.researchgate.net/post/Earthworm_biomass_relation_between_fresh_mass_and_dry_masswith 15-18% apparently accepted. I went with 17%. Using thisto convert middle of the range of values from closest to theboat launch of AFD 25 g/m2:

    ## conversion factor from ash free dry biomass to fresh biomass is roughlycf

  • 0.73.

    ## Average biomass was 0.36 g/m2

    cf

  • Get new lifescanner kits uploaded into Arctos andregister kits.Continue AWCC analysis.Finish Refuge Notebook article.Finish Stormy Lake fungi article.Post AKES meeting presentations.

    I uploaded life scanner vial barcodes to Arctos so that theyare ready to use (R script). I tried registering lifescanner kitA3OP00 through the http://lifescanner.net/ interface. Thislooks much better than the iphone app in that coordinates,etc. can be manually entered. I will not register all of the kitsto my account now so that these can be used by others.

    I looked through the vials of Pedetontus s. str. from RodCrawford, looking for newer vials for sequencing. Seven vialsof these specimens had been collected in 2018.

    http://lifescanner.net/

  • The abovementioned specimens are now KNWR:Ento:11305and KNWR:Ento:11306.

    http://arctos.database.museum/guid/KNWR:Ento:11305http://arctos.database.museum/guid/KNWR:Ento:11306

  • Pedetontus specimen KNWR:Ento:11306,habitus.

    At home this afternoon I dug in my horse manure pile insearch of Eisenia fetida, but all that I found looked likeEisenia andrei, lacking E. fetida's more conspicuous palebands as figure by Domínguez (2018).

    http://arctos.database.museum/guid/KNWR:Ento:11306

  • Eisenia andrei in large horse manure pile, OldKasilof Road.

    I looked at some literature on herbivory by Lumbricusterrestris and resumed work on my AKES Newsletter articleon soil fungi at Stormy Lake.

    I continued with exploratory work on the Stormy Lake fungaldata using the vegan package (see R script).

  • Plot of CCA of Stormy Lake fungal data withLumbricus presence as a factor.

    Hours: 07:00-07:45, 09:15-13:15, 14:15-14:45, 20:45-22:00,22:45-00:45, Σ = 8.5 hrs.

    Friday, February 28

    To do:

    Respond to Todd about blackfish permit.Format Refuge Notebook article.Finish Stormy Lake fungi article.Post AKES meeting presentations.

    I filled out a project description for applying for an ADF&G

  • permit to collect blackfish this summer.

    I worked on community analysis of Stormy Lake fungal data.I did some reading up on analysis types in McCune and Grace(2002). p. 102: It looks like either direct gradient analysis,where we know what the explanatory variables of interestare, or indirect gradient analysis, where we don't know aheadof time, would be appropriate. In my case I do know what myvariable of interest is. CCA and canonical correlation areindirect gradient analysis methods. p. 109: Vector fittingwould be ok for what I am trying to do. p. 115: PCA would notbe appropriate for my data, which are not linear or normal.At least I do not want to worry about these assumptions. p.125: NMS would be appropriate, apparently the best. p. 154:Should not use CA. p. 160: Should not use DCA. p. 164: CCAwould be ok with cautions.

    I still am unsure whether vector fitting or constrainedmethods would be best. In the end I did both. I found that Ihad insufficient data (not enough sites?) to use NMS, butCCA seemed to work. See R script and a second R script, inthe end making plots that were colorful if nothing else.

  • Biplot of OTUs (circles) and sampling sites(labeled boxes) from a correspondence analysiswhere presence of Lumbricus terrestris wasincluded as an environmental variable. Colors ofOTU circles correspond to category colors fromthe pie charts I made on February 1. Red andblue lines indicate groupings of sites byearthworm presence.

  • Biplot of OTUs and sampling sites from aconstrained correspondence analysis wherepresence of Lumbricus terrestris was included asa constraint. See explanation in the caption ofthe figure above.

    I also made a new set of pie charts for the article. See Rscript.

    I worked on the Stormy Lake fungi article, incorporatingthese figures.

    Hours: 06:30-07:30, 09:45-12:45, 16:00-18:00, 20:00-22:00, Σ = 8 hr.

  • MarchFriday, March 1

    To do:

    Format Refuge Notebook article.Finish Stormy Lake fungi article.Post AKES meeting presentations.

    I formatted and posted today's Refuge Notebook article.

    I worked on the Stormy Lake earthworm soil fungi article.

    I just learned that there is a newer UNITE release (8.0). Ishould have used that. Oh, well, I am not starting over on theStormy Lake fungal analysis at this point.

    I worked on graphs (R script) and on examining some of themycorrhizal OTUs (R script and notes).

    Hours: 06:45-07:30, 10:00-12:00, 20:30-22:30

    Sunday, March 3

    I worked on trying to finish up the Stormy Lake earthwormsoil fungi article.

    Monday, March 4

    To do:

    https://www.fws.gov/uploadedFiles/Region_7/NWRS/Zone_2/Kenai/Sections/What_We_Do/In_The_Community/Refuge_Notebooks/2019_Articles/Refuge_Notebook_v21_n05.pdf

  • Finish Stormy Lake fungi article. See aboutuploading data to GenBank SRA, Zenodo, and/orPlutoF.Post AKES meeting presentations.

    I made BioProject, BioSample, and read submissions to NCBIGenBank SRA for the Stormy Lake soil fungi data. →BioProject PRJNA525443

    I started learning how to work with projects and samples inPlutoF, getting a project set up(https://plutof.ut.ee/#/study/view/74051), but I did not getto the point of uploading sequences.

    I am trying to figure out the best way to get HTS occurrencedata to GBIF. It appears that NCBI GenBankreads/occurrences do not get harvested by or linked to GBIFin a regular way. I know that PlutF/UNITE reads can beharvested by GBIF. I did some tinkering and testing and gotone test sequence uploaded via the importer. It will take alittle bit of work to get the names formatted correctly.

    Tuesday, March 5

    Those GenBank SRA submissions were published and arenow available athttps://www.ncbi.nlm.nih.gov/sra/PRJNA525443.

    I worked on reformatting OTU sequence data to upload toPlutoF. This ended up being a more difficult task than I hadexpected (see R script).

    http://www.ncbi.nlm.nih.gov/bioproject/525443https://plutof.ut.ee/https://plutof.ut.ee/#/study/view/74051https://www.ncbi.nlm.nih.gov/sra/PRJNA525443

  • I worked on the Stormy Lake fungi article, making a smallchange to the map (R script). I sent off a draft requesting forreviews.

    Wednesday, March 6

    To do:

    Check PlutoF uploads.Post AKES meeting presentations.Start blackfish diet article.See about ordering fungal sequencing kits.Take A100 and A312/325R classes.

    Those PlutoF imports are still hung up, waiting to beprocessed.

    I posted presentations from the AKES annual meeting.

    I entered data for Dendrobaena octadra specimenKNWR:Inv:38 and added this to the collection.

    I started work on an Alaska blackfish diet article for theAKES Newsletter.

    After some tweaking I got the first 60 records imported intoPlutoF. See R script.

    Thursday, March 7

    I continued importing Stormy Lake sequence data to PlutoF,getting this done! (See R script)

    http://www.akentsoc.org/archives/1474http://arctos.database.museum/guid/KNWR:Inv:38

  • Colin and I drove out to Kenai to look for Alaska blackfish inthe vicinity of the pond of of Candlelight Dr. where I had seenthem earlier and in the stream behind Walmart. We did findopen water in a few places, mainly at seeps, but we saw noblackfish.

    I worked on entering data from blackfish gut contentsspecimens into Arctos in preparation for a short article onblackfish diet.

    Friday, March 8

    I started out responding to an e-mail inquiry aboutearthworms, finding out about the new Yukon record ofArctiostrotus fontinalis, and looking up and requestingpertinent literature.

    To do:

    Post today's Refuge Notebook article.Finish blackfish diet article.See about ordering fungal sequencing kits.Take A100 and A312/325R classes.

    I formatted and posted this week's Refuge Notebook article.

    I commented on a record of cutthroat trout fromWoodpecker Lake on the Refuge (UAM:Fish:328). Thisseems to me to be a questionable record.

    Another interesting record was for Coregonus laurettae,Bering cisco, at Gene Lake (UAM:Fish:2444) We do not have

    http://arctos.database.museum/guid/UAM:Fish:328http://arctos.database.museum/guid/UAM:Fish:2444

  • this species on our checklist.

    Sunday, March 10

    I worked on formatting a table of blackfish prey items.

    Monday, March 11

    To do:

    Finish blackfish diet article.See about ordering fungal sequencing kits.Take A100 and A312/325R classes.

    I worked on my blackfish diet article.

    Tuesday, March 12

    To do:

    Credit card stuff.Finish blackfish diet article.See about ordering fungal sequencing kits.Take A100 and A312/325R classes.Start on STDP article.

    I finished a draft of the blackfish diet article.

    I started an article on the 2017 STDP metagenomic work inthe journal Research Ideas and Outcomes. I uploaded all 64raw FASTQ files to Yeti. I intend to begin an analysis usingQIIME 2™. I wrote a manifest file using an R script.

    https://riojournal.com/

  • Wednesday, March 13

    To do:

    Time.Safety committee meetingCredit card stuff.Revise blackfish diet article.See about ordering fungal sequencing kits.Take A100 and A312/325R classes.STDP analysis.Review carabid table from Bergdahl.

    I attended the safety meeting at 09:00. Action points:

    MOCC refresher - I should take this this year.New PPE policy - Record keeping required PPE training.Refuge is responsible for providing PPE.JHAs - Need to check bio JHAs. These should be donebefore April 5. Definitely need JHA for spraying.Aviation. Will need to have all helmets inspected. Willlikely need to destroy some helmets.Need to update lab safety plan.

    I revised my blackfish article. I wrote a short note on two newmayfly records from Alaska.

    Thursday, March 14

    To do:

    Credit card stuff.

  • See about ordering fungal sequencing kits.Take A100 and A312/325R classes.STDP analysis.Review carabid table.

    I imported the STDP data into QIIME.

    I did some backing up of data because our server is scheduledto go down soon. Back-up list:

    Slikok projectElodea workLTEMPAWCC/Caribou Hills stuffMelvin thesis datasets

    I started an anlysis in QIIME. See notes and I/O.

    QIIME summary of read counts per sample for

  • STDP dataset.

    Read lengths per sample:

    Sample name Sequence countEAFB07JUN17-E 175187EAFB30JUN17-EA 27823EAFB30JUN17-IT 27604JNUF06JUL17-EA 26951JNUF26MAY17-EA 26040JBER06JUN17-E 24615JBER20JUN17-R 24546JBER11JUL17-IT 22823EAFB22MAY17-E 22578JBER06JUN17-IT 21891JNUF02JUN17-EA 21553JBER20JUN17-EA 21357JBER06JUN17-EA 21281JNUF20JUN17-R 20948EAFB30JUN17-E 20928JNUF06JUL17-E 20686JNUF11AUG17-E 19748JNUF26MAY17-E 19287JBER20JUN17-E 18981EAFB22MAY17-EA 18886JNUF20JUN17-E 18876JNUF02JUN17-E 18375EAFB07JUN17-EA 18233JBER24MAY17-E 17063JBER24MAY17-IT 16134EAFB07JUN17-IT 15180JBER24MAY17-EA 14808

  • JBER10MAY17-R2 13334JBER10MAY17-R1 13144JNUF20JUL17-EA 9834

    Why does one sample, EAFB07JUN17-E, have over 170,000reads while most are closer to 20,000?

    Forward read quality.

    Reverse read quality.

    I think that now I need to build a guild library, but I am notsure of the very best way to make this in a format that QIIME

  • will like.

    Friday, March 15

    To do:

    See about ordering fungal sequencing kits.Take A100 and A312/325R classes.STDP analysis.Review carabid table.Assemble AKES Newsletter draft.

    I looked at methods of reference library creation of Nilsson etal. (2018), Richardson et al. (2018), and Pruesse et al.(2007). The work of Richardson et al. (2018) is most similarto what I need to do, but I think it would not be easy for meto set up Metaxa2. I need to figure out how to make a librarythat QIIME 2 can use for now, not necessarily a library aswell-curated as these major databases.

    I started downloading records from BOLD, which was takinga long time.

    While downloading I worked on assembling all of thesubmitted articles for the AKES Newsletter into an issue.

    I got started on building a library. See notes and, scripts, I/O.

    Monday, March 18

    To do:

    Send out AKES Newsletter draft for review.

  • STDP analysis.Review carabid table.See about ordering fungal sequencing kits.Take A100 and A312/325R classes.

    I did some editing of the AKES Newsletter draft and sent itout to the editorial committee, authors, and others forreview.

    I saw that BOLD's BIN algorithm was run this weekend.

    I resumed work on construction of an Alaska DNA barcodelibrary for HTS. See notes and I/O.

    I posted two Biology News entries requested by John.

    Wednesday, March 20

    To do:

    Revise Lumbricus-fungi article based oncomments received.Scan some of Dominique's artwork.STDP analysis.Review carabid table.See about ordering fungal sequencing kits.Take A100 and A312/325R classes.Format this week's Refuge Notebook article.

    I found some problems with the library work I had done theother day. I resumed, dealing with these problems. See notes

  • and I/O.

    I scanned a couple of Dominique's illustration to fulfill arequest for someone writing an article about Dominique'swork.

    Watercolor illustration of the life history of aMesopolobus by Dominique Collet.

    I formatted this week's Refuge Notebook article.

  • Thursday, March 21

    To do:

    STDP analysis.Finish up AKES Newsletter.Review carabid table.See about ordering fungal sequencing kits.Take A100 and A312/325R classes.

    I worked on the STDP analysis, finishing the library work andfor the first time generating an OTU table in QIIME2. Seescripts, I/O.

    Friday, March 22

    To do:

    STDP analysis.Finish up AKES Newsletter.Post Refuge Notebook article.

    I posted today's Refuge Notebook article.

    I worked more on the library, re-running the dereplicationand clustering steps. I started selecting best library records,also. See I/O, R script, and SLURM script.

  • ReferencesCsuzdi, Csaba and Katalin Szlávecz. 2003. “Lumbricus Friendi

    Cognetti, 1904 a New Exotic Earthworm in North America.”Northeastern Naturalist 10 (1): 77–83.https://bioone.org/journals/Northeastern-Naturalist/volume-10/issue-1/1092-6194(2003)010[0077:LFCANE]2.0.CO;2/span-classgenus-speciesLUMBRICUS-FRIENDI-spana-classinternal-link-hrefi1092-6194-10/10.1656/1092-6194(2003)010[0077:LFCANE]2.0.CO;2.full

    Domínguez, J. 2018. “Earthworms and Vermicomposting.” Ch. 5. InRay, S. (Ed.). Earthworms, Rijeka: IntechOpen.https://doi.org/10.5772/intechopen.76088.

    Escherich, Karl. 1905. “Das System der Lepismatiden.” Zoologica 43(18). https://doi.org/10.5962/bhl.title.7909.

    Gates, Gordon Enoch, and John Warren Reynolds. 2017. “Preliminarykey to North American Megadriles (Annelida, Oligochaeta),based on external characters, insofar as possible.”Megadrilogica 22 (10).

    Gweon, Hyun S., Anna Oliver, Joanne Taylor, Tim Booth, MelanieGibbs, Daniel S. Read, Robert I. Griffiths, and KarstenSchonrogge. 2015. “PIPITS: An automated pipeline for analysesof fungal internal transcribed spacer sequences from theillumina sequencing platform.” Methods in Ecology andEvolution 6 (8): 973–80. https://doi.org/10.1111/2041-210X.12399.

    McCune, Bruce, and Grace, James B. 2002. Analysis of EcologicalCommunities. Gleneden Beach, Oregon: MjM Software Design.

    Mendes, Luis F., and Volker S. Schmid. 2010. “Description ofAllograssiella floridana gen. nov., spec. nov. from the southernUnited States living with Pseudomyrmex ants.” Spixiana 33:49–54.

    https://bioone.org/journals/Northeastern-Naturalist/volume-10/issue-1/1092-6194(2003)010[0077:LFCANE]2.0.CO;2/span-classgenus-speciesLUMBRICUS-FRIENDI-spana-classinternal-link-hrefi1092-6194-10/10.1656/1092-6194(2003)010[0077:LFCANE]2.0.CO;2.fullhttps://doi.org/10.5772/intechopen.76088https://doi.org/10.5962/bhl.title.7909https://doi.org/10.1111/2041-210X.12399

  • Nilsson, R. H.; Glöckner, F. O.; Saar, I.; Tedersoo, L.; Kõljalg, U.;Abarenkov, K.; Larsson, K.-H.; Taylor, A. F.; Bengtsson-Palme,J.; Schigel, D.; Jeppesen, T. S.; Kennedy, P. & Picard, K. 2018.The UNITE database for molecular identification of fungi:handling dark taxa and parallel taxonomic classifications.Nucleic Acids Research 47:D259-D264.https://doi.org/10.1093/nar/gky1022

    Nguyen, Nhu H., Zewei Song, Scott T. Bates, Sara Branco, LehoTedersoo, Jon Menke, Jonathan S. Schilling, and Peter G.Kennedy. 2016. “FUNGuild: An open annotation tool forparsing fungal community datasets by ecological guild.” FungalEcology 20 (April): 241–48.https://doi.org/10.1016/j.funeco.2015.06.006.

    Pruesse, E.; Quast, C.; Knittel, K.; Fuchs, B. M.; Ludwig, W.; Peplies, J.& Glöckner, F. O. 2007. SILVA: a comprehensive onlineresource for quality checked and aligned ribosomal RNAsequence data compatible with ARB. Nucleic Acids Research35:7188-7196. https://doi.org/10.1093/nar/gkm864

    Ratnasingham, Sujeevan, and Paul D. N. Hebert. 2013. “A DNA-basedregistry for all animal species: The Barcode Index Number(BIN) system.” PLOS ONE 8 (7): e66213.https://doi.org/10.1371/journal.pone.0066213.

    Richardson, R. T.; Bengtsson-Palme, J.; Gardiner, M. M. & Johnson,R. M. 2018. A reference cytochrome c oxidase subunit Idatabase curated for hierarchical classification of arthropodmetabarcoding data. PeerJ 6:e5126.https://doi.org/10.7717/peerj.5126

    Römbke, Jörg, Manuel Aira, Thierry Backeljau, Karin Breugelmans,Jorge Domínguez, Elisabeth Funke, Nadin Graf, et al. 2016.“DNA barcoding of earthworms (Eisenia fetida/andreiComplex) from 28 ecotoxicological test laboratories.” ISEE-10:The 10th International Symposium on Earthworm Ecology,22-27 June 2014, Athens, Georgia, USA 104 (August): 3–11.https://doi.org/10.1016/j.apsoil.2015.02.010.

    Saltmarsh, Deanna Marie, Matthew L. Bowser, John M. Morton,Shirley Lang, Daniel Shain, and Roman Dial. 2016.

    https://doi.org/10.1093/nar/gky1022https://doi.org/10.1016/j.funeco.2015.06.006https://doi.org/10.1371/journal.pone.0066213https://doi.org/10.7717/peerj.5126https://doi.org/10.1016/j.apsoil.2015.02.010

  • “Distribution and abundance of exotic earthworms within aboreal forest system in Southcentral Alaska.” NeoBiota 28(August): 67–86. https://doi.org/10.3897/neobiota.28.5503.

    Sherlock, E., and Field Studies Council (Great Britain). 2012. Key tothe Earthworms of the UK and Ireland. Occasional Publication/ Field Studies Council. FSC.

    https://doi.org/10.3897/neobiota.28.5503

  • Appendices2019-01-29-1116_work_on_yeti.txt

    ## Switching between yeti and R figuring out how to continue the AWCC analysis in the background on yeti## while I take care of other things.

    cd /home/mattbowser/2018_AWCC_soil_fungi/out_seqprephead prepped.fastawc -l prepped.fasta1019826 prepped.fasta ## That is a little over 1e6 lines.

    ## That is 1019826/2 = 509,913 sequences.

    ## For my last analysis on ? there were 59,732 inputsequences. From these there were 31,214 dereplicatedsequences, ## 31214/59732 = 0.5225675 or 52%.

    ## This took ITSx 2018-12-17 15:09:06 Extracting ITS2 from sequences [ITSx]2018-12-18 00:02:54 ... donestart

  • have about 509913 * 0.5225675 = 266464 dereplicated sequences, ## which should take about 266464/3508.505 = 75.94802 hours or 75.94802/24 = 3.164501 days.## My previous job was canceled after six days (6-10:56:32), so it was taking longer than expected.

    ## Wanting to split this up.

    ## from https://stackoverflow.com/questions/6424856/r-function-for-returning-all-factors

    FUN

  • print(cmds)

    ## After editing...mv xaa xaa.fastamv xab xab.fastamv xac xac.fastamv xad xad.fastamv xae xae.fastamv xaf xaf.fastamv xag xag.fastamv xah xah.fastamv xai xai.fasta

    cmd

  • srun --mpi=pmi2 pipits_funits -i out_seqprep/xag.fasta -o out_funits_xag -x ITS2srun --mpi=pmi2 pipits_funits -i out_seqprep/xah.fasta -o out_funits_xah -x ITS2srun --mpi=pmi2 pipits_funits -i out_seqprep/xai.fasta -o out_funits_xai -x ITS2

    cd /home/mattbowser/2018_AWCC_soil_fungivi 2019-01-29-1202_AWCCfungi.slurm

    #!/bin/bash#SBATCH --job-name=AWCCITSx#SBATCH -n 9 # number of nodes #SBATCH -n 9 # number of tasks #SBATCH -p long # parition#SBATCH --account=bio # account code#SBATCH --time=4-01:00:00 # requested job time D-HH:MM:SS#SBATCH --mail-type=ALL # choose when you want to beemailed#SBATCH [email protected] # add your email address#SBATCH -o 2019-01-29-1202_AWCCITSx-%j.out # name ofoutput file (the %j inserts the jobid)

    module load python/miniconda3-gcc6.1.0 # load required modulessource activate pipits_env # load PIPITS environmentcd /home/mattbowser/2018_AWCC_soil_fungisrun --mpi=pmi2 pipits_funits -i out_seqprep/xaa.fasta -o out_funits_xaa -x ITS2srun --mpi=pmi2 pipits_funits -i out_seqprep/xab.fasta -o out_funits_xab -x ITS2srun --mpi=pmi2 pipits_funits -i out_seqprep/xac.fasta -o out_funits_xac -x ITS2srun --mpi=pmi2 pipits_funits -i

  • out_seqprep/xad.fasta -o out_funits_xad -x ITS2srun --mpi=pmi2 pipits_funits -i out_seqprep/xae.fasta -o out_funits_xae -x ITS2srun --mpi=pmi2 pipits_funits -i out_seqprep/xaf.fasta -o out_funits_xaf -x ITS2srun --mpi=pmi2 pipits_funits -i out_seqprep/xag.fasta -o out_funits_xag -x ITS2srun --mpi=pmi2 pipits_funits -i out_seqprep/xah.fasta -o out_funits_xah -x ITS2srun --mpi=pmi2 pipits_funits -i out_seqprep/xai.fasta -o out_funits_xai -x ITS2source deactivate # deactivate PIPITSmodule purge # unload those modules

    ## Running it.sbatch 2019-01-29-1202_AWCCfungi.slurm

    ## That failed right away. Judging from the out file, I think it again tried to run things in parallel. I might need to break this into separate slurm jobs.

    sf

  • fn, ".out

    module load python/miniconda3-gcc6.1.0 # load required modulessource activate pipits_env # load PIPITS environmentcd /home/mattbowser/2018_AWCC_soil_fungisrun --mpi=pmi2 pipits_funits -i out_seqprep/", fn, ".fasta -o out_funits_", fn, " -x ITS2source deactivate # deactivate PIPITSmodule purge # unload those modules", sep="" )

    wd

  • sbatch xae.slurmsbatch xaf.slurmsbatch xag.slurmsbatch xah.slurmsbatch xai.slurm

    ## Wait. I want to change the job names.

    sf

  • source deactivate # deactivate PIPITSmodule purge # unload those modules", sep="" )

    for (thisf in 1:9) { write(sf[thisf], file=sn[thisf]) }

    ## Ok, trying one now.sbatch xaa.slurm## Got error. File has DOS line breaks instead of unix line breaks.

    for (thisf in 1:9) { of EOL conversion.sbatch xaa.slurm

    ## Got lots of errors. Maybe this cannot be split up like this.

    ## Going to just re-run that original job for now.sbatch 2018-12-20-1038_AWCCfungi.slurm

    2019-01-30-1509_looking_at_results.R

    ## Looking at Stormy fungi data again, this time

  • making sure to eliminate all non-fungi.wd

  • ## Remaking histogram for the article.pdf(file="2019-01-30-1662_hist_freq.pdf", width=4, height=4 )par(mar=c(4,4,1,1))hist(ct4$freq, breaks=0:6/6, xlab="Number of detections", ylab="Number of OTUs", main="", xaxt="n", col="gray" )axis(side=1, at=1:6/6 - 0.5/6, labels=1:6)dev.off()

    ## Resorting to look at most frequently observed OTUs.ct4

  • PC1 PC2 PC3 PC4 PC5 PC6site1 -4.914343 0.5844918 -3.839917 15.5903530 -1.3504194 2.517436e-14site2 -5.110602 -4.0457191 -10.797290 -7.3450695 -8.9553186 6.359063e-15site3 -9.776014 -7.2800890 14.443269 -2.1070481 -0.8510169 -3.169009e-14site4 -3.060605 -0.7910831 -5.807692 -3.7639991 13.8212994 3.054160e-14site5 20.789200 -6.1599321 1.973286 0.6347356 -0.6891978 -1.492242e-15site6 2.072364 17.6923314 4.028345 -3.0089719 -1.9753467 -2.819267e-14

    2019-01-31-1116_looking_at_results.R

    ## Today I am looking at differences between infested and non-infested sites.wd

  • ct4$wrmcor

  • 34 0 0 021 20 0 038 13 0 030 0 0 020 0 11 05 137 0 0209 0 0 079 0 0 0 nreads otu_id king phyl clas ord45 244 OTU661 Fungi Ascomycota Dothideomycetes Venturiales56 171 OTU14 Fungi Mortierellomycota Mortierellomycetes Mortierellales34 354 OTU611 Fungi Ascomycota Leotiomycetes Helotiales21 557 OTU613 Fungi Ascomycota Leotiomycetes Helotiales38 303 OTU674 Fungi Ascomycota Leotiomycetes Helotiales30 399 OTU330 Fungi Ascomycota Sordariomycetes Hypocreales20 578 OTU542 Fungi Ascomycota Leotiomycetes Helotiales5 1195 OTU471 Fungi Ascomycota Sordariomycetes Sordariales209 34 OTU459 Fungi Ascomycota Dothideomycetes Pleosporales

  • 79 121 OTU365 Fungi Ascomycota Sordariomycetes Hypocreales fam gen spec sim45 Venturiaceae Venturia 0.8656 unidentified unidentified Mortierellales_sp_SH213394.07FU 1.0034 1.0021 0.9638 unidentified unidentified 0.9130 Nectriaceae 0.8620 unidentified unidentified Helotiales_sp_SH013008.07FU 0.955 Chaetomiaceae Humicola Humicola_sp_SH195345.07FU 0.98209 Pleomassariaceae Tumularia Tumularia_sp_SH198695.07FU 1.0079 Nectriaceae 0.97 freq wrmcor45 0.5000000 0.952902456 0.5000000 0.937849834 0.5000000 0.915778521 0.6666667 0.899114538 0.6666667 0.894324930 0.5000000 0.889467520 0.6666667 0.80590555 0.6666667 0.7767712209 0.3333333 0.706910379 0.3333333 0.7067543

  • ## 10 records most negatively correlated with nightcrawler presence:ct4[1:10,]] X5348.2017MLB100.MSITS3 X5348.2017MLB101.MSITS3 X5348.2017MLB102.MSITS3130 0 0 07 0 0 0106 0 13 046 14 19 0194 0 0 035 0 0 082 0 0 0233 0 0 0189 0 0 0201 0 0 0 X5348.2017MLB103.MSITS3 X5348.2017MLB105.MSITS3 X5348.2017MLB107.MSITS3130 11 37 217 10 779 328106 36 24 2046 20 87 103

  • 194 0 24 1735 0 116 21582 0 42 75233 0 17 11189 28 0 15201 13 25 0 nreads otu_id king phyl clas ord130 69 OTU552 Fungi Ascomycota Dothideomycetes Pleosporales7 1117 OTU715 Fungi Ascomycota Dothideomycetes Venturiales106 93 OTU349 Fungi Ascomycota Eurotiomycetes Eurotiales46 243 OTU525 Fungi Ascomycota Dothideomycetes Capnodiales194 41 OTU194 Fungi Mucoromycota Umbelopsidomycetes Umbelopsidales35 331 OTU810 Fungi Ascomycota Leotiomycetes Helotiales82 117 OTU10 Fungi Mortierellomycota Mortierellomycetes Mortierellales233 28 OTU721 Fungi Ascomycota Leotiomycetes Helotiales189 43 OTU418 Fungi Ascomycota Sordariomycetes 201 38 OTU527 Fungi Ascomycota Leotiomycetes Helotiales fam gen spec sim

  • 130 Melanommataceae 0.857 Venturiaceae 0.99106 1.0046 0.96194 Umbelopsidaceae Umbelopsis 1.0035 Myxotrichaceae Oidiodendron Oidiodendron_pilicola_SH216991.07FU 0.9182 unidentified unidentified Mortierellales_sp_SH026734.07FU 1.00233 Sclerotiniaceae Mycopappus Mycopappus_alni_SH177350.07FU 0.98189 0.89201 0.92 freq wrmcor130 0.5000000 -0.97729857 0.5000000 -0.8852464106 0.6666667 -0.802841646 0.8333333 -0.7058453194 0.3333333 -0.705578435 0.3333333 -0.705174982 0.3333333 -0.7044942233 0.3333333 -0.7041084189 0.3333333 -0.7021832201 0.3333333 -0.7013344

    ## It is interesting that members of Venturiaceae are both most negatively and most positively correlated with nightcrawler presence. These are plant pathogens.

  • ## Mortierellales - positively associated with worm presence. These are mosty saprobes (see https://doi.org/10.3767/003158513X666268).

    ## Humicola spp. appear to be mainly decomposers (see https://www.sciencedirect.com/science/article/pii/S0166061618300319).

    ## Whoa, I found out how to look up UNITE SH (species hypothesis) entities much like BOLD BINs. That Humicola identified as Humicola_sp_SH195345.07FU can be looked up at https://unite.ut.ee/bl_forw_sh.php?sh_name=SH195345.07FU or http://dx.doi.org/10.15156/BIO/SH195345.07FU. Thereis a newer version of this SH: https://unite.ut.ee/bl_forw_sh.php?sh_name=SH1615609.08FU. This is identified as genusChaetomium. Some of these are endophytes. Some aresoil-dwelling, I think decomposers.

    ## Looking up that Mortierellales_sp_SH213394.07FU thing, https://unite.ut.ee/bl_forw_sh.php?sh_name=SH213394.07FU -> now SH1507815.08FU. Not much info on that one.

    ## Mycopappus alni is a leaf disease of alders, birches, and crabapples.

    ## Save image...save.image("2019-01-31-1626_workspace.RData")

    2019-02-01-1024_looking_at_results.R

    ## Today I am looking at differences between

  • infested and non-infested sites.wd

  • length(levels(as.factor(ct4$accid)))[1] 245

    accids

  • ct6

  • Epiphyte-Plant Saprotroph-Wood Saprotroph" [5] "Animal Pathogen-Dung Saprotroph-Endophyte-Lichen Parasite-Plant Pathogen-Undefined Saprotroph" [6] "Animal Pathogen-Endophyte-Fungal Parasite-Plant Pathogen-Wood Saprotroph" [7] "Animal Pathogen-Fungal Parasite-Undefined Saprotroph" [8] "Animal Pathogen-Plant Pathogen-Undefined Saprotroph" [9] "Animal Pathogen-Soil Saprotroph" [10] "Animal Pathogen-Undefined Saprotroph" [11] "Bryophyte Parasite-Ectomycorrhizal-Ericoid Mycorrhizal-Undefined Saprotroph" [12] "Bryophyte Parasite-Litter Saprotroph-Wood Saprotroph" [13] "Dung Saprotroph-Ectomycorrhizal" [14] "Dung Saprotroph-Ectomycorrhizal-Soil Saprotroph-Wood Saprotroph" [15] "Dung Saprotroph-Endophyte-Litter Saprotroph-Undefined Saprotroph" [16] "Dung Saprotroph-Endophyte-Undefined Saprotroph" [17] "Dung Saprotroph-Plant Saprotroph" [18] "Dung Saprotroph-Plant Saprotroph-Wood Saprotroph" [19] "Dung Saprotroph-Soil Saprotroph-Undefined Saprotroph" [20] "Dung Saprotroph-Soil Saprotroph-Wood Saprotrop" [21] "Ectomycorrhizal" [22] "Ectomycorrhizal-Endophyte-Ericoid Mycorrhizal-Litter Saprotroph-Orchid Mycorrhizal" [23] "Ectomycorrhizal-Fungal Parasite" [24] "Ectomycorrhizal-Fungal Parasite-Plant Pathogen-Wood Saprotroph" [25] "Ectomycorrhizal-Fungal Parasite-Soil

  • Saprotroph-Undefined Saprotroph" [26] "Ectomycorrhizal-Lichenized-Wood Saprotroph" [27] "Ectomycorrhizal-Orchid Mycorrhizal-Root Associated Biotroph" [28] "Ectomycorrhizal-Undefined Saprotroph" [29] "Endophyte" [30] "Endophyte-Fungal Parasite-Plant Pathogen" [31] "Endophyte-Lichen Parasite-Plant Pathogen-Undefined Saprotroph" [32] "Endophyte-Litter Saprotroph-Soil Saprotroph-Undefined Saprotroph" [33] "Endophyte-Litter Saprotroph-Wood Saprotroph" [34] "Endophyte-Plant Pathogen" [35] "Endophyte-Plant Pathogen-Undefined Saprotroph"[36] "Endophyte-Plant Pathogen-Wood Saprotroph" [37] "Endophyte-Undefined Saprotroph" [38] "Endophyte-Undefined Saprotroph-Wood Saprotroph" [39] "Ericoid Mycorrhizal" [40] "Fungal Parasite" [41] "Fungal Parasite-Lichen Parasite" [42] "Fungal Parasite-Plant Pathogen-Plant Saprotroph" [43] "Fungal Parasite-Undefined Saprotroph" [44] "Leaf Saprotroph-Plant Pathogen-Undefined Saprotroph-Wood Saprotroph" [45] "Lichenized-Undefined Saprotroph" [46] "Litter Saprotroph" [47] "Litter Saprotroph-Plant Pathogen" [48] "NULL" [49] "Orchid Mycorrhizal" [50] "Plant Pathogen" [51] "Plant Pathogen-Plant Saprotroph" [52] "Plant Pathogen-Undefined Saprotroph" [53] "Plant Pathogen-Wood Saprotroph" [54] "Plant pathogenic (?) on polen"

  • [55] "Plant Saprotroph" [56] "Plant Saprotroph-Wood Saprotroph" [57] "Soil Saprotroph" [58] "Undefined Saprotroph" [59] "Undefined Saprotroph-Wood Saprotroph" [60] "Wood Saprotroph" ## That is a lot of different guild assignments.

    ##"Orchid Mycorrhizal" This is just a cool one. What was this?ct6[ct6$Guild == "Orchid Mycorrhizal",] X5348.2017MLB100.MSITS3 X5348.2017MLB101.MSITS3 X5348.2017MLB102.MSITS3175 0 0 0 X5348.2017MLB103.MSITS3 X5348.2017MLB105.MSITS3 X5348.2017MLB107.MSITS3 nreads175 0 15 0 15 otu_id king phyl clas ord fam gen175 OTU151 Fungi Basidiomycota Agaricomycetes Sebacinales Serendipitaceae Serendipita spec sim freq wrmcor accid Taxon Taxon_Level Trophic_Mode175 0.99 0.1666667 -0.4472136 Serendipita Serendipita 13 Symbiotroph Guild Confidence_Ranking Growth_Morphology Trait Notes175 Orchid Mycorrhizal Highly Probable NULL NULL NULL Citation_Source175 Tedersoo L, et al. 2010. Mycorrhiza 20:217-263 (pro parte); Weiss et al. DOI: 10.1111/nph.13977

  • ## Ok, I need to simplify this. Using Excel.gd

  • x1 endophyte or parasite 7652 endophyte, mycorrhizal, parasite, or saprotroph 8413 endophyte, parasite, or saprotroph 93094 mycorrhizal 65375 mycorrhizal or saprotroph 18606 saprotroph 95927 unknown 19411

    ag1

  • samples")

    ## Ok, now trying to compare the two.ct7$nreads_worms

  • abundance with Lumbricus terrestris") dev.off()

    png(filename="2019-02-01-1240_guilds_noworms.png", width=900, height=500 )pie(agnw$x, labels = agnw$Group.1, main="Fungal guild abundance without Lumbricus terrestris") dev.off()

    ## Now percentages.agw$percent

  • 298 12 endophyte, mycorrhizal, parasite, or saprotroph 354 13 endophyte, parasite, or saprotroph 5055 214 mycorrhizal 5094 215 mycorrhizal or saprotroph 839 36 saprotroph 4137 177 unknown 8822 36## So earthworm infestation takes mycorrhizal fungi from 21% to 6%, cutting mycorrhizal fungi abundance by 2/3.

    save.image("2019-02-01-1248_workspace.RData")

    ## Carrying on after a break...## Let's compare individual sites.aga

  • { pie(aga[,thissite + 1], labels = aga[,1], main=paste("Fungal guild relative abundances, site", thissite), col=cls) }

    png(filename="2019-02-01-1438_site_guilds.png", width=800, height=1200, pointsize = 24 )cls

  • col=cls) } dev.off()

    png(filename="2019-02-01-1454_guilds_comparison.png", width=920, height=700, pointsize = 23 )cls

  • pie(agnw$x, labels = "", col=cls, main="Lumbricus absent")pie(agw$x, labels = "", col=cls, main="Lumbricus present")plot.new()legend( "top", legend = aga[,1], fill = cls )dev.off()

    ## Now for a table.st

  • this in its own directory.sed -e '10q' out_seqprep/prepped.fasta > out_seqprep_001/prepped.fasta

    ## made a new SLURM file, included below.

    sbatch 2019-02-11-1303_AWCCfungi_1-5.slurm

    ## That took 34 seconds but did not produce the expected output file.

    cat *3939971.outpipits_funits 2.2, the PIPITS Projecthttps://github.com/hsgweon/pipits---------------------------------

    2019-02-11 15:08:49 pipits_funits started2019-02-11 15:08:49 Checking input FASTA for illegalcharacters2019-02-11 15:08:49 ... done2019-02-11 15:08:49 Counting input sequences2019-02-11 15:08:50 ... number of input sequences: 52019-02-11 15:08:50 Dereplicating sequences for efficiency2019-02-11 15:08:51 ... done2019-02-11 15:08:51 Counting dereplicated sequences2019-02-11 15:08:51 ... number of dereplicated sequences: 52019-02-11 15:08:51 Extracting ITS2 from sequences [ITSx]2019-02-11 15:09:02 ... done2019-02-11 15:09:02 Counting ITS sequences (dereplicated)2019-02-11 15:09:02 ... number of ITS sequences (dereplicated): 52019-02-11 15:09:02 Removing short sequences below

  • 100bp2019-02-11 15:09:02 ... done2019-02-11 15:09:02 Counting length-filtered sequences (dereplicated)2019-02-11 15:09:02 ERROR: You have 0 sequences! Something isn't right.srun: error: n3-94: task 0: Exited with exit code 1

    ## Overwriting that previous file with a longer file.sed -e '100q' out_seqprep/prepped.fasta > out_seqprep_001/prepped.fasta

    wc -l out_seqprep_001/prepped.fasta## Ok, that was 100 lines long as it should be.

    ## Trying again...sbatch 2019-02-11-1303_AWCCfungi_1-5.slurm## At the previous pace that should takeround(34*100/60)[1] 57 ## minutes, or one hour!## We will see how long that takes.## It took 00:01:21, much faster, thankfully.

    cat *3939978.outpipits_funits 2.2, the PIPITS Projecthttps://github.com/hsgweon/pipits---------------------------------

    2019-02-11 15:24:07 pipits_funits started2019-02-11 15:24:07 Checking input FASTA for illegalcharacters2019-02-11 15:24:07 ... done2019-02-11 15:24:07 Counting input sequences2019-02-11 15:24:07 ... number of input sequences: 50

  • 2019-02-11 15:24:07 Dereplicating sequences for efficiency2019-02-11 15:24:07 ... done2019-02-11 15:24:07 Counting dereplicated sequences2019-02-11 15:24:07 ... number of dereplicated sequences: 502019-02-11 15:24:07 Extracting ITS2 from sequences [ITSx]2019-02-11 15:25:24 ... done2019-02-11 15:25:24 Counting ITS sequences (dereplicated)2019-02-11 15:25:24 ... number of ITS sequences (dereplicated): 312019-02-11 15:25:24 Removing short sequences below <100bp2019-02-11 15:25:24 ... done2019-02-11 15:25:24 Counting length-filtered sequences (dereplicated)2019-02-11 15:25:24 ... number of length-filtered sequences (dereplicated): 22019-02-11 15:25:24 Re-inflating sequences2019-02-11 15:25:25 ... done2019-02-11 15:25:25 Counting sequences after re-inflation2019-02-11 15:25:25 ... number of sequences with ITSsubregion: 22019-02-11 15:25:25 Cleaning temporary directory2019-02-11 15:25:25 Done - pipits_funits ended successfully. (Your ITS sequences are "out_funits_001/ITS.fasta")2019-02-11 15:25:25 Next step: pipits_process [ Example: pipits_process -i out_funits_001/ITS.fasta -o pipits_process ][mattbowser@yeti-login20 2018_AWCC_soil_fungi]

    ## That worked!

  • ## The rate was 100 sequences per 81 seconds or100/81[1] 1.234568 ## reads per second.

    ## Ok, trying a larger file. How big is the original file?wc -l out_seqprep/prepped.fasta1019826 out_seqprep/prepped.fasta

    ## Let's say we run 100K reads at a time.## That might take81/100 * 1e5 [1] 81000 ## seconds or81/100 * 1e5 * 1/60^2[1] 22.5 ## hours.

    ## Overwriting that previous file with a much longerfile.sed -e '100000q' out_seqprep/prepped.fasta > out_seqprep_001/prepped.fastawc -l out_seqprep_001/prepped.fasta ## That looked good.

    wc -l out_seqprep/prepped.fasta## That looked good.

    ## Now going for it.sbatch 2019-02-11-1303_AWCCfungi_1-5.slurm

    ## Uh-oh. I might not have given that enough time.scancel 3939980

    ## Revised the script to give two days.## Now going for it again.sbatch 2019-02-11-1303_AWCCfungi_1-5.slurm

  • 2019-02-11-1303_AWCCfungi_1-5.slurm

    #!/bin/bash#SBATCH --job-name=AWCCfungi_1-5 #SBATCH -n 1 # number of nodes #SBATCH -n 1 # number of tasks #SBATCH -p long # parition#SBATCH --account=bio # account code#SBATCH --time=02-01:00:00 # requested job time D-HH:MM:SS#SBATCH --mail-type=ALL # choose when you want to beemailed#SBATCH [email protected] # add your email address#SBATCH -o 2019-02-11-1303_AWCCfungi_1-5-%j.out # name of output file (the %j inserts the jobid)

    module load python/miniconda3-gcc6.1.0 # load required modulessource activate pipits_env # load PIPITS environmentcd /home/mattbowser/2018_AWCC_soil_fungisrun --mpi=pmi2 pipits_funits -i out_seqprep_001/prepped.fasta -o out_funits_001 -x ITS2source deactivate # deactivate PIPITSmodule purge # unload those modules

    2019-02-12-0925_yeti_script.txt

    ## Work on Yeti, 11.Feb.2019## I am going to see if I can run just the first 5 sequences through the cd /home/mattbowser/2018_AWCC_soil_fungi

    mkdir out_seqprep_002

  • sed -e '1,100000d;200000q' out_seqprep/prepped.fasta> out_seqprep_002/prepped.fastawc -l out_seqprep_002/prepped.fasta

    sbatch 2019-02-12-0940_AWCCfungi_2.slurm## That started. I hope it works. Now getting ready for the rest.

    mkdir out_seqprep_003mkdir out_seqprep_004mkdir out_seqprep_005mkdir out_seqprep_006mkdir out_seqprep_007mkdir out_seqprep_008mkdir out_seqprep_009mkdir out_seqprep_010

    sed -e '1,200000d;300000q' out_seqprep/prepped.fasta> out_seqprep_003/prepped.fastased -e '1,300000d;400000q' out_seqprep/prepped.fasta> out_seqprep_004/prepped.fastased -e '1,400000d;500000q' out_seqprep/prepped.fasta> out_seqprep_005/prepped.fastased -e '1,500000d;600000q' out_seqprep/prepped.fasta> out_seqprep_006/prepped.fastased -e '1,600000d;700000q' out_seqprep/prepped.fasta> out_seqprep_007/prepped.fastased -e '1,700000d;800000q' out_seqprep/prepped.fasta> out_seqprep_008/prepped.fastased -e '1,800000d;900000q' out_seqprep/prepped.fasta> out_seqprep_009/prepped.fastased -e '1,900000d;1019826q' out_seqprep/prepped.fasta > out_seqprep_010/prepped.fasta

    wc -l out_seqprep_003/prepped.fasta

  • wc -l out_seqprep_004/prepped.fastawc -l out_seqprep_005/prepped.fastawc -l out_seqprep_006/prepped.fastawc -l out_seqprep_007/prepped.fastawc -l out_seqprep_008/prepped.fastawc -l out_seqprep_009/prepped.fastawc -l out_seqprep_010/prepped.fasta

    ## Yeah, yesterday's SLURM job completed!## Let's have a look.cat *983.out## ok 50k input filtered to 7203 reads after re-inflation.

    ## Trying to run parallel analyses...sbatch 2019-02-12-1026_AWCCfungi_3-10.slurm## That failed.

    ## Cleaned up a little.## Now running as separate batch files.sbatch 2019-02-12-1055_AWCCfungi_03.slurmsbatch 2019-02-12-1055_AWCCfungi_04.slurmsbatch 2019-02-12-1055_AWCCfungi_05.slurmsbatch 2019-02-12-1055_AWCCfungi_06.slurmsbatch 2019-02-12-1055_AWCCfungi_07.slurmsbatch 2019-02-12-1055_AWCCfungi_08.slurmsbatch 2019-02-12-1055_AWCCfungi_09.slurmsbatch 2019-02-12-1055_AWCCfungi_10.slurm

    2019-02-12-0940_AWCCfungi_2.slurm

    #!/bin/bash#SBATCH --job-name=AWCCfungi_1-5 #SBATCH -n 1 # number of nodes #SBATCH -n 1 # number of tasks #SBATCH -p long # parition

  • #SBATCH --account=bio # account code#SBATCH --time=02-01:00:00 # requested job time D-HH:MM:SS#SBATCH --mail-type=ALL # choose when you want to beemailed#SBATCH [email protected] # add your email address#SBATCH -o 2019-02-12-0940_AWCCfungi_2-%j.out # nameof output file (the %j inserts the jobid)

    module load python/miniconda3-gcc6.1.0 # load required modulessource activate pipits_env # load PIPITS environmentcd /home/mattbowser/2018_AWCC_soil_fungisrun --mpi=pmi2 pipits_funits -i out_seqprep_002/prepped.fasta -o out_funits_002 -x ITS2source deactivate # deactivate PIPITSmodule purge # unload those modules

    2019-02-12-1055_AWCCfungi_10.slurm

    This is an example of one of the eight files submitted nearlysimultaneously.

    #!/bin/bash#SBATCH --job-name=AWCCfungi_10 #SBATCH -n 1 # number of nodes #SBATCH -n 1 # number of tasks #SBATCH -p long # parition#SBATCH --account=bio # account code#SBATCH --time=02-01:00:00 # requested job time D-HH:MM:SS#SBATCH --mail-type=ALL # choose when you want to beemailed

  • #SBATCH [email protected] # add your email address#SBATCH -o 2019-02-12-1052_AWCCfungi_10-%j.out # name of output file (the %j inserts the jobid)

    module load python/miniconda3-gcc6.1.0 # load required modulessource activate pipits_env # load PIPITS environmentcd /home/mattbowser/2018_AWCC_soil_fungi#srun --mpi=pmi2 pipits_funits -i out_seqprep_003/prepped.fasta -o out_funits_003 -x ITS2#srun --mpi=pmi2 pipits_funits -i out_seqprep_004/prepped.fasta -o out_funits_004 -x ITS2#srun --mpi=pmi2 pipits_funits -i out_seqprep_005/prepped.fasta -o out_funits_005 -x ITS2#srun --mpi=pmi2 pipits_funits -i out_seqprep_006/prepped.fasta -o out_funits_006 -x ITS2#srun --mpi=pmi2 pipits_funits -i out_seqprep_007/prepped.fasta -o out_funits_007 -x ITS2#srun --mpi=pmi2 pipits_funits -i out_seqprep_008/prepped.fasta -o out_funits_008 -x ITS2#srun --mpi=pmi2 pipits_funits -i out_seqprep_009/prepped.fasta -o out_funits_009 -x ITS2srun --mpi=pmi2 pipits_funits -i out_seqprep_010/prepped.fasta -o out_funits_010 -x ITS2source deactivate # deactivate PIPITSmodule purge # unload those modules

  • 2019-02-13-1006_soil_fungi_map.R

    ## Making a map for the Lumbricus soil fungi article.wd

  • pts1

  • lwd=2, add=TRUE ) points( sitesa, lwd=2, cex=1.2 ) legend("topright", bg="white", legend=ldf$lab, fill=ldf$fill, border=ldf$border, lwd=ldf$lwd, pch=ldf$pch, pt.cex=ldf$ptcex, pt.lwd=ldf$ptlwd, col=ldf$col ) map.scale(x=160844.7+20, y=1204136+130, len=200, ndivs=2, units="m", subdiv=100) north.arrow(xb=160844.7+0, yb=1204136+60, len=8, lab="N")dev.off()

    lab

  • pdf(file="2019-02-13-1041_soil_fungi_map.pdf", width=6, height=6 ) par(mar=c(0.1, 0.1, 0.1, 0.1)) par(bg=land) plot(sitesa, pch="", bg=land ) plot(streams, add=TRUE, col="#1d6b95", lwd=1 )

    plot(roads, col="#888888", lwd=2, add=TRUE ) points( sitesa, lwd=2, cex=1.2 ) points( 160973.5, 1204136, pch=21, cex=40, col="red", lwd=2 ) plot(lakes, add=TRUE, col=water,

  • border="#1d6b95", lwd=1 ) text(161110, 1204160, "Stormy Lake", srt=46 ) legend("topright", bg="white", legend=ldf$lab, fill=ldf$fill, border=ldf$border, lwd=ldf$lwd, pch=ldf$pch, pt.cex=ldf$ptcex, pt.lwd=ldf$ptlwd, col=ldf$col ) map.scale(x=160844.7+20, y=1204136+130, len=200, ndivs=2, units="m", subdiv=100) north.arrow(xb=160844.7+0, yb=1204136+60, len=8, lab="N")dev.off()

    2019-02-13-1244_yeti_stuff.txt

    cd /home/mattbowser/2018_AWCC_soil_fungi

    ## Trying to combine all of those output files.cat \out_funits_001/ITS.fasta \out_funits_002/ITS.fasta \out_funits_003/ITS.fasta \out_funits_003/ITS.fasta \out_funits_003/ITS.fasta \out_funits_004/ITS.fasta \

  • out_funits_005/ITS.fasta \out_funits_006/ITS.fasta \out_funits_007/ITS.fasta \out_funits_008/ITS.fasta \out_funits_009/ITS.fasta \out_funits_010/ITS.fasta \> out_funits/ITS.fasta

    ## Trying to finish...sbatch 2019-02-13-1226_AWCCfungi_finish.slurm

    ## How many lines are in that ITS.fasta file?wc -l out_funits/ITS.fasta## Got 76624.

    2019-02-13-1226_AWCCfungi_finish.slurm

    #!/bin/bash#SBATCH --job-name=AWCCfungifinish #SBATCH -n 1 # number of nodes #SBATCH -n 1 # number of tasks #SBATCH -p long # parition#SBATCH --account=bio # account code#SBATCH --time=02-01:00:00 # requested job time D-HH:MM:SS#SBATCH --mail-type=ALL # choose when you want to beemailed#SBATCH [email protected] # add your email address#SBATCH -o 2019-02-12-1052_AWCCfungifinish%j.out # name of output file (the %j inserts the jobid)

    module load python/miniconda3-gcc6.1.0 # load required modulessource activate pipits_env # load PIPITS environmentcd /home/mattbowser/2018_AWCC_soil_fungi

  • srun --mpi=pmi2 pipits_process -i out_funits/ITS.fasta -o out_processsrun --mpi=pmi2 vsearch --usearch_global out_process/repseqs.fasta \ --db out_process/repseqs.fasta --self --id .84 --iddef 1 \ --userout match_list.txt -userfields query+target+id \ --maxaccepts 0 --query_cov .9 --maxhits 10source deactivate # deactivate PIPITSmodule purge # unload those modules

    2019-02-14-0939_yeti_stuff.txt

    cd /home/mattbowser/2018_AWCC_soil_fungi

    ## Looking at output from yesterday.cat *638.outpipits_process 2.2, the PIPITS Projecthttps://github.com/hsgweon/pipits---------------------------------

    2019-02-13 14:40:43 pipits_process started2019-02-13 14:40:43 Generating a sample list from the input sequences2019-02-13 14:40:43 Dereplicating and removing unique sequences prior to picking OTUs2019-02-13 14:40:44 Picking OTUs [VSEARCH]2019-02-13 14:40:48 Removing chimeras [VSEARCH]2019-02-13 14:40:56 Renaming OTUs2019-02-13 14:40:56 Mapping reads onto centroids [VSEARCH]2019-02-13 14:42:10 Making OTU table2019-02-13 14:42:11 Converting classic tabular OTU into a BIOM format [BIOM]2019-02-13 14:42:38 Assigning taxonomy with UNITE

  • [RDP Classifier]2019-02-13 14:46:34 Reformatting RDP_Classifier output2019-02-13 14:46:34 Adding assignment to OTU table [BIOM]2019-02-13 14:46:36 Converting OTU table with taxa assignment into a BIOM format [BIOM]2019-02-13 14:46:38 Phylotyping OTU table2019-02-13 14:46:40 Cleaning temporary directory2019-02-13 14:46:40 Number of reads used to generate OTU table: 290172019-02-13 14:46:40 Number of OTUs: 8992019-02-13 14:46:40 Number of phylotypes: 992019-02-13 14:46:40 Number of samples: 122019-02-13 14:46:40 Done - Resulting files are in "out_process" directory2019-02-13 14:46:40 pipits_process ended successfully.vsearch v2.10.2_linux_x86_64, 125.9GB RAM, 20 coreshttps://github.com/torognes/vsearch

    Reading file out_process/repseqs.fasta 100%366088 nt in 899 seqs, min 103, max 472, avg 407Masking 100%Counting k-mers 100%Creating k-mer index 100%Searching 100%Matching query sequences: 711 of 899 (79.09%)

    ## Ok, just looking at how much was filtered..wc -l out_seqprep/prepped.fasta1019826 out_seqprep/prepped.fasta## So that would have been

  • 1019826/2[1] 509913 # reads.

    wc -l out_funits/ITS.fasta76624 out_funits/ITS.fasta## Down to 76624/2[1] 38312 # reads after ITSx step. That is a whole lot fewer!

    ## Looking at output files.## Here is the first of 10 (lines 1-100K):cat *983.outpipits_funits 2.2, the PIPITS Projecthttps://github.com/hsgweon/pipits---------------------------------

    2019-02-11 15:59:56 pipits_funits started2019-02-11 15:59:56 Checking input FASTA for illegalcharacters2019-02-11 15:59:56 ... done2019-02-11 15:59:56 Counting input sequences2019-02-11 15:59:56 ... number of input sequences: 500002019-02-11 15:59:56 Dereplicating sequences for efficiency2019-02-11 15:59:57 ... done2019-02-11 15:59:57 Counting dereplicated sequences2019-02-11 15:59:57 ... number of dereplicated sequences: 481322019-02-11 15:59:57 Extracting ITS2 from sequences [ITSx]2019-02-12 12:19:16 ... done2019-02-12 12:19:16 Counting ITS sequences (dereplicated)2019-02-12 12:19:16 ... number of ITS sequences

  • (dereplicated): 252772019-02-12 12:19:16 Removing short sequences below <100bp2019-02-12 12:19:16 ... done2019-02-12 12:19:16 Counting length-filtered sequences (dereplicated)2019-02-12 12:19:16 ... number of length-filtered sequences (dereplicated): 71662019-02-12 12:19:16 Re-inflating sequences2019-02-12 12:19:17 ... done2019-02-12 12:19:17 Counting sequences after re-inflation2019-02-12 12:19:17 ... number of sequences with ITSsubregion: 72032019-02-12 12:19:17 Cleaning temporary directory2019-02-12 12:19:17 Done - pipits_funits ended successfully. (Your ITS sequences are "out_funits_001/ITS.fasta")2019-02-12 12:19:17 Next step: pipits_process [ Example: pipits_process -i out_funits_001/ITS.fasta -o pipits_process ]

    ## So here it looks like the ITSx step removed abouthalf of the sequences (48K to 25K)## But most of these were < 100bp. Removing these left only 7K.## So overall the pipits_funits step is yielding7203/50000[1] 0.14406 # 14% of number of input reads.

    ## Looking at last set (lines 900001-1019826).cat *736.outpipits_funits 2.2, the PIPITS Projecthttps://github.com/hsgweon/pipits---------------------------------

  • 2019-02-12 13:02:16 pipits_funits started2019-02-12 13:02:16 Checking input FASTA for illegalcharacters2019-02-12 13:02:18 ... done2019-02-12 13:02:18 Counting input sequences2019-02-12 13:02:18 ... number of input sequences: 599132019-02-12 13:02:18 Dereplicating sequences for efficiency2019-02-12 13:02:21 ... done2019-02-12 13:02:21 Counting dereplicated sequences2019-02-12 13:02:21 ... number of dereplicated sequences: 556632019-02-12 13:02:21 Extracting ITS2 from sequences [ITSx]2019-02-13 13:39:14 ... done2019-02-13 13:39:14 Counting ITS sequences (dereplicated)2019-02-13 13:39:14 ... number of ITS sequences (dereplicated): 515222019-02-13 13:39:14 Removing short sequences below <100bp2019-02-13 13:39:14 ... done2019-02-13 13:39:14 Counting length-filtered sequences (dereplicated)2019-02-13 13:39:14 ... number of length-filtered sequences (dereplicated): 12842019-02-13 13:39:14 Re-inflating sequences2019-02-13 13:39:15 ... done2019-02-13 13:39:15 Counting sequences after re-inflation2019-02-13 13:39:15 ... number of sequences with ITSsubregion: 12952019-02-13 13:39:15 Cleaning temporary directory2019-02-13 13:39:15 Done - pipits_funits ended successfully. (Your ITS sequences are

  • "out_funits_010/ITS.fasta")2019-02-13 13:39:15 Next step: pipits_process [ Example: pipits_process -i out_funits_010/ITS.fasta -o pipits_process ][mattbowser@yeti-login20 2018_AWCC_soil_fungi]

    ## Wow, there was an even higher precentage dropped there. Most of these reads were < 100 bp.

    ## I just reviewed my Stormy Lake analysis for comparison. Here there were 85,606 raw reads, 60,178 joined reads, 59,732 filtered reads input to pipits_funits. Here there were 59,732 input reads, 31,214 dereplicated sequences, 28,218 sequences after ITSx and length filtering, and 54,294 reads after re-inflation. Got 874 OTUs from 6 samples. This is much different that the AWCC analysis.

    ## Looking at notes from December 20 for the AWCC analysis.## In the pipisno_seqprep step there were initially 577,039 reads.

    ## Perhaps I did something wrong on December 18 in regards to splitting the original file, etc.

    ## Going to try installing QIIME2 to demultiplex.## Using instructions at the URI below.https://docs.qiime2.org/2019.1/install/native/#install-qiime-2-within-a-conda-environment

    module load python/miniconda3-gcc6.1.0conda update conda## This started, but permission was denied.## Just checking:module avail

  • ## Nothing newer.## Proceeding.wget https://data.qiime2.org/distro/core/qiime2-2019.1-py36-linux-conda.ymlconda env create -n qiime2-2019.1 --file qiime2-2019.1-py36-linux-conda.yml

    ## (installed)## Some output:

    # To activate this environment, use:# > source activate qiime2-2019.1## To deactivate an active environment, use:# > source deactivate

    ## Transferred raw illumina FASTQ files. Need to uncompress these.gunzip original_data/SAMP1-12_S4_L001_R1_001.fastq.gz gunzip original_data/SAMP1-12_S4_L001_R2_001.fastq.gz

    ## Looking at these.wc -l original_data/SAMP1-12_S4_L001_R1_001.fastq5416748 original_data/SAMP1-12_S4_L001_R1_001.fastq

    wc -l original_data/SAMP1-12_S4_L001_R2_001.fastq5416748 original_data/SAMP1-12_S4_L001_R2_001.fastq

    ## So those are the same length, good (5.4 M lines, 1.4 M reads.)

    ## Trying this as a SLURM script.sbatch 2019-02-14-1248_join.slurm

  • qiime join_paired_ends.py \ -f original_data/SAMP1-12_S4_L001_R1_001.fastq \ -r original_data/SAMP1-12_S4_L001_R2_001.fastq \ -o joined

    -bash