Study

Coding Lessons for Data Analysis in Animal Breeding

Are you interested in learning how to code for data analysis in animal breeding and genetics? Look no further! As a PhD in bioinformatics, I am excited to offer my expertise to help you learn coding skills in Linux, R, Python, Perl, and how to handle and process large datasets in genomics. 


Why learn coding for data analysis?


What you will learn?


If you're interested in learning to code for data analysis in animal breeding and genetics, please visit my blog to learn more about this and schedule a session. I am excited to work with you to enhance your coding skills and help you achieve your research goals.

Software programs for animal breeding analysis

There are several software programs available for animal breeding analysis. Some of the most popular ones include:


1. ASReml

ASReml is a powerful statistical package that fits linear mixed models using Residual Maximum Likelihood (REML). Development in collaboration between experts in the analysis of mixed models, animal breeding, spatial and longitudinal data analysis, and software development, ASReml has been under development since 1993. It offers a stable platform for delivering established procedures while also delivering current research in the application of linear mixed models. Its use of the Average Information (AI) algorithm and sparse matrix methods allow for efficient analysis of large and complex datasets. ASReml offers a wide range of variance models for the random effects in the linear mixed model that are available and provides a rich and flexible tool for analyzing many data sets commonly arising in the agricultural, biological, medical, and environmental sciences. ASReml offers an R package as well, but both the ASReml software and the R package are paid. ASReml can be utilized across different operating systems including Windows, Mac, and Linux. Download the latest version of ASReml here!!



2. BLUPF90

BLUPF90 is a collection of computer programs that specialize in mixed-model calculations for animal breeding applications. These programs have a wide range of functions for large data sets, and the use of SNP information for improved accuracy in breeding values and GWAS. The programs were designed to be flexible, simple, and efficient, and have been used in numerous studies as well as for commercial genetic evaluation in many animal industries. The programs were initially created as teaching exercises and have since been improved by many contributors. BLUPF90 is written in Fortran 90/95, and compiled versions are available for free of research purposes on Linux, Windows, and Mac OS X. More information on the programming and computing algorithms used in BLUPF90 can be found in an Interbull paper and as course notes. Download the latest version of BLUPF90 here!!



3. GenSel

GenSel v4.90 is a widely used program for estimating molecular breeding values of animals in selection using SNP or marker data for the phenotype of interest. The software was written by Rohan Fernando and implemented by Dorian J. Garrick at Iowa State University as part of the Bioinformatics to implement Genomic Selection (BIGS) project. The original version of the GenSel was developed on the MAC platform using the GNU compiler collection (GCC) and libraries from GNU scientific libraries (GSL), MatVec, and Boost. GenSel can be operated via the command line on MAC or Unix interfaces or through a user-friendly menu-driven approach on any operating system. The software can be used for Bayesian estimation of marker effects of a training dataset using Bayes A, B, C, and Cpi approaches, least squares, linkage disequilibrium estimation, and prediction of molecular breeding value using validation or unrelated marker datasets without phenotypic records. Download the latest version of GenSel here!!



4. MTG2

MTG2 is a computer program designed to estimate genetic and environmental variance and covariance across multiple traits using the Genomic Residual Maximum Likelihood (GREML) method. The program employs a multivariate linear mixed model that can fit complex covariance structures derived from genomic information, making it a multivariate version of GCTA GREML. Additionally, the program provides the Best Linear Unbiased Prediction (BLUP) of additive genetic effects, which can be in the form of either breeding values or predictions of genetic risk. MTG2 uses the direct Average Information (AI) algorithm to achieve these estimates. The MTG2 software is available for free of research purposes in a Linux environment. Download the latest version of MTG2 here!!



5. MiXBLUP

MiXBLUP is a state-of-the-art genetic evaluation system that can be used by all breeding organizations to estimate the genetic merit of individuals based on observations and genetic similarity, provided that the components of variance are known. MiXBLUP has the potential to accelerate genetic progress in breeding populations and is characterized by its ease of use, speed, ability to analyze a large number of traits simultaneously, and capacity to fit complex models. The software supports various methods for determining genetic similarity between individuals, including pedigree relationships, genomic relationships, and regression on SNP covariates. To access MiXBLUP, the software can be downloaded from its website, but a license is required for its use, which can be ordered from the website. A 30-day free trial license is available for evaluation purposes. Download the latest version of MiXBLUP here!!



6. DMU

DMU is a software package that facilitates quantitative genetics research by estimating variance components and predicting random effects. It has been developed over a period of more than 25 years to meet the needs of applied quantitative animal genetics research in Denmark, and has since been used in genetic evaluations of cattle, sheep, mink, and horses in several countries. The package implements a wide range of statistical methods and computation algorithms and is distributed as executables for Linux, Windows, and OS x. DMU is free for research purposes and should be acknowledged in publications referencing the package's manual. The latest version of DMU can be found here!!



7. WOMBAT

The program WOMBAT makes it easier to fit analyses to linear, mixed models using restricted maximum likelihood (REML). Although WOMBAT was designed with quantitative genetic analysis in mind, it can be used in a variety of other contexts. The estimation of (co)variance components and the consequent genetic parameters are its primary goals. It is particularly well adapted to fitting relatively simple models to assessments of moderately big to huge data sets from cattle improvement programs. The WOMBAT software can be freely downloaded for research purposes and is compatible with Linux, Windows, and MAC OS X operating systems. The latest version of WOMBAT can be found here!!



8. HiBLUP

HiBLUP (Say 'Hi' to Best Linear Unbiased Prediction) is a software package for the analysis of genomic data in animal breeding. One of the key features of HiBLUP is its ability to handle large datasets, including both genotypic and phenotypic data. It is designed to perform efficient and accurate genomic prediction in high-dimensional settings, where the number of markers or SNPs is much larger than the number of individuals in the sample. HIBLUP is a C++ program that can be used in terminal on various platforms without the need for installation. The executable files are freely available and can be found for different platforms here!!



9. JWAS

JWAS is an interactive software platform that utilizes Julia and Jupyter notebook for analyzing univariate and multivariate Bayesian mixed effects models. The software is particularly useful for genomic prediction and genome-wide association studies using either complete or incomplete genomic data. JWAS offers a broad range of analyses, including Bayesian methods for whole-genome analyses, shrinkage estimation, and variable selection methods. Its features include univariate and multivariate analysis, no limitations on fixed effects, random effects other than markers, additive genetic effects, maternal effects, random permanent environmental effects, correlated residuals, correlated random effects, use of genomic information, and support for incomplete genomic data. The latest version of JWAS can be found here!!



10. PLINK

PLINK (Purposeful Linkage) is a popular, open-source software package for genetic data analysis in humans and other organisms. It provides a suite of tools for performing various analyses, including genome-wide association studies (GWAS), linkage analysis, and population structure analysis. One of the key features of PLINK is its ability to handle large datasets, including both genotypic and phenotypic data. It is designed to be highly efficient and able to handle millions of markers or SNPs, making it an ideal tool for large-scale genetic studies. The PLINK can be freely downloaded for research purposes and is compatible with Linux, Windows, and MAC OS X operating systems. The latest version of PLINK can be found here!!



11. SVS

SNP & Variation Suite is a user-friendly tool that enables biologists and researchers to analyze and visualize genomic and phenotypic data easily. The software offers a range of numerical analysis methods, including principal component analysis and Fisher's exact test, and supports data management with natural pan and zoom controls, smart labeling, and display of results, raw data, and annotation sources in a single view. Additionally, the software provides genomic visualization for microarray and whole-exome data and supports call copy number variants on large-N workflows. It also offers support and extensibility with technical documentation, customer support, and training, and includes features for clinical variant scoring, such as splice site predictions and functional predictions for missense mutations. The software is available for Linux, Windows, and MAC OS X for research purposes. To use this software, a license must be obtained from the website through ordering. The latest version of SVS can be found here!!



12. GCTA

GCTA (Genome-wide Complex Trait Analysis) is a comprehensive software package designed to analyze data from genome-wide association studies (GWASs). Initially developed for estimating the proportion of phenotypic variance explained by all genome-wide SNPs for a complex trait, GCTA has since been extended to support various other analyses. To use GCTA, users must obtain a license from the website. GCTA offers a range of analyses including heritability, genetic correlation, and phenotype prediction. It can estimate genetic relationships among individuals, inbreeding coefficients of individuals in GWAS data, SNP-based heritability, and the partitioning of genetic variance into contributions from different sets of SNPs stratified by chromosome location, allele frequency, or functional annotation. Additionally, GCTA offers genome-wide association analysis, GWAS simulation, population genetics, and Mendelian randomisation. GCTA supports several ultra-fast linear model association analyses, mixed linear model association analyses, conditional and joint association analysis, gene- or set-based association tests, and multi-trait-based conditional and joint association analysis. The software is freely available for Linux, Windows, and MAC OS X for research purposes. The latest version of GCTA can be found here!!



13. ADMIXTURE

ADMIXTURE is a software tool designed for maximum likelihood estimation of individual ancestries using multilocus SNP genotype datasets. It utilizes the same statistical model as STRUCTURE but employs a fast numerical optimization algorithm for much faster estimation of individual ancestry fractions. ADMIXTURE utilizes a block relaxation approach to iteratively update allele frequency and ancestry fraction parameters, with each block update solved by solving a large number of independent convex optimization problems using a fast sequential quadratic programming algorithm. The algorithm's convergence is enhanced using a unique quasi-Newton acceleration technique. Compared to EM algorithms and MCMC sampling methods, ADMIXTURE performs much better. The software is freely available for Linux and MAC OS X for research purposes. The latest version of ADMIXTURE can be found here!!



14. ARLEQUIN

Arlequin ver 3.5.2.2 is a software package for population genetic data analysis. It provides breeders and researchers with a suite of tools for performing various analyses, including the estimation of genetic diversity, population structure, and migration patterns. Arlequin ver 3.5.2.2 is designed to be highly flexible, allowing users to perform custom analyses and incorporate user-defined models. The software is freely available for Linux, Windows, and MAC OS X for research purposes. The latest version of ARLEQUIN can be found here!!



15. SelScan

SelScan is a software package for detecting signatures of selection in genomic data. It provides breeders and researchers with a suite of tools for identifying genomic regions that have been subject to positive selection or selective sweeps. It provides a fast and accurate method for detecting signatures of selection, allowing breeders and researchers to gain insights into the evolutionary history of their populations. SelScan is also highly flexible, allowing users to perform custom analyses and incorporate user-defined models. The software is freely available for Linux, Windows, and MAC OS X for research purposes. The latest version of SelScan can be found here!!



16. R

R is a powerful statistical software that can be used for animal breeding studies. Here are some of the ways R can be used in animal breeding:

Some popular R packages for animal breeding include BLR, ASReml, rrBLUP, and MCMCglmm. These packages provide various tools for analyzing breeding data and estimating breeding values.



17. FImpute

FImpute (ef-impute) is a software tool designed for efficient genotype imputation in large-scale livestock datasets. It employs an overlapping sliding window strategy to leverage haplotype similarities between target and reference individuals. The algorithm starts with longer windows to capture haplotype similarity between closely related individuals and subsequently shrinks the window size after each chromosome sweep to consider shorter haplotype similarities from more distantly related individuals. FImpute assumes that all individuals are related to each other at varying degrees, and pedigree information is utilized to improve imputation accuracy, particularly for sparse low-density panels. However, high-quality input genotypes are crucial for accurate imputation with FImpute. Currently, FImpute only supports SNP markers. The FImpute software is distributed "AS IS" solely for non-commercial use. The software can be used upon request to the authors, subject to reasonable conditions. See more here!!



18. Beagle

Beagle 5.4 is a software package for genotype imputation in animal breeding. It is a powerful tool for predicting missing genetic information in genomic datasets, allowing breeders and researchers to make predictions about the genotypes of individuals based on the data that is available. It provides a fast and efficient method for imputing missing genotypes, making it an ideal tool for large-scale genomic studies. It is also highly customizable, allowing users to easily incorporate custom algorithms and models into the framework. Beagle is free software and the latest version can be downloaded from here!!



19. CFC

CFC (Contribution, Inbreeding, and Coancestry) is a software package for genetic analysis in animal breeding. It is a tool for estimating the individual contributions, inbreeding coefficients, and coancestry coefficients of individuals in a population based on their genetic information. CFC provides a flexible and efficient framework for building models to estimate individual contributions, inbreeding coefficients, and coancestry coefficients in animal populations. The software is freely available for Windows. The latest version of CFC can be found here!!



20. GenomeStudio

GenomeStudio is a software platform for genetic data analysis, primarily designed for the analysis of genotyping data generated using Illumina platforms. It is a powerful tool for performing a variety of genetic analyses, including genotype calling, quality control, and association studies. GenomeStudio provides a user-friendly interface for analyzing genetic data, with tools for visualizing and summarizing data, as well as conducting advanced statistical analysis. It is also highly customizable, with options for importing custom algorithms and models, as well as integrating with other data analysis tools. In addition to its robust feature set, GenomeStudio is also highly flexible, allowing users to work with a variety of data types, including genotyping arrays, whole-genome sequencing data, and transcriptomics data. This makes it a versatile solution for genetic data analysis in a variety of applications. The software is freely available. The latest version of GenomeStudio can be found here!!



21. fcGENE

fcGENE is a software tool for converting genotype data between different file formats. It provides a convenient and efficient way to convert genotype data from one format to another, making it easier to work with data from different sources and to use different analysis tools. fcGENE supports a wide range of genotype file formats, including PLINK, FASTA, and VCF, among others. It is designed to be fast and efficient, allowing users to quickly convert large datasets with a minimal amount of time and effort. fcGENE also provides a number of advanced features, such as the ability to filter and sort data, as well as to specify specific subsets of data to convert. These features make it possible to easily manipulate and pre-process genotype data prior to analysis, further enhancing its utility. The software is freely available for Linux, Windows, and MAC OS X. The latest version of fcGENE can be found here!!



22. Haploview

Haploview is a software tool designed for analyzing haplotype data from genetic studies. It offers a variety of functions, including LD (linkage disequilibrium) and haplotype block analysis, haplotype population frequency estimation, single SNP and haplotype association tests, permutation testing for association significance, implementation of the Tagger tag SNP selection algorithm, automatic download of phased genotype data from HapMap, and visualization and plotting of PLINK whole genome association results with advanced filtering options. Haploview is compatible with data dumps from the HapMap project and the Perlegen Genotype Browser, and can analyze thousands of SNPs in thousands of individuals. The software is easy to use and comes with a tutorial to help users get started. It is also fully compatible with different operating systems, and Broad Institute users can run the software without administrator privileges by saving the Haploview.jar file to their local user folder and double-clicking on it. See details here!!



23. CMS

CMS (Composite of Multiple Signals) is a software tool designed to detect regions in the genome that have undergone positive selection and to identify the causal variant within those regions. The tool utilizes three different signals of positive selection - long-range haplotypes, differentiated alleles, and high frequency derived alleles - to generate a composite score. By combining these signals, CMS is able to detect positive selection at a much higher resolution than traditional methods, often down to the level of single genes. Additionally, CMS has better power than haplotype-based methods when detecting regions under selection that occurred relatively long ago, around 20-35 thousand years ago. See details here!!


These programs offer a range of features, including pedigree management, genetic analysis, performance tracking, and data visualization. The specific features and capabilities of each program may vary, so it is important to carefully consider your needs and choose the software that is best suited for your specific breeding program.

A Comprehensive List of Research Tools for Researchers

In the world of academic research, access to the latest scientific articles and research papers is critical. However, the high cost of accessing scientific papers often limits the ability of researchers, particularly those in developing countries, to access the latest research findings. Fortunately, there are several online tools available that can help researchers access the research they need without breaking the bank. In this post, we will explore some of the most popular research tools available to researchers today.

1. EndNote: Researchers spend almost 200,000 hours annually just formatting citations. Think about how much more research they could conduct if they could retrieve that time. EndNote is a reference management software that helps to collect, organize, and cite references for academic works. It allows users to create a library of references, organize them into groups, and add notes and keywords. EndNote offers tools for formatting citations and bibliographies in a wide range of citation styles and has both desktop and web-based versions. It also integrates with Microsoft Word for easy citation and bibliography formatting. https://endnote.com/.

2. Mendeley: It is also a reference management software that helps researchers to manage their research, collaborate with others, and discover new research. It allows users to organize and annotate PDFs, import and export references, and collaborate with others. It also has a social networking component for connecting with other researchers and discovering new research. Mendeley is available on desktop and mobile devices, and there's a web-based version called Mendeley Web. https://www.mendeley.com/.

3. Sci-Hub: This website provides free access to millions of scientific articles and research papers that are typically available only through paid subscriptions. While the legality of Sci-Hub remains a subject of debate, many researchers, particularly those in developing countries, rely on the site for access to research that they would not otherwise be able to afford. Using Sci-Hub to access copyrighted material without permission is illegal in many countries, and researchers should be aware of the potential legal consequences before using the site. https://sci-hub.se/.

4. Science hub Mutual Aid community (wosonhj): The website provides a platform for people to share their access to research articles and papers with those who cannot afford to pay for them. All you need to do is sign up for an account and visit the website daily to gain points. These points can then be used to download premium articles for free. One of the most interesting things about the Science Hub Mutual Aid Community is that it relies on a community of individuals who help each other out. All it takes is 4-5 minutes for someone to share an article with others, helping to ensure that everyone has access to the latest research and findings. https://wosonhj.com/.

5. Z-library: This website provides access to a vast collection of research articles, books, and other documents. With over 11 million books and 84 million articles as of January 1, 2023, Z-library is one of the largest e-book libraries in the world. The organization operates as a non-profit and relies on donations to sustain its operations. https://z-lib.is/.

6. Anna’s Archive: It is a non-profit online shadow library metasearch engine created by anonymous archivists in response to law enforcement efforts to close down Z-Library in November 2022. Anna's Archive aims to fill that Z-Library shutdown gap by providing access to a large number of books and articles, including many that are not easily available elsewhere. https://annas-archive.org/.

7. futurepedia.io: A knowledge-sharing AI platform that provides a comprehensive database of futuristic technologies and their applications. It aims to help researchers, engineers, and enthusiasts stay up-to-date with the latest advancements in technology, including AI, biotechnology, renewable energy, and space exploration. The platform features a user-friendly interface, a community-driven section for collaboration, and accurate information. https://www.futurepedia.io/.

8. Scholarcy: It is an AI-powered summarizer that helps researchers and students understand and summarize academic papers. It uses machine learning to identify key information like data analysis and main findings, generates summaries, and highlights important sections. Scholarcy saves time, increases productivity, and offers a web-based application and browser extension. It supports multiple citation styles and can automatically generate references and bibliographies. https://www.scholarcy.com/.

9. Semantic Scholar: Semantic Scholar is a platform that allows users to access over 200 million research papers, discover connections between different topics, receive recommendations based on recent searches, and generate summaries. https://www.semanticscholar.org/.

10. Elicit: AI tool that can be used to access over 175 million research articles and provide instant answers to any questions you may have. With Elicit, you can upload any research article and get answers to various questions related to that article in real-time. The tool can extract key information from the paper, even if the keywords do not perfectly match your query. It can also summarize the takeaways from the paper that are specific to your question. https://elicit.org/.

11. typeset.io: A research communication platform that solves various problems encountered while reading research papers. Many people have difficulty understanding research papers while reading them. This artificial intelligence tool provides a solution to this problem and helps users understand every line of the research paper. If you want to know the practical implications of the paper, this AI tool will analyze the article and provide you with the information quickly. https://typeset.io/.

12. Content Mine: Enables users to find, download, analyze, and extract knowledge from academic papers. It simplifies the research process by allowing researchers to search for papers and extract relevant data quickly and efficiently. https://lnkd.in/dQ73C2Af.

13. Scite: Allows researchers to see how their publications have been cited and provide the context of citation. It helps users discover supporting and contrasting evidence for each paper using AI and machine learning algorithms. https://scite.ai/.

14. Paper Digest: This tool creates 3-minute summaries of research papers by extracting key ideas and sentences. It saves time and presents the key findings in an easy-to-read format, particularly useful for busy professionals who need to stay updated on the latest research in their field. https://lnkd.in/d7TsH96f.

15. SciSpace Copilot: A multi-lingual AI tool that helps users understand scientific papers by simplifying technical language, math, and tables. It can also provide quick answers to queries and convert lengthy sections into easy-to-read summaries. It is offered by Typeset, a research communication platform. https://typeset.io/.

16. Scrivener: This tool is designed for long writing projects that help writers overcome writer's block and page fright by allowing them to write text in any order and organize it later. It also includes features for note-taking and research organization to keep writing projects on track. https://lnkd.in/dtXgAQnT.

17. ChatGPT: ChatGPT is an artificial intelligence language model developed by OpenAI. It is capable of generating human-like text and can be used to suggest book titles or even create cover letters. However, it should not be used to generate content for research articles as the company has recently introduced a tool that can identify AI-generated text. If you use an AI tool to write an article, it may even catch Turnitin software's attention. https://chat.openai.com/.

18. Researchrabbitapp or Connnectedpaper: A research tool that allows users to upload research articles and receive automatic suggestions for similar literature based on real-time analysis. It also offers features such as citation analysis and reference tracking. This tool can be particularly useful for literature review purposes. https://www.researchrabbitapp.com/; https://www.connectedpapers.com/.

19. সঠিক: A Bengali language spell checker designed specifically for grammar checking. It helps Bengali language users to identify and correct spelling errors while typing in Bengali. This website has already released a test version of the spell checker. https://spell.bangla.gov.bd/.

20. creatly/Diagram.net/mindmup/Canva: You often need to create diagrams or images. Although it is possible to draw images in Microsoft Word, they often don't look as good. If you want to create your desired diagram instantly, you can use these websites. https://creately.com/; https://www.diagrams.net/; https://www.mindmup.com/; https://www.canva.com/.

21. Elink.io: A content curation tool that allows users to save articles, videos, cloud files, and social media posts from around the web, and share them with others. With Elink.io, users can create visually appealing and organized collections of content that can be shared via email, social media, or a unique link. The platform also offers a bookmarklet that makes it easy to save content while browsing the web. https://elink.io/.

22. GanttPRO: GanttPRO is a tool for planning research projects. It allows users to create a timeline, assign tasks, track progress and deadlines. The platform helps teams to collaborate and stay organized throughout the project. GanttPRO also offers templates and customization options to suit the user's specific needs. It is a web-based tool that can be accessed from anywhere with an internet connection. https://ganttpro.com/.

23. Trinka: Trinka is an AI-powered grammar checker and language correction tool designed specifically for academic and technical writing. It can identify errors that other grammar checkers might miss and helps improve the overall quality of the writing. https://www.trinka.ai/.

24. Lex: A user-friendly text editor that simplifies document formatting and organization. With its simple prompts and user-friendly interface, Lex allows you to quickly create documents with headers, bulleted lists, and references. https://lex.page/.

25. Grammarly: A writing tool that uses AI to check grammar, spelling, punctuation, and sentence structure in real-time, and provides suggestions for improvement. It offers a web-based editor, desktop app, browser extension, and mobile app, and can be used for emails, essays, articles, and social media posts. Grammarly has features like tone detection, plagiarism checking, and readability analysis to improve writing quality. It has both free and paid versions, with the paid version offering more advanced grammar checks and a plagiarism detector. https://www.grammarly.com/.

26. Tome: Tome is an AI-powered storytelling platform that helps writers and content creators develop compelling characters, generate new ideas, and plot engaging storylines. The platform uses natural language processing and machine learning techniques to analyze literature and generate new ideas based on user input. Users can input their own ideas or prompts and browse through existing literature to find inspiration for their writing. https://beta.tome.app/.

27. Quillbot: Quillbot is an AI-powered writing assistant that suggests rephrasing, summarizing, and paraphrasing for users. The platform uses advanced machine learning algorithms to analyze and improve the user's writing. Users can input their text, select from a range of templates, and access a variety of features including sentence rephrasing, tone and style change, and summarization. Quillbot can quickly generate a concise summary by capturing the most important information from longer articles or documents. https://quillbot.com/.

28. Turnitin: Turnitin is a web-based plagiarism detection software used by educators and institutions worldwide to ensure academic integrity. It works by comparing submitted papers to a vast database of academic and online sources to identify potential instances of plagiarism. https://www.turnitin.com/.

Apart from the tools mentioned above, there are several other tools available that can help researchers in their work. It is worth noting that while many of these tools are free and open-source, some do come with licensing restrictions or require payment for advanced features. Researchers should be aware of these restrictions and choose tools that are appropriate for their needs and budgets. In addition, always respect the copyright laws and obtain proper permissions before using copyrighted material.