Recently we’ve been busy updating Analyse-it to stay aligned with the latest updates to the CLSI protocols, and added a new inverse prediction feature.
If you have you can download and install the update now, see or visit the . If maintenance on your license has expired you can renew it to get this update and forthcoming updates, see .
New CLSI EP6-Ed2
The CLSI recently released guideline , which replaces the EP06-A published in 2003.
EP06-A relied on fitting a linear (straight line), 2nd (parabolic) and 3rd (sigmoidal) order polynomials to the data. A method was then determined to be linear or possibly non-linear based on statistical criteria. The degree of nonlinearity was then calculated as the difference between the linear fit and the best fitting non-linear model (parabolic or sigmoidal curves). Nonlinearity could then be compared against allowable nonlinearity criteria.
The new CLSI EP6-Ed2 protocol no longer requires fitting polynomial models to determine linearity. Instead, the deviation from linearity is calculated as the difference between the mean of each level and a linear fit through the data. That can then be compared against the allowable nonlinearity criteria. Other changes to the protocol include experimental design and there is now more focus on the structure of the variance across the measuring interval.
Recent improvements to the , in version 5.50 and later, include the addition of probit regression. Probit regression is useful when establishing the detection limit (LoD) for an RT-qPCR assay.
The protocol provides guidance for estimating LoD and is recognized by the FDA. In this blog post, we will look at how to perform the relevant part of the CLSI EP17-A2 protocol using Analyse-it.
For details on experimental design, see section 5.5 in the CLSI EP17-A2 guideline. In Analyse-it, you should arrange the data in 2 columns: the first should be the concentration, and the second should be the result, positive or negative. You should have a minimum of 20 replicates at each concentration. We have put together a hypothetical example in the workbook which you can use the follow the steps below:
The analysis task pane opens.
NOTE: If using Analyse-it pre-version 5.65, on the Fit panel, in the Predict X given Probability edit box, type 0.95.
Following our last blog post, today, we will show how to calculate binary agreement using the . The protocol is a useful companion resource for laboratories and diagnostic companies developing qualitative diagnostic tests.
In Analyse-it, you should arrange the data in frequency or case form, as discussed in the blog post: . You can find an example of both and follow the steps below, using the workbook .
NOTE: The Average method is useful when comparing two laboratories or observers where neither is considered a natural comparator. The reference method is asymmetric, and the result will depend on the assignment of the X and Y methods, whereas the average method is symmetric, and the result does not change when swapping the X and Y methods.
INFO: Older versions of Analyse-it do not support the Average method, and the Agreement by category checkbox is called Agreement.
The analysis report shows positive and negative agreement statistics.
Due to COVID-19, there is currently a lot of interest surrounding the sensitivity and specificity of a diagnostic test. These terms relate to the accuracy of a test in diagnosing an illness or condition. To calculate these statistics, the true state of the subject, whether the subject does have the illness or condition, must be known.
In recent FDA guidance for laboratories and manufacturers, , the FDA state that users should use a clinical agreement study to establish performance characteristics (sensitivity/PPA, specificity/NPA). While the terms sensitivity/specificity are widely known and used, the terms PPA/NPA are not.
protocol describes the terms positive percent agreement (PPA) and negative percent agreement (NPA). When you have two binary diagnostic tests to compare, you can use an agreement study to calculate these statistics.
As you can see, these measures are asymmetric. That is, interchanging the test and comparative methods, and therefore the values of b and c, changes the statistics. They do, however, have a natural, simple, interpretation when one method is a reference/comparative method and the other a test method.
It is important in diagnostic accuracy studies that the true clinical state of the patient is known. For example, in developing a SARS-CoV-2 anti-body test, for the positive subgroup, you might enlist subjects who had a positive SARS-CoV-2 PCR test and clinically confirmed illness. Then, for the negative subgroup, you might use samples taken from subjects before the illness was in circulation. It is also essential to consider other factors, such as the severity of illness, as they can have a marked effect on the performance characteristics of the test. A test that shows high sensitivity/specificity in a hospital situation in very ill patients can be much less effective in population screening where the severity of the illness is less.
In cases where the true condition of the subject is not known, and only results from a comparative method and a new test method are available, an agreement measure is more suitable. We will cover that scenario in detail in a future blog post.
In our last post, we mentioned that the 'accuracy' statistic, also known as the probability of a correct result, was a useless measure for diagnostic test performance. Today we'll explain why.
Let's take a hypothetical test with a sensitivity of 86% and specificity of 98%.
As a first scenario we simulated test results on 200 subjects with, and 200 without, the condition. The accuracy statistic (TP+TN)/N is equal to (172+196)/400 = 92%. See below:
In a second scenario we again simulated test results on 400 subjects, but only 50 with, and 350 without, the condition. The accuracy statistic is (43+343)/400 = 96.5%. See below:
The accuracy statistic is effectively a weighted average of sensitivity and specificity, with weights equal to the sample prevalence P(D=1) and the complement of the prevalence (that is, P(D=0) = 1-P(D=1)).
Accuracy = P(TP or TN) = (TP+TN)/N = Sensitivity * P(D=1) + Specificity * P(D=0)
Therefore as the prevalence in the sample changes so does the statistic. The prevalence of the condition in the sample may vary due to the availability of subjects or it may be fixed during the design of the study. It's easy to see how to manipulate the accuracy statistic to weigh in favor of the measure that performs best.
There’s currently a lot of press attention surrounding the finger-prick antibody IgG/IgM strip test to detect if a person has had COVID-19. Here in the UK companies are buying them to test their staff, and some in the media are asking why the government hasn’t made millions of tests available to find out who has had the illness and could potentially get back to work.
We did a quick Google search, and there are many similar-looking test kits for sale. The performance claims on some were sketchy, with some using as few as 20 samples to determine their performance claim! However, we found a webpage for a COVID-19 IgG/IgM Rapid antibody test that used a total of 525 cases, with 397 positives, 128 negatives, clinically confirmed. We have no insight as to the reliability of the claims made in the product information. The purpose of this blog post is not to promote or denigrate any test but to illustrate how to look further than headline figures.
We ran the data through the version 5.51. Here's the workbook containing the analysis:
Our focus at Analyse-it has always been on the development and improvement of our software. While we provide extensive help, tutorials, and technical support for Analyse-it, one area we do not cover is training and consultancy. As many of you will know we are based in England in the United Kingdom, and providing training and consultancy is often done better locally, in-person.
Instead we partner with experts who can provide training and consultancy in various disciplines, in local language, and geographically near (or at least nearer) to our customers. You can always find a list of current consultant and training partners at
One of the experts we have had a long relationship with is Dr. Thomas Keller. Dr Keller is an independent statistician and has run for 15 years. One his many areas of expertise is the planning and evaluation of experiments for method validation and he has been involved in international working groups (IFCC, CLSI) in the fields of clinical chemistry and laboratory medicine. Dr. Keller was actually a customer and started to provide training in Analyse-it shortly after. His reputation is second to none in the industry and he has provided consultancy and training to many companies using Analyse-it. See an example of a offered by Dr. Keller. He also provides for anything from simple questions to full courses for individuals and small groups.
It’s been a long-requested feature, and today we’re happy to announce that Analyse-it version 5.10 now includes the ability to save the dataset filter with an analysis and re-apply it on recalculation.
Analyse-it always allowed you to use Excel auto-filters to quickly limit analysis to just a subset of the data, but until now that filter wasn’t saved. Each time you recalculated the analysis it was based on the currently active filter rather than the filter in-effect when you created the analysis.
This “active filter” method had its uses in exploratory data analysis: you can easily create an analysis, adjust the filter criteria, click Recalculate to see the changes to the analysis, then repeat as necessary to explore the data. But it also had its limitations. For example, if you created two analyses to look at subjects where Age > 20 and Age <= 20, simply clicking Recalculate on those analyses could get you in a mess if you didn’t reset the filter conditions on each analysis before recalculating.
Update 19-Sep-2019: Unfortunately this continues to be an issue for some users and unfortunately there is currently no solution from Microsoft except to suggest the use of compatibility mode as detailed below. We have requested this be fixed, so please up-vote it at
Update 27-Jun-2018: Although we have a fix for this issue on an internal build, it appears that Microsoft Office version 1807 (which is currently only available on the Office Insider track) fixes this issue. The missing user-interface problem was caused by a bug in Microsoft Office 1805/1806 updates. We will release our fix shortly, but the 1807 version update will also become available to everyone over the next month or so. If you want to get it immediately see .
Microsoft has recently released updates to both and to provide support for multiple monitor high DPI (dots-per-inch) displays.
In the early days of Microsoft Windows, monitors were assumed to have 96 DPI and all applications worked on that assumption: with a user interface fixed on that assumption. In the last 15 years, monitors with higher DPI have started to appear with the benefit that on-screen text and graphics look much smoother because there are so many more dots per inch. That caused problems for many applications which were fixed to assume 96 DPI, causing their user interface to scale improperly on high-DPI monitors. Applications like Analyse-it supported high DPI monitors and adjusted their user interface appropriately... until now.
Prediction intervals on Deming regression are a major new feature in the Analyse-it Method Validation Edition version 4.90, just released.
A prediction interval is an interval that has a given probability of including a future observation(s). They are very useful in method validation for testing the commutability of reference materials or processed samples with patient samples. Two CLSI protocols, and both use prediction intervals.
We will illustrate this new feature using an example from CLSI EP14-A3:
1) Open the workbook .
2) On the Analyse-it ribbon tab, in the Statistical Analysis group, click Method Comparison and then click Ordinary Deming regression.
3) In the X (Reference / Comparative) drop-down list, select Cholesterol: A.
4) In the Y (Test / New) drop-down list, select Cholesterol: B.
5) On the Analyse-it ribbon tab, in the Method Comparison group, click Restrict to Group.
Often we collect a sample of data not to make statements about that particular sample but to generalize our statements to say something about the population. Estimation is the process of making inferences about an unknown population parameter from a random sample drawn from the population of interest. An estimator is a method for arriving at an estimate of the value of an unknown parameter. Often there are many competing estimators for the population parameter that differ based on the underlying statistical theory.
A point estimate is the best estimate, in some sense, of the population parameter. The most well-known estimator is the sample mean which produces an estimate of the population mean.
It should be obvious that any point estimate is not absolutely accurate. It is an estimate based on only a single random sample. If repeated random samples were taken from the population the point estimate would be expected to vary from sample to sample. This leads to the definition of an interval estimator which provides a range of values defined by the limits [L, U].
As we mentioned last week in the , in this release we took the opportunity to revamp the documentation.
The revamp involved rewriting many topics to make the content clearer, adding new task-oriented topics, including refresher topics on common statistical concepts, and improving the indexing and links between topics so you can more easily navigate the help system.
The new task-oriented topics give you step-by-step instructions on completing common tasks. For example you will now find topics on how to , , , and even simple tasks like . We have also fully documented the supported dataset layouts for each type of analysis so you can see how to arrange your data for Analyse-it. The links in each topic help you more easily find related topics, for example links to topics on how to interpret the statistics, links to explain the pros and cons of the available statistical tests, links to topics for common tasks, and a link showing you how to arrange the dataset.
Of all the requests, the most customer-requested improvement is the new . Previously we offered EPUB and Kindle reader editions of the help, but not PDF. To be honest, producing a PDF user guide from the tools we use to write the help was a real technical challenge. The PDF produced just wasn’t good enough for us, and certainly not for our customers – the formatting and layout were poor, indexing was non-existent, and there were so many other niggles. So we took the time to make the user-guide both look good and be usable. Take a look and let us know what you think!
Last week we released version 4.80 of Analyse-it.
The new release includes multi-way , , and in the Standard edition, and since every licence includes the Standard edition, these features are available to all users. We also took the opportunity to revamp the and develop a . We’ll go into more details on the improvements in the next few weeks.
If you have you can download and install the update now, see . If maintenance on your license has expired you can renew it to get this update and forthcoming updates, see .
Today we released version 3.80 of the Analyse-it Standard edition.
The new release includes Principal Component Analysis (PCA), an extension to the multivariate analysis already available in Analyse-it. It also includes probably the most advanced implementation of biplots available in any commercial package.
New features include:
The tutorial walks you through a guided example looking at how to use correlation and principal component analysis to discover the underlying relationships in data about New York Neighbourhoods. It demonstrates the amazing new features and helps you understand how to use them. You can either follow the tutorial yourself, at your own pace, or .
If you have you can download and install the update now, see . If maintenance on your licence has expired you can renew it to get this update and forthcoming updates, see .
If you you will no doubt already know about the recent improvements in the Analyse-it Method Validation edition and the release of our first video tutorial. If not, now is a good time to since we post short announcements and feature previews on Facebook, and use the blog only for news about major releases.
The latest changes and improvements to the Analyse-it Method Validation edition include:
Finally, we are delighted to release our first video tutorial. The tutorial is the video equivalent of the tutorial above. It walks and talks you through using Analyse-it to determine the agreement between methods. Sit back and .
We intend to produce more video tutorials in future, so let us know what you think: what you like, dislike, and how we can improve them in future.
What is a sample quantile or percentile? Take the 0.25 quantile (also known as the 25th percentile, or 1st quartile) -- it defines the value (let’s call it x) for a random variable, such that the probability that a random observation of the variable is less than x is 0.25 (25% chance).
A simple question, with a simple definition? The problem is calculating quantiles. The formulas are simple enough, but a take a quick look on Wikipedia and you’ll see there are at least 9 alternative methods . Consequently, statistical packages use different formulas to calculate quantiles. And we're sometimes asked why the quantiles calculated by Analyse-it sometimes don’t agree with Excel, SAS, or R.
Excel uses formula R-7 (in the Wikipedia article) to calculate the QUARTILE and PERCENTILE functions. Excel 2010 introduced two new functions that use slightly different formulas, with different denominators: PERCENTILE.INC and PERCENTILE.EXC.
SAS, R and some other packages let you choose which formula is used to calculate the quantiles. While this provides some flexibility, as it lets you reproduce statistics calculated using another package, the options can be confusing. Most non-statisticians don’t know when to use one method over another. When would you use the "Linear interpolation of the empirical distribution function" versus the "Linear interpolation of the modes for the order statistics for the uniform distribution on [0,1]" method?
Yesterday we improved the help in the and added a statistical reference guide. The guide tells you about the statistical procedures in Analyse-it, with help on using and understanding the plots and statistics. It’s a work in progress, and we intend to improve it further with your comments and feedback, but it’s important to understand the role of the guide.
Firstly, the guide is not intended to be a statistics textbook. While it covers key concepts in statistical analysis, it is no substitute for learning statistics from a good teacher or textbook.
Secondly, the guide does not include the mathematical formulas behind the statistics. While an understanding of the mathematics is useful, it is better to understand the practical application of statistics: when and where they can be used, and how to interpret the results. Software makes it unnecessary to know the exact formulas, and often the exact mathematics used in software differ from those in textbooks since optimised routines are used to ensure good performance and numerical precision.
In clearly titling this blog post, we’ve probably already revealed the answer, but... Can you spot the difference between the two rows of values in the Excel spreadsheet shown below?
Sorry, it’s a trick question, because (visually) there is no difference. The difference is how the values are stored by Microsoft Excel. The value 57 in the cell on second row is actually stored as a text string, not a number.
When you type a value into a cell, Excel looks at what you’ve typed and decides whether it’s a valid number. If it is, the value is stored as a number, and if not it’s stored as text (a string of characters).
Considering this, how is it possible for Excel to store a value that looks like a number, as text? There are a few ways. Most common is when you copy-paste data from another application, and the application providing the data fools Excel into believing the values should be stored as text. Similarly, if you import data from a database field that contained numbers stored as text, the numbers will be imported as text. Finally, you can force Excel to store a number as text by prefixing it with an apostrophe (‘).
Today we’re delighted to publish the second case study into the use of Analyse-it.
The case study features a national clinical laboratory in the USA that offers more than 2,000 tests and combinations to major commercial and government laboratories. They use Analyse-it to determine analytical performance of automated immunoassays for some of the industry’s leading in-vitro diagnostic device makers -- including Abbott Diagnostics, Bayer Diagnostics, Beckman Coulter and Roche Diagnostics.
Unfortunately we cannot name the end-user, or the organisation she works for, in the case study. Although she was delighted to feature in the case study, at final approval her organisation's committee preferred the names be withheld. Thankfully they have allowed us to use the case study, albeit anonymously.
You can online now or download the .
We would love to feature more customer stories in case studies. If you can get approval to participate – which we realise is very difficult in many industries – and have 20 minutes to spare for a telephone interview, please contact us at .
In a previous post, , we explained the tests provided in Analyse-it to determine if a sample has normal distribution. In that post, we mentioned that although hypothesis tests are useful you should not solely rely on them. You should always look at the histogram and, maybe more importantly, the normal plot.
The beauty of the normal plot is that it is designed specifically for judging normality. The plot is very easy to interpret and lets you see where the sample deviates from normality.
As an example, let’s look at the distribution of systolic blood pressure, for a random group of healthy patients. Analyse-it creates the histogram (left) and normal plot (right) below:
Looking at the histogram, you can see the sample is approximately normally distributed. The bar heights for 120-122 and 122-124 make the distribution look slightly skewed, so it’s not perfectly clear.
The normal plot is clearer. It shows the observations on the X axis plotted against the expected normal score (Z-score) on the Y axis. It’s not necessary to understand what an expected normal score is, nor how it’s calculated, to interpret the plot. All you need to do is check is that the points roughly follow the red-line. The red-line shows the ideal normal distribution with mean and standard-deviation of the sample. If the points roughly follow the line – as they do in this case – the sample has normal distribution.
A customer contacted us last week to ask how to refer to cells on an Analyse-it report worksheet, from a formula on another worksheet. The customer often used Analyse-it's refresh feature, to repeat the statistical analysis and update the statistics, and direct references to cells on the report were being lost on refresh.
As an example, suppose you have used Analyse-it linear regression to calculate the linear relationship between installation cost and the number of employees required, distance to the site, and the cost of machine being installed. Analyse-it would calculate the effect of each variable on the final cost, technically known as regression coefficients, which you can then use to predict installation costs for jobs in future.
You might setup a worksheet to predict and quote installation costs for future jobs. You could use an Excel formula to reference the coefficients directly from the Analyse-it report, for example:
= Employees * CostAnalysis!C17 + Distance * CostAnalysis!C18 + MachineCost * CostAnalysis!C19
Today we’re delighted to publish the first case study into the use of Analyse-it.
Marco Balerna Ph.D., a Clinical Chemist at the in Switzerland, used Analyse-it when replacing the clinical chemistry and immunological analysers in EOC’s laboratories.
Since the EOC provides clinical chemistry services to five large hospitals and three small clinics in the region, it was essential the transition to the new analysers went smoothly. Marco used Analyse-it to ensure the analyser’s performance met the manufacturer’s claims, to ensure the reporting of patient results was not affected, and to comply with the regulations of the EOC’s accreditation.
Overall the project involved comparing performance for 110-115 parameters, comprising over 25,600 measurements with control materials and patient samples.
Marco was so impressed with Analyse-it and the time he saved, that he was very enthusiastic when we asked if we could feature his story in a case study. We would like to publically thank Marco for his co-operation in the case study. Grazie Marco! Salute!
Although the charts in Analyse-it are large so they’re easy to read when printed, sometimes you need to print a chart to fill the full page. You can do so easily, without resizing the chart, in just a few steps:
Chart size is only limited by the page size your printer supports.
Identifying what was analysed, when, and by who, is the first step in understanding any Analyse-it report. The top rows of each Analyse-it report provide you with this information. The statistical test used, dataset and variables analysed, user who analysed, and the date and time last analysed, are included (see below). When you print the report the header is repeated at the top of printed page.
The date the report was last updated is included so you can see when reports are out of sync with changes made to the dataset. It’s also useful if you archive analysis reports and need to know when the analysis was performed. For brevity Analyse-it shows only the date, but the cell also contains the time of the last update to the report. To see the time, click the cell containing the date to activate it, and then look at the Excel formula bar to see the time (see screenshot above).
To aid traceability, Analyse-it includes the name of the user who last updated the report. Analyse-it gets the name from the Microsoft Office user name. The user name shared among all Microsoft Office applications, including Word, Excel, and PowerPoint. Office applications use the name to identify changes in documents, and store it in the document properties to identify who created, last edited, or modified an Office document. Analyse-it includes the name in the report header so you can quickly see who last analysed the data, should you need to contact them.
In May this year, we surveyed users of the Analyse-it Method Evaluation edition to gain insight into how we can improve Analyse-it in future. Thank you to all those who responded.
In the responses, one issue became clear: the unfiled reports feature causes confusion.
When you run an analysis, Analyse-it creates a new worksheet containing the statistics and charts for that analysis (what we call a report). Analyse-it places the report in a temporary workbook called . From there you can then decide what you want to do with the analysis: keep it, print it, e-mail it, or discard it. If you want to keep it you click the (see below), and Analyse-it moves the report into the same workbook as your dataset.
You might wonder where the idea of unfiled reports originated. It was actually a carry-over from , the predecessor to Analyse-it. We implemented the same feature in Analyse-it without really questioning it. We thought the feature would be useful to help you manage reports, plus most Astute users upgrading would expect it.
The most used distribution in statistical analysis is the normal distribution. Sometimes called the Gaussian distribution, after , the normal distribution is the basis of much parametric statistical analysis.
Parametric statistical tests often assume the sample under test is from a population with normal distribution. By making this assumption about the data, parametric tests are more powerful than their equivalent non-parametric counterparts and can detect differences with smaller sample sizes, or detect smaller differences with the same sample size.
It’s vital you ensure the assumptions of a parametric test are met before use.
If you’re unsure of the underlying distribution of the sample, you should check it.
Only when you know the sample under test comes from a population with normal distribution – meaning the sample will also have normal distribution – should you consider skipping the normality check.
Many variables in nature naturally follow the normal distribution, for example, biological variables such as blood pressure, serum cholesterol, height and weight. You could choose to skip the normality check these in cases, though it’s always wise to check the sample distribution.
For new and occasional Analyse-it users, datasets can sometimes seem confusing. Today we’ll explain why we devised the 'dataset' concept, a concept now copied by some other Excel add-ins.
We introduced the dataset concept so Analyse-it could automatically pick-up the data and variables from your Excel worksheet. As we found with , the Analysis Toolpak, and other Excel add-ins, forcing you to select cells containing the data to be analysed can be problematic:
Statistics software is supposed to simplify what’s already a complex and error-prone subject area. Forcing you to select ranges of cells just seemed, to us, to introduce more potential for errors.
In Analyse-it we tried to solve all these problems. We wanted the software to do more of the work, eliminating the need to select or re-organise your data. You should be able to:
These requirements mean Analyse-it has to know exactly how your data is arranged on an Excel worksheet -- which cells contain data for analysis, and which cells contains the variable names.
A few readers have e-mailed to ask for more information about the book by David J. Sheskin we alluded to in the comment reply re: the , last week.
The book is the Handbook of Parametric & Non-parametric Statistical procedures, by David J. Sheskin, ISBN: 1584888148.
We have the third edition of the book which runs to over 1,200 pages -- a phenomenal piece of work for a single (obviously very dedicated) author. While it’s not a book you would sit down and read cover-to-cover, it is a very readable reference guide, covering all the parametric and non-parametric statistical procedures included in Analyse-it.
For beginners the book starts at the very beginning, introducing summary statistics such as the mean, median, then moving on to explain concepts such as measurement scales, central tendency, variability, normal distribution, hypothesis testing, parametric and non-parametric statistics. The text is concise, but is clear, easy to read, and easy to understand -- ideal for anyone needing a refresher course on statistics.
Most of you know where to find the help and examples provided with Analyse-it, but if not, today we’d like to explain what’s available. If you're stuck we're always happy to help, and usually respond within a few hours, but it's always faster for you to check if the help answers your question first.
If you’re new to Analyse-it, or want a quick refresher, the best place to start is the Getting Started tutorial. It’s completely automated, no typing is required, so all you have to do is sit back and watch. In just 10 minutes it will demonstrate how to setup a dataset, how to filter the dataset, how to run a statistical test, and how to edit, refresh, and print the reports.
To watch the tutorial:
The application help provided with Analyse-it is a complete reference covering all aspects of Analyse-it: how to install Analyse-it, start it, layout datasets, manage reports, and how to use the statistical tests. You can either browse the contents to learn about Analyse-it, or use the index or search to quickly find the right topic. Index and search were only recently added to the help, in Analyse-it 2.10, so if you’re not using the latest version .