Introduction
|
McMaille (pronounce : MacMy) is a program for indexing powder patterns by Monte Carlo and grid search (maille in french = cell in english). The 2-theta peak positions extracted from a peak hunting program are used together with the intensities in order to build a pseudo powder pattern to which are compared patterns calculated from the cell parameters proposed by a Monte Carlo or by a grid systematic search process. In McMaille versions 0.9-2.0, the calculated intensities were adjusted by a Le Bail fit (applying 3 iterations of the Rietveld decomposition formula) using Gaussian peak shapes. In version 3.0, time is gained by a factor 20 by using columnar peak shapes and a "fit" by percentage of inclusion of the calculated columns inside the "observed" one. The best cells are refined, more or less. This is similar to the (unnamed and still unavailable ?) software by B.M. Kariuki et al., J. Synchrotron Rad. 6. (1999) 87-92, though the latter uses a genetic algorithm and the raw data. Moreover, McMaille proposes an option of simultaneous two phases indexing and an automated expert system (black box mode) with a simplified manual (recommended for beginners). Version 4.00 is available under two executables, one for single processor machines and another one parallelized for multi core machines (core duo, dual core, quad core etc), much faster. Main improvements in version 4.00 :
Armel Le Bail - Last update : November 2006 |
Download McMaille version 4.00 is distributed under the GNU Public Licence conditions. The zipped package contains the executable for Windows 95/98/NT/XP, as well as the FORTRAN source codes (quite short and documented) for both the single and the multi processor machines, and some examples described below. Get it : McMaille-v4.zip The compiler used for building the executable was Intel Visual Fortran
9.1. The parallelized version is realized by using OpenMP
directives.
The package MvMaille-V4.zip contains mainly : McMaille.exe : executable
for MS Windows, single processor
PMcMaille.zip : contains the executable
for the multi core processors
McMaille-v4.html : the complete manual
McMaille.pdf
: copy of the McMaille publication
tests.zip
: various classical test files
cub.hkl, hex.hkl, rho.hkl, tet.hkl, ort.hkl, mon.hkl, tri.hkl
More about the .hkl files :
References In case of successful use, please cite the paper :
See comparisons of indexing software including McMaille :
Visit the Indexing Benchmarks Web page at :
The manual : McMaille-v4.html included in the package is also available
at :
More on the method Read the paper cited above for full details. As soon as a Monte Carlo cell proposal produces Rp < Rmaxref ~0.5 (similar definition as Rp in the Rietveld method), that cell is more closely examined. Because a least square refinement would not be efficient, the cell parameters are changed (NCYCLES times, see below) a bit (in the range 0. to 0.02 Angstroms and 0. to 0.2 degrees), randomly by using the Monte Carlo process, around their initial values, checking if Rp decreases. Most of the times Rp decreases enormously, sometimes below the selected Rmax (a limit value for keeping the cell) and Rmin (another limit value for stopping the run because, with such a fit quality, the cell could be the right one). This cell adjustment is analogous to simulated annealing. Moreover, a second criterium is used being that if the number of expected peaks is explained (NDAT-NIND) with Rp > Rmaxref, that proposal cell is examined too. This is a brute force indexing approach, very simple to develop. Least square parameters refinements (using the old CELREF routine by Laugier & Filhol) are performed at the end on the selected cell(s). Some important values defined in the program are below : Nhkl Min Nhkl Max NCYCLES NTRIED/NSOL cubic 6xNDAT 400 200 100 rhombohedral 12xNDAT 600 500 1000 hexagonal 12xNDAT 800 500 1000 tetragonal 12xNDAT 800 500 1000 orthorhombic 20xNDAT 1000 1000 10000 monoclinic 20xNDAT 1000 2000 100000 triclinic 20xNDAT 1000 5000 100000 NDAT = Number of powder pattern peaks examined Nhkl = Number of calculated hkl compared to the data (read in the .hkl files) NCYCLES = Number of random parameter small changes for a given selected cell proposal (having Rp < Rmaxref) NTRIED = Number of Monte Carlo events NSOL = Number of solutions retained having Rp < Rmax The NTRIED/NSOL ratio helps to reduce the number of retained cells. If the value is < to the numbers listed above, then Rmax is decreased by 5%. However, the process is not active if NSOL < 50 and Rmax should be given negative. Avoiding being overloaded by cell proposal is better resolved by decreasing the control parameters W (peak width) and/or Nind (number of non-indexed peak positions tolerated) and/or Rmaxref (the Rp level below which a cell will be refined). The figures of merit (F.o.M.) as applied in McMaille There are 4 F.o.M. used in McMaille : 1 - Classical P.M. de Wolff (1968): M20 P.M. de Wolf, J. Appl. Cryst. 1 (1968) 108-113. 2 - Classical Smith & Snyder (1979): FN (F20 for N=20) G.S. Smith and R.L. Snyder, J. Appl. Cryst. 12 (1979) 60-65. 3 - Rp which is equivalent to a Rietveld profile R factor. H.M. Rietveld, J. Appl. Cryst. 2 (1969) 65-71. 4 - The Rp-based new McMaille F.o.M. McM20 is calculated, for 20 observed lines, as : McM20 = [100./(Rp*N20)] * Brav * Sym where : N20 is the number of possibly existing lines up to the 20th observed line (for a P lattice). Brav is a factor equal to 6 for F and R Bravais lattices, 4 for I, 2 for A, B, C and 1 for P. Sym is a factor equal to 6 for a cubic or a rhombohedral cell, 4 for a trigonal/hexagonal/tetragonal cell, 2 for an orthorhombic cell, and 1 for a monoclinic or triclinic cell. Note that the M(20) and F(20) above are proposed without taking account of extinctions (P lattice). The larger are M20, F20 and McM20, and the better the solution. For Rp, this is the reverse, the more Rp is small, and the better is the fit. McM20 is the best at separating clearly the most probable solutions due to the consideration of symmetry and Bravais lattices. However, F20 and M20 may be artificially high in McMaille because some lines can be eliminated at the refinement stage if they show excessive discrepancy (decreasing the average discrepancy...). Parameters Running McMaille (by either clicking on McMaille.exe and giving the entry file name - no extension - or in a DOS box by typing "McMaille name" ) requires a parameters data file. A typical data file (should be named name.dat, name being your choice) follows : Sr2Cr2O7 Title 1.54056 0.0 2 Wavelength, Zeropoint, Ngrid 1 1 1 0 0 0 Symmetry codes 0.16 6 W , Nind 3. 15. 200. 1500. 0.1 0.2 0.4 Pmin, Pmax, Vmin, Vmax, Rmin, Rmax, Rmaxref 0.2 0.2 Spar, Sang (grid search only) 20000 1 Ntests, Nruns (Monte Carlo only) !!! A line starting by ! is ignored 11.180 345. 2-theta (or d(A)), Intensity 12.217 1120. Etc 15.835 124. 20 couples of positions and 18.709 455. intensities should cover usual Etc cases, but more may be necessary (max = 100) Or, if W above is negative : 11.180 345. 0.16 2-theta, Intensity, W 12.217 1120. 0.10 Etc 15.835 124. 0.24 triplets of positions, 18.709 455. 0.16 intensities and widths Etc In Black box mode, the file is much shorter : Sr2Cr2O7 Title 1.54056 0.0 -3 Wavelength, Zeropoint, Ngrid !!! A line starting by ! is ignored 11.180 345. 2-theta, Intensity 12.217 1120. Etc 15.835 124. 20 couples of positions and 18.709 455. intensities. You may put more Etc but only 20 will be used.Title : for your problem identification. Wavelength : your experiment wavelength. If you used CuKalpha, you should have stripped alpha2 before peak positions hunting. If you try to index large cells (proteins, etc), then consider to divide the wavelength by 5 or 10, so that you will obtain cell parameters divided by 5 or 10 as well. Zeropoint : your powder pattern zeropoint (global value including the zero due to the diffractometer and the zero due to sample misplacement - will be added to the data). It is recommended to have a standard compound mixed with your sample or to apply the harmonics method for zeropoint estimation. Ngrid : code for the
process to be applied
In black box mode, the next lines should be the 2-theta and intensities couples of values, directly - see the nameb.dat files. NOTE-1 : grid search in triclinic is not implemented (would be too long...)
Symmetry codes :
6 codes allowing to select the crystal system to be explored.
W : the width of the columnar peak shape in degrees. It is recommended to choose W = 2 * FWHM, as a minimum. Using 0.2 < W < 0.3 should produce some correct cells for in-lab data at ~1.5 A wavelength. Using 0.05 < W < 0.15 could be applicable to data coming from a synchrotron Facility at ~0.7 A wavelength (extremely good peak positions are certainly required, anyway). This parameter should reflect your data accuracy, it is close to a tolerated error. Large values (0.30 for a copper target) give more chance to the Monte Carlo process to find easily a minima, but the risk is to be overloaded by false propositions. Play with it... The fact is that most of the test cases will produce the correct solution faster with W=0.5. Being overloaded by cell proposal is resolved by decreasing W (peak width) or decreasing Nind or decreasing Rmaxref. NOTE : if W is negative, then, triplets of [2-theta, I and Width] values should be read instead of doublets of [2-theta and I] values. Moreover, these widths will be multiplied by -W (then, use W=-1 if you wish not to change the widths, or W=-2 if you want to enlarge the widths by a factor 2, etc). Nind : Number of non-indexed reflections you tolerate. Why not 2-6 for a set of 20 hkl ? Avoiding being overloaded by cell proposal is resolved by decreasing Nind (or W or Rmaxref). The more Nind is large, the more the calculations are long... Pmin, Pmax : minimum and maximum cell parameters for the search. Try first 2-15 or 2-20, then, if no solution appears, increase Pmax. NOTE : If Pmin is negative, then it becomes possible to play more on
the individual parameter limits, and a supplementary line should be given
with 12 values :
Vmin, Vmax : minimum and maximum cell volumes for the search. Try first small volumes 20-400, then increase Vmax if no solution occurs. Rmin, Rmax, Rmaxref :
Rp profile reliability factor limits.
NOTE : the line including the 2 following parameters is optional (should not occur if NGRID = 0) Spar : grid search step
applied to the cell parameters.
Sang : grid search step
applied to the cell angles.
NOTE : the line including the 2 following parameters is optional (should not occur if NGRID = 1) Ntests : number of
Monte Carlo tests. Use 500-10000000000 or more.
Nruns : number of Monte
Carlo runs. One run will execute Ntests tests.
2-theta (or d(A)), Intensity
: values obtained at the peak hunting step.
McMaille
expects very accurate peak positions,
Output McMaille produces 4 or 5 types of output files : name.imp containing the details of the calculations
and a final sorted summary.
The screen output delivers for each symmetry examined the first cell
proposal,
Strategy McMaille is a "brute force" program that can be "almost exhaustive" in grid search mode, provided the grid steps are very short. The only problem is : TIME. Calculations for the triclinic case with 1000 steps for each of the six cell parameters would lead to 1000000000000000000 tests, which corresponds to many centuries at the current speed of 30000 MC steps per second in McMaille-v4.0 (was "only" 1000 steps per second in McMaille-v2.0) for a monoprocessor running at 3GHz (multiply by ~1.8 or 3.8 for a core duo or a quadcore, respectively, using the parallelized version... However, an exhaustive search is quite manageable in grid search mode (not yet parallelized...) with a step of 0.01 Angstrom for cubic/hexagonal/tetragonal crystal systems. The recommendation is : First use TREOR, DICVOL, ITO, CRYSFIRE. If no result, then apply McMaille with your fastest PC in an automated run (black box mode NGRID = 3 or -3). If McMaille is so long, and if it is suggested to apply the classical software, what is the McMaille interest ? McMaille is rather insensitive to IMPURITIES. Note that "impurity" means supplementary phase(s) that do not contribute for more than 10% of the total intensity diffracted. You should not expect from McMaille solutions for mixtures of 2 or more unknown major phases (though, see below...). It is obvious that known impurity peaks (identified by a search/match process) should be removed from the list of peaks submitted to McMaille. Making several successive applications of McMaille is recommended. First cubic, then hexagonal and tetragonal, or those 3 crystal systems in one try. Then orthorhombic, if no clear solution appears at the previous runs. Then monoclinic, if no clear solution appears at the previous runs. Finally triclinic. The black box mode detailed below can do that for you : BLACK BOX MODE :
Symmetry max MC events Pmax Vmax cubic V*0.5 3*dmax (3*dmax)**3 - no limit hex/rhomb/tetra 400000 30 4000 orthorhombic 6x1000000 20 0-500-1000-1500-2000-2500-3000 monoclinic 6x10000000 20 0-500-1000-1500-2000-2500-3000 triclinic 8x1000000000 20 0 to 2000 by ranges of 250 Six runs in orthorhombic, monoclinic and eight in triclinic will be made by using different maximum volumes, successively. Other global fixed parameters : NDAT cutted at 20 (if not less), NIND = 3, Pmin = 2., Vmin = 8., W = 0.30*wavelenght/1.54056, SPAR = 0.02, SANG = 0.05, Rmin = 0.02, Rmax = 0.15, Rmaxref = 0.40 Dmax is the d value for the first peak position at low diffraction angle.Note : using NGRID = -3 avoids searching in triclinic. This black box mode could solve simple cases. If not, using the manual modes (NGRID = 0, 1, or 2) would be necessary, enlarging the above cell parameters and volume limits. Trying first in cubic symmetry (this is why the name-new.dat file is made for the cubic case), and then going to lowest symmetries if no result. For recognizing the very best
solution in a black box mode output,
FASTER PRELIMINARY TEST :
Repeat several Monte Carlo runs if nothing is produced (several Monte
Carlo runs will not use the same random number sequences, and will not
examine the same combinations of cell parameters). This is essentially
a question of chance...
TWO PHASES MODE (use
cautiously !):
TRICKS NOTE0 : Keep an eye on the Rp column on the left in the DOS box during McMaille is executing. If it goes to very low values (<0.05 or even less), there may be some solution so that you may consider to read NOTE1 below and stop the calculation.
NOTE1 : pressing the K keystroke (capital letter - for Kill) will stop the program a few seconds (or minutes) later, saving the current results. NOTE2 : If you find that McMaille monopolizes the CPU, then decrease
its priority.
NOTE 3 : LARGE CELLS
Examples The test samples attached with the McMaille package (testn.dat) come mainly from the TREOR and DICVOL distribution package tests (using arbitrarily intensities set to 100.), plus some other example like Y2O3, NAC, and the samples 1-3 from the SDPDRR-2 Round Robin. Running them on your own PC should produce the solutions. Examples of time (Pentium IV 2.4GHz) needed by McMaille for its test files are below (all tests by Monte Carlo, not grid search) : Cimetidine (cim.dat) : monoclinic - 9 seconds
NAC (nac.dat) : cubic - < 1 second
SDPDRR2 Sample 1 (sample1.dat)
: monoclinic - 23 seconds
SDPDRR2 Sample 2 (sample2.dat)
: monoclinic - > 6 minutes
SDPDRR2 Sample 3 (sample3.dat)
: cubic - 1 second
Test 1 - Cd3(OH)5(NO3) (test1.dat) - orthorhombic - 3 seconds
Test2 (test2.dat) - tetragonal - < 1 second
Test3 (test3.dat) - orthorhombic - 5 seconds
Test 4 : monoclinic - less than 1 minute
Test 5: (NH4)2S2O3 - monoclinic - 16 seconds
Test 6 : triclinic - small cell - < 2 minutes
Test7 - cubic ??? - < 1 second
Test 8 - monoclinic - < 2 minutes
Test 9 - triclinic - < 1 minute
Y2O3 - cubic - < 1 second
See also the nameb.* files which are corresponding to the Black
Box mode.
In mixture1.imp (2 cubic phases), the correct couple of solutions appears in 15th position : Rp2 Vol Ind Nsol a b c alpha beta gamma 0.113 1078.288 29 7 10.2544 10.2544 10.2544 90.000 90.000 90.000 0.165 1190.411 14 7 10.5982 10.5982 10.5982 90.000 90.000 90.000In mixture2.imp, (one tetragonal + one orthorhombic phase), the correct solution is the 1st : Rp2 Vol Ind Nsol a b c alpha beta gamma 0.259 1188.120 30 13 11.1880 11.1880 9.4919 90.000 90.000 90.000 0.106 378.244 15 4 10.0276 3.4206 11.0274 90.000 90.000 90.000Times may be different on your machine (could be less or more, this is Monte Carlo... you need chance). In 15-20 years, computers will be 210 to 213 faster
(x1000 to x8000 faster), at least, probably. Even grid search in triclinic
will be manageable.
Parallel computing is clearly the best way now :
To do I have done a lot already, wasting randomly considerable time ;-)... Parallelizing the grid-search mode is not yet done. New bugs are occuring erraticaly in the parallelized version... Displaying sometimes NaN, infinit, etc instead of usually nice numbers... This is very probably due to incorrect assignation of some variables (shared or privates) in the Open MP directives. If you are an expert in parallelizing Fortran codes with Open MP directives, please help ;-). Send your comments, ideas and bug reports
|